Pagination - won't scrape all pages :(

Hey Guys, I'm trying to scrape all of the 1267 locations (site name and addresses) listed at blinkcharging. I have tried creating a sitemap to paginate through all the pages however it only outputs about ~550 results.. I've tried working through some examples to no avail.. not sure what I'm doing wrong here. Any help would be greatly appreciated. Thanks!

Url: https://prod.blinknetwork.com/blinkMap.html?lat=36.70365959719456&lng=-97.9541015625&z=4#

Sitemap:
{"_id":"blink2","startUrl":["https://prod.blinknetwork.com/blinkMap.html?lat=36.70365959719456&lng=-97.9541015625&z=4#"],"selectors":[{"id":"pagination","type":"SelectorElementClick","parentSelectors":["_root"],"selector":"div#leftColumn","multiple":true,"delay":"200","clickElementSelector":"li.simplePagerNav:nth-of-type(8) a","clickType":"clickMore","discardInitialElements":false,"clickElementUniquenessType":"uniqueHTMLText"},{"id":"site","type":"SelectorElement","parentSelectors":["pagination"],"selector":"div.addressBody","multiple":true,"delay":0},{"id":"info","type":"SelectorText","parentSelectors":["site"],"selector":"parent","multiple":false,"regex":"","delay":0}]}

Hey,

Firstly, a 0.2 second delay is a bit risky, as that might not be sufficient for the page to completely render and it might cause your scraping job to crash. Usually, 1.5-2s is a good range.

Secondly, you selected the 'Next' button with :nth-of-type(8) selector, which work fine on the first page, but as the pages progress the 'Next' button is not in the 8th position anymore, so you get a click on an element that is in it, which in this case shifts to a page number, causing it to skip pages. To mitigate this issue, when selecting the Click button, you can use a:contains('Next') selector, that will only press on an element, that contains 'Next' in it regardless of where the button is positioned.

Here is a sitemap, that works:

{"_id":"blink2","startUrl":["https://prod.blinknetwork.com/blinkMap.html?lat=36.70365959719456&lng=-97.9541015625&z=4#"],"selectors":[{"id":"pagination","type":"SelectorElementClick","parentSelectors":["_root"],"selector":".locationDetails","multiple":true,"delay":"1500","clickElementSelector":"li.simplePagerNav a:contains('Next')","clickType":"clickMore","discardInitialElements":false,"clickElementUniquenessType":"uniqueCSSSelector"},{"id":"name","type":"SelectorText","parentSelectors":["pagination"],"selector":".name","multiple":false,"regex":"","delay":0},{"id":"address","type":"SelectorText","parentSelectors":["pagination"],"selector":".addressBody","multiple":false,"regex":"","delay":0}]}

1 Like

Thanks so much, webber! Appreciate it.