Pagination not working for "next' & "last" pages

I can scrape pages 1-9 using the below sitemap, however, I am not able to continue scraping past 9. I've tested 2 sitemats: 1. adding all pages in the startUrl as https://www.urban.com.au/developments?page=[1-342] and remove pagination, 2. added "next" and "last" as links in my pagination section. Neither of these two worked. Help would be much appreciated.

Url: https://www.urban.com.au/developments

Sitemap:
{"_id":"devs","startUrl":["https://www.urban.com.au/developments"],"selectors":[{"id":"details","type":"SelectorLink","selector":"li.node-readmore a","parentSelectors":["_root","pagination"],"multiple":true,"delay":"20"},{"id":"project","type":"SelectorText","selector":"span.current","parentSelectors":["details"],"multiple":false,"regex":"","delay":0},{"id":"suburb state","type":"SelectorText","selector":"span.views-field.views-field-field-location-taxonomize-terms a span","parentSelectors":["details"],"multiple":false,"regex":"","delay":0},{"id":"pagination","type":"SelectorLink","selector":"ul.pager li:nth-of-type(n+3) a","parentSelectors":["_root"],"multiple":true,"delay":"20"},{"id":"developer","type":"SelectorText","selector":"span.views-field.views-field-field-developer a span","parentSelectors":["details"],"multiple":false,"regex":"","delay":0}]}

Cheers,
Nils

Hi!

If you're using pagination array, you have to be precise in page number. This particular website uses page numbers starting from 0, not 1. You can see it by pressing page 2 from first page, notice URL changed to page=1. The last page number is 341.

Try this URL in your sitemap:
https://www.urban.com.au/developments?page=[0-341]

Also note that it uses CloudFlare DDoS protection, I'd recommend you to increase Page Load Delay up to 6 seconds otherwise you will get blocked.

P.S. don't forget to remove your pagination selector

Brilliant, this is working flawlessly now.

Cheers!

1 Like