Pagination scraping on non-sequence url

I looking to scrape a list of business directories through numerous page, but it seems like the page url is not in sequence mode. How can I scrape the data while going to the next page? I tried with the pagination method, it didn't work.

Url: https://www.sgpbusiness.com/browse/A/

I only manage to scrape with the following sitemap:

{"_id":"company-a","startUrl":["https://www.sgpbusiness.com/browse/A/after/1118834/"],"selectors":[{"id":"company_link","type":"SelectorLink","parentSelectors":["_root"],"selector":"a.list-group-item","multiple":true,"delay":0},{"id":"wrapper","type":"SelectorElement","parentSelectors":["company_link"],"selector":"div[itemtype='https://schema.org/LocalBusiness']","multiple":false,"delay":0},{"id":"company_name","type":"SelectorText","parentSelectors":["wrapper"],"selector":"[itemprop='name'] a","multiple":false,"regex":"","delay":0},{"id":"wrapper-details","type":"SelectorElement","parentSelectors":["wrapper"],"selector":"div.list-group-item","multiple":true,"delay":0},{"id":"details","type":"SelectorText","parentSelectors":["wrapper-details"],"selector":"div.col-sm-9","multiple":true,"regex":"","delay":0}]}

The pagination works in the same way as when using a number sequence, just need to use the "Next" button to iterate through the pages.

Just add 6000 or so delay to the sitemap when scraping, as the page takes a couple of seconds to render.

{"_id":"company-a","startUrl":["https://www.sgpbusiness.com/browse/A/after/1118834/"],"selectors":[{"id":"company_link","type":"SelectorLink","parentSelectors":["_root","pagination"],"selector":"a.list-group-item","multiple":true,"delay":0},{"id":"wrapper","type":"SelectorElement","parentSelectors":["company_link"],"selector":"div[itemtype='https://schema.org/LocalBusiness']","multiple":false,"delay":0},{"id":"company_name","type":"SelectorText","parentSelectors":["wrapper"],"selector":"[itemprop='name'] a","multiple":false,"regex":"","delay":0},{"id":"wrapper-details","type":"SelectorElement","parentSelectors":["wrapper"],"selector":"div.list-group-item","multiple":true,"delay":0},{"id":"details","type":"SelectorText","parentSelectors":["wrapper-details"],"selector":"div.col-sm-9","multiple":true,"regex":"","delay":0},{"id":"pagination","type":"SelectorLink","parentSelectors":["_root","pagination"],"selector":".next a","multiple":true,"delay":0}]}

1 Like

Looks like its working, I am trying out scraping from the site now. Might take a while. May I asked how did you manage to duplicate the pagination?