I've created a good scrape getting all the data I want from Yelp, but I would also like to limit how much data is scraped. The pagination is just a little different within the url as the start=0 refers to the start of the reviews (i.e. review number 10 would be start=0).
I've tried to use a pagination technique to scrape only the first 100 reviews (start=[0-100]), but it doesn't work as hoped. Any suggestions?
Sitemap:
{"_id":"yelp","startUrl":["https://www.yelp.com/search?find_desc=Eyewear+%26+Opticians&find_loc=Salt+Lake+City,+UT&start=0&sortby=rating"],"selectors":[{"id":"business-name","type":"SelectorLink","selector":"span.indexed-biz-name a.biz-name","parentSelectors":["next"],"multiple":true,"delay":0},{"id":"name","type":"SelectorText","selector":"h1.biz-page-title","parentSelectors":["business-name"],"multiple":false,"regex":"","delay":0},{"id":"website","type":"SelectorText","selector":"span.biz-website a","parentSelectors":["business-name"],"multiple":false,"regex":"","delay":0},{"id":"next","type":"SelectorElementClick","selector":"div.clearfix.scroll-map-container div.column.column-alpha","parentSelectors":["_root"],"multiple":true,"delay":"2000","clickElementSelector":"a.u-decoration-none.next span.icon","clickType":"clickMore","discardInitialElements":false,"clickElementUniquenessType":"uniqueText"},{"id":"stars","type":"SelectorHTML","selector":"div.biz-rating.biz-rating-very-large.clearfix > div","parentSelectors":["business-name"],"multiple":false,"regex":"","delay":0},{"id":"review-count","type":"SelectorText","selector":"div.rating-info span.review-count","parentSelectors":["business-name"],"multiple":false,"regex":"","delay":0}]}