Hi there,
I'm trying to scrape customer reviews for restaurants on Yelp. My scraper works partly, but seems to stop well before it's finished scraping all reviews for each restaurant. And I'm also getting duplicates of each review in my csv.
Process:
Yelp homepage for "Restaurants in Montreal" (Start URL)
https://www.yelp.ca/search?cflt=restaurants&find_loc=Montreal%2C+QC%2C+Canada
-
Access restaurant link from homepage to scrape customer review data
-
Here you can see I created a review-wrapper to allow me to access the same data for each review element + simple scrape action for title of shop
-
I need the scraper to paginate through the homepage for all the different pages of the restaurant listings and links, and also paginate through the restaurant link, to access every review per restaurant.
Where am I going wrong? I appreciate any insight and am happy to clarify if anything isn't clear with my process.
Here's my sitemap:
{"_id":"mtl-rests-reviews","startUrl":["https://www.yelp.ca/search?cflt=restaurants&find_loc=Montreal%2C+QC%2C+Canada"],"selectors":[{"id":"shop","type":"SelectorLink","parentSelectors":["_root","pages"],"selector":"li:nth-of-type(n+8) .css-1pxmz4g a","multiple":true,"delay":0},{"id":"shop-name (key)","type":"SelectorText","parentSelectors":["shop"],"selector":"h1","multiple":false,"regex":"","delay":0},{"id":"review-wrapper","type":"SelectorElement","parentSelectors":["shop","review_pages"],"selector":"div.review__373c0__13kpL","multiple":true,"delay":0},{"id":"customer-name","type":"SelectorText","parentSelectors":["review-wrapper"],"selector":"a.css-166la90","multiple":false,"regex":"","delay":0},{"id":"customer-location","type":"SelectorText","parentSelectors":["review-wrapper"],"selector":"span.css-n6i4z7","multiple":false,"regex":"","delay":0},{"id":"rating","type":"SelectorElementAttribute","parentSelectors":["review-wrapper"],"selector":"div.i-stars__373c0__1T6rz","multiple":false,"extractAttribute":"aria-label","delay":0},{"id":"review-date","type":"SelectorText","parentSelectors":["review-wrapper"],"selector":"span.css-e81eai","multiple":false,"regex":"","delay":0},{"id":"review-text","type":"SelectorText","parentSelectors":["review-wrapper"],"selector":"span.raw__373c0__3rcx7","multiple":false,"regex":"","delay":0},{"id":"pages","type":"SelectorLink","parentSelectors":["_root"],"selector":"a.pagination-link-component__09f24__H0ICg","multiple":true,"delay":0},{"id":"review_pages","type":"SelectorLink","parentSelectors":["_root","shop"],"selector":"a.pagination-link-component__373c0__1fUdr","multiple":true,"delay":0}]}