Tripadvisor mail pagination

Hello Everyone,

I have been trying to scrape restaurants' emails from Tripadvisor. So far, I have manage to scrape the first 30 emails corresponding to the first 30 restaurants on the first page. Nothing is scraped from pages 2,3 and so on.

There are similar posts in the forum that I've tested them but they don't seem to work.



They all seem to run in the beginning but after a while they stopped and nothing is scraped.

Thank you for helping me out.

ps : Is there a way to limit the number of pages scraped ? Let's say to 15, for example ?

Start URL : https://www.tripadvisor.com/Restaurants-g187514-Madrid.html

Sitemap: This one extracts the first 30 restaurants from first page.

{"_id":"tripmadrid","startUrl":["https://www.tripadvisor.com/Restaurants-g187514-Madrid.html"],"selectors":[{"id":"Restaurant","type":"SelectorLink","parentSelectors":["_root"],"selector":"a._15_ydu6b","multiple":true,"delay":0},{"id":"mail","type":"SelectorLink","parentSelectors":["nombreresto"],"selector":"div._36TL14Jn:nth-of-type(2) a","multiple":false,"delay":0}]}

Tripadvisor's URL does change with each page, for example, pages 2 to 4 are:
https://www.tripadvisor.com/Restaurants-g187514-oa30-Madrid.html#EATERY_LIST_CONTENTS

https://www.tripadvisor.com/Restaurants-g187514-oa60-Madrid.html#EATERY_LIST_CONTENTS

https://www.tripadvisor.com/Restaurants-g187514-oa90-Madrid.html#EATERY_LIST_CONTENTS

So you don't really need a paginator and you can get the first 4 pages with these Start URLS:
https://www.tripadvisor.com/Restaurants-g187514-Madrid.html
https://www.tripadvisor.com/Restaurants-g187514-oa[30-90:30]-Madrid.html#EATERY_LIST_CONTENTS

Ref: Tripadvisor paginator
Also see: The Specify multiple urls with ranges method.

2 Likes

Thank you for the information + links. Here discovering scrapping so hope I'll manage to apply your tips!


Update : Finally managed to get the info I was interested in.

Hey Correlations,

Can you share the sitemap please ?