Tripadvisor Hotel html links - can't do next page

plokiploki2018 · July 21, 2018, 2:41pm

Hi.

I'm trying to extract the html address of each hotel list in tripadvidor. Later , I'll do a post processing with these addresses to extract more information of them.

My start page is this one: https://www.tripadvisor.pt/Hotels-g189100-Portugal-Hotels.html

The name of the hotel contains a link, this is my goal.

I need to click "Seguinte" and extract next page until the end.

It seems is working while is scrapping but when finish I only extract 30 URL.

{"_id":"trip","startUrl":["https://www.tripadvisor.pt/Hotels-g189100-Portugal-Hotels.html"],"selectors":[{"id":"quadros","type":"SelectorElement","selector":"div.prw_rup.prw_meta_hsx_responsive_listing","parentSelectors":["_root"],"multiple":true,"delay":"3000"},{"id":"link","type":"SelectorElementAttribute","selector":"a.property_title","parentSelectors":["quadros"],"multiple":false,"extractAttribute":"href","delay":0},{"id":"seguinte","type":"SelectorElementClick","selector":"div.prw_rup.prw_meta_hsx_responsive_listing","parentSelectors":["_root"],"multiple":true,"delay":"2000","clickElementSelector":"span.nav.next","clickType":"clickMore","discardInitialElements":false,"clickElementUniquenessType":"uniqueText"}]}

What I'm doing wrong?
Anybody can help me, please?

Thank you in advance.

bretfeig · July 21, 2018, 7:24pm

Here you go! You only need one element-click. I used .nav:last for the click element and main_col for the selector.

Everything else goes as a child of that.

{"_id":"trip","startUrl":["https://www.tripadvisor.pt/Hotels-g189100-Portugal-Hotels.html"],"selectors":[{"id":"seguinte","type":"SelectorElementClick","selector":".main_col","parentSelectors":["_root"],"multiple":true,"delay":"4000","clickElementSelector":".nav:last","clickType":"clickMore","discardInitialElements":false,"clickElementUniquenessType":"uniqueText"},{"id":"Hotel HREF","type":"SelectorElementAttribute","selector":".property_title","parentSelectors":["seguinte"],"multiple":false,"extractAttribute":"href","delay":0},{"id":"","type":"SelectorText","selector":"","parentSelectors":["seguinte"],"multiple":false,"regex":"","delay":0}]}

plokiploki2018 · July 22, 2018, 7:13pm

Thank you!!! You save my life. I learned a lot with you.

webcom · April 12, 2019, 10:36am

Please help me too. )

{"_id":"tripsaranda","startUrl":["https://www.tripadvisor.ru/Restaurants-g303165-Saranda_Vlore_County.html"],"selectors":[{"id":"seguinte","type":"SelectorElementClick","parentSelectors":["_root"],"selector":"div.pageNumbers","multiple":true,"delay":"4000","clickElementSelector":".nav:last","clickType":"clickMore","discardInitialElements":false,"clickElementUniquenessType":"uniqueText"},{"id":"Name","type":"SelectorLink","parentSelectors":["_root"],"selector":"a.property_title","multiple":true,"delay":0}]}

Scrap only from 1-st page. I need from all pages.

bretfeig · May 4, 2019, 10:12am

Here you go

{"_id":"tripadviser","startUrl":["https://www.tripadvisor.ru/Restaurants-g303165-Saranda_Vlore_County.html#EATERY_OVERVIEW_BOX"],"selectors":[{"id":"Element Click","type":"SelectorElementClick","parentSelectors":["_root"],"selector":"div.listing","multiple":true,"delay":0,"clickElementSelector":"a.nav:contains("next")","clickType":"clickMore","discardInitialElements":"do-not-discard","clickElementUniquenessType":"uniqueCSSSelector"},{"id":"Name","type":"SelectorLink","parentSelectors":["Element Click"],"selector":"a.property_title","multiple":false,"delay":0}]}

This will paginate through all pages and then extract the name of the restaurant. You can add any selector you want inside the element select (child). If you want it to visit each listing, add a link selector

Note: do not click "multiple" as this is already inherited by the main element click selector which acts as both a link selector and element selector in one.