Tripadvisor: reviews by language dropdown

Hello,

I’ve been trying to scrape the number of reviews per language of each attraction of a city from tripadvisor, but I can’t seem to go forward.

I am able to get the data for a single location:

{"_id":"tripadvisor_idiomas_poi_individual","startUrl":["https://www.tripadvisor.es/Attraction_Review-g187457-d675885-Reviews-La_Concha_Beach-San_Sebastian_Donostia_Province_of_Guipuzcoa_Basque_Country.html"],"selectors":[{"clickElementSelector":"button[aria-label='Español (España): Español (España) (4733)']","clickElementUniquenessType":"uniqueCSSSelector","clickType":"clickOnce","delay":2000,"discardInitialElements":"do-not-discard","id":"dropdown","multiple":false,"parentSelectors":["_root"],"selector":"div[data-automation='WebPresentation_PoiReviewsAndQAWeb']","type":"SelectorElementClick"},{"id":"languages","multiple":false,"parentSelectors":["_root"],"regex":"","selector":"ul.IIbRQ","type":"SelectorText"}]}

But when it comes to make the scraper go through several location links (https://www.tripadvisor.es/Attractions-g187457-Activities-oa0-San_Sebastian_Donostia_Province_of_Guipuzcoa_Basque_Country.html) it doesn’t work due to the dropdown button selector being different for each one (it is dependant of the number of reviews in your language – Spanish in my case, you can see the 4733 above).

So far I’ve tried to extract the HTML code of the button that includes those reviews, which then would need a regex in order to get the number of reviews of that particular location (which I don’t know how to do).

So, the returned HTML (which is constant for every location except for the number of reviews between brackets) is:

*<button class="OKHdJ z Pc PQ Pp PD W _S Gn Z B2 BF _M PQFNM wSSLS" type="button" aria-haspopup="listbox" aria-label="Español (España): Español (España) (4733)"><div class="RCAPL u"><span class="biGQs _P vvmrG">Español (España)</span><span class="NK"><svg viewBox="0 0 25 24" width="20px" height="20px" class="d Vb UmNoP"><path fill-rule="evenodd" clip-rule="evenodd" d="M5.188 7.521l6.836 6.837 6.837-6.837 1.06 1.06-7.366 7.368a.75.75 0 01-1.061 0L4.127 8.582l1.06-1.06z"></path></svg></span></div></button>*

Would there then be a way to introduce that number in a children Element link to allow the scraping working?

Does that make any sense at all? Any alternatives?

Any help would be very much appreciated. And thanks a lot for this magnificent tool.

After you click the languages button, you can scrape the same dropdown repeatedly and pick out the languages with regex, e.g.

(?<=Inglés\()\d+(?=\))

So the text order should not matter.

Example sitemap. modify as needed:

{"_id":"tripadvisor-get-languages","startUrl":["https://www.tripadvisor.es/Attraction_Review-g187457-d675885-Reviews-La_Concha_Beach-San_Sebastian_Donostia_Province_of_Guipuzcoa_Basque_Country.html"],"selectors":[{"id":"Location","parentSelectors":["_root"],"type":"SelectorText","selector":"header h1","multiple":false,"regex":""},{"id":"Click languages button","parentSelectors":["_root"],"type":"SelectorElementClick","clickActionType":"real","clickElementSelector":"div#tab-data-qa-reviews-0 > div:first-of-type div > div > div > div > div.C:nth-child(2) button","clickElementUniquenessType":"uniqueText","clickType":"clickOnce","delay":2000,"discardInitialElements":"do-not-discard","multiple":false,"selector":"div#tab-data-qa-reviews-0"},{"id":"Español ","parentSelectors":["_root"],"type":"SelectorText","selector":"div[data-menu=\"true\"]","multiple":false,"regex":"(?<=Español \\(España\\)\\()\\d+(?=\\))"},{"id":"Inglés","parentSelectors":["_root"],"type":"SelectorText","selector":"div[data-menu=\"true\"]","multiple":false,"regex":"(?<=Inglés\\()\\d+(?=\\))"},{"id":"Todos los idiomas","parentSelectors":["_root"],"type":"SelectorText","selector":"div[data-menu=\"true\"]","multiple":false,"regex":"(?<=Todos los idiomas\\()\\d+(?=\\))"}]}

NOTE: Close any pop-ups such as login prompts, ads, notices, etc or the button click may not work.