Hello,
I’ve been trying to scrape the number of reviews per language of each attraction of a city from tripadvisor, but I can’t seem to go forward.
I am able to get the data for a single location:
{"_id":"tripadvisor_idiomas_poi_individual","startUrl":["https://www.tripadvisor.es/Attraction_Review-g187457-d675885-Reviews-La_Concha_Beach-San_Sebastian_Donostia_Province_of_Guipuzcoa_Basque_Country.html"],"selectors":[{"clickElementSelector":"button[aria-label='Español (España): Español (España) (4733)']","clickElementUniquenessType":"uniqueCSSSelector","clickType":"clickOnce","delay":2000,"discardInitialElements":"do-not-discard","id":"dropdown","multiple":false,"parentSelectors":["_root"],"selector":"div[data-automation='WebPresentation_PoiReviewsAndQAWeb']","type":"SelectorElementClick"},{"id":"languages","multiple":false,"parentSelectors":["_root"],"regex":"","selector":"ul.IIbRQ","type":"SelectorText"}]}
But when it comes to make the scraper go through several location links (https://www.tripadvisor.es/Attractions-g187457-Activities-oa0-San_Sebastian_Donostia_Province_of_Guipuzcoa_Basque_Country.html) it doesn’t work due to the dropdown button selector being different for each one (it is dependant of the number of reviews in your language – Spanish in my case, you can see the 4733 above).
So far I’ve tried to extract the HTML code of the button that includes those reviews, which then would need a regex in order to get the number of reviews of that particular location (which I don’t know how to do).
So, the returned HTML (which is constant for every location except for the number of reviews between brackets) is:
*<button class="OKHdJ z Pc PQ Pp PD W _S Gn Z B2 BF _M PQFNM wSSLS" type="button" aria-haspopup="listbox" aria-label="Español (España): Español (España) (4733)"><div class="RCAPL u"><span class="biGQs _P vvmrG">Español (España)</span><span class="NK"><svg viewBox="0 0 25 24" width="20px" height="20px" class="d Vb UmNoP"><path fill-rule="evenodd" clip-rule="evenodd" d="M5.188 7.521l6.836 6.837 6.837-6.837 1.06 1.06-7.366 7.368a.75.75 0 01-1.061 0L4.127 8.582l1.06-1.06z"></path></svg></span></div></button>*
Would there then be a way to introduce that number in a children Element link to allow the scraping working?
Does that make any sense at all? Any alternatives?
Any help would be very much appreciated. And thanks a lot for this magnificent tool.
