Hello,
I am trying to scrape the reviews for the Morton Arboretum from TripAdvisor. In addition to the whole review, I would like to scrape the username, date of experience, bubble rating, and title. As of today, March 28, 2019, there are 868 reviews spread across 87 pages. I first tried to paginate through all the pages, then, open each review to scrape the entire review, username, experience date, bubble rating, and title. The problem with this method is that TripAdvisor lists the username, after the first page of reviews, as TripAdvisor Member in the detailed review page. Instead of listing the actual username.
The second method I have tried is to go through the 87 pages of reviews and use the elementclick to expand the ‘more’ expansion link. TripAdvisor hides the complete review under the ‘more’ expansion link. I have not been successful in expanding more than the first couple of ‘more’ links.
Thoughts?
Scrape method 1:
{"_id":"march21mortonarboretum2","startUrl":["https://www.tripadvisor.com/Attraction_Review-g36269-d132786-Reviews-Morton_Arboretum-Lisle_DuPage_County_Illinois.html"],"selectors":[{"id":"elementselector","type":"SelectorElementClick","parentSelectors":["_root"],"selector":"div.ratings_and_types","multiple":true,"delay":0,"clickElementSelector":"div.mobile-more a.nav.next","clickType":"clickMore","discardInitialElements":false,"clickElementUniquenessType":"uniqueCSSSelector"},{"id":"link_review","type":"SelectorLink","parentSelectors":["elementselector"],"selector":"a.title","multiple":true,"delay":0},{"id":"wholereviewinsidewholereview","type":"SelectorText","parentSelectors":["link_review"],"selector":"span.fullText","multiple":false,"regex":"","delay":0},{"id":"username","type":"SelectorText","parentSelectors":["elementselector"],"selector":"div.info_text div:nth-of-type(1)","multiple":false,"regex":"","delay":0},{"id":"location","type":"SelectorText","parentSelectors":["elementselector"],"selector":"strong","multiple":false,"regex":"","delay":0},{"id":"revieweddate","type":"SelectorText","parentSelectors":["elementselector"],"selector":"span.ratingDate","multiple":false,"regex":"","delay":0},{"id":"rating","type":"SelectorElementAttribute","parentSelectors":["elementselector"],"selector":"span.ui_bubble_rating","multiple":false,"extractAttribute":"class","delay":0},{"id":"contributionsnumber","type":"SelectorText","parentSelectors":["elementselector"],"selector":"span.badgetext:nth-of-type(2)","multiple":false,"regex":"","delay":0},{"id":"helpfulvotes","type":"SelectorText","parentSelectors":["elementselector"],"selector":"span.badgetext:nth-of-type(4)","multiple":false,"regex":"","delay":0}]}
Scrape method 2: A Shorter sitemap to test the click ‘more’ scrape functionality
{"_id":"march27morton5","startUrl":["https://www.tripadvisor.com/Attraction_Review-g36269-d132786-Reviews-Morton_Arboretum-Lisle_DuPage_County_Illinois.html"],"selectors":[{"id":"clickmore","type":"SelectorElementClick","parentSelectors":["_root"],"selector":"div.rev_wrap","multiple":true,"delay":"300","clickElementSelector":"div.prw_rup p.partial_entry span.taLnk","clickType":"clickOnce","discardInitialElements":false,"clickElementUniquenessType":"uniqueCSSSelector"},{"id":"review","type":"SelectorText","parentSelectors":["clickmore"],"selector":"div.entry","multiple":true,"regex":"","delay":"300"}]}