Impossible to scrap Mail in TripAdvisor

Hi there,
I encounter a problem in the following sitemap,
To experience the problem, please go on the following link. Inspect the page, then import the sitemap. Go to exploreur/Fiche/ and you will be able to experiment what i mean.

The email of the following URL is displayed. Once you click on it, a popup start and display the right Email of the contact.
But i did not find a way to scrap that Email. Usually in a similar case, i use the "link" option, but this Email link seem to not be recognized as a link by webscraper.
Only Linkpopup is giving me data but the href si "null"

Any thoughts about how to recognize that email ? Is TripAdvisor starting to protect their Emails ? Is there a potential new functionnality to devellop to be able to get around that ?

Thank you very much for your advices,

Url: https://www.tripadvisor.fr/Attraction_Review-g187147-d1600380-Reviews-La_Cuisine_Paris_Cooking_Classes-Paris_Ile_de_France.html

Sitemap:
{"_id":"tripadvisor_atelier","startUrl":["https://www.tripadvisor.fr/Attractions-g187147-Activities-c41-Paris_Ile_de_France.html","https://www.tripadvisor.fr/Attractions-g187139-Activities-c41-Corsica.html","https://www.tripadvisor.fr/Attractions-g187079-Activities-c41-Bordeaux_Gironde_Nouvelle_Aquitaine.html","https://www.tripadvisor.fr/Attractions-g187265-Activities-c41-Lyon_Rhone_Auvergne_Rhone_Alpes.html","https://www.tripadvisor.fr/Attractions-g187234-Activities-c41-Nice_French_Riviera_Cote_d_Azur_Provence_Alpes_Cote_d_Azur.html","https://www.tripadvisor.fr/Attractions-g187253-Activities-c41-Marseille_Bouches_du_Rhone_Provence_Alpes_Cote_d_Azur.html","https://www.tripadvisor.fr/Attractions-g187175-Activities-c41-Toulouse_Haute_Garonne_Occitanie.html","https://www.tripadvisor.fr/Attractions-g187178-Activities-c41-Lille_Nord_Hauts_de_France.html","https://www.tripadvisor.fr/Attractions-g187153-Activities-c41-Montpellier_Herault_Occitanie.html","https://www.tripadvisor.fr/Attractions-g187075-Activities-c41-Strasbourg_Bas_Rhin_Grand_Est.html","https://www.tripadvisor.fr/Attractions-g187209-Activities-c41-Aix_en_Provence_Bouches_du_Rhone_Provence_Alpes_Cote_d_Azur.html","https://www.tripadvisor.fr/Attractions-g187198-Activities-c41-Nantes_Loire_Atlantique_Pays_de_la_Loire.html","https://www.tripadvisor.fr/Attractions-g187221-Activities-c41-Cannes_French_Riviera_Cote_d_Azur_Provence_Alpes_Cote_d_Azur.html","https://www.tripadvisor.fr/Attractions-g187217-Activities-c41-Antibes_French_Riviera_Cote_d_Azur_Provence_Alpes_Cote_d_Azur.html","https://www.tripadvisor.fr/Attractions-g13320446-Activities-c41-Carcassonne_Aude_Occitanie.html","https://www.tripadvisor.fr/Attractions-g187206-Activities-c41-La_Rochelle_Charente_Maritime_Nouvelle_Aquitaine.html","https://www.tripadvisor.fr/Attractions-g187212-Activities-c41-Avignon_Vaucluse_Provence_Alpes_Cote_d_Azur.html","https://www.tripadvisor.fr/Attractions-g187257-Activities-c41-Toulon_Var_Provence_Alpes_Cote_d_Azur.html","https://www.tripadvisor.fr/Attractions-g187264-Activities-c41-Grenoble_Isere_Auvergne_Rhone_Alpes.html","https://www.tripadvisor.fr/Attractions-g187137-Activities-c41-Reims_Marne_Grand_Est.html","https://www.tripadvisor.fr/Attractions-g187091-Activities-c41-Clermont_Ferrand_Puy_de_Dome_Auvergne_Rhone_Alpes.html","https://www.tripadvisor.fr/Attractions-g187111-Activities-c41-Dijon_Cote_d_Or_Bourgogne_Franche_Comte.html","https://www.tripadvisor.fr/Attractions-g187148-Activities-c41-Versailles_Yvelines_Ile_de_France.html","https://www.tripadvisor.fr/Attractions-g187109-Activities-c41-Beaune_Cote_d_Or_Bourgogne_Franche_Comte.html","https://www.tripadvisor.fr/Attractions-g187130-Activities-c41-Tours_Indre_et_Loire_Centre_Val_de_Loire.html","https://www.tripadvisor.fr/Attractions-g187226-Activities-c41-Hyeres_French_Riviera_Cote_d_Azur_Provence_Alpes_Cote_d_Azur.html","https://www.tripadvisor.fr/Attractions-g187080-Activities-c41-Biarritz_Basque_Country_Pyrenees_Atlantiques_Nouvelle_Aquitaine.html","https://www.tripadvisor.fr/Attractions-g580167-Activities-c41-Ile_d_Oleron_Charente_Maritime_Nouvelle_Aquitaine.html","https://www.tripadvisor.fr/Attractions-g187151-Activities-c41-Carcassonne_Center_Carcassonne_Aude_Occitanie.html","https://www.tripadvisor.fr/Attractions-g196484-Activities-c41-Porto_Vecchio_Corse_du_Sud_Corsica.html","https://www.tripadvisor.fr/Attractions-g207359-Activities-c41-Morzine_Portes_du_Soleil_Haute_Savoie_Auvergne_Rhone_Alpes.html","https://www.tripadvisor.fr/Attractions-g187269-Activities-c41-Saint_Etienne_Loire_Auvergne_Rhone_Alpes.html","https://www.tripadvisor.fr/Attractions-g669639-Activities-c41-Saint_Martin_de_Belleville_Savoie_Auvergne_Rhone_Alpes.html","https://www.tripadvisor.fr/Attractions-g196504-Activities-c41-Anglet_Basque_Country_Pyrenees_Atlantiques_Nouvelle_Aquitaine.html","https://www.tripadvisor.fr/Attractions-g196714-Activities-c41-Tignes_Savoie_Auvergne_Rhone_Alpes.html","https://www.tripadvisor.fr/Attractions-g187256-Activities-c41-Saint_Remy_de_Provence_Bouches_du_Rhone_Provence_Alpes_Cote_d_Azur.html","https://www.tripadvisor.fr/Attractions-g187157-Activities-c41-Uzes_Gard_Occitanie.html","https://www.tripadvisor.fr/Attractions-g187266-Activities-c41-Megeve_Haute_Savoie_Auvergne_Rhone_Alpes.html","https://www.tripadvisor.fr/Attractions-g187271-Activities-c41-Val_d_Isere_Savoie_Auvergne_Rhone_Alpes.html","https://www.tripadvisor.fr/Attractions-g187262-Activities-c41-Courchevel_Savoie_Auvergne_Rhone_Alpes.html","https://www.tripadvisor.fr/Attractions-g187224-Activities-c41-Grasse_French_Riviera_Cote_d_Azur_Provence_Alpes_Cote_d_Azur.html","https://www.tripadvisor.fr/Attractions-g641840-Activities-c41-Capbreton_Landes_Nouvelle_Aquitaine.html","https://www.tripadvisor.fr/Attractions-g1080950-Activities-c41-Les_Allues_Savoie_Auvergne_Rhone_Alpes.html","https://www.tripadvisor.fr/Attractions-g580182-Activities-c41-Meribel_Les_Allues_Savoie_Auvergne_Rhone_Alpes.html","https://www.tripadvisor.fr/Attractions-g196716-Activities-c41-Val_Thorens_Saint_Martin_de_Belleville_Savoie_Auvergne_Rhone_Alpes.html","https://www.tripadvisor.fr/Attractions-g562740-Activities-c41-Bourg_Saint_Maurice_Savoie_Auvergne_Rhone_Alpes.html","https://www.tripadvisor.fr/Attractions-g196712-Activities-c41-Samoens_Grand_Massif_Haute_Savoie_Auvergne_Rhone_Alpes.html","https://www.tripadvisor.fr/Attractions-g196707-Activities-c41-Les_Deux_Alpes_Isere_Auvergne_Rhone_Alpes.html","https://www.tripadvisor.fr/Attractions-g187272-Activities-c41-Valence_Drome_Auvergne_Rhone_Alpes.html","https://www.tripadvisor.fr/Attractions-g196672-Activities-c41-Bandol_French_Riviera_Cote_d_Azur_Provence_Alpes_Cote_d_Azur.html","https://www.tripadvisor.fr/Attractions-g425002-Activities-c41-Hossegor_Landes_Nouvelle_Aquitaine.html","https://www.tripadvisor.fr/Attractions-g13231714-Activities-c41-Val_Cenis_Savoie_Auvergne_Rhone_Alpes.html","https://www.tripadvisor.fr/Attractions-g1056015-Activities-c41-Saint_Gilles_Croix_de_Vie_Vendee_Pays_de_la_Loire.html","https://www.tripadvisor.fr/Attractions-g1933300-Activities-c41-Abondance_Portes_du_Soleil_Haute_Savoie_Auvergne_Rhone_Alpes.html","https://www.tripadvisor.fr/Attractions-g227893-Activities-c41-Montrouge_Hauts_de_Seine_Ile_de_France.html","https://www.tripadvisor.fr/Attractions-g562738-Activities-c41-Avoriaz_Morzine_Portes_du_Soleil_Haute_Savoie_Auvergne_Rhone_Alpes.html","https://www.tripadvisor.fr/Attractions-g790302-Activities-c41-Vincennes_Val_de_Marne_Ile_de_France.html","https://www.tripadvisor.fr/Attractions-g226885-Activities-c41-Louviers_Eure_Haute_Normandie_Normandy.html"],"selectors":[{"id":"Exploreur","type":"SelectorElementClick","parentSelectors":["_root"],"selector":"div.listing_title a","multiple":true,"delay":0,"clickElementSelector":"a.nav","clickType":"clickMore","discardInitialElements":false,"clickElementUniquenessType":"uniqueText"},{"id":"fiche","type":"SelectorLink","parentSelectors":["Exploreur"],"selector":"parent","multiple":true,"delay":0},{"id":"Nom","type":"SelectorText","parentSelectors":["fiche"],"selector":"h1.ui_header","multiple":false,"regex":"","delay":0},{"id":"Popularité","type":"SelectorText","parentSelectors":["fiche"],"selector":"span.header_popularity","multiple":false,"regex":"","delay":0},{"id":"Adresse1","type":"SelectorText","parentSelectors":["fiche"],"selector":"span.detail span.street-address","multiple":false,"regex":"","delay":0},{"id":"Adresse2","type":"SelectorText","parentSelectors":["fiche"],"selector":"span.detail span.locality","multiple":false,"regex":"","delay":0},{"id":"Adresse3","type":"SelectorText","parentSelectors":["fiche"],"selector":"span.detail span.country-name","multiple":false,"regex":"","delay":0},{"id":"Types","type":"SelectorText","parentSelectors":["fiche"],"selector":"span.is-hidden-mobile","multiple":false,"regex":"","delay":0},{"id":"Urls","type":"SelectorPopupLink","parentSelectors":["fiche"],"selector":"div.contactType.website span.taLnk","multiple":false,"delay":0},{"id":"Emails","type":"SelectorPopupLink","parentSelectors":["fiche"],"selector":"div.contactType.email span.taLnk","multiple":false,"delay":"0"}]}

where is thee email in the page? Can you share a screenshot ?

1 Like

Not sure about this one. Click on e-mail runs, what i think is a java script, that calls an open-window function. Not sure how to scrape. Anton ? @iconoclast

1 Like

Hi there,
Thank you Bretfreig and nctbrtc,

Sorry the link i shared was the root of the Sitemap, here is the right page where your can see the Email link below on the right content box.

I tried to share a screen but it didnt want to upload :
Here it work now :

image

Yes it does call an open-window function but i use to be able to scrap that kind of link on other website but here i'm struggling,

If anybody has a solution i would be very happy :slight_smile:

Thank you

Hello there!

Have you tried clicking an E-mail button? Does it bring anything? It doesn't work for me at all.

Hello Iconoclast,

Thanks for your answer,
Yes i tried the element click
I also tried using a link as a child of element click but always bring null

Maybe there is something hiding the real Email ?

I don't know ...

hi there @iconoclast ,

Here is some news. I discussed it with my developper.
Here is what we found out :

  1. The adress is effectively not on the code of the page
  2. TripAdvisor use a JS to call an open window to redirect to your default mail manager
  3. The Email of the contact appear in the URL of your mail manager (see screenshot)

Now the question is : With webscraper, is it possible to get URL (http) from website that use JavaScript to redirect to a different link ? If yes, how ? And if not, is there any update coming up soon for it ? :slight_smile:

For exemple, here i try to use a click selector on email, then it open, in the same window, my gmail with the adresse. So i try to use as a child a text selector, but it does not recognize the parent selector (the Email redirection link in JS)

New functionnality to implement ?

Thank you for your help !
Have a nice one

It can be done using Tampermonkey script. I need some time to make one.

I don't know much about scripting, but i found that, i don't know if it could be of any help (apparently script to block a pop-up) : https://greasyfork.org/fr/scripts/37654-popup-blocker-script

It's very interesting.
We should make an application to store parameters to the .txt file and setup this application as default mail manager.
Or just use .bat file and compile using bat2exe.

Any news on webscraper to be able to get these email displayed on the Urls ?

Thanks in advance for your help :slight_smile:

On my PC, I have a separate instance of Chrome used only for scraping. So it does not have any mail program set up, meaning if I click on any email or mailto:, it doesn't do anything.

For the tripadvisor example above, I clicked on the Email button and nothing happened on screen. But checking the inspector, I found that the server will return the email as a href in the HTML:

Once the email appears, it is only a matter of scraping it with Element Attribute selector:

2019-09-18_212152

Settings I used:
Type: Element Attribute
Selector: div.attractions-contact-card-ContactCard__contactCard--zMh3H > div:nth-child(2) > div:nth-child(3) > div > div > a
Attribute: href

Of course, you would need to figure out the site navigation and button clicking first.

Thank you so much @leemeng for this update,

Very interresting ... You gave me some hope :smiley:
But ... It does not work for me for some reasons :confused:

Where do you get this link " div.attractions-contact-card-ContactCard__contactCard--zMh3H > div:nth-child(2) >" ? Me, when i inspect the element of the Email, here is what i get :

Here is a sample of the data i can get :
Email is returning me a "null" result with the selector as you said


{"_id":"tripadvisor_test","startUrl":["https://www.tripadvisor.fr/Attractions-g11038854-Activities-c41-Bourgogne_Franche_Comte.html","https://www.tripadvisor.fr/Attractions-g2349678-Activities-c56-t272-Haute_Garonne_Occitanie.html","https://www.tripadvisor.fr/Attractions-g187153-Activities-c56-Montpellier_Herault_Occitanie.html"],"selectors":[{"id":"Exploreur","type":"SelectorElementClick","parentSelectors":["_root"],"selector":"div.listing_details","multiple":true,"delay":"5000","clickElementSelector":"a.next:last","clickType":"clickMore","discardInitialElements":"do-not-discard","clickElementUniquenessType":"uniqueText"},{"id":"fiche","type":"SelectorLink","parentSelectors":["Exploreur"],"selector":".listing_info > a","multiple":true,"delay":0},{"id":"Nom","type":"SelectorText","parentSelectors":["fiche"],"selector":"h1.ui_header","multiple":false,"regex":"","delay":0},{"id":"Popularité","type":"SelectorText","parentSelectors":["fiche"],"selector":"span.header_popularity","multiple":false,"regex":"","delay":0},{"id":"Adresse1","type":"SelectorText","parentSelectors":["fiche"],"selector":"span.detail span.street-address","multiple":false,"regex":"","delay":0},{"id":"Adresse2","type":"SelectorText","parentSelectors":["fiche"],"selector":"span.detail span.locality","multiple":false,"regex":"","delay":0},{"id":"Adresse3","type":"SelectorText","parentSelectors":["fiche"],"selector":"span.detail span.country-name","multiple":false,"regex":"","delay":0},{"id":"Types","type":"SelectorText","parentSelectors":["fiche"],"selector":"span.is-hidden-mobile","multiple":false,"regex":"","delay":0},{"id":"Urls","type":"SelectorPopupLink","parentSelectors":["fiche"],"selector":"div.contactType.website span.taLnk","multiple":false,"delay":0},{"id":"Emails","type":"SelectorElementAttribute","parentSelectors":["fiche"],"selector":"div.attractions-contact-card-ContactCard__contactCard--zMh3H > div:nth-child(2) > div:nth-child(3) > div > div > a","multiple":false,"extractAttribute":"href","delay":"0"}]}

Hi you would need to click on the Email button (or use Element Click) to trigger the server. If it works, you'll see the see the email element pop up in real time in the inspector.

But as I mentioned, this only seems to work if you have a "clean" copy of Chrome which does not have email client set up (i.e. Chrome must not intercept mailto: links).

Thank you for the reply,

Do you have any sitemap to display ?
So you are saying i need to have an element attribute as a child of an element click with a lean chrome interface ?

I'm gonna try :slight_smile: Thank you

But i'm still curious of how you get this inspection like tris ...

In a link like this one : https://www.tripadvisor.fr/Attraction_Review-g187147-d3704252-Reviews-Le_Foodist-Paris_Ile_de_France.html
i get this :

On a link like this one : https://www.tripadvisor.fr/Attraction_Review-g187109-d1792256-Reviews-Burgundy_Wine_School-Beaune_Cote_d_Or_Bourgogne_Franche_Comte.html

So there is two tipes of presentation and so two different element attributes to display i suppose.

And also, i'm curious how do you see the element afterclick ?
Thank you for your help :slight_smile: you save me hips of time

I can see the email when tripadvisor shows the Burgundy Wine layout, but not for the other type of layout. I don't know how to extract that one.

Video_2019-09-19_210730

I'm not really sure how to replicate this. It kinda works by accident for me because I don't set any default mailto: handler, i.e. I don't want email links to launch anything. If I recall, when you launch Chrome for the first time, it will eventually ask you if you want Chrome to handle mailto: links, or something like that. I just click No. There may also be a similar setting in Windows.

Thank you very much @leemeng,

I think there is much more data displayed as the first encoded state than the "open" state like Burgondy where the email seem to pop into the HTML,

I suppose it exist a workaround that @NetworkReject tried to explain to me intothis subject :

By trying to pull the "data-encoded-url" of the attribute and then decode it with a tierce software,
So i suppose we have all the element not far but i still don't manage to get it right.

I'll try to dig it deeper by setting my chrome out of the mailto: and i'll try to keep you updated with further avancement ...

So frustrating though that i was able few month ago to get it right and now i'm incapable of getting any data.

Thank you for your help :slight_smile:

If you want to explore more, try the link below. It is currently beyond my skills and I don't know how to integrate it with WS:

https://kaijento.github.io/2017/03/17/web-scraping-tripadvisor.com/

LATEST: I have found a solution for the "Le Foodist" type of layout. Turns out the cleartext for both the website and email is stored right in the source code. But it not located anywhere near contact info elements.

So I just used HTML element to grab body, and I picked out the email and website with Regex. A bit crude (we're basically grabbing the source code for entire page ) but it seems to work here. You will need to do some post-cleanup but it should be straightforward. This sitemap works on my PC:

{"_id":"tripadvisor_get_email","startUrl":["https://www.tripadvisor.fr/Attraction_Review-g187147-d3704252-Reviews-Le_Foodist-Paris_Ile_de_France.html"],"selectors":[{"id":"name","type":"SelectorText","parentSelectors":["_root"],"selector":"h1.ui_header","multiple":false,"regex":"","delay":0},{"id":"get_email","type":"SelectorHTML","parentSelectors":["_root"],"selector":"body","multiple":false,"regex":"email\":\"[\\w_@\\.\\-]{3,99}\\b","delay":0},{"id":"get_website","type":"SelectorHTML","parentSelectors":["_root"],"selector":"body","multiple":false,"regex":"website\":\"[^\"]{5,125}\\b","delay":0}]}

1 Like

@leemeng, thank you so much !!!

I had a very rough day and a tough week to be honest,
And your post just cleared all my mind and soul before i can take my holliday,

I can be now i peace for the all year xD

Thank you so much ! It seems to work.

I'll see when i come back from holliday :wink:

Have a good one !