Scrap URL (http) from website that use JS to redirect to a different link

Hi there,

Following my topic on "how to" (you can get my long sitemap from there)

Link exemple :
https://www.tripadvisor.fr/Attraction_Review-g187147-d1600380-Reviews-La_Cuisine_Paris_Cooking_Classes-Paris_Ile_de_France.html3

Screen contact :
image

TripAdvisor use a JS to call an open window to redirect to your default mail manager (i set it up from chrome so that it open my gmail in the same window)

The Email of the contact appear in the URL of your mail manager (see screenshot)

image

I would like to get that URL (http) from website that use JavaScript to redirect to a different link. I tried all the different combinaitions without success...

For exemple,

  1. Element click selector on the email,
  2. Open on the same window the new URL redirected on your mail manager
  3. Use as a child of the elementclick, a text selector, but it does not recognize the parent selector (the Email redirection link in JS)

Looking forward for this new functionality,
Have a nice one

The data can be extracted with the 'Element Attribute Selector' but it will be encoded in base64. Use 'span.email ~ div div.ui_link' as your selector and 'data-encoded-url' as the attribute you are extracting. From there you will have something like "NXlPX21haWx0bzpjb250YWN0QGxhY3Vpc2luZXBhcmlzLmNvbV94cjI=" that can be decoded at various sites such as base64decode.org or make a script to do so in bulk. A quick google search shows that you can make a script in google sheets to do this also. https://joshuatz.com/posts/2019/google-sheets-quick-start-with-base64/

Good Luck!

Hi @NetworkReject,

Thank you very much for your answer,

My first question is : where do you find this 'span.email ~ div div.ui_link' ?
When i do Element preview or if i Inspect the page, i haven't found anything similar, so i'm curious and i wonder if my sitemap recognize at all this attribute. All it return after running is "null" :confused:

Do you have a sitemap to show where webscraper return the data encoded url ?
I would really appreciate if you could adapt it to my sitemap :

{"_id":"trip_link2","startUrl":["https://www.tripadvisor.fr/Attraction_Review-g187109-d15609964-Reviews-Caveau_Domaine_Loubet_Dewailly-Beaune_Cote_d_Or_Bourgogne_Franche_Comte.html","https://www.tripadvisor.fr/Attraction_Review-g187109-d12593821-Reviews-La_Manufacture_Beaune-Beaune_Cote_d_Or_Bourgogne_Franche_Comte.html","https://www.tripadvisor.fr/Attraction_Review-g641850-d8089959-Reviews-Art_du_Tonneau-Meursault_Cote_d_Or_Bourgogne_Franche_Comte.html","https://www.tripadvisor.fr/Attraction_Review-g2168052-d6415167-Reviews-In_Anna_s_Kitchen-Villy_le_Moutier_Cote_d_Or_Bourgogne_Franche_Comte.html","https://www.tripadvisor.fr/Attraction_Review-g187111-d15355477-Reviews-Cours_Photo_Dijon-Dijon_Cote_d_Or_Bourgogne_Franche_Comte.html","https://www.tripadvisor.fr/Attraction_Review-g187109-d1887274-Reviews-Sensation_Vin-Beaune_Cote_d_Or_Bourgogne_Franche_Comte.html"],"selectors":[{"id":"Nom","type":"SelectorText","parentSelectors":["_root"],"selector":"h1.ui_header","multiple":false,"regex":"","delay":0},{"id":"Popularité","type":"SelectorText","parentSelectors":["_root"],"selector":"span.header_popularity","multiple":false,"regex":"","delay":0},{"id":"Adresse1","type":"SelectorText","parentSelectors":["_root"],"selector":"span.detail span.street-address","multiple":false,"regex":"","delay":0},{"id":"Adresse2","type":"SelectorText","parentSelectors":["_root"],"selector":"span.detail span.locality","multiple":false,"regex":"","delay":0},{"id":"Adresse3","type":"SelectorText","parentSelectors":["_root"],"selector":"span.detail span.country-name","multiple":false,"regex":"","delay":0},{"id":"Types","type":"SelectorText","parentSelectors":["_root"],"selector":"span.is-hidden-mobile","multiple":false,"regex":"","delay":0},{"id":"Urls","type":"SelectorPopupLink","parentSelectors":["_root"],"selector":"div.contactType.website span.taLnk","multiple":false,"delay":0},{"id":"element","type":"SelectorElementAttribute","parentSelectors":["_root"],"selector":"span.email ~ div div.ui_link","multiple":false,"extractAttribute":"data-encoded-url","delay":0}]}

@iconoclast, still no clue of how to extract such a data ?

About few month ago, i did manage to extract the data using an element click on the .email span.taLnk,
As i set every tab to open in the same window and i use Gmail as default mail manager,
I was using that click selector with about 17000 delay (to allow the gmail to charge and display the email on top)
And then i would use a text selector to grab the email within the element click selector.

Problem : I lost all my sitemaps from before (from a mistaken resynchronisation, i lost all my extension ... ) and now, when i try again this technique, either im using a popup link or a element click selector,
whatever open the link directing to gmail will not appear within the browsing lines

Here is the most advanced method i tried so far ... to try this sitemap, you need to set "gmail" as your default email reader, you need to set your navigator to open every new window within the same tab.

Here is the sitemap : {"_id":"tripadvisor","startUrl":["https://www.tripadvisor.fr/Attraction_Review-g187109-d15609964-Reviews-Caveau_Domaine_Loubet_Dewailly-Beaune_Cote_d_Or_Bourgogne_Franche_Comte.html","https://www.tripadvisor.fr/Attraction_Review-g187109-d12593821-Reviews-La_Manufacture_Beaune-Beaune_Cote_d_Or_Bourgogne_Franche_Comte.html","https://www.tripadvisor.fr/Attraction_Review-g641850-d8089959-Reviews-Art_du_Tonneau-Meursault_Cote_d_Or_Bourgogne_Franche_Comte.html","https://www.tripadvisor.fr/Attraction_Review-g2168052-d6415167-Reviews-In_Anna_s_Kitchen-Villy_le_Moutier_Cote_d_Or_Bourgogne_Franche_Comte.html","https://www.tripadvisor.fr/Attraction_Review-g187111-d15355477-Reviews-Cours_Photo_Dijon-Dijon_Cote_d_Or_Bourgogne_Franche_Comte.html","https://www.tripadvisor.fr/Attraction_Review-g187109-d1887274-Reviews-Sensation_Vin-Beaune_Cote_d_Or_Bourgogne_Franche_Comte.html"],"selectors":[{"id":"Nom","type":"SelectorText","parentSelectors":["_root"],"selector":"h1.ui_header","multiple":false,"regex":"","delay":0},{"id":"Popularité","type":"SelectorText","parentSelectors":["_root"],"selector":"span.header_popularity","multiple":false,"regex":"","delay":0},{"id":"Adresse1","type":"SelectorText","parentSelectors":["_root"],"selector":"span.detail span.street-address","multiple":false,"regex":"","delay":0},{"id":"Adresse2","type":"SelectorText","parentSelectors":["_root"],"selector":"span.detail span.locality","multiple":false,"regex":"","delay":0},{"id":"Adresse3","type":"SelectorText","parentSelectors":["_root"],"selector":"span.detail span.country-name","multiple":false,"regex":"","delay":0},{"id":"Types","type":"SelectorText","parentSelectors":["_root"],"selector":"span.is-hidden-mobile","multiple":false,"regex":"","delay":0},{"id":"Urls","type":"SelectorPopupLink","parentSelectors":["_root"],"selector":"div.contactType.website span.taLnk","multiple":false,"delay":0},{"id":"Emails","type":"SelectorElementClick","parentSelectors":["_root"],"selector":"div.M9","multiple":false,"delay":"20000","clickElementSelector":"div.contactType.email span.taLnk","clickType":"clickOnce","discardInitialElements":"do-not-discard","clickElementUniquenessType":"uniqueText"},{"id":"mail","type":"SelectorText","parentSelectors":["element"],"selector":"div.vT","multiple":false,"regex":"","delay":3000},{"id":"element","type":"SelectorElement","parentSelectors":["Emails"],"selector":"form","multiple":false,"delay":0}]}

II would need the lights from an expert :slight_smile: Its been 7 days i'm struggling to get this sitemap back on track.
I know that few month ago, i did manage to get data so i know it is possible,
But impossible to remember how ...

Thank you very much for your lights !

Hi, I provided a solution for scraping tripadvisor email here:

Thank you very much @leemeng,

But i still don't manage to use this Element attribute to get anything.
It always bring me "null"

@NetworkReject, could you try to adapt the Element attribute to get the Email on the following sitemap :

{"_id":"tripadvisor_test","startUrl":["https://www.tripadvisor.fr/Attractions-g11038854-Activities-c41-Bourgogne_Franche_Comte.html","https://www.tripadvisor.fr/Attractions-g2349678-Activities-c56-t272-Haute_Garonne_Occitanie.html","https://www.tripadvisor.fr/Attractions-g187153-Activities-c56-Montpellier_Herault_Occitanie.html"],"selectors":[{"id":"Exploreur","type":"SelectorElementClick","parentSelectors":["_root"],"selector":"div.listing_details","multiple":true,"delay":"5000","clickElementSelector":"a.next:last","clickType":"clickMore","discardInitialElements":"do-not-discard","clickElementUniquenessType":"uniqueText"},{"id":"fiche","type":"SelectorLink","parentSelectors":["Exploreur"],"selector":".listing_info > a","multiple":true,"delay":0},{"id":"Nom","type":"SelectorText","parentSelectors":["fiche"],"selector":"h1.ui_header","multiple":false,"regex":"","delay":0},{"id":"Popularité","type":"SelectorText","parentSelectors":["fiche"],"selector":"span.header_popularity","multiple":false,"regex":"","delay":0},{"id":"Adresse1","type":"SelectorText","parentSelectors":["fiche"],"selector":"span.detail span.street-address","multiple":false,"regex":"","delay":0},{"id":"Adresse2","type":"SelectorText","parentSelectors":["fiche"],"selector":"span.detail span.locality","multiple":false,"regex":"","delay":0},{"id":"Adresse3","type":"SelectorText","parentSelectors":["fiche"],"selector":"span.detail span.country-name","multiple":false,"regex":"","delay":0},{"id":"Types","type":"SelectorText","parentSelectors":["fiche"],"selector":"span.is-hidden-mobile","multiple":false,"regex":"","delay":0},{"id":"Urls","type":"SelectorPopupLink","parentSelectors":["fiche"],"selector":"div.contactType.website span.taLnk","multiple":false,"delay":0},{"id":"Emails","type":"SelectorElementAttribute","parentSelectors":["fiche"],"selector":"div.attractions-contact-card-ContactCard__contactCard--link > div:nth-child(2) > div:nth-child(3) > div > div > a","multiple":false,"extractAttribute":"href","delay":"0"}]}

Thank you very much in advance for your help !