Hello Where am I doing it wrong?

sony · July 18, 2024, 3:07pm

Describe the problem.
I am using webscraper, i watched tutorial
I only want a basic scrap
From page 1 to the end
With: Level 1 : Name + adress
Level 2 (click on the link on the name) : phones numbers (the whole block of text) + email + block of text named "Structure"

Url: Annuaire

Sitemap:

{"_id":"avocats_val_de_marne","startUrl":["Annuaire lien","parentSelectors":["_root"],"type":"SelectorElementClick","clickActionType":"real","clickElementSelector":".card-title a","clickElementUniquenessType":"uniqueText","clickType":"clickOnce","delay":2000,"discardInitialElements":"do-not-discard","multiple":true,"selector":".card-title a"},{"id":"numero","parentSelectors":["suivre lien"],"type":"SelectorText","selector":".colonne_fiche p:nth-of-type(1)","multiple":false,"regex":""}]}

Thanks

don2010 · July 19, 2024, 12:51pm

something like this:

{"_id":"AVOCATS","startUrl":["https://www.avocats-valdemarne.com/annuaire?recherche=1&page=[1-10]"],"selectors":[{"id":"link","linkType":"linkFromHref","multiple":true,"parentSelectors":["_root"],"selector":".card-title a","type":"SelectorLink"},{"extractAttribute":"","id":"fixe","parentSelectors":["link"],"selector":"div.colonne_fiche a[href*=\"tel\"]","type":"SelectorGroup"},{"id":"email","multiple":false,"parentSelectors":["link"],"regex":"","selector":"div.colonne_fiche a[href*=\"mailto\"]:contains(\"@\")","type":"SelectorText"},{"id":"adres","multiple":false,"parentSelectors":["link"],"regex":"","selector":"address","type":"SelectorText"}]}

You can change a range of pages to be scraped in your start URL: &page=[1-10] in this example you can scrape till 10 pages.

sony · July 20, 2024, 8:01am

Thanks don2010, it works