How to scrape from link url with dropdown selector

Hello! I would like to ask for you help.

I want to scrape information from a new url link (for example Name and Street from a Store). Before I get to this link with that information, there is a page with all stores listed. For each Store (Selectors) there is already some information but you also have a "more information" button (Click selector) and after you click on that "more information" button new data will appear including a link url, where I want to get information (Name, Street of the store) from. Then i want the scraper to open the link url and extract the information from the link url.

Therefore, I want the webscraper to repeat the extraction from each link, so in the end I basically have all information from each link (from each store).

I hope somebody can help me with this topic and i'm looking forward to hear from you ideas.

@TAP Hello, would you be able to provide the starting URL or your sitemap?

Hey @viesturs ! Thank you for your reply!

The URL is: Angebote in meiner Nähe | Kaufland
When you open it you have to click on the minus to "zoom out" and get all stores in germany.

This is my Sitemap is :
{"_id":"kaufland2","startUrl":["Angebote in meiner Nähe | Kauflandparent","multiple":true,"delay":2000,"clickElementSelector":"button","clickType":"clickOnce","discardInitialElements":"do-not-discard","clickElementUniquenessType":"uniqueText"},{"id":"Detailseite","type":"SelectorLink","parentSelectors":["Mehrinformationen"],"selector":"a.a-link--store-list-detail","multiple":true,"delay":0},{"id":"Filiale","type":"SelectorText","parentSelectors":["Detailseite"],"selector":"h1","multiple":false,"regex":"","delay":0},{"id":"Öffnungszeitheute","type":"SelectorText","parentSelectors":["Detailseite"],"selector":"div.m-store-info__status","multiple":false,"regex":"","delay":0},{"id":"Öffnungszeiten","type":"SelectorText","parentSelectors":["Detailseite"],"selector":"dl","multiple":false,"regex":"","delay":0}]}

I basically want from every store the information: store name and street and opening times from the whole week, that you get from the "more information" drop down by clicking on the link that pops up after the drop down.

I hope you can understand what I mean to say.

Thank you for you help in advance ! I appreciate it a lot :smiley:

@TAP Hello, your sitemap didn't work, try to copy and you will see that the JSON is invalid, therefore when pasting it, i'd suggest using the preformatted text option.

Overall I think this should do the job, altho it seems there are no valid links after you click on more details, so you should probably go for an XML approach.

Sitemap example:

{"_id":"kaufland-de","startUrl":["https://filiale.kaufland.de/#filial-finder"],"selectors":[{"id":"zoom-out-click","type":"SelectorElementClick","parentSelectors":["_root"],"selector":"body","multiple":true,"delay":"1200","clickElementSelector":"button[title='Zoom out']","clickType":"clickMore","discardInitialElements":"do-not-discard","clickElementUniquenessType":"uniqueHTMLText"},{"id":"store-wrapper","type":"SelectorElement","parentSelectors":["_root"],"selector":"li.m-store-list__item","multiple":true,"delay":0},{"id":"store-address","type":"SelectorText","parentSelectors":["store-wrapper"],"selector":"span.m-store-list__address","multiple":false,"regex":"","delay":0},{"id":"store-telephone","type":"SelectorText","parentSelectors":["store-wrapper"],"selector":"span.m-store-list__phone","multiple":false,"regex":"","delay":0},{"id":"store-open","type":"SelectorText","parentSelectors":["store-wrapper"],"selector":"span.m-store-open-status__time","multiple":false,"regex":"","delay":0}]}

 {"_id":"kaufland2","startUrl":["https://filiale.kaufland.de/#filial-finder"],"selectors":[{"id":"Deutschland","type":"SelectorElement","parentSelectors":["_root"],"selector":"li.m-store-list__item","multiple":true,"delay":0},{"id":"Mehrinformationen","type":"SelectorElementClick","parentSelectors":["Deutschland"],"selector":"_parent_","multiple":true,"delay":2000,"clickElementSelector":"button","clickType":"clickOnce","discardInitialElements":"do-not-discard","clickElementUniquenessType":"uniqueText"},{"id":"Detailseite","type":"SelectorLink","parentSelectors":["Mehrinformationen"],"selector":"a.a-link--store-list-detail","multiple":true,"delay":0},{"id":"Filiale","type":"SelectorText","parentSelectors":["Detailseite"],"selector":"h1","multiple":false,"regex":"","delay":0},{"id":"Öffnungszeitheute","type":"SelectorText","parentSelectors":["Detailseite"],"selector":"div.m-store-info__status","multiple":false,"regex":"","delay":0},{"id":"Öffnungszeiten","type":"SelectorText","parentSelectors":["Detailseite"],"selector":"dl","multiple":false,"regex":"","delay":0}]}

So this is sitemap. I copied it used the preformatted text option like you suggested. Please tell me if this is correct. :slight_smile:

I basically want to crawl the opening times and the headline from every store. But when I need to press "Mehr Informationen and then the "Detailseite" first to get to the link. (See screenshots below). How do I crawl this information and also repeatedly so that i get it from every store?


Thank you @viesturs for helping me. :slight_smile: :smiley:

Hey @viesturs are you still there ? Is your sidemap already the answer? Because there is a dropdown with the link "Detailseite" as you can see in the screenshots, which is leading to a new page, where I want to scrape that information

Thank you :slight_smile: :grinning:

Hi @TAP

I think this way it will be easier by using an XML.


{"_id":"kaufland","startUrl":["https://filiale.kaufland.de"],"selectors":[{"id":"xml","type":"SelectorSitemapXmlLink","parentSelectors":["_root"],"sitemapXmlMinimumPriority":0.1,"sitemapXmlUrlRegex":"service","sitemapXmlUrls":["https://filiale.kaufland.de/.sitemap.xml"]},{"id":"store-card","type":"SelectorElement","parentSelectors":["xml"],"selector":"body:has(h1[itemprop=\"name\"])","multiple":true,"delay":0},{"id":"store-name","type":"SelectorText","parentSelectors":["store-card"],"selector":"div.m-store-info__name","multiple":false,"regex":"","delay":0},{"id":"store-address-street","type":"SelectorText","parentSelectors":["store-card"],"selector":"div[itemprop='streetAddress']","multiple":false,"regex":"","delay":0},{"id":"store-address-city","type":"SelectorText","parentSelectors":["store-card"],"selector":"div.m-store-info__city","multiple":false,"regex":"","delay":0},{"id":"store-contact","type":"SelectorText","parentSelectors":["store-card"],"selector":"div.m-store-info__telephone","multiple":false,"regex":"","delay":0},{"id":"store-open","type":"SelectorText","parentSelectors":["store-card"],"selector":"dl.m-store-info__shophours-data","multiple":false,"regex":"","delay":0}]}

Hope it helps.

@TAP If you want to learn more, check this out - Sitemap xml selector | Web Scraper Documentation