How to scrape from a site in 3 levels?

Hi all, I found this amazing tool 2 days ago. I've been trying with examples and now I want to scrape some similar sites that are categorized in 3 levels. States (Level1), Cities (Level 2), Locations (Level 3) and finally get the address of each location. I'd like to have columns state, city, address, zip code and the links for each level. I've made this sitemap but is only getting 34 records and not all information I'd like.

Not sure what I'm missing. Thanks for any help.

{"_id":"Bojangles","startUrl":["https://locations.bojangles.com/index.html"],"selectors":[{"id":"state","linkType":"linkFromHref","multiple":true,"parentSelectors":["_root"],"selector":"a.c-directory-list-content-item-link","type":"SelectorLink"},{"id":"city","linkType":"linkFromHref","multiple":true,"parentSelectors":["_root"],"selector":"a.c-directory-list-content-item-link","type":"SelectorLink"},{"id":"location","linkType":"linkFromHref","multiple":true,"parentSelectors":["_root"],"selector":"a.Teaser-titleLink","type":"SelectorLink"},{"id":"city-1","multiple":false,"parentSelectors":["_root"],"regex":"","selector":"h1","type":"SelectorText"},{"id":"address","multiple":false,"parentSelectors":["_root"],"regex":"","selector":"span.c-address-street-1","type":"SelectorText"}]}

Hi,

In order to traverse through the links, the selectors have to be nested under each other accordingly.

Please see the reference sitemap below:

{"_id":"Bojangles","startUrl":["https://locations.bojangles.com/index.html"],"selectors":[{"id":"state","linkType":"linkFromHref","multiple":true,"parentSelectors":["_root"],"selector":"a.c-directory-list-content-item-link","type":"SelectorLink"},{"id":"city","linkType":"linkFromHref","multiple":true,"parentSelectors":["state"],"selector":"a.c-directory-list-content-item-link","type":"SelectorLink"},{"id":"location","linkType":"linkFromHref","multiple":true,"parentSelectors":["city"],"selector":".c-LocationGrid a.Teaser-titleLink","type":"SelectorLink"},{"id":"city-1","multiple":false,"parentSelectors":["state","city","location"],"regex":"","selector":"h1#location-name","type":"SelectorText"},{"id":"address","multiple":false,"parentSelectors":["state","city","location"],"regex":"","selector":"span.c-address-street-1","type":"SelectorText"}]}
1 Like

@JanAp Hi JanAp, thank you so much for your help. It seems now it works how it should. As an exercise I did the first example in the WebScraper site (the ecommerce-1) that looks similar to what I want do to and in my understanding I did it well, but I didn´t. Then, how is the way to create nested selectors? Is there a video example for this?

Is there a way to generalize in order the json script works for other websites that have similar structure or should be hardcoded for each site?

And besides that, I've just installed this awesome app but when I run a simple sitemap, opens the URL and after 2 or 3 seconds the browser window is closed automatically. I don´t know what's happen. I only was able to test it in cloud version.

Last question, in cloud version, where can I see the sitemap/selectors configured? I've only found the Export window where is shown the sitemap in json format.

Regards

Hi,

Basically, the 'Extension intro video' describes how to set up multi-level selectors.

It is quite safe to say that each website will have a different navigation setup, thus no universal sitemap can be created.

Sitemaps are created locally by utilizing the Webscraper Chrome extension. The purpose of the cloud solution is to execute the sitemaps.

Thanks JanAp, last question. I've run your code and some of the examples and even in summary said that more than 300 pages were processed, when I export to Excel file, there are only 100 rows. Why this happens?

If you have run the sitemap with a trial account in the cloud, 100 rows of data is the limit for trial accounts, which is lifted when switching to a paid plan.

1 Like