How to scrape a SPA website

JoAllen · October 17, 2020, 7:28am

Describe the problem.

The website (https://www.woolworths.com.au/) is a single page app, written in Angular. The URL in the address bar does not change, and page contents are fetched by Javascript.

I am trying to build a sitemap for Product Categories:

In the navbar there are main categories. When you click on one of the links, it opens a side menu to subcategories. When click on an item in sub-categories, it opens a further menu for sub-sub-categories.
I am trying to capture this categories tree.

The web scraper sitemap looks fine, and "element preview" and "data preview" show results as expected.

However, when I actually run the scraper, I get empty csv file (with only headings but no data).

I am guessing this has to do with the dynamic nature of the website and that somehow I need to "click" each link so that the DOM content can get updated.

Please advise what to do

Sitemap:
{"_id":"wollies","startUrl":["https://www.woolworths.com.au/"],"selectors":[{"id":"topCat","type":"SelectorLink","parentSelectors":["_root"],"selector":"nav.categoryHeader-navigation a.categoryHeader-navigationLink","multiple":true,"delay":0},{"id":"midCat","type":"SelectorLink","parentSelectors":["topCat"],"selector":"nav.categoriesNavigation--category a.categoriesNavigation-link","multiple":true,"delay":0},{"id":"subCat","type":"SelectorLink","parentSelectors":["midCat"],"selector":"nav.categoriesNavigation--subcategory a.categoriesNavigation-link","multiple":true,"delay":0}]}

leemeng · October 18, 2020, 12:31am

I have covered this site beofre, just search woolworths.

While the site uses a lot of javascript, the majority of links, including to categories are still plain old HTML (a href). You can check for this while hovering the mouse over links and doing a right-click; you should see options to "Open link in new tab/new window". The URL will change when you use this method.

That means the woolworths site can be navigated using Type: Link.

JoAllen · October 18, 2020, 5:19am

Thanks leemeng for the quick reply.

I have tried several web scraping tools and found this Web Scraper to be the best, easiest to use while being robust with css selectors. I also tested Web Scraper on few other websites and it worked Perfect.

I am sure Web Scraper can do this website as well, but I think Woolworths is a bit tricky and appreciate if you give it a bit of your consideration.

The Woolworths website has Three levels of nested categories, subcategories and sub-sub-categories.

Yes, it is possible to click on any category link and open it in a new window. And I designed the sitemap, as you advise, using Link. Also, "Element preview" and "data preview" do show the results exactly as expected.

The problem is when I actually run Scrape. The program loops through the top categories in the navbar, but then it skips the subcategories in the left menu.

The csv output is only the columns header

The expected output is a list of categories, subcategories, sub-sub-categories and and their hrefs.

You can replicate the problem using the sitemap:

Sitemap:
{"_id":"wollies","startUrl":["https://www.woolworths.com.au/"],"selectors":[{"id":"topCat","type":"SelectorLink","parentSelectors":["_root"],"selector":"nav.categoryHeader-navigation a.categoryHeader-navigationLink","multiple":true,"delay":0},{"id":"midCat","type":"SelectorLink","parentSelectors":["topCat"],"selector":"nav.categoriesNavigation--category a.categoriesNavigation-link","multiple":true,"delay":0},{"id":"subCat","type":"SelectorLink","parentSelectors":["midCat"],"selector":"nav.categoriesNavigation--subcategory a.categoriesNavigation-link","multiple":true,"delay":0}]}

Thanks for your advice