How to extract text from a click-activated scrolling mobile

Tentle · January 14, 2021, 11:01pm

The problem is Web Scraper doesn't extract all the text that's encapsulated in the designated element.

Target URL:

Sitemap:

{"_id":"nuts-and-seeds","startUrl":["https://www.fritolay.com/brands/nuts-and-seeds"],"selectors":[{"id":"items","type":"SelectorLink","parentSelectors":["_root"],"selector":"a.product-thumbnail","multiple":true,"delay":0},{"id":"itemName","type":"SelectorText","parentSelectors":["items"],"selector":".anymore-wow h2","multiple":false,"regex":"","delay":0},{"id":"nfClick","type":"SelectorElementClick","parentSelectors":["items"],"selector":".hide-mobile h3","multiple":false,"delay":2000,"clickElementSelector":".hide-mobile h3","clickType":"clickOnce","discardInitialElements":"do-not-discard","clickElementUniquenessType":"uniqueText"},{"id":"nFacts","type":"SelectorText","parentSelectors":["items"],"selector":".hide-mobile div.item-container","multiple":false,"regex":"","delay":0}]}

The selector in question is "nFacts"

I'm not sure if I titled this correctly, but Web Scraper calls the element a hide-mobile element-container, and it scrolls, so that's why I'm calling it a scrolling mobile.

Tentle · January 21, 2021, 6:00pm

Hi, I'm having the same problem here again, on another root URL, this time from Nesquik. The actual trouble URL is

https://www.nesquik.com/en/products/ready-to-drink/nesquik-chocolate-lowfat-milk-6-pack

If you click on the "SmartLabel" button in the bottom of the nutrition information section I get the same situation as in my original post. You can use the CSS selector .desktop span.smartlabel-icon to find the button I'm talking about. I've learned a lot more about Web Scraper and CSS selectors since I originally posted this, but I still can't get Web Scraper to capture the text in the SmartLabel nutrition facts table. For Nesquik the CSS selector [class*="nfp container-fluid"] should do the trick, but both the element preview and data preview actions show nothing. The element preview shows no highlighted element, and the data preview returns null.

Below is the sitemap for Nesquik. Note I have a second selector to compare with nFacts2, which does show an element preview. I'm going to try scraping with it now to see what happens.

{"_id":"nesquik","startUrl":["https://www.nesquik.com/en/products?page=[0-1]"],"selectors":[{"id":"links","type":"SelectorLink","parentSelectors":["_root"],"selector":".tsp-product-teaser a","multiple":true,"delay":0},{"id":"itemName","type":"SelectorText","parentSelectors":["links"],"selector":"h1","multiple":false,"regex":"","delay":0},{"id":"nfClick","type":"SelectorElementClick","parentSelectors":["links"],"selector":".desktop span.smartlabel-icon","multiple":false,"delay":2000,"clickElementSelector":".desktop span.smartlabel-icon","clickType":"clickOnce","discardInitialElements":"do-not-discard","clickElementUniquenessType":"uniqueCSSSelector"},{"id":"nFacts","type":"SelectorText","parentSelectors":["links"],"selector":"[class*="nfp container-fluid"]","multiple":false,"regex":"","delay":0},{"id":"nFacts2","type":"SelectorText","parentSelectors":["links"],"selector":"div.modal-body","multiple":false,"regex":"","delay":0}]}

leemeng · January 21, 2021, 10:40pm

A tough one, cos the Smartlabel link is Ajax, plus it loads up an iframe.

However if it is the nutritional info you seek, the info is right in the source code but not visible on page. So you can get at it without clicking. You can see it if you do an HTML dump of div.product_detail_nutritional.pdContainer

Type: HTML
Selector: div.product_detail_nutritional.pdContainer

From there, you can start building data scrapers, e.g.

Total Calories
Type: Text
Selector: div.product_detail_nutritional.pdContainer tbody > tr:contains('Total Calories') td:first-of-type

Total Fat
Type: Text
Selector: div.product_detail_nutritional.pdContainer tbody > tr:contains('Total Fat') td:first-of-type

Tip: You can paste HTML dumps into the HTML Formatter site to get a better view of the structure and data.

Tentle · January 23, 2021, 12:41am

Hey, that's great! So that definitely works for getting the SmartLabel HTML out of the Nesquik website, but not the Frito-Lays website. I can't find find an element in the Frito-Lays website that allows me to extract the SmartLabel HTML. Moreover, I noticed that both websites have <iframe src=...> elements that point to the SmartLabel HTML. You need to click the button on the Nesquik website. I thought about extracting those links using iframe[src*="smartlabel"], but Web Scraper still can't reach into those elements. If I can get those SmartLabel URLs, I can then scrape them with Web Scraper.

For example,

Can be easily scraped.

leemeng · January 26, 2021, 12:55pm

That was actually my original plan, just gather all the iframe source URLs and scrape them separately. For Frito-Lay pages, you can get at them with

Type: Element attribute
Selector: div#nutrition-facts > iframe
Attribute name: src

Tentle · February 1, 2021, 5:28pm

Hey, that's absolutely fantastic! It works great. Thank you very much. Since I last posted I had written some code expecting the input to be the iframe HTML, e.g.:

<iframe src="https://smartlabel.pepsico.info/028400435703-0010-en-US/index.html" class=""></iframe>

So I for the other websites I'm still using the HTML selector, but I modified it with your CSS selector:

Type: HTML
Selector: div#nutrition-facts

If I ever have to redo my sitemaps and my code, I'll probably go directly for the URL using your approach. Thanks!