Data Preview Show Productdescription fine, but csv import show Null WHY?

gayleseo · January 12, 2020, 1:16pm

Hi All

I am having problem with websraper and urgently need your help.

I am trying to scrape product description, in data preview , it shows description content, but when I try export data as csv, the description field in my csv imported file show “Null”

I am using the following sitemap to scrape the site using webscraper chrome plugin

{"_id":"computer1","startUrl":["https://www.tokopedia.com/search?st=product&q=computer&ob=5&page=1"],"selectors":[{"id":"productlink","type":"SelectorLink","parentSelectors":["_root"],"selector":"._2OBup6Zd a, a.anchor-overlay","multiple":true,"delay":0},{"id":"name","type":"SelectorText","parentSelectors":["productlink"],"selector":"h1.css-x7lc0h","multiple":false,"regex":"","delay":0},{"id":"price","type":"SelectorText","parentSelectors":["productlink"],"selector":"h3","multiple":false,"regex":"","delay":0},{"id":"productdescription","type":"SelectorText","parentSelectors":["productlink"],"selector":"p.css-1ngxow7-unf-heading","multiple":false,"regex":"","delay":0}]}

This is the metadata
Sitemap Name : computer1
Start URL : https://www.tokopedia.com/search?st=product&q=computer&ob=5&page=[1-2]

Scrape Parameter
Request interval (ms) :3000
Page load delay (ms) : 3000

The fields that I am scraping is only 3 which are Name, Price, and Producdescription
The CSV imported file show name,price fields are filled with data but Producdescription show "NULL"
even though in Data Preview show the content of the product description

I have tried the following solution in the following threads but they don't work.

1.Anti-scraping measure

The excerpt
"Your selectors are too specific and are likely to fail. You've probably noticed that the site uses random-looking element names like sc-cLQEGU, sc-cLQEGU, sc-hORach etc. This is intentional and is probably an anti-scraping measure.

You'll need to spend a significant amount of time in the Chrome inspector and also have some knowledge of CSS selectors 1 to scrape such sites.

As an example, for "nomcontact", you can try:
span[class^="ant-avatar"] ~ div[class^="sc-"] > div[class^="sc-"]

This means, look for a div element which has a class that starts with "sc-" (ignores the random strings), which is a child of a div which also has a class that starts with "sc-", and which is preceded by a span element which has a class that starts with "ant-avatar"."

MY RESULT

I have tried using all kind of span tried to extract the productdescription content within the specified DIV. It doesn't work.
Even it does not show any content in Data preview.

2.Page load delay not long enough

MY RESULT
I have setup Page load delay from 2000 ms to 5000 ms, productdescription still "NULL"

3.Scroller

MY RESULT
When I try to load the product
https://www.tokopedia.com/xinvicious/computer-dust-filter-with-magnet-tebal-9d01

I can see that productdescription need is loaded normally with no lazy loading or dynamically loaded
I don't see how adding scroller so that the webscraper scroll down all the way to bottom can help to fixed the null issue

Please help me guys. I am out of ideas already.
ps: I am sure this is an issue that is affected every users that use webscraper. How can we subscribe to paid version/cloud if the free one can not even extract product description

leemeng · January 12, 2020, 2:50pm

Great that you have read thru some previous posts.

The site does use a form of lazy-loading in the product pages, so you would need a scroller.

You can confirm this yourself by going to a completely new product page (never visited before). But do not scroll down yet. Click the data preview and it should return "null".

Now manually scroll down to the prod desc area, and click data preview again. Now it should return some data.

Also, some of your selectors contain random characters like 2OBup6Zd and x7lc0h, so they may not work properly in the future (site owner can change these anytime).

gayleseo · January 12, 2020, 11:26pm

Hi leemengs
Thank you for your prompt response . I really appreaciate it

How can I implement the scroller cuz from the lazy loading post , i just see copy and paste of the imported sitemap with scroller on it.

How can i added scroller on my sitemap using webscraper gui on F12 so that webscraper scrolled down all the way to bottom before start scraping the data. I know the scroller is put on the same section as name, price and productdescription
Where should i put the scroller ? Is it before name all the way on the top so that it will scroll down first before it start scraping?

And after i put the selector title, what dropdown should i select?

Because there is no tutorial whatsoever on https://webscraper.io/ not even a youtube tutorial.The scroller is such an important feature since most ecommerce will use lazy loading to speed up page load

And of course you will receive hundreds of the same questions from other users that scrape ecommerce sites as i do since there is no tutorial for adding the scroller.

Regarding the random figure, i am not worried.cuz i can always recheck and change the selector before i scrape

leemeng · January 14, 2020, 11:42pm

Try this version below with scroller. I've improved some of your selectors.

There is already a very good tutorial on how to use Elemnt Scroll Down (AKA scrollers AKA infinite scroll) from the WS team at:

The video title is "Web Scraper multiple record extraction tutorial" and it doesn't say "infinite scroll".

The "infinite scroll" bit actually starts around 3:25, but you should really watch all the way from the beginning 'cos it explains important concepts such as wrapper elements, element selection, child selectors, etc.

Also useful: https://www.w3schools.com/cssref/css_selectors.asp

{"_id":"toko-test","startUrl":["https://www.tokopedia.com/search?st=product&q=computer&ob=5&page=1"],"selectors":[{"id":"productlink","type":"SelectorLink","parentSelectors":["_root"],"selector":"div[class$='pcr'] > div > span ~ a,div.ta-product-container > div.ta-product > div > a.anchor-overlay","multiple":true,"delay":0},{"id":"Scroller","type":"SelectorElementScroll","parentSelectors":["productlink"],"selector":"h6:contains('Catatan')","multiple":false,"delay":"2500"},{"id":"name","type":"SelectorText","parentSelectors":["productlink"],"selector":"div > h1[class^='css']","multiple":false,"regex":"","delay":0},{"id":"price","type":"SelectorText","parentSelectors":["productlink"],"selector":"h3","multiple":false,"regex":"","delay":0},{"id":"productdescription","type":"SelectorText","parentSelectors":["productlink"],"selector":"h2 ~ p[data-merchant-test][class^='css']","multiple":false,"regex":"","delay":0}]}

gayleseo · January 15, 2020, 9:43am

Thank you so much leemeng for your generous help and promptness. after using numerous scraper like import.io,etc I find webscraper the most user friendly scraper on the market. And your promptness and generous support to forums threads makes webscraper the best scraper on the market

djbox9 · August 14, 2020, 8:28am

leemeng is the most supportive scraper in this forum,i also like him,he helped me in many things