Trying to scrape shopee website

Want to scrape the listing below the banners.
I can get all 50 titles of the listing in the data preview
but when i press scrape, most of the data appears as null

what did i do wrongly ?
please assist

Thank you

Url: https://shopee.sg/Toys-Kids-Babies-cat.12

The Shopee site uses lazy loading, so it needs a scroller to load the bottom-of-page items. If you don't scroll down, you will get null results.

1 Like

Thank you Lee Meng
It works

Hmm to add on,
does anyone know how to extract images if the html is displayed like this

<div class="_1T9dHf _3XaILN" style="background-image: url(&quot;https://cf.shopee.sg/file/6ebce3ce91591590a1179fc5598c18cb_tn&quot;); background-size: contain; background-repeat: no-repeat;"></div>

The image is in the css background-image

Thank you

hmm I am not sure, maybe use regex to get it out? @iconoclast, I see you're back.. any thoughts here?

Normally you could just use Element attribute for this. but there's a need to extract a url so you can use Element: HTML and a regex (Element attribute does not support regex yet):

Selector: div[style^="background-image"]

Regex: (?<=&quot;).+(?=&quot;)

Assuming the url between the two &quot; is what you wanted.

2 Likes

I will try this now , but showing "null" ):

This my demo sitemap

{"_id":"shopeekidcart","startUrl":["https://shopee.co.th/เสื้อผ้าแฟชั่นผู้ชาย-cat.48"],"selectors":[{"id":"Product Selector","type":"SelectorLink","parentSelectors":["_root"],"selector":".col-xs-2-4 a","multiple":true,"delay":0},{"id":"name","type":"SelectorText","parentSelectors":["Product Selector"],"selector":".qaNIZv span","multiple":false,"regex":"","delay":0},{"id":"price","type":"SelectorText","parentSelectors":["Product Selector"],"selector":"div._3n5NQx","multiple":false,"regex":"","delay":0},{"id":"Img","type":"SelectorHTML","parentSelectors":["Product Selector"],"selector":"div[style^="background-image"]","multiple":false,"regex":"(").+(")","delay":0},{"id":"scrolldown","type":"SelectorElementScroll","parentSelectors":["_root"],"selector":"svg.icon-arrow-left","multiple":false,"delay":0},{"id":"imgtest","type":"SelectorHTML","parentSelectors":["Product Selector"],"selector":"div._1RzplO","multiple":false,"regex":"(").+(")","delay":0},{"id":"imgtest2","type":"SelectorHTML","parentSelectors":["Product Selector"],"selector":"div._2JMB9h","multiple":false,"regex":"","delay":0}]}

leemeng.i have been following you in this forum,u solve literally all problems so easily,i wanted to add you as a friend,but no profile link found,can you share your address so i can ask you for advice time to time,thanks a lot


Another option for this is the image URL can be manipulated using Microsoft Excel.

i also tried this code, but showing "null", any idea?
thank you, leemeng

Shopee uses random attribute names like div._1RzplO so you'll need better selectors. Try the sitemap below which will get the first 4 pages. Modify as needed. To make it click on all product links you'll need to add data scrapers under "Page link" (currently it will just get the product URLs and it does not click thru).

Note that Shopeed URLs are offset by 1, so:

Page 1 - https://shopee.co.th/เสื้อผ้าแฟชั่นผู้ชาย-cat.48
Page 2 - https://shopee.co.th/เสื้อผ้าแฟชั่นผู้ชาย-cat.48?page=1
Page 3 - https://shopee.co.th/เสื้อผ้าแฟชั่นผู้ชาย-cat.48?page=2
and so on

Sitemap:
{"_id":"shopee-thailand-test","startUrl":["https://shopee.co.th/%E0%B9%80%E0%B8%AA%E0%B8%B7%E0%B9%89%E0%B8%AD%E0%B8%9C%E0%B9%89%E0%B8%B2%E0%B9%81%E0%B8%9F%E0%B8%8A%E0%B8%B1%E0%B9%88%E0%B8%99%E0%B8%9C%E0%B8%B9%E0%B9%89%E0%B8%8A%E0%B8%B2%E0%B8%A2-cat.48?page=[1-3]","https://shopee.co.th/%E0%B9%80%E0%B8%AA%E0%B8%B7%E0%B9%89%E0%B8%AD%E0%B8%9C%E0%B9%89%E0%B8%B2%E0%B9%81%E0%B8%9F%E0%B8%8A%E0%B8%B1%E0%B9%88%E0%B8%99%E0%B8%9C%E0%B8%B9%E0%B9%89%E0%B8%8A%E0%B8%B2%E0%B8%A2-cat.48"],"selectors":[{"id":"Results wrapper","type":"SelectorElement","parentSelectors":["_root"],"selector":"div[role='main'] div.shopee-search-item-result","multiple":false,"delay":0},{"id":"Separate scroller","type":"SelectorElementScroll","parentSelectors":["Results wrapper"],"selector":"div.col-xs-2-4:nth-of-type(n+8)","multiple":true,"delay":"2500"},{"id":"Product wrappers","type":"SelectorElement","parentSelectors":["Results wrapper"],"selector":"div.col-xs-2-4","multiple":true,"delay":0},{"id":"Product name","type":"SelectorText","parentSelectors":["Product wrappers"],"selector":"div[data-sqe='name']","multiple":false,"regex":"","delay":0},{"id":"Proce","type":"SelectorText","parentSelectors":["Product wrappers"],"selector":"div[data-sqe='name'] + div","multiple":false,"regex":"","delay":0},{"id":"Page link","type":"SelectorLink","parentSelectors":["Product wrappers"],"selector":"a","multiple":false,"delay":0},{"id":"Page number","type":"SelectorText","parentSelectors":["Results wrapper"],"selector":"div.shopee-page-controller button.shopee-button-solid","multiple":false,"regex":"","delay":0}]}