Need help scraping

Hello,

I want to scrape the products and their prices frome one e-commerce site and I have problems with it.
I cannot define/click the elements on the page. Is this page somehow protected from scraping?

I would like to scrape GTIN, Name of product and price of product which is hidden

Url: Mercator Spletna Trgovina | Več Kot 13.000 Izdelkov

Sitemap:
{id:"sitemap code"}

@testuser Hello, after inspecting the page it seems that you can extract the desired data points with the following sitemap:

{"_id":"trgovina-mercator-si","startUrl":["https://trgovina.mercator.si/market/brskaj#offset=0"],"selectors":[{"delay":0,"id":"wrapper","multiple":true,"parentSelectors":["_root"],"selector":"div.grid > div[data-type=\"product\"]","type":"SelectorElement"},{"delay":0,"id":"price","multiple":false,"parentSelectors":["wrapper"],"regex":"","selector":".lib-product-price","type":"SelectorText"},{"delay":0,"extractAttribute":"data-gtin","id":"GTIN","multiple":false,"parentSelectors":["wrapper"],"selector":"_parent_","type":"SelectorElementAttribute"},{"delay":0,"id":"name","multiple":false,"parentSelectors":["wrapper"],"regex":"","selector":"div.product-name","type":"SelectorText"}]}

Also, it appears that you have selected 'Toggle device toolbar' option which is not necessary in this case.

Hello, thank you. I am kind of new in scraping... I've managed to click something after I've turned down "toogle device toolbar".
How can I extract data using sitemap you've posted?

@testuser You have to import it into the extension and simply launch the scrape.

All of the necessary learning resources can be found at the 'Learn' section.

Tutorial videos: Web Scraper Tutorials
Documentation: Installation | Web Scraper Documentation
How-tos: Web Scraper << How to >> video tutorials

1 Like

Thank you very much for your help. I am doing something now! :slight_smile:
I just have to figure how to disable the limit of 200 lines which are exported. (there are more articles)

I am learning myself... :slight_smile: I probably have to make more sitemaps eg. Mercator Spletna Trgovina | Več Kot 13.000 Izdelkov 02 03 04 05 etc.

I have tried to scrape products from more pages and I cant get more then 400 products in the export.

@testuser If you are looking to load all of the additional products you will have to use an 'Element scroll' selector.

{"_id":"trgovina-mercator-si-1","startUrl":["https://trgovina.mercator.si/market/brskaj#offset=0"],"selectors":[{"id":"wrapper","parentSelectors":["_root"],"type":"SelectorElementScroll","selector":"div.grid > div[data-type=\"product\"]","multiple":true,"delay":4000},{"id":"price","parentSelectors":["wrapper"],"type":"SelectorText","selector":".lib-product-price","multiple":false,"delay":0,"regex":""},{"id":"GTIN","parentSelectors":["wrapper"],"type":"SelectorElementAttribute","selector":"_parent_","multiple":false,"delay":0,"extractAttribute":"data-gtin"},{"id":"name","parentSelectors":["wrapper"],"type":"SelectorText","selector":"div.product-name","multiple":false,"delay":0,"regex":""}]}

Thank you for all your help. I somehow get the script running but it is scrolling to infinity with 280 results in the export. There are 382 results in the search engine :expressionless:

This is the sitemap code:

{"_id":"mercator2","startUrl":["Mercator Spletna Trgovina | Več Kot 13.000 Izdelkov > div[data-type="product"]","type":"SelectorElementScroll"},{"delay":0,"id":"price","multiple":false,"parentSelectors":["wrapper"],"regex":"","selector":".lib-product-price","type":"SelectorText"},{"delay":0,"extractAttribute":"data-gtin","id":"GTIN","multiple":false,"parentSelectors":["wrapper"],"selector":"parent","type":"SelectorElementAttribute"},{"delay":0,"id":"name","multiple":false,"parentSelectors":["wrapper"],"regex":"","selector":"div.product-name","type":"SelectorText"},{"delay":0,"id":"oldprice","multiple":false,"parentSelectors":["wrapper"],"regex":"","selector":"div.discount-price","type":"SelectorText"},{"delay":0,"id":"discount","multiple":false,"parentSelectors":["wrapper"],"regex":"","selector":".discount div","type":"SelectorText"},{"delay":0,"extractAttribute":"class","id":"discountType","multiple":false,"parentSelectors":["wrapper"],"selector":"div.discount","type":"SelectorElementAttribute"}]}

The same goes if I want to scrape the whole site. It should scroll from start until page 50 but it does not scrape any data and runs to infinity. :confused:

{"_id":"mercator2","startUrl":["Mercator Spletna Trgovina | Več Kot 13.000 Izdelkov > div[data-type="product"]","type":"SelectorElementScroll"},{"delay":0,"id":"price","multiple":false,"parentSelectors":["wrapper"],"regex":"","selector":".lib-product-price","type":"SelectorText"},{"delay":0,"extractAttribute":"data-gtin","id":"GTIN","multiple":false,"parentSelectors":["wrapper"],"selector":"parent","type":"SelectorElementAttribute"},{"delay":0,"id":"name","multiple":false,"parentSelectors":["wrapper"],"regex":"","selector":"div.product-name","type":"SelectorText"},{"delay":0,"id":"oldprice","multiple":false,"parentSelectors":["wrapper"],"regex":"","selector":"div.discount-price","type":"SelectorText"},{"delay":0,"id":"discount","multiple":false,"parentSelectors":["wrapper"],"regex":"","selector":".discount div","type":"SelectorText"},{"delay":0,"extractAttribute":"class","id":"discountType","multiple":false,"parentSelectors":["wrapper"],"selector":"div.discount","type":"SelectorElementAttribute"}]}