Site impossible to scrape correctly

Hey guys!

I'm trying to scrape a site but it's seems impossible.
I'm not sure how to explain but the selectors seems to be "dynamic" so the selector changes every for every page.

Example

Product 1: URL Product 1

A) [title='Energiajook Red Bull, RED BULL, 355 ml'] span.text-xl
B) [title='Red Bull energiajook 0.355L'] span.text-xl
C) [title='RED BULL ENERGIAJOOK 355 ML'] span.text-xl

Product 2:

A) [title='Energiajook Energy, MONSTER, 500 ml'] span.text-xl
B) [title='Monster Energy energiajook 0.5L'] span.text-xl
C) [title='MONSTER ENERGY ENERGIAJOOK 50'] span.text-xl

The selector seems to the same for every page but on each page of the price selector is according to the title as you can see above.

Is there a way to scan this site or it's simply not working for this website?

I'm pasting the Sitemap
If someone calme help me

Thanks!

{"_id":"Ostukorvid","startUrl":["https://ostukorvid.ee/kategooriad"],"selectors":[{"id":"cat","linkType":"linkFromHref","multiple":true,"parentSelectors":["_root"],"selector":".grid a","type":"SelectorLink"},{"id":"sub-cat","linkType":"linkFromHref","multiple":true,"parentSelectors":["cat"],"selector":"a.items-center","type":"SelectorLink"},{"id":"title","multiple":false,"parentSelectors":["sub-cat"],"regex":"","selector":"h1","type":"SelectorText"},{"id":"Seler-Price","multiple":false,"parentSelectors":["sub-cat"],"regex":"","selector":"[title='White Edition, RED BULL, 250 ml'] span.text-xl","type":"SelectorText"},{"id":"Selver-PriceKG","multiple":false,"parentSelectors":["sub-cat"],"regex":"","selector":"span.block.text-gray-600","type":"SelectorText"},{"id":"Selver-URL","linkType":"linkFromHref","multiple":false,"parentSelectors":["sub-cat"],"selector":"a[title='White Edition, RED BULL, 250 ml']","type":"SelectorLink"},{"id":"Prisma-Price","multiple":false,"parentSelectors":["sub-cat"],"regex":"","selector":"[title='Smoked cheese, 200 g'] span.text-xl","type":"SelectorText"},{"id":"Prisma-Price-KG","multiple":false,"parentSelectors":["sub-cat"],"regex":"","selector":"[title='Red Bull energiajook White Edition 250 ml'] span.block.text-gray-600","type":"SelectorText"},{"id":"Prisma-URL","linkType":"linkFromHref","multiple":false,"parentSelectors":["sub-cat"],"selector":"a[title='Red Bull energiajook White Edition 250 ml']","type":"SelectorLink"},{"id":"Coop","multiple":false,"parentSelectors":["sub-cat"],"regex":"","selector":"span.coop","type":"SelectorText"},{"id":"Coop-Price","multiple":false,"parentSelectors":["sub-cat"],"regex":"","selector":"[title='Red Bull White Edition energiajook 0.25L'] span.text-xl","type":"SelectorText"},{"id":"Coop-Price-KG","multiple":false,"parentSelectors":["sub-cat"],"regex":"","selector":"[title='Red Bull White Edition energiajook 0.25L'] span.block.text-gray-600","type":"SelectorText"},{"id":"Coop-URL","linkType":"linkFromHref","multiple":false,"parentSelectors":["sub-cat"],"selector":"a[title='Red Bull White Edition energiajook 0.25L']","type":"SelectorLink"},{"id":"Rimi-Price","multiple":false,"parentSelectors":["sub-cat"],"regex":"","selector":"[title='Energiajook Red Bull Red Edition 0,25l'] span.text-xl","type":"SelectorText"},{"id":"Rimi-Price-KG","multiple":false,"parentSelectors":["sub-cat"],"regex":"","selector":"[title='Energiajook Red Bull Red Edition 0,25l'] span.block.text-gray-600","type":"SelectorText"},{"id":"Rimi-URL","multiple":false,"parentSelectors":["sub-cat"],"regex":"","selector":"[title='Energiajook Red Bull Red Edition 0,25l'] .h-8 path","type":"SelectorText"},{"id":"Category","multiple":false,"parentSelectors":["sub-cat"],"regex":"","selector":"a.group","type":"SelectorText"},{"id":"Maxima-Price","multiple":false,"parentSelectors":["sub-cat"],"regex":"","selector":"span.text-xl","type":"SelectorText"},{"id":"Maxima-Price-KG","multiple":false,"parentSelectors":["sub-cat"],"regex":"","selector":"[title='Smoked cheese HELLO, 200g'] span.block.text-gray-600","type":"SelectorText"},{"id":"Maxima-URL","linkType":"linkFromHref","multiple":false,"parentSelectors":["sub-cat"],"selector":"a[title='Smoked cheese HELLO, 200g']","type":"SelectorLink"}]}

You will need better selectors for this site, which means having some knowledge of CSS and HTML. Also, it is better to test problematic areas separately, instead of running the full sitemap e.g.

{"_id":"ostukorvid-test","startUrl":["https://ostukorvid.ee/tooted/10036f4d-d358-4591-8984-21dca55459e7/apelsin-navel"],"selectors":[{"id":"Title","multiple":false,"parentSelectors":["_root"],"regex":"","selector":"h1","type":"SelectorText"},{"id":"price1","multiple":false,"parentSelectors":["_root"],"regex":"","selector":"div > a[title] > span:contains('€') > span:first-of-type","type":"SelectorText"},{"id":"price2","multiple":false,"parentSelectors":["_root"],"regex":"","selector":"div > a[title] > span:contains('€') > span:nth-of-type(2)","type":"SelectorText"}]}

Hi,

thanks for your help!
I ran a test with what you sent me but it's not really working.

The problem seems that the CSS class are all the same for the price "span.block.text-gray-600" so it's not really helping

I tried on this product for example and it's not working:

For the 1st one the results are showing at 2 different spots where the title is not include but only the class.

The only difference I could see between those 2 lines were these parameter:
a:nth-child(1)
a:nth-child(2)

But how I could use that in Webscraper I really have no idea?!

I'm not sure what else I could do to make it work.
Do you have any idea?

Thanks in advance!

Hi @leemeng,

Thanks again for your help!
Actually I managed to work with the HTML and CSS and it worked.

So you can dismiss my previous message.
Thanks again!

1 Like