Scraping eCommerce Sites

Hi Community,

I started scraping TODAY. So please be patient with me.
I want to scrape German eCommerce Plattforms category by category (So one Sitemap for each catgeory). I want to scrape the product name, price, number of reviews and the rating. All good so far. Unfortunately, the rating is only displayed as icons and not text. So I struggle to get that scraped. Within the HTML is an arial-label which contains the number.

Do somebody has a tipp how I can also scrape the rating withing that sitemap?

Url: https://www.mediamarkt.de/de/category/_saugroboter-460055.html

Sitemap:
{"_id":"mediamarkt_saugroboter_name_price_anzahlreviews","startUrl":["https://www.mediamarkt.de/de/category/_saugroboter-460055.html"],"selectors":[{"id":"product-wrapping","type":"SelectorElementClick","parentSelectors":["_root"],"selector":"[data-test] div.Row__StyledRow-x4c83j-0","multiple":false,"delay":2000,"clickElementSelector":"button.bxaOPl","clickType":"clickOnce","discardInitialElements":"do-not-discard","clickElementUniquenessType":"uniqueText"},{"id":"select_productcard","type":"SelectorElementScroll","parentSelectors":["product-wrapping"],"selector":"div.ProductTilestyled__StyledCardWrapper-sc-1w38xrp-0","multiple":true,"delay":2000},{"id":"brandname","type":"SelectorText","parentSelectors":["select_productcard"],"selector":"p.kQkRBx","multiple":false,"regex":"","delay":0},{"id":"price","type":"SelectorText","parentSelectors":["select_productcard"],"selector":".UnbrandedPriceDisplay__StyledUnbrandedPriceDisplayWrapper-sc-1pmc1sr-0 > div.ToolTipstyled__StyledTooltipWrapper-sc-1rht449-0","multiple":false,"regex":"","delay":0},{"id":"reviews","type":"SelectorText","parentSelectors":["select_productcard"],"selector":"div.ProductRating__StyledWrapper-q99jve-0","multiple":false,"regex":"","delay":0},{"id":"rating","type":"SelectorText","parentSelectors":["select_productcard"],"selector":"div.Rating__StyledRatingWrapper-sc-1v0kytr-0","multiple":false,"regex":"","delay":0},{"id":"ratinganzahl","type":"SelectorHTML","parentSelectors":["rating"],"selector":"print(elem.get('aria-label'))","multiple":false,"regex":"","delay":0}]}

If you have a div like
<div aria-label="Bewertung: 4.0857 von 5 Sternen" class="ProductRating__StyledWrapper-q99jve-0 bkeRRU">

you can use Element Attribute to extract the aria-label.

Type: Element Attribute
Selector: div[class^='ProductRating__StyledWrapper']
Attribute name: aria-label

1 Like