Help Needed with Extracting Combo Box Data

Hello,

I need assistance with extracting the content of a combo box using the Web Scraper tool on Edge.

Despite not being a developer, I managed to extract other data from the page without any issues, including an invisible reference. However, I am struggling to extract the following:

  1. Container: The content of the combo box.
  2. Price: The complete price in one piece (I could only extract euros and cents separately).
  3. Image:
  4. Price per liter.
  5. Rating: The rating out of 5 AND the number of reviews.

It returns nothing to me or just the first line according to my attempt.

I have tried various “how-to” guides and followed responses to similar questions on forums, but none of the solutions worked for me.

Thank you for your help!

Url: https://www.aroma-zone.com/info/fiche-technique/huile-essentielle-helichryse-italienne-bio-aroma-zone

Sitemap:
{"_id":"web-scraper-dropdown-variation","startUrl":["https://www.aroma-zone.com/info/fiche-technique/huile-essentielle-helichryse-italienne-bio-aroma-zone?capacity=2&capacity-unit=ml"],"selectors":[{"clickElementSelector":"div.variant-selectors .vs__dropdown-menu","clickElementUniquenessType":"uniqueCSSSelector","clickType":"clickOnce","delay":1000,"discardInitialElements":"do-not-discard","id":"product-wrapper","multiple":true,"parentSelectors":["_root"],"selector":"div.variant-selectors .vs__dropdown-toggle","type":"SelectorElementClick"},{"id":"price","multiple":false,"parentSelectors":["product-wrapper"],"regex":"","selector":".sf-price div.container","type":"SelectorText"},{"id":"weight","multiple":false,"parentSelectors":["product-wrapper"],"regex":"","selector":"div.variant-selectors .vs__dropdown-toggle span","type":"SelectorText"},{"id":"price-liter","multiple":false,"parentSelectors":["product-wrapper"],"regex":"","selector":".product__add-to-cart__label span","type":"SelectorText"}]}

Hi, please try the sitemap below:

{"_id":"web-scraper-dropdown-variation","startUrl":["https://www.aroma-zone.com/info/fiche-technique/huile-essentielle-helichryse-italienne-bio-aroma-zone?capacity=2&capacity-unit=ml"],"selectors":[{"clickElementSelector":"div.variant-selectors .vs__dropdown-toggle","clickElementUniquenessType":"uniqueCSSSelector","clickType":"clickOnce","delay":1000,"discardInitialElements":"do-not-discard","id":"product-wrapper","multiple":false,"parentSelectors":["_root"],"selector":"body","type":"SelectorElementClick"},{"extractAttribute":"content","id":"price","multiple":false,"parentSelectors":["_root"],"selector":"[itemprop=\"price\"]","type":"SelectorElementAttribute"},{"id":"weight","multiple":false,"parentSelectors":["_root"],"regex":"","selector":"div.variant-selectors .vs__dropdown-toggle span","type":"SelectorText"},{"id":"price-liter","multiple":false,"parentSelectors":["_root"],"regex":"","selector":".product__add-to-cart__label span","type":"SelectorText"},{"id":"variants","multiple":false,"parentSelectors":["_root"],"regex":"","selector":"[role=\"listbox\"]","type":"SelectorText"},{"extractAttribute":"content","id":"rating","multiple":false,"parentSelectors":["_root"],"selector":"[itemprop=\"ratingValue\"]","type":"SelectorElementAttribute"},{"id":"reviews","multiple":false,"parentSelectors":["_root"],"regex":"","selector":"div[itemprop='ratingCount']","type":"SelectorText"},{"id":"image","multiple":false,"parentSelectors":["_root"],"selector":".gallery__content__item img","type":"SelectorImage"}]}

Hello and thank you very much for your help!

Everything works BUT only for one capacity except for “variants” which returns nothing, and I don’t know what it corresponds to.

Once again, I am not a developer and I just followed the tutorial that asked to use ElementClick, which you didn’t do here :wink: I could have searched for a long time! LOL.

However, I wanted to extract not just one value from the combo box .variant-selectors but all of them. On this page, there are 3 (2ml, 5ml, and 10ml) but on other pages, there may be none, or even 6 or 7.

I managed by just putting “ul” in “selector” and it retrieved everything but on a single line like this “2ml 9.95€ 5ml 22.95€ 10ml 39.95€”, but in this case, I no longer had the data for the selection made in the list which changes according to the chosen option (price per liter, image, etc.).

I am at the same point because I retrieved the data for a single capacity (2ml), I did it, but I need them all. I tested on Edge and Google Chrome to get the same result just for info.

This thorny part is the most important of my project, which is only in its early stages! And retrieving these dependencies from this list of choices is crucial.

help-me-it-crowd

OK. Thanx. Have a great day.

Hi, I noticed that the layout changed when the scrape was conducted in the small scraper window.

Please try expanding the pop-up window right after clicking scrape and the variants selector should display all variants.

Thank you but i didn't understand what you mean. I always launched my site in full screen, making sure to reduce the web scraper so that it doesn’t cover the elements I select to know the names of objects with Element preview. If that’s what you’re talking about.

Also, I realize that even the smallest thing can trip up a web scraper! I noticed that during extraction, depending on the option of the combobox that was focused, it returns the data in a certain order or in reverse order. I managed to extract two of the three available options in this combobox thanks to BING AI which did the job, but only if I run the web scraper once. Otherwise, it returns the same data twice! I have to refresh the page and rerun the web scraper to get some of the data back in the same way! Why? Should I start a new topic now that my problem has changed? See the modified sitemap below as well as the image of the combobox and the result obtained.

The combobox

First scraping with the new sitemap (BING AI) in image (with default value 5ml)

And then, when i restart the scrapping without refreshing the page

Thanx for your help !
CRock_help_me

Sorry i forgot the NEW sitemap

URL : https://www.aroma-zone.com/info/fiche-technique/huile-essentielle-helichryse-italienne-bio-aroma-zone

{
  "_id": "az-data-2",
  "startUrl": ["https://www.aroma-zone.com/info/fiche-technique/huile-essentielle-helichryse-italienne-bio-aroma-zone"],
  "selectors": [
    {
      "id": "product-wrapper",
      "parentSelectors": ["_root"],
      "type": "SelectorElementClick",
      "clickElementSelector": "div.variant-selectors .vs__dropdown-toggle",
      "clickElementUniquenessType": "uniqueCSSSelector",
      "clickType": "clickOnce",
      "delay": 1000,
      "discardInitialElements": "do-not-discard",
      "multiple": false,
      "selector": "body"
    },
    {
      "id": "variant-options",
      "parentSelectors": ["product-wrapper"],
      "type": "SelectorElementClick",
      "clickElementSelector": "[role='listbox'] [role='option']",
      "clickElementUniquenessType": "uniqueCSSSelector",
      "clickType": "clickMore",
      "delay": 2000,
      "discardInitialElements": "do-not-discard",
      "multiple": true,
      "selector": "body"
    },
    {
      "id": "price",
      "parentSelectors": ["product-wrapper"],
      "type": "SelectorElementAttribute",
      "selector": "[itemprop='price']",
      "multiple": false,
      "extractAttribute": "content"
    },
    {
      "id": "weight",
      "parentSelectors": ["product-wrapper"],
      "type": "SelectorText",
      "selector": "div.variant-selectors .vs__dropdown-toggle span",
      "multiple": false,
      "regex": ""
    },
    {
      "id": "price-liter",
      "parentSelectors": ["product-wrapper"],
      "type": "SelectorText",
      "selector": ".product__add-to-cart__label span",
      "multiple": false,
      "regex": ""
    },
    {
      "id": "rating",
      "parentSelectors": ["product-wrapper"],
      "type": "SelectorElementAttribute",
      "selector": "[itemprop='ratingValue']",
      "multiple": false,
      "extractAttribute": "content"
    },
    {
      "id": "reviews",
      "parentSelectors": ["product-wrapper"],
      "type": "SelectorText",
      "selector": "div[itemprop='ratingCount']",
      "multiple": false,
      "regex": ""
    },
    {
      "id": "image",
      "parentSelectors": ["product-wrapper"],
      "type": "SelectorImage",
      "selector": ".gallery__content__item img",
      "multiple": false
    }
  ]
}

It appears you are not scraping but just clicking the data preview.

To start the scraping click 'Sitemap -> Scrape' and a new window will pop-up:

image