Help Needed with Extracting Combo Box Data

Hello,

I need assistance with extracting the content of a combo box using the Web Scraper tool on Edge.

Despite not being a developer, I managed to extract other data from the page without any issues, including an invisible reference. However, I am struggling to extract the following:

  1. Container: The content of the combo box.
  2. Price: The complete price in one piece (I could only extract euros and cents separately).
  3. Image:
  4. Price per liter.
  5. Rating: The rating out of 5 AND the number of reviews.

It returns nothing to me or just the first line according to my attempt.

I have tried various “how-to” guides and followed responses to similar questions on forums, but none of the solutions worked for me.

Thank you for your help!

Url: https://www.aroma-zone.com/info/fiche-technique/huile-essentielle-helichryse-italienne-bio-aroma-zone

Sitemap:
{"_id":"web-scraper-dropdown-variation","startUrl":["https://www.aroma-zone.com/info/fiche-technique/huile-essentielle-helichryse-italienne-bio-aroma-zone?capacity=2&capacity-unit=ml"],"selectors":[{"clickElementSelector":"div.variant-selectors .vs__dropdown-menu","clickElementUniquenessType":"uniqueCSSSelector","clickType":"clickOnce","delay":1000,"discardInitialElements":"do-not-discard","id":"product-wrapper","multiple":true,"parentSelectors":["_root"],"selector":"div.variant-selectors .vs__dropdown-toggle","type":"SelectorElementClick"},{"id":"price","multiple":false,"parentSelectors":["product-wrapper"],"regex":"","selector":".sf-price div.container","type":"SelectorText"},{"id":"weight","multiple":false,"parentSelectors":["product-wrapper"],"regex":"","selector":"div.variant-selectors .vs__dropdown-toggle span","type":"SelectorText"},{"id":"price-liter","multiple":false,"parentSelectors":["product-wrapper"],"regex":"","selector":".product__add-to-cart__label span","type":"SelectorText"}]}

Hi, please try the sitemap below:

{"_id":"web-scraper-dropdown-variation","startUrl":["https://www.aroma-zone.com/info/fiche-technique/huile-essentielle-helichryse-italienne-bio-aroma-zone?capacity=2&capacity-unit=ml"],"selectors":[{"clickElementSelector":"div.variant-selectors .vs__dropdown-toggle","clickElementUniquenessType":"uniqueCSSSelector","clickType":"clickOnce","delay":1000,"discardInitialElements":"do-not-discard","id":"product-wrapper","multiple":false,"parentSelectors":["_root"],"selector":"body","type":"SelectorElementClick"},{"extractAttribute":"content","id":"price","multiple":false,"parentSelectors":["_root"],"selector":"[itemprop=\"price\"]","type":"SelectorElementAttribute"},{"id":"weight","multiple":false,"parentSelectors":["_root"],"regex":"","selector":"div.variant-selectors .vs__dropdown-toggle span","type":"SelectorText"},{"id":"price-liter","multiple":false,"parentSelectors":["_root"],"regex":"","selector":".product__add-to-cart__label span","type":"SelectorText"},{"id":"variants","multiple":false,"parentSelectors":["_root"],"regex":"","selector":"[role=\"listbox\"]","type":"SelectorText"},{"extractAttribute":"content","id":"rating","multiple":false,"parentSelectors":["_root"],"selector":"[itemprop=\"ratingValue\"]","type":"SelectorElementAttribute"},{"id":"reviews","multiple":false,"parentSelectors":["_root"],"regex":"","selector":"div[itemprop='ratingCount']","type":"SelectorText"},{"id":"image","multiple":false,"parentSelectors":["_root"],"selector":".gallery__content__item img","type":"SelectorImage"}]}

Hello and thank you very much for your help!

Everything works BUT only for one capacity except for “variants” which returns nothing, and I don’t know what it corresponds to.

Once again, I am not a developer and I just followed the tutorial that asked to use ElementClick, which you didn’t do here :wink: I could have searched for a long time! LOL.

However, I wanted to extract not just one value from the combo box .variant-selectors but all of them. On this page, there are 3 (2ml, 5ml, and 10ml) but on other pages, there may be none, or even 6 or 7.

I managed by just putting “ul” in “selector” and it retrieved everything but on a single line like this “2ml 9.95€ 5ml 22.95€ 10ml 39.95€”, but in this case, I no longer had the data for the selection made in the list which changes according to the chosen option (price per liter, image, etc.).

I am at the same point because I retrieved the data for a single capacity (2ml), I did it, but I need them all. I tested on Edge and Google Chrome to get the same result just for info.

This thorny part is the most important of my project, which is only in its early stages! And retrieving these dependencies from this list of choices is crucial.

help-me-it-crowd

OK. Thanx. Have a great day.

Hi, I noticed that the layout changed when the scrape was conducted in the small scraper window.

Please try expanding the pop-up window right after clicking scrape and the variants selector should display all variants.

Thank you but i didn't understand what you mean. I always launched my site in full screen, making sure to reduce the web scraper so that it doesn’t cover the elements I select to know the names of objects with Element preview. If that’s what you’re talking about.

Also, I realize that even the smallest thing can trip up a web scraper! I noticed that during extraction, depending on the option of the combobox that was focused, it returns the data in a certain order or in reverse order. I managed to extract two of the three available options in this combobox thanks to BING AI which did the job, but only if I run the web scraper once. Otherwise, it returns the same data twice! I have to refresh the page and rerun the web scraper to get some of the data back in the same way! Why? Should I start a new topic now that my problem has changed? See the modified sitemap below as well as the image of the combobox and the result obtained.

The combobox

First scraping with the new sitemap (BING AI) in image (with default value 5ml)

And then, when i restart the scrapping without refreshing the page

Thanx for your help !
CRock_help_me

Sorry i forgot the NEW sitemap

URL : https://www.aroma-zone.com/info/fiche-technique/huile-essentielle-helichryse-italienne-bio-aroma-zone

{
  "_id": "az-data-2",
  "startUrl": ["https://www.aroma-zone.com/info/fiche-technique/huile-essentielle-helichryse-italienne-bio-aroma-zone"],
  "selectors": [
    {
      "id": "product-wrapper",
      "parentSelectors": ["_root"],
      "type": "SelectorElementClick",
      "clickElementSelector": "div.variant-selectors .vs__dropdown-toggle",
      "clickElementUniquenessType": "uniqueCSSSelector",
      "clickType": "clickOnce",
      "delay": 1000,
      "discardInitialElements": "do-not-discard",
      "multiple": false,
      "selector": "body"
    },
    {
      "id": "variant-options",
      "parentSelectors": ["product-wrapper"],
      "type": "SelectorElementClick",
      "clickElementSelector": "[role='listbox'] [role='option']",
      "clickElementUniquenessType": "uniqueCSSSelector",
      "clickType": "clickMore",
      "delay": 2000,
      "discardInitialElements": "do-not-discard",
      "multiple": true,
      "selector": "body"
    },
    {
      "id": "price",
      "parentSelectors": ["product-wrapper"],
      "type": "SelectorElementAttribute",
      "selector": "[itemprop='price']",
      "multiple": false,
      "extractAttribute": "content"
    },
    {
      "id": "weight",
      "parentSelectors": ["product-wrapper"],
      "type": "SelectorText",
      "selector": "div.variant-selectors .vs__dropdown-toggle span",
      "multiple": false,
      "regex": ""
    },
    {
      "id": "price-liter",
      "parentSelectors": ["product-wrapper"],
      "type": "SelectorText",
      "selector": ".product__add-to-cart__label span",
      "multiple": false,
      "regex": ""
    },
    {
      "id": "rating",
      "parentSelectors": ["product-wrapper"],
      "type": "SelectorElementAttribute",
      "selector": "[itemprop='ratingValue']",
      "multiple": false,
      "extractAttribute": "content"
    },
    {
      "id": "reviews",
      "parentSelectors": ["product-wrapper"],
      "type": "SelectorText",
      "selector": "div[itemprop='ratingCount']",
      "multiple": false,
      "regex": ""
    },
    {
      "id": "image",
      "parentSelectors": ["product-wrapper"],
      "type": "SelectorImage",
      "selector": ".gallery__content__item img",
      "multiple": false
    }
  ]
}

It appears you are not scraping but just clicking the data preview.

To start the scraping click 'Sitemap -> Scrape' and a new window will pop-up:

image

Thank you for your reply. Same thing here !

You understund why i need help ?!

Only the same product 2ml, and just 2 results on 3 available (2ml, 5ml,, and 10ml)

why

Traversing through variants is not working since the variant elements are removed from the HTML after the selection thus breaking the sequence.

If the price is the only data point changing, I would recommend scraping the variants like this:

{"_id":"web-scraper-dropdown-variation","startUrl":["https://www.aroma-zone.com/info/fiche-technique/huile-essentielle-helichryse-italienne-bio-aroma-zone?capacity=2&capacity-unit=ml"],"selectors":[{"clickElementSelector":"div.variant-selectors .vs__dropdown-toggle","clickElementUniquenessType":"uniqueCSSSelector","clickType":"clickOnce","delay":1000,"discardInitialElements":"do-not-discard","id":"product-wrapper","multiple":false,"parentSelectors":["_root"],"selector":"body","type":"SelectorElementClick"},{"extractAttribute":"content","id":"price","multiple":false,"parentSelectors":["_root"],"selector":"[itemprop=\"price\"]","type":"SelectorElementAttribute"},{"id":"weight","multiple":false,"parentSelectors":["_root"],"regex":"","selector":"div.variant-selectors .vs__dropdown-toggle span","type":"SelectorText"},{"id":"price-liter","multiple":false,"parentSelectors":["_root"],"regex":"","selector":".product__add-to-cart__label span","type":"SelectorText"},{"extractAttribute":"","id":"variants","parentSelectors":["_root"],"selector":"[role=\"listbox\"] li","type":"SelectorGroup"},{"extractAttribute":"content","id":"rating","multiple":false,"parentSelectors":["_root"],"selector":"[itemprop=\"ratingValue\"]","type":"SelectorElementAttribute"},{"id":"reviews","multiple":false,"parentSelectors":["_root"],"regex":"","selector":"div[itemprop='ratingCount']","type":"SelectorText"},{"id":"image","multiple":false,"parentSelectors":["_root"],"selector":".gallery__content__item img","type":"SelectorImage"}]}

Look by yourself ! In image... Juste 1 result on 3 !

crying-emoji-9

Try to maximize the pop-up window when the scraping starts.

You should see this result:

Result :

Just to be sure I understand, it's not possible to obtain ALL the information related to each option change, such as the image, price per liter, and SKU reference. Are we on the same page?

Because on this site, there are other categories where the information might change, like the description in addition to the photo! So far, I've focused on just one category. But I wasn't planning to stop there, which is why I emphasized this core aspect!

funny_atkin

Unfortunately, no, due to the specific setup of the variant dropdown.

Thank you sooooooooooo much for your help, really, sooooooo much for your time !

Thank you, thank you !

I am disappointed that I didn't have the opportunity to do what I wanted, so I will adapt accordingly. It would be good to address this issue, this limitation, if the development team is committed to creating a versatile tool and improving it. Is this way of doing things a method to block scraping?! Will this become commonplace? To be continued! But thank you again for your help, Janis, thank you very much. All the best.

thank-you-12

It is not really an anti-scraping measure, just an unfortunate design solution.
We will consider if this can be added to the Webscraper development pipeline.
Thank you for your input!

I have similar problem on several pages. Nevertheless it is still ok, you can parse it even with excel. If you don't know excel structure your need for formula and ask AI. This is how I solved those problematic dropdowns.

1 Like

I see what you mean, converting some values like the price per liter. Unfortunately, some information or images are still missing. I will lower the elements to recover, which is really a shame! But thank you for your suggestion.