Regex selector to extract text inside a selector

I am scraping a web site and I have the following text selector and want to extract what is before .
The values are sizes so it can be XS, 2XS, S, M, L, XL, 2XL, 3XL

<a href="#" class="js-variant disabled" data-name="integration_size" data-value="2XS" data-isvariant="true" data-pk="169479"> 2XS </a>```

If I understood correctly you want to select the element prior to this link (a) that has a "data-value".

I will assume the element prior is a div, so you adjust accordingly.

In this example, you will get all divs that have the following sibling as an "a" and with an attribute data-value="2XS"

div:has(+ a[data-value="2XS"])

Another option for being less specific is to get all divs that have the following sibling as an "a" and that have an attribute data-value (obs.: even if this attribute is null like data-value="")

div:has(+ a[data-value])

I am trying to capture the value after data-value inside the "a" tag (or the value before )
I tried the regex you posted and it won't let me save the selector (text or html ?)
Here is the link to the web site Erkek Açık Sarı Polo Yaka T-Shirt Basic | U.S.Polo Assn.

The selector is actually HTML and if I export it to csv, that entry in the first post shows broken down into lines

There are 2 types of data, I want to capture the data-value for the one not in the disabled class

<a href="#" class="js-variant disabled" data-name="integration_size" data-value="2XS" data-isvariant="true" data-pk="169479"> 2XS </a>
<a href="#" class="js-variant " data-name="integration_size" data-value="L" data-isvariant="true" data-pk="165734"> L </a>

If you want to grab all info but be able to distinguish disable and not disable, you may grab the Class.

{"_id":"forum_regex","startUrl":["https://tr.uspoloassn.com/erkek-yesil-polo-yaka-t-shirt-basic-50249149-vr083/?integration_color=VR004"],"selectors":[{"delay":0,"id":"sizes","multiple":true,"parentSelectors":["_root"],"selector":".js-product-sizes li a","type":"SelectorElement"},{"delay":0,"extractAttribute":"data-value","id":"size","multiple":false,"parentSelectors":["sizes"],"selector":"_parent_","type":"SelectorElementAttribute"},{"delay":0,"extractAttribute":"data-pk","id":"data_pk","multiple":false,"parentSelectors":["sizes"],"selector":"_parent_","type":"SelectorElementAttribute"},{"delay":0,"extractAttribute":"class","id":"status","multiple":false,"parentSelectors":["sizes"],"selector":"_parent_","type":"SelectorElementAttribute"}]}

If is something different, please send your sitemap and an example of what should look like the desired extraction.

Here is my sitemap

{"_id":"link","startUrl":["https://tr.uspoloassn.com/erkek-yesil-polo-yaka-t-shirt-basic-50249149-vr083/?integration_color=VR004"],"selectors":[{"id":"Description","parentSelectors":["_root"],"type":"SelectorText","selector":"h1","multiple":false,"delay":0,"regex":""},{"id":"Price","parentSelectors":["_root"],"type":"SelectorText","selector":".product__payment--price.hidden-xs ins","multiple":false,"delay":0,"regex":""},{"id":"SKU","parentSelectors":["_root"],"type":"SelectorText","selector":"div.product-sku","multiple":false,"delay":0,"regex":""},{"id":"size-avail","parentSelectors":["_root"],"type":"SelectorHTML","selector":".js-product-sizes li","multiple":true,"regex":"<a href=\"#\" class=\"js-variant \" data-name=\"integration_size\" data-value=\"[A-Z2-5]{0,3}\"","delay":0}]}

I am trying to capture the description, price, sku and sizes (available ones only)

in that case, you may use a selector excluding the li´s that have class="js-variant disabled"

You may use element attribute or just text. both will work.

selector: element attribute

{"_id":"forum_regex2","startUrl":["https://tr.uspoloassn.com/erkek-yesil-polo-yaka-t-shirt-basic-50249149-vr083/?integration_color=VR004"],"selectors":[{"delay":0,"id":"Description","multiple":false,"parentSelectors":["_root"],"regex":"","selector":"h1","type":"SelectorText"},{"delay":0,"id":"Price","multiple":false,"parentSelectors":["_root"],"regex":"","selector":".product__payment--price.hidden-xs ins","type":"SelectorText"},{"delay":0,"id":"SKU","multiple":false,"parentSelectors":["_root"],"regex":"","selector":"div.product-sku","type":"SelectorText"},{"delay":0,"extractAttribute":"data-value","id":"size-avail","multiple":true,"parentSelectors":["_root"],"selector":".js-product-sizes li > a:not(.disabled)","type":"SelectorElementAttribute"}]}

selector: text

{"_id":"forum_regex2","startUrl":["https://tr.uspoloassn.com/erkek-yesil-polo-yaka-t-shirt-basic-50249149-vr083/?integration_color=VR004"],"selectors":[{"delay":0,"id":"Description","multiple":false,"parentSelectors":["_root"],"regex":"","selector":"h1","type":"SelectorText"},{"delay":0,"id":"Price","multiple":false,"parentSelectors":["_root"],"regex":"","selector":".product__payment--price.hidden-xs ins","type":"SelectorText"},{"delay":0,"id":"SKU","multiple":false,"parentSelectors":["_root"],"regex":"","selector":"div.product-sku","type":"SelectorText"},{"delay":0,"id":"size-avail","multiple":true,"parentSelectors":["_root"],"regex":"","selector":".js-product-sizes li > a:not(.disabled)","type":"SelectorText"}]}

Thank you, that worked. I have a new challenge now, the website shows a number of items in each page and the visitor has to click show more to see the rest so the scraper only captures the first page

{"_id":"uspolo","startUrl":["https://tr.uspoloassn.com/"],"selectors":[{"id":"category-link","parentSelectors":["_root"],"type":"SelectorLink","selector":"#hafta-sonu-firsatlari-1 .button a","multiple":true,"delay":0},{"id":"product","parentSelectors":["category-link"],"type":"SelectorLink","selector":".product__name a","multiple":true,"delay":0},{"id":"description","parentSelectors":["product"],"type":"SelectorText","selector":"h1","multiple":false,"delay":0,"regex":""},{"id":"price","parentSelectors":["product"],"type":"SelectorText","selector":".hidden-xs del","multiple":false,"delay":0,"regex":""},{"id":"sku","parentSelectors":["product"],"type":"SelectorText","selector":"div.product-sku","multiple":false,"delay":0,"regex":""},{"id":"size-avail","parentSelectors":["product"],"type":"SelectorElementAttribute","selector":".js-product-sizes li > a:not(.disabled)","multiple":true,"delay":0,"extractAttribute":"data-value"},{"id":"discounted_price","parentSelectors":["product"],"type":"SelectorText","selector":".product__payment--price.hidden-xs ins","multiple":false,"delay":0,"regex":""}]}