How to exclude part of element?

clem · February 5, 2021, 10:16pm

Hi,

How to exclude part of element? For example in below sitemap I only want the value "12 Watts" but I only manage to scrape "Power Handling (RMS) De belasting in elektrisch vermogen uitgedrukt in watt dat een luidsprekerspoel kan opnemen voor een langere tijd, zonder de luidsprekerspoel te beschadigen. RMS staat voor Root Mean Square. 12 Watts" .

This is what I see when I inspect the element:

< li class >
< span > Power Handling (RMS)
< i class ="info-wrapper"> ... < /i >
< /span >
12 Watts < /li >
::after
< /li >

This is the selector I use that gives me all the text:. .list-info li:nth-of-type(4)
If I'd want everything BUT the value, i can use: .list-info li:nth-of-type(4) span
So what selector do I use to only get 12 Watts ? without the span

EDIT: in this particular case I can manage with a Regex expression (?<=Square.)[1-9 ]+ (result = 12 ) but there must be a better, more generalized way, no?

In case it might prove helpful, if I use plugin "CSS Selector Finder" plugin for the value in developer console it gives me error: Can't generate CSS selector for non-element node type.

Url: Tang Band W3-1878 woofer kopen - SoundImports

Sitemap:
{"_id":"soundimports_oneitempage","startUrl":["h t t p s://www.soundimports.eu/nl/tang-band-w3-1878.html"],"selectors":[{"id":"specs_element","type":"SelectorElement","parentSelectors":["_root"],"selector":"article.a","multiple":false,"delay":0},{"id":"value-select--not-working","type":"SelectorText","parentSelectors":["specs_element"],"selector":"li:nth-of-type(4) ","multiple":false,"regex":"","delay":0}]}

leemeng · February 6, 2021, 11:37pm

Due to site's HTML structure, regex is the way to go. You can try these better selector and regex:

Selector: ul.list-info li:contains('Power Handling \(RMS')
Regex: [\d]+ Watts$

If some of the watts have a decimal point, e.g. 3.5 Watts, use:

Regex: [\d\.]+ Watts$

clem · February 7, 2021, 12:31am

Thank you. I went the regex route already, but good to know there was not a quicker way The selector alternative is new to me, for the regex I'll need to look up what \d does. For now I went with (?<="end of beginning string that I wish to ignore").*$