Hi.
In a short: I'd like to extract specific text from a text selector, but it appears to be formatted with line breaks and I can't get any regex to work.
The details:
On this page: https://www.ha.com/c/search-results.zx?N=0+790+231&Nty=1&Ntt=Patrick+Nagel&Ntk=SI_Titles-Desc&erpp=24
Each search result has an item-title element that has this kind of structure:
<a href="url omitted" class="item-title">
<b>PATRICK NAGEL</b>
"(American, 1945-1984)"
<br>
<i>Untitled (Her Look)</i>
", 1983"
<br>
"Acrylic on canvas"
<br>
"36 x 33 in."
<br>
"Signed lower left"
</a>
The simple text looks like this:
PATRICK NAGEL (American, 1945-1984)
Untitled (Her Look) , 1983
Acrylic on canvas
36 x 33 in.
Signed lower left
Extracting the artist (b) and the title of the work (i) is simple enough, but I'd also like to extract the year--in this case, 1983-- which does not have an identifier. I've tried various regex but none seem to match and I suspect it has to do with the breaks. (I'd also be happy with extracting each line separately, if that is doable).
Any ideas?
Sitemap:
{"_id":"test","startUrl":["https://www.ha.com/c/search-results.zx?N=0+790+231&Nty=1&Ntt=Patrick+Nagel&Ntk=SI_Titles-Desc&erpp=24"],"selectors":[{"id":"title","type":"SelectorText","parentSelectors":["_root"],"selector":"a.item-title","multiple":true,"regex":"","delay":0}]}