Extract not the whole cell

0101 · August 14, 2018, 9:22pm

Hi,

I need help again with this page: https://www.sattipp.de/spt/bl-18/1/
There I want to extract some dates from that table; for example the numbers of the cell "2:0 [Info]". How can I scrape only 2:0?...without "[info]"?

hossain007 · August 15, 2018, 1:20am

Here's a nice trick, extract all the info you want, copy & paste on excel, then hit CTRL+F , chose replace, on find what type [Info] & leave "replace with" empty, and finally click on replace all...problem has been solved

0101 · August 15, 2018, 4:47am

Thx for your answer.
I know this "trick"
Isnt it possible to extract a cell without a certain part of text?

hossain007 · August 15, 2018, 2:14pm

I don't think it's possible in that case

iconoclast · August 16, 2018, 10:23pm

Hi!

In order to pick only numbers, followed by a same pattern of results (scores, number:number), you can use RegEx in your selector:

\d+.+\d

Open your score selector and put above mentioned RegEx to pick numbers without [info].

P.S. looks like funny looking meme haha

0101 · August 18, 2018, 3:30pm

Super, it works! But: Why? What does "\d+.+\d" mean?

iconoclast · August 18, 2018, 3:52pm

\d stands for any digit character, + after it will pick all resting same-type characters.
. stands for any single character.

You can test your regex prior using it on https://regex101.com/

Please keep in mind that WebScraper does not support any flags, e.g. Global flag.

At the moment using Global flag can be achieved if WebScraper used in pair with Tampermonkey extension.

0101 · August 19, 2018, 6:59pm

Super, Thank you very much!

I´ve another question (sorry!):

Another website (1) has a table, where the football-results are splitted in 3 cells. I want to scrape it in 1 cell. Is that also with the regex possible?

(1) http://www.fussballvorhersage.de/b1/b1_20180817.htm

iconoclast · August 19, 2018, 8:19pm

Results are split into table data cells there, you can use an Element selector to pick up the rows, and add text selectors inside to pick the results. You don't need regex there.

Example sitemap:

{"_id":"fussballvorhersage","startUrl":["http://www.fussballvorhersage.de/b1/b1_20180817.htm"],"selectors":[{"id":"row","type":"SelectorElement","selector":"tr:nth-of-type(6) tr","parentSelectors":["_root"],"multiple":true,"delay":0},{"id":"count1","type":"SelectorText","selector":"td:nth-of-type(4)","parentSelectors":["row"],"multiple":false,"regex":"","delay":0},{"id":"colon","type":"SelectorText","selector":"td:nth-of-type(5)","parentSelectors":["row"],"multiple":false,"regex":"","delay":0},{"id":"count2","type":"SelectorText","selector":"td:nth-of-type(6)","parentSelectors":["row"],"multiple":false,"regex":"","delay":0}]}

0101 · August 22, 2018, 5:31am

Great, thanks!

Back to regex: "3:1 [Info]"

To scrape the data without "[Info]" works fine.
Now I want to extract the single digits.
First i extracted the first one (3); it works! (\d)
But how to scrape only the second one (1) and without [Info]?

Edit: I got it: "[^d] "