Regex noob and am baffled

I am scraping just fine (great tool) but spend a lot of time later sanitizing and throwing away some of the data using find and replace in spreadsheet or CSVed..

One field I collect information from has many new lines / eol / cr etc.

I only need the first 15 characters from that field anyway.

I have spent all afternoon trying different RegEx expressions to try and throw away / ignore anything from char 16 onwards or to ignore non alphanumeric data such as carriage returns etc or to select only letters and numbers and I have failed miserably. :confused:

I've looked at dozens of examples for Javascript Regex, Pearl, PHP etc but they are often trying to do way more than what I need, I can't get them to work, it feels like I'm just missing a quote or a bracket somewhere.

Be grateful for a few pointers.

Have you tried this: .{15} ?

1 Like

Thanks for replying and LOL, no, I ended up doing this:
[a-zA-Z0-9]..........................

Approx 15 dots :blush:

I've been making noob mistakes regarding wildcards (I'm used to * )

The more examples I see the better I get it, I can understand your example above and I'm going to change my Regex to that

The field I'm extracting data from typically has hundreds of rows with values like this:
Views: 17 (12 Unique)

The information that I really want are the two figures.

After downloading my CSV I do a few searches and replaces

I change the ( to a comma,

Unique) and Views: get deleted altogether (searched and replaced with nothing).

For the example above I end up with 17,12 - I can split that column easily in Google Sheets.

I'm sure there's probably a regular expression that would take care of most or all of that at the data collection stage.

Any pointers on that?

Select each row if it is possible and try this regex: \s\d+\s\(\d+ .

1 Like

Thanks again. That works perfectly. Could you explain it or is that too much to write?