Regex parsing matching backwards

I'm trying to strip UTM parameters from links, so I'm using regex to try to remove the ? from the URL and anything after.

My regex: \?.*

But instead of filtering out the string, the parser is only keeping things that match. How do I reverse it so it's filtering it the other way?

(In this example, there is only one string with UTM params, so it's only matching one record.)

try this way: .+?(?=/?)

Hmm, .+?(?=/?) just gives me the output of "h" on every row?

Hi, you can try:

^[^?]*
1 Like

That one did it, thank you!

I don't have a huge handle on regex (and mostly getting by with a little help from ChatGPT), but I understand what this is doing compared to mine. I guess my question going forward is... should I have expected my original syntax to work, or does Web Scraper just parse things in a different way?

Hey, no shame in using chatGPT, that's how I did it :smiley:

My prompt: regex to match string until the first occurrence of ?

Your original regex matched the string after the '?' and Parser returns the matched part .

This is the expected behavior for regex, and it is a "match" or "find" operation. So you should think in terms of what you want to match, and not what you need to remove. That is why JanAp's regex works. Personally I would go with something like the one below which would match only URLs:

^http[^?]+