The problem is that my formula doesn’t work properly when there are additional parameters in the URL like https://target-ads.example.com/track?campaign_id=9876543210&source=email&utm_medium=newsletter. I need to capture only the digits that immediately follow the equals sign, ignoring any ampersands or other characters that might come after the number sequence.
Had this exact issue migrating campaign data from our old tracking system. Your regex pattern is greedy and doesn’t know when to stop capturing digits. Here’s what worked for me - tweaked the SUBSTITUTE approach: IF(REGEX_MATCH({Campaign URL}, “campaign_id=\d+”), SUBSTITUTE(REGEX_EXTRACT({Campaign URL}, “campaign_id=\d+(?=&|$)”), “campaign_id=”, “”)) The trick is being specific about the parameter name and using (?=&|$) to stop at either an ampersand or end of string. Prevents it from grabbing digits from other parameters. I’ve found targeting the exact parameter name beats just matching any equals sign - avoids false matches when you’ve got other numeric parameters. Ran this on thousands of campaign URLs and it’s been solid.
Your regex doesn’t set proper boundaries - that’s the main problem. I had the same issue with tracking URLs and switching to a more precise pattern fixed it right away. The issue is \d+ keeps matching digits no matter what comes after. Try this instead: IF(REGEX_MATCH({Campaign URL}, “=(\d+)”), REGEX_EXTRACT({Campaign URL}, “=(\d+)(?=&|?|$)”, “$1”)) The (?=&|\?|$) part is a lookahead that stops capturing digits when it hits an ampersand, question mark, or the end of the string. This stops it from bleeding into other URL parameters while keeping accuracy across different URL formats. I’ve used this method on campaign data with tons of parameter variations. The lookahead consistently beats simple substitution because it handles edge cases like URLs that end with your target parameter or have multiple numbers.
The trick is using capture groups - wrap \d+ in parentheses, then reference $1 to get just the digits without the equals sign.
I hit the same problem building analytics dashboards at work. URLs get messy with multiple parameters, but this pattern stops at the first non-digit after the equals sign.
Still having trouble? Try being more specific about what follows the number:
Honestly, just add word boundaries to your pattern. Try REGEX_EXTRACT({Campaign URL}, "=(\d+)(?=\D|$)", "$1") - the (?=\D|$) stops at any non-digit or end of string. Works way better than lookaheads for ampersands since URLs get weird formatting sometimes.