I’m trying to extract multiple ID numbers from a URL string but my regex pattern is only capturing the first match instead of all of them.
Here’s what I’m working with:
import re
api_url = "https://example-service.com/api/messages/987654321,123456789.json"
pattern = r'\b\d+\b'
matches = re.findall(pattern, api_url)
result = []
for match in matches:
result.append({'id': match})
return result
The URL contains several numeric IDs separated by commas, but I’m only getting back one ID instead of both. I need to extract all the numbers so I can create separate API requests for each ID later.
Is there something wrong with my regex pattern or how I’m processing the results? Any help would be great!
Check if you’re running this inside a function or class method with multiple return paths. I hit something similar where conditional logic was cutting off my results early. Your regex pattern looks good - \b\d+\b should definitely catch both numbers in that URL. I’d add some debug output right after the findall to see what’s being captured: print(f'Found matches: {matches}'). If you see both numbers there but still get cut off, then it’s definitely your processing logic after the regex, not the pattern itself. Also make sure you don’t have any list comprehensions or filtering elsewhere that might be limiting results to just the first element.
Hit this exact problem last month. Your regex works fine - I tested your code and it grabbed both numbers no problem. Like pixelPilot said, it’s probably your return statement placement, but I’ve seen another thing cause this. Check for early returns or breaks in any try/except blocks around that code. I had exception handling that was quietly catching something and only returning partial results. Also make sure you’re not slicing the matches list somewhere - I once had matches[:1] hidden in my code that took ages to find. Debugging trick that helped me: throw in print(f’Total matches found: {len(matches)}') right after findall. If it shows the right count but your result is still cut off, you know the regex isn’t the issue.
that’s strange - your regex looks fine. i tested r'\b\d+\b' with findall() on a similar url and it grabbed all the numbers perfectly. try printing the matches variable before your loop to see what’s actually getting captured. something else in your code might be causing this.
Your regex pattern works fine - I tested it and it grabs both numbers no problem. The issue’s probably in your code structure, not the pattern. I hit something similar where nested functions made me return from the wrong scope too early. Check that your whole code block has the same indentation and you don’t have any sneaky return statements above your loop. Also make sure you’re not overwriting the matches variable between findall and your loop. Quick test: throw in print(matches) right after the findall line. If you see both numbers there, your regex is perfect and something’s wrong with how you’re processing the results.
Been wrestling with URL parsing for years and this drives me nuts. Your regex looks fine - probably some hidden code issue like others said.
What bugs me is you’re building a one-off solution you’ll have to maintain forever. Plus separate API requests for each ID? That’s more complexity with error handling, rate limits, retries.
I used to write these scripts constantly. Parse URLs, extract values, loop API calls, handle responses. Now I set up the pipeline once and let it run.
URL parsing, data extraction, and API requests all happen without custom regex. Built-in error handling and easy workflow changes without touching Python.
Honestly sounds like you’re testing in Jupyter or an interactive shell - things get weird there sometimes. Try running the same code in a fresh .py file. I’ve seen findall act differently in notebooks vs regular Python scripts for whatever reason. Also check if you’ve got other regex imports or functions overriding the standard re module.
Your regex pattern’s fine, but you’re probably returning results wrong. If you’ve got a return statement inside your loop, it’ll bail after the first match - that’s why you’re only seeing one result.
Move the return outside the loop:
import re
api_url = "https://example-service.com/api/messages/987654321,123456789.json"
pattern = r'\b\d+\b'
matches = re.findall(pattern, api_url)
result = []
for match in matches:
result.append({'id': match})
return result # This should be outside the loop
Honestly though, this screams automation to me. Why write regex parsing code every time you need to extract data and hit APIs? You could automate this whole workflow.
I deal with this constantly - parsing URLs, grabbing multiple values, then firing off separate API requests for each one. The pattern matching, data transformation, and API calls all run automatically without custom Python scripts.
Had the same problem - findall worked but gave weird results. Check if you’re doing any string preprocessing before the regex runs. I wasted hours debugging this once, turns out I had a string replace earlier in my code that was stripping digits or messing with the URL format. Also make sure your URL variable actually has what you expect - throw a print statement right before the regex to see if the string looks right. Your pattern should definitely catch both numbers in that example URL. Oh, and if you’re using a web framework or async stuff, watch out for encoding issues that might mess with your string before it hits the regex.