How to extract viewer numbers and channel names from streaming platform data using regex?

I’m working with streaming data that looks like this:

Late Night Gaming Session with @ninja
15,234 viewers on ninja


Just Chatting Stream
8,567 viewers on pokimane


Speedrun Attempts - New PB Today!
4,123 viewers on shroud


Music Production Live
2,890 viewers on deadmau5


Cooking Stream Tonight
1,675 viewers on amouranth


Ranked Games All Day
987 viewers on tfue

I need help creating a regex pattern that can pull out three things from each entry: the stream title, the viewer count number, and the streamer’s username. Right now I have this pattern ([^\n]+)\n([^\n]+)\n{2} but it only grabs the title and the whole viewer line together. What I really want is to get something like Late Night Gaming Session with @ninja, 15,234, and ninja as three different capture groups. Can regex handle this kind of data extraction? I’ve been struggling with this for a while and would appreciate any help with the pattern structure.

Try this pattern: ^(.+)\n(\d{1,3}(?:,\d{3})*) viewers on (\w+)$ with multiline flag enabled. First group catches the title, second grabs the number, and third gets the username. Worked for me with similar streaming data.

Your regex is close but needs a few tweaks. Try ^(.+?)\n(\d{1,3}(?:,\d{3})*)\s+viewers\s+on\s+(\w+) with the m flag. Main fixes: non-greedy quantifier for the title and proper escaping around “viewers on”. I’ve parsed streaming logs before and the non-greedy first group stops it from grabbing too much text. Also, smaller viewer counts won’t have commas, so (\d+(?:,\d{3})*) might work better to handle both. Test it on your full dataset though - streaming platforms format things differently.

regex gets messy fast with streaming data. i’d use ^(.+)\n(\d[\d,]*) viewers on (\S+)$ in multiline mode. grabs titles, handles numbers with commas, and catches usernames better than \w+ since that misses underscores.

Had this same problem scraping Twitch analytics last month. This pattern worked for me: (.*)\n(\d{1,3}(?:,\d{3})*)\s+viewers\s+on\s+(\w+) but watch out for edge cases. Some usernames have special characters that \w+ misses, and viewer counts under 1,000 don’t use commas so the number matching breaks. Multi-line stream titles will also mess things up - learned that the hard way. I preprocess the data first to normalize whitespace, then run the regex. Been solid across different platforms, though you’ll probably need tweaks depending on how each one formats exports.

Regex becomes a nightmare when data formats change or you’re juggling multiple sources. Trust me, I’ve been there.

You need a real data extraction pipeline that handles streaming APIs and parses different formats automatically. I’ve built systems like this for pulling live viewer counts and channel data from various platforms.

Skip the regex mess. Set up an automated workflow that connects directly to streaming APIs, grabs your data, and formats it how you want. When data structures change, you won’t be scrambling to fix broken patterns.

I use Latenode for exactly this. Build workflows that pull streaming data, parse it cleanly, and route it anywhere you need. Way more solid than regex that breaks every update.

Check it out: https://latenode.com