Extracting current game from Twitch streamer using web scraping

I’m trying to figure out how to get the name of the game a Twitch streamer is currently playing. I’ve been working with Selenium and BeautifulSoup, but I’m not sure if I’m on the right track.

Here’s what I’ve got so far:

import time
from selenium import webdriver
from bs4 import BeautifulSoup

browser = webdriver.Firefox()
browser.get('https://www.twitch.tv/pokimane')
time.sleep(5)  # Wait for the page to load completely

page_content = browser.page_source
soup = BeautifulSoup(page_content, 'html.parser')

game_element = soup.find('p', {'data-a-target': 'stream-game-name'})
print(game_element.text if game_element else 'Game not found')

browser.quit()

This code opens a Twitch stream page, waits a few seconds, and then tries to extract the game name by looking for a specific element. However, it’s not returning the expected result. Am I overlooking something, or is there a better method to achieve this? Any guidance would be greatly appreciated!

hey, i’ve had similar issues. twitch can be a pain to scrape. have u tried using the twitch-python library? it wraps the api nicely and makes getting stream info way easier. somthing like:

from twitch import TwitchClient
client = TwitchClient(client_id=‘your_client_id’)
stream = client.streams.get_stream_by_user(‘pokimane’)
print(stream[‘game’])

might work better for ya. good luck!

I’ve been down this road before, and let me tell you, web scraping Twitch can be a real headache. While your approach isn’t bad, I’ve found a more reliable method using the TwitchAPI library. It’s a wrapper for the official Twitch API that’s much easier to work with than raw requests.

Here’s a snippet that’s worked wonders for me:

from twitchAPI.twitch import Twitch

twitch = Twitch('your_client_id', 'your_client_secret')
user = twitch.get_users(logins=['pokimane'])
stream = twitch.get_streams(user_id=user['data'][0]['id'])

if stream['data']:
    print(f"Current game: {stream['data'][0]['game_name']}")
else:
    print('Stream offline or game not found')

You’ll need to register your app on the Twitch dev portal to get the client ID and secret, but trust me, it’s worth it. This method is faster, more reliable, and less likely to break when Twitch updates their site. Plus, you won’t have to worry about rate limiting or getting your IP blocked for excessive scraping.

Your approach is on the right track, but Twitch’s dynamic content loading can be tricky. Have you considered using Twitch’s API instead? It’s more reliable and efficient for this task. You’d need to register an application and use OAuth for authentication, but it provides direct access to stream information, including the current game. If you’re set on web scraping, try increasing the wait time or implementing a wait for the specific element to load. Also, Twitch’s structure might have changed - double-check the element you’re targeting by inspecting the page source. Lastly, some streamers use custom overlays that can interfere with scraping, so keep that in mind when testing your script.

I’ve actually tackled this problem before, and I found that using Twitch’s API is indeed the most reliable method. However, if you’re determined to stick with web scraping, I can share a workaround that worked for me.

Instead of relying on Selenium, I had success using requests and a custom header to mimic a browser. Here’s a snippet that might help:

import requests
from bs4 import BeautifulSoup

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}

response = requests.get('https://www.twitch.tv/pokimane', headers=headers)
soup = BeautifulSoup(response.content, 'html.parser')

game_element = soup.find('a', {'data-a-target': 'stream-game-link'})
print(game_element.text if game_element else 'Game not found')

This approach is faster and doesn’t require browser automation. Just remember that web scraping can break if Twitch changes their HTML structure, so you might need to update your selectors periodically.

Having worked extensively with Twitch data, I can confirm that their API is the most reliable solution for obtaining current game information. However, if you’re committed to web scraping, consider using playwright instead of Selenium. It’s more robust for handling dynamic content.

Here’s a snippet that might work:

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch()
    page = browser.new_page()
    page.goto('https://www.twitch.tv/pokimane')
    page.wait_for_selector('[data-a-target=\"stream-game-link\"]')
    game_element = page.query_selector('[data-a-target=\"stream-game-link\"]')
    game_name = game_element.inner_text() if game_element else 'Game not found'
    print(game_name)
    browser.close()

This approach waits for the specific element to load before attempting to extract the game name, which should improve reliability. Remember to install playwright first with pip install playwright.