I’m working on a project where I need to get weekly music rankings for different cities. I found that the official platform doesn’t provide API access for this type of city-specific chart data, so I’m trying to extract it directly from their charts pages.
Instead of getting the actual song data and rankings, I only receive the basic HTML structure with lots of CSS and JavaScript references. The page seems to load the chart information dynamically after the initial page loads.
What I’m looking for:
I need to access the actual track names, artist information, and ranking positions that show up when you visit the page normally in a browser. Since there’s no official API for city charts, web scraping seems like the only way to get this data.
Any suggestions on how to handle this dynamic content loading issue? I’m pretty new to this whole scraping thing so any help would be great.
You’re hitting JavaScript rendered content - basic HTTP requests won’t work. Most people jump straight to Selenium or Playwright, but that’s way more than you need.
I ran into this same thing building a music analytics dashboard. The trick is executing the JavaScript first, then grabbing the data after it loads.
Skip the browser automation headache. I built this as a workflow that handles dynamic loading automatically. Set it to run weekly, pull whatever data you want, and dump it straight into your database or spreadsheet.
No browser dependencies, no timeout issues, no headless Chrome management. Just point it at your chart URLs and let it handle the JavaScript execution while you parse out track names, artists, and rankings.
Want multiple cities? Set up different workflows and schedule them all. Way cleaner than maintaining scraping scripts that break every time they change the page.
Been scraping music chart data for a side project on regional trends. That issue you’re hitting is super common with chart sites. Skip Selenium - it’s a resource hog. Playwright crushes it for this stuff. Way faster and handles JavaScript way better. Here’s what actually works: don’t just wait for page load, wait for the chart elements themselves. These sites load in phases and track data comes last. Target the song title containers specifically and wait until they’re actually populated before you scrape. Watch out for rate limiting and bot detection though. I throw in random delays between requests and rotate user agents when I’m pulling data for multiple cities. Keeps me from getting blocked.
Your problem is the dynamic content loading. urllib only grabs the initial HTML shell before JavaScript loads the actual chart data. I hit the same issue scraping concert venue data that loaded async. requests-html worked way better than jumping straight to Selenium. Much lighter but still handles JavaScript.
from requests_html import HTMLSession
session = HTMLSession()
r = session.get(city_chart_url)
r.html.render() # This executes the JavaScript
# Now you can parse r.html.html for the actual content
render() executes the JavaScript and waits for content to load. Add a sleep parameter if the data takes time to populate. Then use BeautifulSoup or whatever to extract track names and rankings from the rendered HTML.
Way more reliable than managing full browser automation, especially when you’re hitting multiple city charts that need regular updates.
bEautifulSoup won’t work here - the data isn’t in the initial html respnse. You need something that can run JavaScript. Selenium’s probably ur best option even tho it’s overkill. Just run it headless and grab the page source after everything loads. I’ve used this approach for scraping chart data from other music sites and it wrks great.