Extract specific URL parameter from Google News API response

I’m working with Google News API and getting these really long URLs that look messy. The API returns URLs like this long string with tons of parameters. What I need is to grab just the actual article URL that comes after the &url= part in the response.

Here’s what I’m trying to do with PHP:

$apiResponse = 'news.google.com/articles?id=123&source=test&redirect=true&target=https://www.example.com/article/tech-news-today';
$urlParts = parse_url($apiResponse);
parse_str($urlParts['query'], $parameters);
$actualUrl = $parameters['target'];
echo $actualUrl;

This should extract the real article URL from the target parameter. Anyone know if this approach works correctly or if there’s a better way to handle this?

Ran into this same problem last month scraping news articles. Google sometimes returns malformed query strings where the URL parameter isn’t encoded properly - especially when the article URL has its own query parameters. Your parse_url method works but breaks if the nested URL has fragments or special characters. I ended up using regex as a fallback when parse_str fails - something like preg_match(‘/[&?]target=([^&]+)/’, $apiResponse, $matches) then urldecode the match. Also, some Google News responses don’t use standard parameter separation, so you might hit edge cases where URLs run together without proper ampersands.

your code looks solid, but heads up on url encoding. the target parameter might have encoded stuff like %20 for spaces. I’d run urldecode() on it just in case. also, check if ‘target’ exists first so you don’t get notices.

I’ve done similar URL extraction work and your approach looks solid for most cases. Just heads up - Google News likes to switch parameter names depending on which API version or response format you hit. I’ve seen them use ‘url’ instead of ‘target’, or sometimes ‘link’. Worth checking for multiple parameter names as backup. Also, heads up on nested redirects - the URL you extract might just lead to another redirect service instead of the final page. If you need the actual final destination, you’ll have to follow the redirect chain with get_headers() and the redirect context option.

This topic was automatically closed 24 hours after the last reply. New replies are no longer allowed.