Extract URL parameter value from Google News API response

I’m working with Google News API responses and getting these really long URLs that have multiple parameters. The URLs look something like this with tons of query parameters, but I only need to get the actual destination URL that comes after the &url= parameter.

Basically, I want to extract just the clean URL from all that mess of parameters. What’s the best way to parse this and grab only the URL value that follows &url= in the string?

Here’s what I tried so far:

$response_url = 'news.google.com/articles?id=xyz123&source=search&url=https://example-news.com/article/breaking-news-story';
$url_parts = parse_url($response_url);
parse_str($url_parts['query'], $params);
echo $params['url'];

This approach should extract the target URL from the url parameter. Does this look right or is there a better method?

Been dealing with Google News URLs for months and your PHP solution works for most cases. Here’s what I learned the hard way - Google sometimes wraps URLs in extra redirects or uses different parameter names like ‘dest’ or ‘target’ depending on the source. I’d add a fallback check for these alternate parameters. Also, some Google News URLs don’t have the url parameter at all and just redirect through their system. You’ll need to follow the redirect with curl to get the final destination. Your code handles the standard case well, but add error handling for missing parameters. Trust me, it’ll save you debugging time later.

Looks good, but heads up on nested encoding. Google sometimes nests URL params multiple times - you might need to run urldecode() two or three times to get the clean destination. Also, I’ve seen the URL param come first instead of last, which would break your regex.

Your parse_url() and parse_str() approach is solid - that’s the standard way to do this in PHP. I’ve used the same method for Google News URL extraction for two years and it’s reliable. Just add URL decoding since Google sometimes encodes destination URLs. Run the parameter value through urldecode() after you get it. Also check if the ‘url’ parameter exists in the array before accessing it or you’ll get undefined index notices. You could use regex instead but that’s messier and harder to maintain. Just handle cases where the URL parameter might be missing.

This topic was automatically closed 4 days after the last reply. New replies are no longer allowed.