Extract URL parameter value from Google News API response

I’m working with Google News API and getting these long redirect URLs that contain the actual article link as a parameter. The URLs look something like this:

http://news.google.com/news/url?sa=t&fd=R&ct2=us&usg=AFQjCNFhnbiaA4JmbJdwKBzJOxs2B49LfQ&clid=c3a7d30bb8a4878e06b80cf16b898331&ei=eBZlU8CyBIWc1QajIA&url=http://example-news-site.com/article/123

I need to extract just the part that comes after &url= which is the actual article URL. What’s the best way to parse this and get only the target URL from the parameter?

$redirect_url = 'http://news.google.com/news/url?sa=t&fd=R&ct2=us&usg=AFQjCNFhnbiaA4JmbJdwKBzJOxs2B49LfQ&clid=c3a7d30bb8a4878e06b80cf16b898331&url=http://example-news-site.com/article/123';

// Need to extract: http://example-news-site.com/article/123
$target_url = extract_target_url($redirect_url);
echo $target_url;

Any suggestions on how to implement the extract_target_url() function would be helpful.

Been dealing with similar URL parsing headaches for years. Quick PHP fix using parse_url() and parse_str():

function extract_target_url($redirect_url) {
    $parsed = parse_url($redirect_url);
    parse_str($parsed['query'], $params);
    return isset($params['url']) ? $params['url'] : null;
}

But if you’re processing Google News feeds regularly, you’ll want this automated. I built a workflow that pulls from Google News API, extracts real URLs, then checks if articles are live, categorizes them, or pushes to other systems.

Automation handles edge cases too - encoded URL parameters, validation before using extracted URLs, all that stuff.

I use Latenode for these data processing pipelines. Set it up to auto-fetch from Google News, parse redirect URLs, clean them up, and send results wherever needed. Way cleaner than running PHP scripts manually.

Had this exact issue when building a news aggregator. The parse_url solutions work fine, but Google sometimes puts the url parameter first instead of last - learned that the hard way.

Simple string method that handles both cases:

function extract_target_url($redirect_url) {
    $url_pos = strpos($redirect_url, '&url=');
    if ($url_pos === false) {
        $url_pos = strpos($redirect_url, '?url=');
        if ($url_pos === false) return null;
        $url_pos += 5; // length of '?url='
    } else {
        $url_pos += 5; // length of '&url='
    }
    
    $end_pos = strpos($redirect_url, '&', $url_pos);
    $target = ($end_pos !== false) ? 
        substr($redirect_url, $url_pos, $end_pos - $url_pos) : 
        substr($redirect_url, $url_pos);
    
    return urldecode($target);
}

Grabs everything between the url parameter and the next ampersand. No regex overhead and handles Google’s parameter ordering quirks. Saved me hours of debugging when they changed their URL structure last year.

Test with a few different Google News URLs first - their format shifts depending on the news source.

Regex works great for this - much simpler than parse_url functions. I use this when batch processing Google News URLs:

function extract_target_url($redirect_url) {
    preg_match('/[&?]url=([^&]+)/', $redirect_url, $matches);
    return isset($matches[1]) ? urldecode($matches[1]) : null;
}

The regex finds &url= or ?url= and grabs everything until the next ampersand. Works with all Google News URL formats I’ve seen. Don’t forget urldecode - the target URLs are usually URL-encoded. Handles hundreds of URLs without breaking a sweat.

substring approach is way easier. find where &url= starts, then grab everything after it until you hit another & or the string ends. $start = strpos($url, '&url=') + 5; $end = strpos($url, '&', $start); return urldecode(substr($url, $start, $end-$start)); does the job without the parsing overhead

yeah, parse_str is great for this! just get the query part of the URL, then extract the ‘url’ param like this: parse_str(parse_url($redirect_url, PHP_URL_QUERY), $params); return $params['url'];. been using it for ages, works like a charm!

You’ll hit a wall scaling this up. Processing hundreds or thousands of Google News URLs manually? That gets messy fast.

PHP works for single URLs, but what about batch processing feeds? Error handling for broken URLs? Validating extracted URLs before you use them?

I’ve automated this entire process. Instead of writing custom PHP functions, I built a workflow that handles Google News API calls, parses redirect URLs automatically, validates extracted URLs, and routes them based on content type or source.

The automation handles Google’s URL format changes too. When they switched parameter ordering last time, my manual scripts broke. Now the workflow adapts on its own.

You can build the same thing with visual automation. Pull from Google News API, extract URL parameters with built-in parsing, validate results, then send clean URLs to your database or content system. No more manual PHP scripting.

Latenode makes these data processing pipelines simple to set up and maintain.

Hit this same problem scraping news feeds last month. The parse_url approach works, but definitely add error handling - Google’s URLs aren’t always consistent.

function extract_target_url($redirect_url) {
    $query_string = parse_url($redirect_url, PHP_URL_QUERY);
    if (!$query_string) return false;
    
    parse_str($query_string, $params);
    return isset($params['url']) ? urldecode($params['url']) : false;
}

Don’t skip the urldecode() - target URLs often come back encoded. Google also switches up their parameter structure sometimes, so check for both ‘url’ and ‘q’ parameters as backups. I’ve seen the parameter name change based on news source or region.