How to validate HTTP status codes for website URLs stored in database records

I’m working with a large database containing approximately 24,000 website records. Many of these URLs have formatting issues like missing slashes or extra spaces that cause them to fail. I need to identify which URLs are broken so I can fix them manually.

My current approach
I’ve been using a fetch-based solution to test each URL and capture the response status:

const config = input.config();
const websiteUrl = config.websiteUrl;
let responseCode;

try {
    const result = await fetch(websiteUrl);
    responseCode = result.status;
} catch (err) {
    responseCode = 'failed';
}

output.set('responseCode', responseCode);

Problems I’m facing

  1. My script doesn’t handle redirects properly and returns “failed” even when the URL works after following redirects
  2. I only get “200” for working URLs or “failed” for broken ones, but I can’t see the actual HTTP error codes like 404 or 500

How can I modify this to properly handle redirects and capture specific HTTP status codes for debugging purposes?

that’s strange - fetch should follow redirects by default. maybe try logging result.url to see if it’s changing. for status codes, you should see actual codes even for errors. only network failures raise exceptions, 404s should just return the status.

quick tip - check if your urls have trailing slashes or query params causing issues. some servers are picky about exact formatting and return errors even when the base url works fine. also watch out for mixed http/https - browsers handle this differently than fetch does.

Your fetch code looks correct for basic requests, but the issue lies in how you’re managing promise rejections. Remember that Fetch API only rejects on network errors, not for HTTP responses like 404 or 500—those responses are successful, merely indicating an error status in result.status. Therefore, adjust your error handling as follows:

const result = await fetch(websiteUrl);
responseCode = result.status; // This captures 200, 404, 500, etc.

As for follows, Fetch API automatically handles redirects (up to 20), so if you’re encountering “failed” responses, it might be due to CORS issues or real network failures rather than redirection problems. Enhancing your error handling can help you distinguish between network errors and valid HTTP errors.

Been dealing with URL validation at scale for years and hit this exact problem. Your fetch logic works, but you’re missing a few things.

First, add a timeout. Without it, broken URLs hang forever:

const controller = new AbortController();
const timeoutId = setTimeout(() => controller.abort(), 5000);

try {
    const result = await fetch(websiteUrl, { signal: controller.signal });
    clearTimeout(timeoutId);
    responseCode = result.status;
} catch (err) {
    clearTimeout(timeoutId);
    if (err.name === 'AbortError') {
        responseCode = 'timeout';
    } else {
        responseCode = 'network_error';
    }
}

Second - with 24k records, you probably have protocol issues. Always normalize first:

let normalizedUrl = websiteUrl.trim();
if (!normalizedUrl.startsWith('http')) {
    normalizedUrl = 'https://' + normalizedUrl;
}

Fetch follows redirects automatically. If you’re getting failures on redirects, it’s likely SSL certificate issues or mixed content problems.

This helped me understand what different status codes actually mean when debugging URL issues.

One more tip - batch your requests with delays. Hitting 24k URLs rapid fire gets you rate limited or blocked by most sites.

I encountered similar issues when bulk validating URLs. CORS restrictions can complicate things; many sites block requests from external domains, making your fetch approach unreliable. Given that you have 24,000 URLs, a server-side solution or utilizing a headless browser is advisable. I transitioned to a Node.js script with axios, which greatly improved timeout and error handling. Alternatively, consider using Puppeteer if you want to render the pages, although it tends to be slower. For accurate status codes, always check both result.ok and result.status, as CDNs and proxies might return a 200 status even when content is unavailable. Implementing timeout handling is crucial too; hanging requests can be a significant issue.