I’m working with an Airtable base that contains approximately 24,000 website URL records. Many of these URLs have formatting issues like missing slashes or extra spaces that cause broken links. I need to identify which URLs are problematic so I can fix them manually.
My current approach
I’ve been using a fetch-based script to test each URL and check its status:
Redirects aren’t being handled properly - the script returns “failed” even when the URL works after a redirect
I only get “200” for working URLs or “failed” for broken ones. I’d prefer to see the actual HTTP status codes (like 404, 500, etc.) to better understand what’s wrong
Any suggestions on how to improve this approach would be really helpful!
try adding redirect: 'manual' in your fetch options to handle redirects. also, keep an eye on the headers for location and remember to separate network errors from http errors - they work differently. for failure status codes, check if result.ok is false before calling it a fail.
I’ve hit the same URL validation issues before. Your error handling needs work - don’t lump everything into ‘failed’. Check result.status properly since 404s and 500s actually tell you something useful. Only catch actual network failures like CORS or DNS problems in your try-catch block. With 24,000 records, you’ll want delays between requests or you’ll get rate limited. Many sites will block you for hammering them too fast.
also worth mentioning - set proper user-agent headers or sites will auto-block you. check result.redirected to see if the url redirected, then grab result.url for the final destination. super helpful when you’re fixing broken links later.
Your fetch is missing proper error handling - you’re treating network timeouts, DNS failures, and HTTP errors all the same as ‘failed’. Don’t catch everything the same way. When fetch succeeds but returns non-200 status, you can still grab result.status to see if it’s a 404, 500, etc. Only mark true network failures as ‘failed’. Fetch handles 301/302 redirects automatically unless you override it. Add a timeout option too - some URLs will hang forever and slow down your dataset processing.