Hey everyone! I’m working with a huge Airtable database that has over 24k website records. A lot of these URLs have small mistakes like missing slashes or extra spaces. We want to find these errors before we start fixing them by hand.
We tried using the fetch method to check each URL and get its status. Here’s a simple script we used:
I’ve dealt with similar challenges when auditing large URL datasets. From my experience, using a more robust HTTP client library like axios can significantly improve your URL checking process because it handles redirects automatically and provides detailed error information.
You can install axios using npm install axios, then update your function to include options for following redirects and setting a timeout. For example, require axios at the beginning of your script and modify your checkUrl function so that it returns the response status on success or, if an error occurs, checks if error.response is defined to return the error status code, otherwise returning a ‘Network Error’.
This approach helped me reliably handle similar large-scale URL validations. Ensure you also implement rate limiting to prevent overwhelming the servers you query. Good luck with your project!
Having worked on similar projects, I’d suggest using the ‘node-fetch’ library instead of the built-in fetch. It’s more feature-rich and handles redirects out of the box. You can install it via npm.
This will follow redirects and return actual HTTP status codes, not just 200 or ‘failed’. For large datasets, consider implementing a queue system with rate limiting to avoid overwhelming the target servers or getting your IP blocked. Also, running this in batches might be more manageable than trying to process all 24k URLs at once.