How to check status codes for URLs in a large Airtable dataset?

Hey everyone! I’m working with a huge Airtable database that has over 24k website records. A lot of these URLs have small mistakes like missing slashes or extra spaces. We want to find these errors before we start fixing them by hand.

We tried using the fetch method to check each URL and get its status. Here’s a simple script we used:

async function checkUrl(url) {
  try {
    const result = await fetch(url);
    return result.status;
  } catch (err) {
    return 'failed';
  }
}

But we hit some snags:

  1. It doesn’t follow redirects, so it says ‘failed’ even if the URL works after redirecting.
  2. We only get ‘200’ for working URLs or ‘failed’ for errors. We’d love to see the actual error codes.

Any ideas on how to improve this? We’re stuck and could use some help. Thanks!

I’ve dealt with similar challenges when auditing large URL datasets. From my experience, using a more robust HTTP client library like axios can significantly improve your URL checking process because it handles redirects automatically and provides detailed error information.

You can install axios using npm install axios, then update your function to include options for following redirects and setting a timeout. For example, require axios at the beginning of your script and modify your checkUrl function so that it returns the response status on success or, if an error occurs, checks if error.response is defined to return the error status code, otherwise returning a ‘Network Error’.

This approach helped me reliably handle similar large-scale URL validations. Ensure you also implement rate limiting to prevent overwhelming the servers you query. Good luck with your project!

yo, been there done that! try using the got library, it’s a game changer. install it with npm and update ur function like this:

const got = require(‘got’);

async function checkUrl(url) {
try {
const res = await got(url);
return res.statusCode;
} catch (err) {
return err.response?.statusCode || ‘Error’;
}
}

this’ll handle redirects and give u proper status codes. good luck mate!

Having worked on similar projects, I’d suggest using the ‘node-fetch’ library instead of the built-in fetch. It’s more feature-rich and handles redirects out of the box. You can install it via npm.

Here’s an improved version of your function:

const fetch = require('node-fetch');

async function checkUrl(url) {
  try {
    const response = await fetch(url, { redirect: 'follow' });
    return response.status;
  } catch (err) {
    return err.code || 'Unknown error';
  }
}

This will follow redirects and return actual HTTP status codes, not just 200 or ‘failed’. For large datasets, consider implementing a queue system with rate limiting to avoid overwhelming the target servers or getting your IP blocked. Also, running this in batches might be more manageable than trying to process all 24k URLs at once.