Performance Issues with Large Dataset Lookup in n8n Self-Hosted Environment - Looking for Solutions

Hi there! I’m dealing with a frustrating performance issue in my n8n automation workflow and hoping someone here can point me in the right direction.

My Setup:

I’ve got a Google Spreadsheet containing 135,000+ business records. My n8n workflow processes job postings and needs to verify if each company name appears in this master list.

Current Challenge:

The Google Sheets integration completely fails when handling this data volume. When I try Get Rows:

  • Sometimes throws Maximum call stack size exceeded error
  • Other times just hangs indefinitely without returning results

Solutions I’ve Attempted:

  1. Filtered queries on the “Company Name” column - still crashes from data size
  2. Exported data to JSON format using a Python script locally
  3. File import attempts in n8n:
    • File operations node - JSON parsing issues with binary data
    • HTTP Request from GitHub raw file - works but extremely slow parsing, can’t pin due to 12MB+ size
  4. Manual data entry via Set node - browser crashes from memory overload
  5. Code node with workflow cache (this.getWorkflowStaticData) - doesn’t persist between executions
  6. Batch processing ideas - still blocked by initial data loading problems

What I Need:

A reliable method to:

  • Quickly verify company existence in this large dataset
  • Avoid re-processing all 135k records on each workflow execution
  • Stay within n8n memory constraints

Any suggestions for caching strategies, external databases, or alternative file hosting approaches? How do others handle large reference datasets in n8n?

Thanks for any help!

Had the same problem with large datasets. Skip n8n for heavy data work - I built a simple REST API with Node.js and used a Map object for O(1) lookups. Loads all records in under 2 seconds, and company checks take less than 10ms. Just throw it on a cheap VPS and have the API return true/false for company names. Hook it up to n8n with an HTTP Request node - way faster than messing with spreadsheets directly. Redis works too if you need more scale, but keeping everything in memory has been rock solid for me.

PostgreSQL with a lookup table fixed this for me. I imported the CSV once, indexed the company_name column, and now queries are instant. The n8n Postgres node works great - just use a simple EXISTS query to check if companies exist. Grab a free PostgreSQL instance on Railway or Supabase if you don’t want to manage your own. Takes 30 minutes to set up but kills all those memory issues. You’ll also get proper data persistence between runs instead of relying on n8n’s sketchy internal caching.

Split your dataset into chunks and use SQLite. I handle similar volumes by breaking 135k records into ~5k batches and storing them in a local SQLite file. n8n’s SQLite node queries it super fast and you won’t get memory crashes. Just upload the .db file somewhere accessible and download it once per workflow run - way more reliable than parsing massive JSON files every time.