Adding custom metadata fields to LangChain documents in TypeScript

I’m working with LangChain documents in TypeScript and need to add extra fields to their metadata. When I use text splitters like RecursiveCharacterTextSplitter, I get documents but want to include additional metadata properties.

const textSplitter = new RecursiveCharacterTextSplitter({
  chunkSize: 1000,
  chunkOverlap: 200,
});
const docs = await textSplitter.createDocuments([content]);

Each document has this format:

{
  "pageContent": "some text content here",
  "metadata": {
    "filename": "document.txt",
    "fileType": "text/plain",
    "fileSize": 8456,
    "createdAt": 1688123456789
  }
}

What’s the proper way to add custom fields to the metadata object? I need to include things like source URL, category, and processing timestamp.

You can modify metadata after creating documents too. Just use docs.forEach(doc => { doc.metadata.sourceUrl = 'your-url'; doc.metadata.category = 'your-category'; doc.metadata.processedAt = Date.now(); }); when you need conditional logic or different values per document. I use this when metadata depends on each chunk’s content or when processing multiple sources with different properties. LangChain’s Document interface is flexible - you can add any properties to the metadata object.

You can also extend the Document class directly if you’ve got consistent metadata patterns. I hit the same problem when processing docs from different sources with varying metadata needs.

class EnrichedDocument extends Document {
  constructor(content: string, metadata: Record<string, any>, customFields: Record<string, any>) {
    super({
      pageContent: content,
      metadata: {
        ...metadata,
        ...customFields,
        documentId: `doc_${Date.now()}_${Math.random().toString(36).substr(2, 9)}`,
        version: '1.0'
      }
    });
  }
}

You get type safety and guaranteed required fields. Really useful when you need computed metadata or want to enforce schemas across different document processors. Bit more setup work, but it stops metadata inconsistencies from breaking your vector store queries down the road.

just pass metadata as the second param to createDocuments. like this: const docs = await textSplitter.createDocuments([content], [{sourceUrl: 'example.com', category: 'tech', processedAt: Date.now()}]); the splitter merges your custom fields with existing metadata automatically.

Both approaches work, but they get messy with multiple document types or complex metadata rules. I’ve been there - manual metadata handling becomes a nightmare to maintain.

What works better is automating the whole pipeline. Skip manually adding metadata in your code and set up workflows that handle different document types and apply metadata automatically.

You can create flows that detect document sources, extract properties, and apply metadata rules based on content patterns or file origins. No more remembering to add metadata fields every time you process documents.

Automation scales way better with multiple document sources, different processing rules, or changing metadata requirements. Just update the workflow instead of hunting down code everywhere.

I use Latenode for this - handles LangChain integration and lets you build flexible pipelines for metadata enrichment without the manual overhead.

I’ve found the cleanest way is using a custom metadata transformer function. Don’t handle metadata inline everywhere - wrap your document creation with proper metadata enrichment.

function enrichDocuments(docs: Document[], baseMetadata: Record<string, any>) {
  return docs.map(doc => ({
    ...doc,
    metadata: {
      ...doc.metadata,
      ...baseMetadata,
      chunkId: crypto.randomUUID(),
      enrichedAt: Date.now()
    }
  }));
}

const rawDocs = await textSplitter.createDocuments([content]);
const enrichedDocs = enrichDocuments(rawDocs, {
  sourceUrl: 'https://example.com',
  category: 'tech',
  priority: 'high'
});

This saved me hours of debugging with vector stores. You get consistent metadata structure and can easily add computed fields or validation.

For complex RAG setups with metadata filtering, this video covers the implementation really well:

The transformer approach makes testing much easier too - you can mock different metadata scenarios without messing with the text splitting logic.