How to retrieve recordId from text splitting skill in Azure AI search?

CreatingStone · August 15, 2025, 5:01am

I’m just getting started with Azure AI search and I need help with extracting the recordId attribute from my skillset. I want to track which position each text chunk has in the original document.

After the text gets split, the output structure looks something like this:

{'values': [{'recordId': '0', 'data': {'content': 'first chunk text'}}, {'recordId': '1', 'data': {'content': 'second chunk text'}}, {'recordId': '2', 'data': {'content': 'third chunk text'}}]}

I need to capture that recordId value as a field in my index. Here’s my current skillset configuration:

{
  "name": "document-processing-skillset",
  "description": "Split documents into chunks and create embeddings",
  "skills": [
    {
      "@odata.type": "#Microsoft.Skills.Text.SplitSkill",
      "name": "#1",
      "description": "Chunk documents using split skill",
      "context": "/document",
      "inputs": [
        {
          "name": "text",
          "source": "/document/content"
        }
      ],
      "outputs": [
        {
          "name": "textItems",
          "targetName": "chunks"
        }
      ],
      "defaultLanguageCode": "en",
      "textSplitMode": "pages",
      "maximumPageLength": 1500,
      "pageOverlapLength": 300,
      "unit": "characters"
    }
  ],
  "indexProjections": {
    "selectors": [
      {
        "targetIndexName": "my-document-index",
        "parentKeyFieldName": "parent_id",
        "sourceContext": "/document/chunks/*",
        "mappings": [
          {
            "name": "text_content",
            "source": "/document/chunks/*"
          },
          {
            "name": "document_title",
            "source": "/document/metadata_title"
          }
        ]
      }
    ]
  }
}

How can I add the recordId to my field mappings?

emmad · August 23, 2025, 3:17pm

You can’t map recordId directly from the split skill output. That JSON structure is just for internal processing - it’s not available through normal field mappings. I ran into this exact problem last year building a document chunking solution. The recordId shows which array index each chunk has, but Azure AI Search doesn’t give you a way to grab it directly. You’d probably need a custom skill to get the sequential position. What I ended up doing was generating unique IDs for each chunk by combining the document key with a hash or timestamp. Just add a custom web skill that processes each chunk and gives it a position-based ID. If you only care about tracking chunk order within a document, you could also handle this in your app logic when querying results. The chunks stay in order anyway.