I’m just getting started with Azure AI search and I need help with extracting the recordId attribute from my skillset. I want to track which position each text chunk has in the original document.
After the text gets split, the output structure looks something like this:
{'values': [{'recordId': '0', 'data': {'content': 'first chunk text'}}, {'recordId': '1', 'data': {'content': 'second chunk text'}}, {'recordId': '2', 'data': {'content': 'third chunk text'}}]}
I need to capture that recordId value as a field in my index. Here’s my current skillset configuration:
{
"name": "document-processing-skillset",
"description": "Split documents into chunks and create embeddings",
"skills": [
{
"@odata.type": "#Microsoft.Skills.Text.SplitSkill",
"name": "#1",
"description": "Chunk documents using split skill",
"context": "/document",
"inputs": [
{
"name": "text",
"source": "/document/content"
}
],
"outputs": [
{
"name": "textItems",
"targetName": "chunks"
}
],
"defaultLanguageCode": "en",
"textSplitMode": "pages",
"maximumPageLength": 1500,
"pageOverlapLength": 300,
"unit": "characters"
}
],
"indexProjections": {
"selectors": [
{
"targetIndexName": "my-document-index",
"parentKeyFieldName": "parent_id",
"sourceContext": "/document/chunks/*",
"mappings": [
{
"name": "text_content",
"source": "/document/chunks/*"
},
{
"name": "document_title",
"source": "/document/metadata_title"
}
]
}
]
}
}
How can I add the recordId to my field mappings?