I’m working with a scanned PDF document that contains vehicle inspection forms spread across multiple pages. The document has various tables with checkboxes that can be either checked or unchecked.
I’ve successfully managed to retrieve the checkbox states using Azure Document Intelligence service by accessing the selection marks like this:
vehicle_form.pages[45].selection_marks
This approach works well for determining whether a checkbox is selected or not. But I’m having trouble figuring out how to get the text labels that correspond to each checkbox. I need to know what each checkbox represents, not just its checked status.
Has anyone dealt with this issue before? What’s the best way to associate checkbox values with their descriptive text using Azure’s document processing capabilities?
Had this exact problem with inspection forms last year. Azure Document Intelligence won’t automatically connect checkboxes to their labels - you’ve got to build spatial analysis to match them yourself. Here’s what worked: extract all selection marks and text from each page, then calculate how close checkboxes are to nearby text. Most forms put labels either right of the checkbox or above it within a reasonable distance. I built a function that loops through each checkbox’s coordinates and hunts for text within a set pixel range. Try starting with 50-100 pixels and tweak based on your form layout. Watch out for reading order though - sometimes the closest text isn’t actually the right label if your form’s formatting gets weird. Pro tip: scanned docs often have alignment issues, so build in coordinate matching tolerance or you’ll get unreliable results.
Try combining table extraction with selection marks. Azure Document Intelligence spots table structures in PDFs, and most inspection forms put checkboxes inside table cells anyway. When you extract tables, each cell shows both the checkbox state and its text. I’ve had way better luck with this than trying to match coordinates - the service already gets how the document’s laid out. Pull the tables from your pages first, then loop through cells looking for selection marks. The cell text usually gives you the label right there. This handles alignment issues much better than measuring distances, especially with scanned docs where OCR shifts text around. Test it on a sample page to see if your forms work this way.