Challenges with Python-Docx multilevel numbered lists when converting to Google Docs

I am currently using python-docx to create documents that feature multilevel numbered lists. While single-level lists work without issues, I face significant challenges when I try to incorporate nested lists.

The main problem arises with the nested items, where their numbering appears broken. For some entries, the sub-items have no numbers, while for others, they show incorrect values like simply “1” instead of the expected hierarchical format.

This situation worsens after I upload the document to Google Docs. Even if the Word file appears correct, the sub-items lose their numbering after the upload.

To troubleshoot, I created a similar document directly in Microsoft Word, and the upload to Google Docs went smoothly, indicating that the issue lies within how I’m generating the lists using python-docx.

Here’s a code snippet illustrating my approach and the complications I encounter:

from docx import Document
from docx.shared import Inches
from docx.oxml import OxmlElement
from docx.oxml.ns import qn

INDENT_SIZE = 0.4
MAX_DEPTH = 6.0
current_item = None

def create_numbered_item(document, content=None, previous=None, depth=None):
    def find_next_abstract_id(numbering_part):
        abstract_elements = numbering_part.findall(qn('w:abstractNum'))
        used_ids = [int(elem.get(qn('w:abstractNumId'))) for elem in abstract_elements]
        return max(used_ids) + 1 if used_ids else 0

    def find_next_num_id(numbering_part):
        num_elements = numbering_part.findall(qn('w:num'))
        used_ids = [int(elem.get(qn('w:numId'))) for elem in num_elements]
        return max(used_ids) + 1 if used_ids else 0

    def build_abstract_numbering(numbering_part, current_depth):
        new_abstract_id = find_next_abstract_id(numbering_part)
        abstract_element = OxmlElement('w:abstractNum')
        abstract_element.set(qn('w:abstractNumId'), str(new_abstract_id))

        level_element = OxmlElement('w:lvl')
        level_element.set(qn('w:ilvl'), str(current_depth))

        start_element = OxmlElement('w:start')
        start_element.set(qn('w:val'), '1')
        level_element.append(start_element)

        format_element = OxmlElement('w:numFmt')
        format_element.set(qn('w:val'), 'decimal')
        level_element.append(format_element)

        text_element = OxmlElement('w:lvlText')
        text_element.set(qn('w:val'), '%1.')
        level_element.append(text_element)

        justify_element = OxmlElement('w:lvlJc')
        justify_element.set(qn('w:val'), 'left')
        level_element.append(justify_element)

        abstract_element.append(level_element)
        numbering_part.append(abstract_element)
        return new_abstract_id

    def build_numbering_instance(numbering_part, abstract_id):
        new_num_id = find_next_num_id(numbering_part)
        num_element = OxmlElement('w:num')
        num_element.set(qn('w:numId'), str(new_num_id))

        abstract_ref = OxmlElement('w:abstractNumId')
        abstract_ref.set(qn('w:val'), str(abstract_id))
        num_element.append(abstract_ref)

        numbering_part.append(num_element)
        return new_num_id

    numbering_def = document.part.numbering_part.numbering_definitions._numbering

    if previous is None or previous._p.pPr is None:
        current_depth = 0 if depth is None else depth
        abstract_id = build_abstract_numbering(numbering_def, current_depth)
        numbering_id = build_numbering_instance(numbering_def, abstract_id)
    else:
        current_depth = previous._p.pPr.numPr.ilvl.val if depth is None else depth
        numbering_id = previous._p.pPr.numPr.numId.val

    content._p.get_or_add_pPr().get_or_add_numPr().get_or_add_numId().val = numbering_id
    content._p.get_or_add_pPr().get_or_add_numPr().get_or_add_ilvl().val = current_depth

def insert_list_entry(document, text_content, level_depth, entry_style):
    global current_item
    new_paragraph = document.add_paragraph(text_content, style=entry_style)
    new_paragraph.paragraph_format.left_indent = Inches(min(level_depth * INDENT_SIZE, MAX_DEPTH))
    new_paragraph.paragraph_format.line_spacing = 1

    if entry_style == 'List Number':
        create_numbered_item(document=document, content=new_paragraph, previous=current_item, depth=level_depth)
        current_item = new_paragraph

doc = Document()
doc.add_heading('Main Section')
insert_list_entry(doc=doc, text_content='Primary Entry', level_depth=0, entry_style='List Number')
insert_list_entry(doc=doc, text_content='Secondary Entry', level_depth=0, entry_style='List Number')

current_item = None
doc.add_heading('Another Section')
insert_list_entry(doc=doc, text_content='Main Point', level_depth=0, entry_style='List Number')
insert_list_entry(doc=doc, text_content='Sub Point A', level_depth=1, entry_style='List Number')
insert_list_entry(doc=doc, text_content='Sub Point B', level_depth=1, entry_style='List Number')
insert_list_entry(doc=doc, text_content='Another Main Point', level_depth=0, entry_style='List Number')

doc.save('nested_lists_output.docx')

It seems I might be overlooking an important aspect of the numbering structure in Word. After uploading to Google Docs, the nested items entirely lose their numbering. Does anyone have insights or solutions for this issue?

Your build_abstract_numbering function is the problem - you’re only creating one level per abstractNum, but multilevel lists need all hierarchy levels defined upfront in a single abstractNum. I hit the exact same issue and had to restructure the function to generate complete level definitions from 0 to max depth. Google Docs is way stricter about this than Word. Create an abstractNum with all six levels defined from the start, each with proper lvlText patterns like ‘%1.’, ‘%1.%2.’, ‘%1.%2.%3.’ etc. This fixed my conversion issues completely, though I had to refactor the numbering logic to reference existing abstractNum IDs instead of generating new ones for each depth change.

Your numbering instance reuse is breaking everything. Every time you call create_numbered_item, you’re either making brand new numbering definitions or reusing old ones randomly. Google Docs hates messy numbering hierarchies, and that’s exactly what you’re creating.

I had the same conversion failures until I figured out the numbering was completely broken. Stop creating new abstractNum elements on the fly. Define your entire multilevel structure once when you initialize the document. Keep one numbering instance for the whole document and just change the ilvl property when you need different depths.

That previous parameter logic? It’s creating fragmented numbering chains. Word deals with it, but Google Docs won’t convert it. Set up one solid numbering definition at the start with all the levels you need, then stick with it. Don’t generate new instances halfway through.

Your problem is with the current_item variable handling across sections. When you reset current_item = None before the second section, you’re creating a brand new numbering definition instead of continuing the multilevel structure.

I’ve hit this same issue before. You need consistent numbering IDs throughout the entire document. Don’t reset current_item - keep the numbering context but restart the counter for new sections. Google Docs conversion makes this worse because it expects coherent numbering relationships.

Here’s what works: use one numbering definition for the whole document and control restart behavior through numPr properties. Don’t create new abstractNum elements. This keeps the multilevel structure intact so Google Docs can handle it properly during upload.

The Problem:

You’re encountering broken multilevel numbered lists when generating docx files using python-docx, particularly after uploading to Google Docs. The nested list items are either unnumbered or display incorrect numbering, indicating a problem with how the list structure is defined within the docx file itself. The issue stems from inconsistencies in how the numbering is defined and managed within your code, leading to a structure that Google Docs struggles to interpret correctly.

:thinking: Understanding the “Why” (The Root Cause):

The root cause is the way your code generates the numbering structure within the docx file. You’re creating fragmented numbering definitions, with inconsistent usage of abstractNum and numId elements. Word might tolerate this, but Google Docs has stricter requirements for the integrity and consistency of the numbering hierarchy. It requires a single, well-defined multilevel numbering structure where all levels are explicitly declared within a single abstractNum element. Your current approach creates separate abstractNum elements on-the-fly for each list level, leading to a fragmented and invalid numbering scheme that Google Docs fails to translate correctly.

:gear: Step-by-Step Guide:

  1. Refactor Numbering Logic: The primary change involves restructuring your numbering creation logic. Instead of generating new abstractNum elements for each nesting level, create a single abstractNum element containing all the necessary levels (up to your MAX_DEPTH) upfront. This establishes a complete, well-defined numbering scheme that Google Docs can correctly interpret.

  2. Define Complete Numbering Hierarchy: Modify the build_abstract_numbering function to create a single abstractNum with all levels defined at once, correctly configuring the lvlText property for each level. These need to be cascading patterns like ‘%1.’, ‘%1.%2.’, ‘%1.%2.%3.’, and so on, reflecting the hierarchical structure of your list.

  3. Reuse a Single Numbering Instance: Avoid creating new numbering instances (numId) within the document. Assign one unique numId at the beginning of the document, using that numId for every list item by only adjusting the ilvl (list level) property. This eliminates the fragmented numbering issues that Google Docs struggles to handle.

  4. Consistent Numbering Across Sections: Ensure that you do not reset current_item when creating new sections. The current_item variable helps track the previous list item, maintaining a consistent numbering chain across different sections of your document. Resetting current_item creates a break in this chain, causing numbering issues.

  5. Remove previous Parameter Redundancy: The logic based on the previous parameter within create_numbered_item is redundant when you’re using a single and consistent numId. This parameter is causing further inconsistencies. Remove the code related to determining the numbering information based on the previous parameter.

Revised Code (Illustrative Example):

The below is a conceptual example, showing how to build a complete abstractNum at the start. You will need to adapt it based on your existing code structure. Crucially, the create_numbered_item function becomes far simpler because all levels and the numId are already defined.

from docx import Document
from docx.shared import Inches
from docx.oxml import OxmlElement
from docx.oxml.ns import qn

# ... (other code) ...

def build_abstract_numbering(numbering_part, max_depth):
    new_abstract_id = find_next_abstract_id(numbering_part)  # Find or create a new abstractNum ID
    abstract_element = OxmlElement('w:abstractNum')
    abstract_element.set(qn('w:abstractNumId'), str(new_abstract_id))

    for i in range(max_depth + 1):
        level_element = OxmlElement('w:lvl')
        level_element.set(qn('w:ilvl'), str(i))

        # ... (start, numFmt, justify remain unchanged) ...

        text_element = OxmlElement('w:lvlText')
        text_element.set(qn('w:val'), '.'.join(['%'+str(j+1) for j in range(i+1)]) + '.') #Correct lvlText pattern
        level_element.append(text_element)

        abstract_element.append(level_element)
    numbering_part.append(abstract_element)
    return new_abstract_id, build_numbering_instance(numbering_part, new_abstract_id)


def create_numbered_item(document, content, depth, numbering_id):
    content._p.get_or_add_pPr().get_or_add_numPr().get_or_add_numId().val = numbering_id
    content._p.get_or_add_pPr().get_or_add_numPr().get_or_add_ilvl().val = depth


# Initialize numbering once
numbering_def = doc.part.numbering_part.numbering_definitions._numbering
abstract_id, numbering_id = build_abstract_numbering(numbering_def, MAX_DEPTH)


# ... (rest of your code, using create_numbered_item with depth and numbering_id) ...

:mag: Common Pitfalls & What to Check Next:

  • Incorrect lvlText Patterns: Double-check that your lvlText patterns in build_abstract_numbering correctly represent the nested numbering scheme (e.g., ‘%1.’, ‘%1.%2.’, ‘%1.%2.%3.’, etc.). Incorrect patterns will lead to broken numbering.

  • XML Validation: Consider validating your generated .docx file against the Open Packaging Conventions (OPC) schema to identify potential structural issues. Tools like zip and oxml libraries for Python provide features that can help inspect the docx file structure and verify the integrity of the numbering elements.

  • Testing with Different Depths: Thoroughly test your code by creating nested lists with various levels of nesting to identify any remaining numbering inconsistencies.

:speech_balloon: Still running into issues? Share your (sanitized) config files, the exact command you ran, and any other relevant details. The community is here to help!

your lvlText patterns are messed up - you’re hardcoding ‘%1.’ for every level, which completely breaks the hierarchy. google docs needs proper cascading patterns: ‘%1.’ for level 0, ‘%1.%2.’ for level 1, and so on. plus your restart logic is backwards. nested items shouldn’t restart numbering unless you’re starting a new parent section.

Been there with multilevel list nightmares. Your XML approach is overkill and breaks easily.

Skip wrestling with numbering XML - automate the whole document creation instead. I’ve watched teams waste weeks debugging python-docx numbering when they could’ve fixed it in hours.

Build automation that creates your template directly in Google Docs via their API. Set up multilevel numbering styles once, then just populate content. No conversion mess, no broken numbering after upload.

For complex workflows like this, I use Latenode. Connect Google Docs API with your data sources, apply consistent formatting, generate documents with perfect numbering every time. Way cleaner than fighting Word XML.

Simple flow: data goes in, properly formatted Google Doc comes out. No intermediate Word files breaking your numbering.

I’ve hit this before. You’re making separate abstractNum elements for each depth level, but Google Docs wants all levels in one abstractNum element. Build a single abstractNum with multiple lvl elements covering all your depths instead of creating new ones each time.

This topic was automatically closed 24 hours after the last reply. New replies are no longer allowed.