I’ve been exploring the Google Docs API and noticed that while the sidebar shows a clean tree based on headings, the API returns the document content as a flat list. I am looking for a way to convert this flat format into a hierarchical structure that mirrors the sidebar view. This change would really simplify processing the document structure programmatically.
Does anyone know if there’s a built-in feature for this, or has someone devised a workaround? Below is a conceptual example of what I have in mind:
def organize_doc_items(item_list):
hierarchy = {}
current_section = hierarchy
for element in item_list:
if element['role'] == 'header':
current_section[element['label']] = {}
current_section = current_section[element['label']]
else:
current_section.setdefault('details', []).append(element['content'])
return hierarchy
Any suggestions or alternate approaches would be greatly appreciated. Thanks in advance!
I’ve tackled a similar challenge with the Google Docs API. While there’s no built-in feature for this, your approach is on the right track. One optimization I’d suggest is to maintain a stack of headers to handle nested levels more efficiently. This way, you can easily pop back to previous levels when encountering a higher-level header.
Here’s a modified version I’ve used:
def organize_doc_items(item_list):
hierarchy = {}
stack = [hierarchy]
for element in item_list:
if element['role'] == 'header':
level = element['level']
while len(stack) > level:
stack.pop()
new_section = {}
stack[-1][element['label']] = new_section
stack.append(new_section)
else:
stack[-1].setdefault('content', []).append(element['content'])
return hierarchy
This method has worked well for me in production. It handles nested headers more robustly and maintains the document’s structure accurately.
hey there! i’ve messed with the docs API too. one thing that helped me was using a recursive function to build the hierarchy. it’s pretty neat cuz it handles nested headers automatically. you could try something like this:
def build_hierarchy(items, level=1):
result = {}
while items and items[0][‘level’] == level:
key = items.pop(0)[‘label’]
result[key] = build_hierarchy(items, level+1)
return result if result else items.pop(0)[‘content’]
I’ve dealt with this exact issue before, and it can be a real headache. One approach that worked well for me was using a stack-based solution, similar to what SwiftCoder42 suggested, but with a twist. I found it helpful to add a ‘depth’ parameter to keep track of the current nesting level. This made it easier to handle cases where headers might skip levels (e.g., going from H1 to H3).
Here’s a rough example of what I ended up using:
def build_hierarchy(items):
root = {'children': []}
stack = [root]
for item in items:
while len(stack) > item['depth']:
stack.pop()
node = {'content': item['content'], 'children': []}
stack[-1]['children'].append(node)
stack.append(node)
return root['children']
This approach gave me more flexibility when dealing with complex document structures. It’s not perfect, but it’s served me well in several projects. You might need to tweak it based on your specific use case, but it should give you a solid starting point.