How to identify page breaks in Google Docs using Apps Script

What I’m trying to do:

I need to split a multi-page Google document into separate documents, one for each page. My plan was to find page breaks and use them as splitting points.

My approach:

I’m trying to parse through the document body to locate PAGE_BREAK elements. When I find one, I want to create a range, copy that section, and make a new Google Doc from it.

The problem:

I made a test document with just one line on page 1 and one line on page 2. When I run my script to parse the document, I can’t find any PAGE_BREAK elements. I thought when text flows to the next page, there would be a PAGE_BREAK element, but it’s not showing up.

Here’s my code:

var currentDoc = DocumentApp.getActiveDocument();
var docBody = currentDoc.getBody();

function clearDocument() {
  docBody.clear(); 
  // Only works when I manually add page breaks with script
  // docBody.appendParagraph("First page content");
  // docBody.appendPageBreak();
  // docBody.appendParagraph("Second page content");
}

function startParsing() {
  const structure = parseElements(docBody);
  Logger.log(structure);
}

function parseElements(elem) {
  const nodeData = {
    element: elem,
  };
  
  if (elem.getNumChildren) {
    var totalChildren = elem.getNumChildren();
    var childNodes = [];

    for (var j = 0; j < totalChildren; j++) {
      var currentChild = elem.getChild(j);
      var breakFound = searchForBreak(elem);
      if(breakFound) {
        Logger.log("Located page break at position " + j);
      }
      var childData = parseElements(currentChild);
      Logger.log(currentChild.getType());
      childNodes.push(childData);
    }

    nodeData["children"] = childNodes;
  }

  return nodeData;
}

function searchForBreak(elem) {
  var targetType = DocumentApp.ElementType.PAGE_BREAK;
  var foundBreak = docBody.findElement(targetType);
  if(foundBreak) {
    Logger.log("Page break detected");
    return true;
  } else {
    Logger.log("No page break found");
    return false;
  }
}

The issue is that natural page breaks (where text flows to the next page) don’t seem to create PAGE_BREAK elements. Only manual page breaks added through the script are detectable.

Has anyone figured out how to handle this? I need to split documents that have natural page breaks, not just manual ones.

Google Docs API doesn’t expose automatic page breaks - only manual ones. That’s why your script isn’t finding anything. The PAGE_BREAK element only catches breaks you manually insert. I ran into this exact problem building a document splitter. Google Docs treats automatic pagination as visual rendering, not part of the document structure. The API only shows you the logical structure, not how it actually looks on the page. Your best bet is converting to PDF first, then using a PDF library to find the real page boundaries. You could try estimating breaks by counting characters or paragraphs, but that’s pretty unreliable with different content types and formatting. Another option is exporting as HTML and parsing for page break indicators, but even that’s hit-or-miss since HTML export doesn’t always keep the exact pagination.

yeah, this is a known limitation. Apps script can’t see automatic page breaks in the document structure. i’ve worked around it by using google docs’ print preview API to grab page dimensions and calculate rough break points based on content height, but it’s messy and not totally accurate.

You’re hitting one of the most annoying limitations of Google Docs API. I spent weeks banging my head against this same wall for a client project. The API doesn’t expose automatic page breaks because they’re calculated on the fly based on layout, fonts, and margins when the document renders. Here’s what actually worked for me: export the doc to Google Slides first, then use the Slides API to grab individual slides - each slide basically represents a page boundary. It’s not perfect since the formatting changes, but it kept the content structure intact enough for splitting. The conversion keeps paragraph breaks and most text formatting, so recreating separate docs becomes way easier. This beats trying to count characters (which is super unreliable) and gives you real page-based divisions you can actually work with programmatically.