How to extract data from all rows in a Notion database table using Selenium when only some rows are visible?

I’m working on extracting data from a Notion database table using Selenium but I’m running into issues with rows that aren’t initially visible on the page.

My current workflow is:

  1. Load the Notion site
  2. Navigate through each table row
  3. Hover over the row to reveal the “Open” button
  4. Click to open the page in side view
  5. Extract the page content
  6. Move to the next row and repeat

The main issue is that only 26-28 rows are visible when the page loads, but my table has 47 total rows. Even after scrolling down, my script can’t detect more than 28 rows.

Here’s my function for processing individual cells:

def process_table_row(browser: webdriver.Chrome, row_index: int) -> str:
    """
    Processes a single table row and extracts its content.
    """
    
    print(f"Working on row {row_index}...")
    
    row_xpath = f"//*[@id='notion-app']/div/div[1]/div/div[1]/main/div/div/div[3]/div[2]/div/div/div/div[3]/div[2]/div[{row_index}]/div/div[1]/div/div[2]/div/div"
    print(f"Finding row {row_index}...")
    
    try:
        row_element = WebDriverWait(browser, 10).until(
            EC.presence_of_element_located((By.XPATH, row_xpath))
        )
        print(f"Row {row_index} found.")
    except Exception as error:
        print(f"Could not find row {row_index}: {error}")
        return ""
    
    # scroll container for rows beyond 16
    if row_index > 16:
        for attempt in range(8):
            try:
                scroll_container_down(browser, row_element, 50)
                print(f"Scrolled to row {row_index}.")
                break
            except Exception as error:
                print(f"Scrolling attempt {attempt + 1} failed: {error}")
    
    # hover over the row
    move_to_element(browser, row_element)
    
    # find and click the side peek button
    print(f"Looking for side peek button on row {row_index}...")
    
    try:
        peek_button = WebDriverWait(browser, 10).until(
            EC.element_to_be_clickable(
                (By.XPATH, "//div[@aria-label='Open in side peek']")
            )
        )
        print(f"Clicking side peek for row {row_index}...")
        peek_button.click()
    except Exception as error:
        print(f"Side peek button not found for row {row_index}: {error}")
        return ""
    
    time.sleep(3)
    
    # get the page content
    print(f"Getting content from row {row_index}...")
    try:
        page_content = WebDriverWait(browser, 10).until(
            EC.presence_of_element_located(
                (By.CLASS_NAME, "notion-page-content")
            )
        )
        extracted_text = page_content.text
        print(f"Content extracted from row {row_index}.")
        return extracted_text
    except Exception as error:
        print(f"Failed to extract content from row {row_index}: {error}")
        return ""

And here’s how I count the total rows:

def count_table_rows(browser: webdriver.Chrome, table_selector: str) -> int:
    """
    Counts the total number of rows in the Notion table.
    """
    
    print("Counting table rows...")
    row_elements = browser.find_elements(By.XPATH, table_selector)
    row_count = len(row_elements)
    print(f"Found {row_count} rows in table")
    return row_count

I think the problem is that rows aren’t being detected in the first place. This works fine for small tables but I need to handle tables with 400+ rows. Any suggestions on how to make Selenium detect all rows in a Notion table?

The problem is you’re counting rows before scrolling starts, so you only see the initial rendered rows. Notion virtualizes content - those other 19 rows don’t exist in the DOM yet. I ditched the upfront count and used a while loop with dynamic detection instead. Keep scrolling until you can’t find new rows after several attempts. I use try-except that breaks when the next expected row XPath fails consistently. Also, your XPath uses absolute positioning with div[{row_index}]. This breaks with lazy loading since row positions shift. Switch to relative selectors or find rows by their data attributes instead of DOM position. Notion adds unique identifiers to table rows that stick around regardless of rendering order. For big tables like your 400+ rows, add progress tracking by checking scroll position or counting processed rows rather than relying on initial DOM state.

Yeah, Notion’s infinite scroll is a pain. Had the same problem with huge databases. I stopped counting rows entirely and switched approaches. Instead of scrolling to each row by index, I scroll the whole table container bit by bit and grab rows as they show up. Track what you’ve already processed with a hash or ID from the row content. You don’t need the total count upfront and it handles any table size. Also, use CSS selectors instead of those long XPaths - they break way less when Notion updates their layout.

This is a common problem with Notion’s lazy loading - it only renders visible rows for performance, so you’ll never see the full dataset when counting upfront.

I switched to a scroll-and-detect approach instead. Made a loop that scrolls down bit by bit and checks for new rows after each scroll. You need to scroll slow enough for Notion to load content, but fast enough to stay practical.

I also changed my row detection to be more dynamic. Instead of using fixed row indices, I started collecting unique identifiers or data attributes while scrolling. This way I could track processed rows and avoid duplicates.

One more tip - add a check for when you hit the table bottom. Notion usually shows a loading indicator or the scroll position stops changing when there’s nothing left to load. Use that as your signal to stop extracting.