Using Selenium to extract data from Notion table cells when rows are not fully loaded

I’m trying to build a data extraction tool for Notion tables but running into issues with row visibility. My script successfully processes the first 26-28 visible rows in a table but can’t access the remaining entries even though the table has 47 total rows.

My current workflow is:

  1. Load the Notion site
  2. Navigate through each table row
  3. Hover to reveal the access button
  4. Click to open the page content
  5. Extract the text data
  6. Move to the next row

The problem is that Notion only loads a limited number of rows initially. I’ve tried scrolling, but my row detection function still maxes out at 28 rows.

Here’s my data extraction function:

def process_table_row(browser: webdriver.Chrome, row_index: int) -> str:
    """
    Processes a single table row and extracts its content.
    """
    print(f"Working on row {row_index}...")
    
    row_xpath = f"//*[@id='notion-app']/div/div[1]/div/div[1]/main/div/div/div[3]/div[2]/div/div/div/div[3]/div[2]/div[{row_index}]/div/div[1]/div/div[2]/div/div"
    
    try:
        row_element = WebDriverWait(browser, 10).until(
            EC.presence_of_element_located((By.XPATH, row_xpath))
        )
        print(f"Found row {row_index}")
    except Exception as error:
        print(f"Could not find row {row_index}: {error}")
        return ""
    
    if row_index > 16:
        for attempt in range(8):
            try:
                scroll_table_view(browser, row_element, 50)
                print(f"Scrolled to row {row_index}")
                break
            except Exception as error:
                print(f"Scroll attempt {attempt + 1} failed: {error}")
    
    move_to_element(browser, row_element)
    
    try:
        view_button = WebDriverWait(browser, 10).until(
            EC.element_to_be_clickable((By.XPATH, "//div[@aria-label='Open in side peek']"))
        )
        view_button.click()
        print(f"Opened content for row {row_index}")
    except Exception as error:
        print(f"Failed to open row {row_index}: {error}")
        return ""
    
    time.sleep(3)
    
    try:
        content_area = WebDriverWait(browser, 10).until(
            EC.presence_of_element_located((By.CLASS_NAME, "notion-page-content"))
        )
        extracted_text = content_area.text
        return extracted_text
    except Exception as error:
        print(f"Content extraction failed for row {row_index}: {error}")
        return ""

And my row counting function:

def count_table_rows(browser: webdriver.Chrome, table_path: str) -> int:
    """
    Counts the total number of rows in the table.
    """
    print("Counting table rows...")
    row_elements = browser.find_elements(By.XPATH, table_path)
    row_count = len(row_elements)
    print(f"Detected {row_count} rows")
    return row_count

The issue seems to be that rows beyond the viewport aren’t being detected initially. I need to handle this for a 400-row table, so manual processing isn’t feasible. Any suggestions for making Notion load all rows or detecting them dynamically?

Notion’s lazy loading is a pain - I skip counting rows and just use infinite scroll. Keep scrolling down, check for new elements, then process each batch. That xpath you’re using is way too specific though. It’ll break the moment Notion updates their UI. Try selenium’s scroll_into_view() on the last row you found - it’ll automatically trigger more content to load.

I had the same problem scraping big Notion databases. Here’s what’s happening - Notion only renders the rows you can actually see to keep things fast. So when you count rows, you’re only getting what’s currently loaded, not the real total.

Ditch the upfront counting. Just keep scrolling until there’s no next row to find. I always add a small delay after each scroll - gives Notion time to load new rows before you try to grab them.

Try scrolling to the very bottom first, then work backwards. This forces more content to load. Also, 50 pixels seems way too small - bump up your scroll distance to actually trigger new rows.

Use a while loop that keeps going until no new rows show up instead of guessing the count ahead of time. Works way better when table sizes vary.

Notion’s virtual scrolling is your problem here. Don’t bother counting rows - just start at row 1 and keep going up until your element finder breaks. When you hit the visibility wall, scroll aggressively to load more content.

I had good luck with ActionChains sending multiple PAGE_DOWN keys straight to the table container. Works way better than pixel scrolling for forcing Notion to render new rows. After each scroll batch, wait for the DOM to settle before hunting for new rows. Your 3-second delay probably isn’t enough for big tables.

Ditch the hardcoded xpath positions. Hunt for table row containers using data attributes or CSS classes that don’t change when the UI updates. This approach handled my 800+ row extractions just fine.