I’m trying to build a data extraction tool for Notion tables but running into issues with row visibility. My script successfully processes the first 26-28 visible rows in a table but can’t access the remaining entries even though the table has 47 total rows.
My current workflow is:
- Load the Notion site
- Navigate through each table row
- Hover to reveal the access button
- Click to open the page content
- Extract the text data
- Move to the next row
The problem is that Notion only loads a limited number of rows initially. I’ve tried scrolling, but my row detection function still maxes out at 28 rows.
Here’s my data extraction function:
def process_table_row(browser: webdriver.Chrome, row_index: int) -> str:
"""
Processes a single table row and extracts its content.
"""
print(f"Working on row {row_index}...")
row_xpath = f"//*[@id='notion-app']/div/div[1]/div/div[1]/main/div/div/div[3]/div[2]/div/div/div/div[3]/div[2]/div[{row_index}]/div/div[1]/div/div[2]/div/div"
try:
row_element = WebDriverWait(browser, 10).until(
EC.presence_of_element_located((By.XPATH, row_xpath))
)
print(f"Found row {row_index}")
except Exception as error:
print(f"Could not find row {row_index}: {error}")
return ""
if row_index > 16:
for attempt in range(8):
try:
scroll_table_view(browser, row_element, 50)
print(f"Scrolled to row {row_index}")
break
except Exception as error:
print(f"Scroll attempt {attempt + 1} failed: {error}")
move_to_element(browser, row_element)
try:
view_button = WebDriverWait(browser, 10).until(
EC.element_to_be_clickable((By.XPATH, "//div[@aria-label='Open in side peek']"))
)
view_button.click()
print(f"Opened content for row {row_index}")
except Exception as error:
print(f"Failed to open row {row_index}: {error}")
return ""
time.sleep(3)
try:
content_area = WebDriverWait(browser, 10).until(
EC.presence_of_element_located((By.CLASS_NAME, "notion-page-content"))
)
extracted_text = content_area.text
return extracted_text
except Exception as error:
print(f"Content extraction failed for row {row_index}: {error}")
return ""
And my row counting function:
def count_table_rows(browser: webdriver.Chrome, table_path: str) -> int:
"""
Counts the total number of rows in the table.
"""
print("Counting table rows...")
row_elements = browser.find_elements(By.XPATH, table_path)
row_count = len(row_elements)
print(f"Detected {row_count} rows")
return row_count
The issue seems to be that rows beyond the viewport aren’t being detected initially. I need to handle this for a 400-row table, so manual processing isn’t feasible. Any suggestions for making Notion load all rows or detecting them dynamically?