I’m trying to find a solution for saving an entire webpage using Selenium with Python while employing a headless browser. My goal is to ensure that the saved webpage appears exactly the same as it does when accessed in a regular browser, akin to the ‘Save as…’ functionality.
I previously used a code example from Andersson that worked well, but I need to adapt it for a headless setup. Is there a way to achieve this? Here’s an example of code I used:
from selenium.webdriver.chrome.service import Service
from selenium import webdriver
service = Service('path/to/chromedriver')
options = webdriver.ChromeOptions()
options.add_argument('--headless')
browser = webdriver.Chrome(service=service, options=options)
browser.get('http://www.example.com')
# Add code here to save the webpage
Any suggestions would be greatly appreciated!
To save a full webpage using Selenium with Python in a headless browser setup, you can leverage the execute_cdp_cmd
function to access Chrome DevTools Protocol (CDP). CDP provides an extensive collection of browser functionalities, including saving web pages as MHTML files, which preserve the formatting of the webpage.
Here’s how you can adjust your code using this approach:
from selenium.webdriver.chrome.service import Service
from selenium import webdriver
service = Service('path/to/chromedriver')
options = webdriver.ChromeOptions()
options.add_argument('--headless')
options.add_argument('--disable-gpu') # For Windows
options.add_argument('--no-sandbox') # For Linux
browser = webdriver.Chrome(service=service, options=options)
browser.get('http://www.example.com')
# Use CDP command to save the webpage as MHTML
mhtml = browser.execute_cdp_cmd('Page.captureSnapshot', {})
with open('webpage.mhtml', 'w', encoding='utf-8') as f:
f.write(mhtml['data'])
browser.quit()
In this example:
- The
Page.captureSnapshot
command from Chrome's DevTools Protocol is used to capture the full page's content in MHTML format.
- The
execute_cdp_cmd
function enables you to interact directly with DevTools commands within Selenium.
- This MHTML file can be opened in any web browser, preserving the text and structure of the original page, similar to the 'Save as...' feature in browsers.