I’m working on a project where I need to pull hotel information from a travel booking site into Google Sheets. I want to extract multiple data points for each property including the hotel name, website link, street address, location, postal code, review count (numbers only), rating percentage, and available facilities.
My goal is to organize all this information so that each hotel’s complete details appear in one row of my spreadsheet. I’ve been experimenting with the IMPORTXML function but I’m having trouble getting the address information to display correctly next to the URL field.
Has anyone successfully scraped similar travel site data using Google Sheets? I’m particularly stuck on the address extraction part and would appreciate any guidance on the proper xpath syntax or alternative approaches.
XPath for addresses is a pain - travel sites love nesting them in weird div structures. Try //div[contains(@class,'address')]//text()
with the contains() function. But honestly, these big booking sites hate scrapers. You’ll probably have better luck targeting smaller hotel sites directly instead of the major platforms.
Travel booking sites typically implement measures to block scrapers, which means that IMPORTXML requests may not succeed. Additionally, these sites frequently change their HTML structures, so your xpath queries may break often. The challenge with address extraction arises because many booking sites load content via JavaScript dynamically, whereas IMPORTXML is limited to the initial HTML, missing any content that loads afterward. In my experience, Google Apps Script combined with UrlFetchApp works better for such tasks, but keep in mind the potential for rate limits and the risk of being blocked. It might also be worth reaching out to the site’s developer relations for any official API access they might provide for business use.
I hit the same wall trying to scrape hotel data last year. IMPORTXML doesn’t work well with travel sites because they serve different content to bots versus real browsers. You’ll usually get empty HTML or error pages when they detect scraping.
What actually worked for me was Google Apps Script with custom headers that fake browser requests - takes more setup but way more reliable. Also check if the site has a public API or affiliate program first. Booking.com and others have partner programs that give you clean data feeds.
If you’re stuck with IMPORTXML, add random delays between requests and maybe use proxies. Just know you’re probably breaking their terms of service.