I’m trying to pull specific information from Wikipedia pages directly into my Google Sheets using the IMPORTXML function, but I can’t seem to get the XPath syntax right.
I want to extract the runtime information from a movie’s Wikipedia page. The data I need is in the info box on the right side of the page where it shows details like director, cast, and runtime.
I’ve been experimenting with different XPath expressions but keep getting error messages or no results. The movie page has a typical Wikipedia layout with the main content on the left and the summary table on the right.
Here’s what I tried so far:
=IMPORTXML("wikipedia_movie_url", "//table[@class='infobox']//tr[td[contains(text(),'Runtime')]]/td[2]")
This formula returns an error instead of the expected duration value. I’m not sure if the issue is with my XPath selector or if there’s something else I’m missing about how IMPORTXML works with Wikipedia’s HTML structure.
Can anyone help me figure out the correct XPath syntax to extract this type of data from Wikipedia info boxes?