Using Puppeteer with page.$$ for Twitter Data Extraction

I am trying to gather information from Twitter using Puppeteer and have successfully identified the necessary page elements. When I execute the following command in my browser’s console, I retrieve the expected results:

$x(`//div[@data-testid='tweetText']`)

Additionally, I can extract specific text using this command:

$x(`//div[@data-testid='tweetText']`)[0].childNodes[1].childNodes[0].wholeText

However, within my reusable script, I’m attempting to use page.$$ to grab all matching elements with this selector:

const [tweetElements] = await page.$$(`xpath/.//div[@data-testid='tweetText']`);

Unfortunately, this method fails to return any results despite verifying that the selector works in the Chrome console. Can anyone provide guidance on how to successfully implement this? It would be helpful to retrieve the specific text as well. Thank you!

To extract elements using XPath in Puppeteer, it's essential to ensure the correct functions and syntax are being utilized, as the method page.$$ is intended for CSS selectors, not XPath. Instead, you should use page.$x for XPath queries.

Here's how you can modify your script to correctly fetch all tweet elements and extract the text:

const tweetElements = await page.$x(`//div[@data-testid='tweetText']`);
for (let element of tweetElements) {
  const textHandle = await element.getProperty('textContent');
  const text = await textHandle.jsonValue();
  console.log(text);
}

This script uses the following steps:

  1. Utilizes page.$x to select elements using XPath, returning an array of elements matching the specified XPath expression.
  2. Iterates over each element in the result set and extracts the textContent property.
  3. Logs the text content of each tweet element to the console.

Remember to execute this code in an async function or within the context of an async block since Puppeteer operations return promises. This should effectively extract the desired tweet texts from Twitter.

Hope this helps you in correctly implementing the data extraction within your Puppeteer script!