How to integrate Artoo.js library with Puppeteer for data extraction

I’m having trouble getting Artoo.js to work properly with Puppeteer in my web scraping project. I’ve tried a couple of different approaches but nothing seems to work.

First, I attempted to install it using npm install artoo-js and then require it in my script, but that didn’t give me the results I expected.

Next, I tried to inject the library directly into the page using page.injectFile() method with the path to the Artoo.js distribution file, but that approach also failed.

Has anyone managed to successfully combine these two tools? I’m looking for a working example that shows the proper way to inject Artoo.js into a Puppeteer-controlled browser page so I can use its scraping capabilities.

Any code examples or step-by-step guidance would be really helpful. Thanks!

I encountered similar challenges with Artoo.js and Puppeteer. My solution was to use page.addScriptTag() instead of page.injectFile(). You can obtain Artoo.js from their releases on GitHub and inject it using a URL:

await page.addScriptTag({
  url: 'https://medialab.github.io/artoo/public/dist/artoo.min.js'
});

It’s essential to wait for Artoo to fully load before utilizing its methods. I often introduce a brief delay or verify if artoo is defined in the global scope. After ensuring it’s loaded, you can execute your scraping logic inside page.evaluate(), keeping in mind that Artoo methods must operate within the browser context, not Node.js.

honestly, i think artoo.js is way overkill for most use cases. Puppeteer’s built-in methods are pretty solid. you can directly use page.$() and page.$$() to select elements, then page.evaluate() for any custom logic, way less hassle than injecting libraries.

I’ve worked with both libraries and timing is usually the culprit. After injecting Artoo.js with page.addScriptTag(), you can’t just start using it right away - you need to wait for it to initialize properly. Skip the arbitrary delays and use page.waitForFunction() instead. Try await page.waitForFunction(() => typeof artoo !== 'undefined') to make sure it’s actually loaded. I’ve had better luck downloading the Artoo.js file locally and using page.addScriptTag({ path: './artoo.min.js' }) rather than the CDN version, which can be flaky. Don’t forget - all Artoo operations need to be wrapped in page.evaluate() since they run in the browser, not your Node.js environment.