Hey everyone! I’m working with Puppeteer and Node.js to scrape some websites. Everything works fine when all the elements exist on the page, but I run into issues when certain elements are missing.
I keep getting errors like Cannot read property 'src' of null
when trying to access properties of elements that don’t exist. In my case, some pages have images while others don’t, and my script crashes when it tries to get the src attribute from a missing image.
I tried using conditional checks but I’m still getting errors. Here’s what I’m working with:
for (let x = 1; x <= 3; x++) {
const data = await page.evaluate(() => {
let headline = document.querySelector('h2').textContent;
let content = document.querySelector('.main-text').textContent;
let image = document.querySelector('.photo img') ? document.querySelector('.photo img').src : 'Default Image';
let website = "News Site";
let topic = "General";
if (!image)
continue;
return {
headline,
content,
image,
website,
topic
}
});
}
How can I properly handle these null values without breaking my loop? Any suggestions would be really helpful!
Your issue arises from using continue
inside page.evaluate
; it’s not effective there since it’s part of the browser context and not your loop. You should place your continue
in the outer loop, as shown below:
for (let x = 1; x <= 3; x++) {
const data = await page.evaluate(() => {
const headlineEl = document.querySelector('h2');
const contentEl = document.querySelector('.main-text');
const imageEl = document.querySelector('.photo img');
return {
headline: headlineEl?.textContent || null,
content: contentEl?.textContent || null,
image: imageEl?.src || null,
website: "News Site",
topic: "General"
};
});
if (!data.image) {
continue; // This is correct here
}
// Handle your data
}
Utilizing the optional chaining operator ?.
allows for a cleaner approach, returning undefined when elements are nonexistent rather than throwing errors.
I ran into the same thing scraping product pages - some had videos, others didn’t. Your code’s setting a default value but then checking if the result is falsy, which ‘Default Image’ will never be.
Wrap your evaluation logic in try-catch and handle missing elements more defensively:
const data = await page.evaluate(() => {
try {
const headlineEl = document.querySelector('h2');
const contentEl = document.querySelector('.main-text');
const imageEl = document.querySelector('.photo img');
if (!headlineEl || !contentEl) {
return null; // Skip if critical elements are missing
}
return {
headline: headlineEl.textContent,
content: contentEl.textContent,
image: imageEl ? imageEl.src : null,
website: "News Site",
topic: "General"
};
} catch (error) {
return null;
}
});
if (data && data.image) {
// Process your data here
}
This way you check for null data outside page.evaluate and decide whether to continue or skip based on what elements actually matter for your use case.
Try page.$eval
with fallbacks instead of page.evaluate
. Way easier to target specific elements and handle missing ones:
let imageUrl = await page.$eval('.photo img', el => el.src).catch(() => null);
let headline = await page.$eval('h2', el => el.textContent).catch(() => '');
If the selector fails, it just returns your fallback value instead of crashing. Much cleaner than checking everything manually.