Hey everyone! I’m working on a Zapier automation and I’m stuck. I’ve got this JavaScript step that grabs HTML from a website. It’s working fine, but it’s pulling the whole page. What I really need is just one part of it.
Here’s what I’ve got so far:
getWebContent('https://example.com')
.then(response => response.text())
.then(content => {
let result = {reference: 'ABC123', fullPage: content};
finishUp(null, result);
})
.catch(finishUp);
This gets everything, but I only want a specific div. Like, how can I just grab the stuff inside a div with class ‘ImportantSection’? Is there a way to do this without getting the whole page first? Any help would be awesome!
hey excitedgamer85, have u tried using document.querySelector? it’s built-in js and might work in zapier. something like:
let importantStuff = document.querySelector('.ImportantSection').innerHTML;
let result = {reference: 'ABC123', extractedContent: importantStuff};
finishUp(null, result);
not sure if it’ll work 100% but worth a shot! lmk how it goes
I’ve dealt with a similar issue in Zapier before. Instead of fetching the entire page, you can use a library like Cheerio to parse and extract specific elements. Here’s a modified version of your code that should work:
const cheerio = require('cheerio');
getWebContent('https://example.com')
.then(response => response.text())
.then(content => {
const $ = cheerio.load(content);
const importantSection = $('.ImportantSection').html();
let result = {reference: 'ABC123', extractedContent: importantSection};
finishUp(null, result);
})
.catch(finishUp);
This approach loads the HTML into Cheerio, then uses jQuery-like syntax to select and extract the content of the div with class ‘ImportantSection’. It’s more efficient than parsing the entire page yourself, and gives you powerful selection capabilities. Remember to add Cheerio as a dependency in your Zapier setup.
While Cheerio is a great solution, it might not be available in all Zapier environments. An alternative approach is to use regular expressions. Here’s how you could modify your code:
getWebContent('https://example.com')
.then(response => response.text())
.then(content => {
const regex = /<div class=\"ImportantSection\">(.*?)<\/div>/s;
const match = content.match(regex);
const importantSection = match ? match[1] : 'Not found';
let result = {reference: 'ABC123', extractedContent: importantSection};
finishUp(null, result);
})
.catch(finishUp);
This method uses a regex pattern to find and extract the content within the ‘ImportantSection’ div. It’s not as robust as DOM parsing, but it works well for simple extractions and doesn’t require additional libraries. Just be cautious with more complex HTML structures.