Hey everyone! I’m a Zapier newbie and I’m trying to set up a workflow that sends website updates to my Discord channel. The problem is, I’m getting raw HTML in my RSS feed and I only need the image URLs.
I’m looking at a table structure where the images are inside <td>
tags. I need to grab the src
attribute from the <img>
tags within those table cells.
Does anyone know a good method to parse this HTML and pull out just the image URLs? I’ve been scratching my head over this for a while now.
Any tips or tricks would be super helpful! Thanks in advance for your advice.
hey mike, i’ve dealt with this before. you can use the ‘formatter’ step in zapier to extract urls. Choose ‘extract url’ as the transform. it might grab more than just image urls tho, so you might need to filter after. hope this helps!
As someone who’s worked extensively with Zapier and web scraping, I can share a trick that’s worked wonders for me. Instead of relying solely on Zapier’s built-in tools, I’ve found that using a combination of Zapier’s ‘Webhook’ action and a simple external API can be incredibly effective.
I set up a small serverless function (using AWS Lambda or similar) that takes in HTML as input and returns just the image URLs. This function uses a robust HTML parser library to accurately extract
tags from elements. Then, in Zapier, I use a Webhook step to send the HTML to this function and receive the cleaned list of image URLs.
This approach has been rock-solid for me, handling even complex HTML structures. It’s a bit more setup initially, but it’s incredibly flexible and reliable in the long run. Plus, you can reuse the same function for other similar tasks in your Zapier workflows.
For extracting image URLs from HTML in Zapier, I’d recommend using a combination of tools. First, utilize the ‘Formatter’ step with ‘Extract URL’ as suggested, but then add a ‘Code’ step using JavaScript. In the Code step, you can implement a regex pattern to specifically target img src attributes within td tags. Something like:
const regex = /.?<img.?src="'["'].*?>/g;
let matches = ;
let match;
while ((match = regex.exec(inputData.html)) !== null) {
matches.push(match[1]);
}
output = {urls: matches};
This approach should give you more precise control over the URLs you’re extracting. Remember to test thoroughly with your specific HTML structure.