Extract 'link' attribute from WordPress lightbox shortcodes using PHP regex

Hey folks, I’m stuck with a regex issue in PHP. I need to extract the ‘link’ attribute from all lightbox shortcodes in WordPress. Here’s an example of what I’m dealing with:

[lightbox link="https://example.com/image1.png" size="medium" position="center" caption="Cool pic" border="yes" type="photo"]
[lightbox link="https://example.com/image2.png" size="medium" position="center" caption="Another pic" border="yes" type="photo"]

There might be many such shortcodes, but I only want to fetch the link values like:

https://example.com/image1.png

I’ve tried this pattern:

$pattern = '/\[(\[?)(lightbox)(?![\w-])([^\]\/]*(?:\/(?!\])[^\]\/]*)*?)(?:(\/)\]|\](?:([^\[]*+(?:\[(?!\/\2\])[^\[]*+)*+)\[\/\2\])?)(\]?)/';

However, it doesn’t seem to work as I expected. Can someone suggest how to modify the regex to correctly capture the ‘link’ attributes? Thanks!

While the previous solution is solid, I’d like to offer an alternative approach using DOM parsing. This method can be more robust, especially if your shortcodes are complex or inconsistent:

$content = 'Your content with shortcodes here';
$dom = new DOMDocument();
$dom->loadHTML($content);
$xpath = new DOMXPath($dom);

$links = [];
foreach ($xpath->query('//lightbox/@link') as $link) {
    $links[] = $link->nodeValue;
}

This method treats the shortcodes as XML-like tags, which can be more flexible and less prone to errors than regex in certain scenarios. It’s particularly useful if you need to extract multiple attributes or if the shortcode structure might change in the future. Just ensure you have the DOM extension enabled in your PHP configuration.

hey mate, i’ve got a quick n dirty solution for ya. try this regex:

/link=“([^”]+)"/

it’ll grab everything between the quotes after ‘link=’. chuck it in preg_match_all() and you’re golden. lemme know if ya need anything else!

I’ve faced a similar challenge before, and I found that using preg_match_all() with a simpler regex pattern works well for this kind of task. Here’s an approach that should do the trick:

$content = 'Your content with shortcodes here';
$pattern = '/\[lightbox.*?link="(.*?)".*?\]/';
preg_match_all($pattern, $content, $matches);

$links = $matches[1];

This pattern looks for ‘lightbox’ shortcodes and captures everything between the quotes after ‘link=’. It’s more flexible and doesn’t rely on the exact order or presence of other attributes.

If you need to handle cases where the link might use single quotes instead of double quotes, you can modify the pattern slightly:

$pattern = '/\[lightbox.*?link=[\'\"](.*?)[\'\"].*?\]/';

This approach has worked consistently for me across various WordPress setups. Hope it helps!