I’m working on a WordPress site where we have a contact form that lets users submit content which gets saved as draft posts. The moderators review these drafts before they go live on the site.
The problem is when people copy text from other websites or documents and paste it into our WYSIWYG editor field. All this messy code comes along with it like extra div tags, span elements, inline styles, and random line breaks. We want to keep the useful formatting like headings (h1, h2), bold text, and italic styling, but get rid of all the junk code.
Our users aren’t very tech-savvy so we can’t really tell them to paste as plain text and then format everything themselves. That would create way too much work for our admin team.
Is there a WordPress function or plugin that can automatically clean this pasted content and only keep the formatting tags we actually want? Any suggestions would be really helpful.
Been dealing with this for years on client sites. Skip filtering during paste or submission - handle it at display level instead. Use wp_strip_all_tags() with a custom function that rebuilds only approved elements. Big advantage: you keep the original submission if someone accidentally removes something important. Set up a custom field for the cleaned version while storing the raw paste in post content. Your moderators can compare both during review. For cleaning, I built a simple parser that scans text patterns (bold becomes strong tags) instead of trying to preserve existing HTML. Most pasted formatting is garbage anyway. Rebuilding from text cues works way better than salvaging messy markup from Word or other sites. Plus you avoid security issues since you’re never trusting external HTML structure.
I had the same issue on a WordPress site and got it working by customizing TinyMCE. I modified the paste_preprocess function in functions.php to strip unwanted tags but keep essential formatting. The paste_auto_cleanup_on_paste option helps too - it cleans up messy formatting automatically when users paste content. For better control, I used the TinyMCE Advanced plugin. It lets you filter pastes and only allow specific HTML tags like h1-h6, strong, and em. Users can paste whatever they want, but you get clean markup on the backend.
check out wp_kses() - it’s built into wordpress core and perfect for this. you can whitelist only the tags you want (h1-h6, strong, em, etc.) and it strips everything else automatically. just hook it to your form processing and run the content through wp_kses before saving as draft. way simpler than messing with tinymce settings.
We hit this exact problem building a content portal at work. Ended up combining server-side filtering with a custom paste handler - worked great.
I built a function that catches content before it hits the database. Used DOMDocument with a whitelist approach to parse HTML and strip everything we didn’t want. Way more reliable than regex.
The trick was handling it on form submission, not in the editor. Users paste whatever junk they want, but when they submit, the backend only keeps h1-h6, strong, em, p, br, and ul/ol/li tags.
On the frontend, I added a JavaScript paste event that previews how content looks after cleanup. Takes 5 minutes to code and saves hours of moderation.
Users can paste normally, you get clean markup, and nobody’s workflow breaks. Beats forcing them to reformat everything by hand.