Fastest method to remove millions of records from small MySQL database

Hello everyone, I’m pretty new to databases so please bear with me. I have a MySQL database running on Digital Ocean with 2GB RAM and 1 vCPU. The database contains about 15 million test records across 4 tables that I need to delete regularly.

Currently it takes about 13 hours to clean up these records which is way too slow. The test data makes up about 500% of my actual production data. Each test record has a special marker column with a UUID value that identifies it as test data.

Right now I’m deleting records by finding each one using this UUID marker but it’s incredibly slow. I was thinking about using table partitioning instead. My idea is to create partitions based on whether a record is test data or not, then just drop the entire partition containing test records.

Since I can’t add partitions to existing tables, I would need to dump the data, recreate tables with partitions, and reload everything. The steps would be removing foreign keys, adding a computed column to identify test records, modifying primary keys, creating partitions, then dropping the test partition.

Would this partitioning approach work better for bulk deletion, or are there other faster methods I should consider? Any advice would be helpful since I need to minimize downtime.

just disable foreign key checks and DELETE everything without WHERE clauses, then restore from your prod backup. yeah, it sounds scary but your production data’s small anyway - restoring will be way faster than waiting 13 hours for deletions. SET FOREIGN_KEY_CHECKS=0, truncate all tables, restore prod data, flip checks back on. you’ll be done in 30 minutes instead of waiting half a day.

Skip the complex partitioning - have you tried batch deletion with proper indexing? Your main problem is probably that UUID marker column not being indexed efficiently. Add a dedicated index on that UUID column first. This alone might dramatically speed things up. For deletion, go smaller batches instead of all at once. Try DELETE FROM table_name WHERE uuid_marker = ‘your_test_uuid’ LIMIT 10000 in a loop. Prevents locking issues and won’t crush your limited server resources. Run multiple batches and track progress. Here’s another idea: create new tables with just production data, then drop the old tables. You’ve got 15 million test records vs much smaller production data. Might be faster to SELECT INTO new tables with only production records, swap table names, and drop the old ones. Skips DELETE overhead entirely and could beat your current 13-hour process.

Both methods work, but you’ll be babysitting the whole thing - scripting batches, watching progress, fixing errors, dealing with downtime. Not fun.

Just automate it. Set up a scheduled job that handles everything without you lifting a finger. It monitors your database, knows when cleanup’s needed, picks the best deletion strategy based on data volume, and switches approaches depending on server load.

For your situation, it’d start with batch deletions during off-peak hours but automatically switch to table recreation if deletions drag on too long. Handles foreign keys, creates temp tables, migrates data, swaps tables - all seamlessly.

Best part? Set it to run on whatever schedule fits your testing cycle. No more 13-hour manual marathons or guessing batch sizes.

You can build this database maintenance automation easily with a visual workflow builder. Connects straight to MySQL and handles logic, errors, and scheduling.

Your partitioning idea works, but there’s an easier way that’ll save you tons of trouble. Skip rebuilding everything - just use TRUNCATE on temp tables instead. Create the same table structure (minus the data), then INSERT only your production records into these fresh tables. Since production data is way smaller than test data, this backwards approach beats deleting 15 million rows by a mile. After moving production data, DROP the original tables and rename the new ones. You completely avoid those massive DELETE operations and work around your server’s resource limits. Downtime’s minimal since you’re only handling the small dataset. You can even script the INSERT operations in chunks if you want. Way cleaner than fighting with partition recreation and foreign key headaches.