Script to optimize MySQL by checking storage and compressing large .ibd files

I’m developing a bash script aimed at managing the disk usage of MySQL databases. The script will first examine the disk space and look for .ibd files that exceed 1GB in size in the MySQL data directory.

#!/bin/bash
df -k /var/lib/mysql
echo "==========="
for large_file in $(find /var/lib/mysql -name "*.ibd" -size +1024M -exec ls -la {} \; | awk '{print $9}'); do
    echo "Current size: " $(ls -la $large_file | awk '{print $5}')
    database_name=$(echo $large_file | cut -d'/' -f5)
    table_name=$(echo $large_file | cut -d'/' -f6 | cut -d'.' -f1)
    echo "DB: $database_name, Table: $table_name"
done

I’m looking to determine which tables and databases are using the most space so I can execute the OPTIMIZE TABLE command on them. Has anyone tried a similar method for maintaining MySQL? What’s the best strategy for automating this kind of storage management?

I’ve been doing this for three years. File path parsing will break on you if you’re using custom datadir setups or symlinks - learned that one the hard way. Now I just query INFORMATION_SCHEMA.TABLES and join with TABLE_STATUS. Gets me size and fragmentation data in one shot, way more reliable. Quick heads up - you need InnoDB file-per-table enabled or you’ll only see the shared ibdata files. I run analysis daily but save optimizations for weekly maintenance windows. Log your before/after sizes. Management loves seeing actual space savings when you need to justify those downtime windows.

Your approach looks solid for finding large tables, but I’d add some safety checks based on my production experience. The biggest problem I hit was running OPTIMIZE TABLE during peak hours - it basically locked tables for ages. I started scheduling these during maintenance windows and checking for active connections first. For huge datasets, try ALTER TABLE tablename ENGINE=InnoDB instead of OPTIMIZE TABLE - it’s way more predictable timing-wise. Also grab the fragmentation percentage with SHOW TABLE STATUS before you start. Not every large table actually needs optimization.

good tips! i do smthing like that too, but make sure to monitor your InnoDB tables first. dat locks can be a huge headache. just be cautious with optimize on large tables, it can turn into a mess.

I’ve been fighting this same issue for years. Manual scripts work but become a maintenance nightmare when your database grows.

I ditched bash scripts for Latenode automation. Built workflows that check disk usage nightly, find large ibd files, cross-reference with fragmentation data from SHOW TABLE STATUS, then queue OPTIMIZE operations during quiet hours.

Best part? Direct integration with monitoring systems. Hit 85% disk usage and cleanup workflows trigger automatically. No more manual checks or missed maintenance windows.

You can add smart logic too - check active connections before operations, alert when tables are too risky to optimize, even integrate with backups to protect everything first.

Way more solid than cron jobs and bash scripts that break every MySQL update or directory change.