I’m working with a MySQL database that has a table containing a text column (varchar type). I need to write a query that will show me all the rows where the same value appears more than once in this specific column.
For example, if I have a column called ‘email’ and some email addresses appear multiple times in different rows, I want to find those duplicate email entries. I’ve tried a few approaches but I’m not sure what the most efficient method would be.
What would be the optimal SQL query to accomplish this task? I’m looking for a solution that can handle a reasonably large dataset without performance issues.
Hit this same issue last week! For big tables, run SELECT email FROM your_table GROUP BY email HAVING COUNT(*) > 1 to grab the duplicates first, then join back to your original table. Much faster than EXISTS on large datasets and won’t lock up your database.
u can use this sql: SELECT email, COUNT(*) FROM your_table GROUP BY email HAVING COUNT(*) > 1. it finds duplicates nicely & is pretty efficient even for large sets. hope this helps!
Here’s another way to tackle this - use a window function if you want to see the actual duplicate rows with their data. Run SELECT *, COUNT(*) OVER (PARTITION BY email) as duplicate_count FROM your_table then filter where duplicate_count > 1. This shows you the actual records instead of just counts. I find this super helpful when deciding which duplicates to keep or toss, since you can see all the data for each duplicate. Performance is fine for most datasets, though GROUP BY will be faster if you just need to identify duplicates.
I’ve hit this same issue on production databases. Adding an index on whatever column you’re checking will speed things up massively. The GROUP BY approach works, but if you want to see the actual duplicate rows, use EXISTS: SELECT * FROM your_table t1 WHERE EXISTS (SELECT 1 FROM your_table t2 WHERE t2.email = t1.email AND t2.id != t1.id). This shows you the duplicate records instead of just counts. Watch out for nulls though - multiple null entries won’t show as duplicates, so handle those separately if needed.
This topic was automatically closed 4 days after the last reply. New replies are no longer allowed.