How to retrieve all duplicate rows in MySQL database

I need help finding a way to get all the duplicate entries from my MySQL table. Right now I can identify which values appear multiple times using this query:

SELECT email, count(user_id) as total FROM customers
GROUP BY email HAVING total > 1

This gives me something like:

[email protected]    3

But what I really want is to see all the actual duplicate rows with their full data, like this:

MIKE    BROWN    [email protected]
SARA    DAVIS    [email protected]  
TOM     WHITE    [email protected]

Is there a way to do this with a single query instead of running the first query and then doing separate lookups for each duplicate value? I want to avoid making multiple database calls if possible.

cte is the way to go! Give this a shot: WITH cte AS (SELECT *, COUNT(*) OVER (PARTITION BY email) as cnt FROM customers) SELECT * FROM cte WHERE cnt > 1; - you’ll get all the dupes with full data. Good luck!

Window functions work great, but I’d automate the whole thing instead of running manual queries constantly.

Built something similar last year - customer data from multiple sources, duplicates everywhere. Instead of remembering SQL syntax and running queries manually, I automated it:

  • Runs duplicate detection on schedule
  • Exports to CSV or Google Sheets
  • Alerts when duplicates exceed thresholds
  • Auto-merges obvious duplicates with predefined rules

You can connect MySQL directly to notifications, data cleaning tools, whatever you need. No more copy-pasting results or running the same searches repeatedly.

The window function approach works for your immediate need. But if this is regular work, automation catches issues before they become problems and saves massive time.

Visual automation tools make building these workflows easy. Check out Latenode for connecting MySQL to other systems without code: https://latenode.com

try select * from customers where email in (select email from customers group by email having count(*) > 1) order by email - wraps your query in an IN clause. Way simpler than window functions or exists.

The Problem: You need to retrieve all rows from your MySQL customers table that have duplicate email addresses. You already have a query to identify the duplicate emails, but you want a single query to retrieve the complete data for all rows with those duplicate emails.

:thinking: Understanding the “Why” (The Root Cause):

The provided solution uses the EXISTS clause for efficient duplicate detection. The EXISTS clause is optimized for performance, especially on larger datasets, because it stops searching as soon as it finds a matching row. Unlike COUNT(*), which counts all matching rows before returning a result, EXISTS only needs to find one match to confirm the existence of duplicates. This makes it significantly faster, particularly in databases with many records. Furthermore, using EXISTS avoids the potential performance overhead associated with subqueries or joins that retrieve all duplicate rows before filtering them. The approach is therefore efficient for finding duplicate rows based on the specified criteria in one query.

:gear: Step-by-Step Guide:

  1. Execute the EXISTS Query: Connect to your MySQL database using a suitable client (e.g., MySQL Workbench, phpMyAdmin, or the command line). Paste and execute the following SQL query:
SELECT * FROM customers c1
WHERE EXISTS (
    SELECT 1 FROM customers c2
    WHERE c2.email = c1.email
    AND c2.user_id != c1.user_id
)
ORDER BY email;

This query selects all columns (*) from the customers table (aliased as c1). The WHERE EXISTS clause checks if another row (c2) exists with the same email address but a different user_id. The ORDER BY email clause sorts the results for better readability.

  1. Review the Results: The query will return all rows from the customers table where the email address appears more than once with different user_id values. Each row will contain the complete data for that entry.

:mag: Common Pitfalls & What to Check Next:

  • Index Optimization: For optimal performance with large tables, ensure that the email column has an index. Adding an index significantly speeds up the EXISTS subquery. You can check existing indexes using SHOW INDEX FROM customers; and add one using CREATE INDEX idx_email ON customers (email);.

  • Multiple Duplicate Columns: If you need to identify duplicates based on multiple columns (e.g., email and another field), you can extend the WHERE clause accordingly. For example, to find duplicates based on email and phone_number:

SELECT * FROM customers c1
WHERE EXISTS (
    SELECT 1 FROM customers c2
    WHERE c2.email = c1.email
    AND c2.phone_number = c1.phone_number
    AND c2.user_id != c1.user_id
)
ORDER BY email;
  • Case Sensitivity: MySQL’s case sensitivity depends on your configuration. If you’re not getting expected results, ensure case-insensitive comparison using functions like LOWER():
SELECT * FROM customers c1
WHERE EXISTS (
    SELECT 1 FROM customers c2
    WHERE LOWER(c2.email) = LOWER(c1.email)
    AND c2.user_id != c1.user_id
)
ORDER BY email;

:speech_balloon: Still running into issues? Share your (sanitized) config files, the exact command you ran, and any other relevant details. The community is here to help!

Here’s a subquery approach that joins back to your original table. Works great on medium datasets and it’s way easier to understand than window functions:

SELECT c.* FROM customers c
INNER JOIN (
    SELECT email FROM customers 
    GROUP BY email 
    HAVING COUNT(*) > 1
) duplicates ON c.email = duplicates.email
ORDER BY c.email;

Basically takes your duplicate-finding query and uses it as a filter to grab all matching rows. MySQL can optimize the subquery pretty well, so performance is solid. I’ve used this tons for data cleanup - handles several hundred thousand records no problem. Just index the email column and you’re good to go.

This topic was automatically closed 24 hours after the last reply. New replies are no longer allowed.