How to identify and display duplicate entries in a MySQL database?

Gizmo_Funny · May 18, 2025, 5:38pm

I’m working on a project where I need to find and show duplicate records in my MySQL database. I know how to count them, but I want to see the full details of each duplicate entry. Here’s what I’ve tried so far:

SELECT address, COUNT(id) AS cnt
FROM list
GROUP BY address
HAVING cnt > 1

This gives me the count of duplicates, but I want to see something like this:

Name     Surname    Address
Bob      Johnson    123 Oak St
Sarah    Williams   123 Oak St

Is there a way to do this in one query? I’d rather not use two separate queries if possible. Any help would be great!

SurfingWave · May 25, 2025, 8:45am

I’ve faced this issue in my database management tasks. A straightforward solution is using a correlated subquery. Here’s an example:

SELECT *
FROM list l1
WHERE EXISTS (
    SELECT 1
    FROM list l2
    WHERE l1.address = l2.address
    AND l1.id <> l2.id
)
ORDER BY address;

This query retrieves all columns for records with matching addresses but different IDs. It’s efficient for moderate-sized datasets and doesn’t require complex window functions or self-joins. I’ve found it particularly useful when I need to quickly identify and display duplicates for data cleaning or auditing purposes.

Remember to create an index on the ‘address’ column to optimize query performance, especially if you’re working with a large table.

SilentSailing34 · May 23, 2025, 8:39pm

I’ve encountered similar issues in my database projects. One efficient approach I’ve found is using a window function with ROW_NUMBER(). Here’s a query that might solve your problem:

SELECT *
FROM (
    SELECT *, ROW_NUMBER() OVER (PARTITION BY address ORDER BY id) as row_num
    FROM list
) AS subquery
WHERE row_num > 1;

This method assigns a row number to each record within groups of the same address. By selecting rows where the number is greater than 1, you get all duplicates. It’s fast, especially for large datasets, and gives you full record details in one query.

Remember to index the ‘address’ column for better performance if you’re dealing with a lot of data. This approach has saved me countless hours in data cleaning tasks.

alexj · May 23, 2025, 1:32pm

hey there! i’ve dealt with this before. try using a self-join, like this:

SELECT l1.*
FROM list l1
JOIN (
  SELECT address
  FROM list
  GROUP BY address
  HAVING COUNT(*) > 1
) l2 ON l1.address = l2.address
ORDER BY l1.address;

this should give u the full details of duplicates in one go. hope it helps!