Removing duplicates from a table with no primary keys

Consider the following table:

mysql> SELECT * FROM Philosopher;
+---------------+-------------+-----------+
| philosopherID | firstName | lastName |
+---------------+-------------+-----------+
| 1234 | John | Locke |
| 1234 | John | Locke |
| 2345 | Rene | Descartes |
| 2347 | John Stuart | Mill |
| 1562 | Emmanuel | Kant |
| 1562 | Emmanuel | Kant |
| 1671 | Baruch | Spinoza |
| 1562 | Emmanuel | Kant |
| 1761 | Jean-Paul | Sartre |
+---------------+-------------+-----------+
9 rows in set (0.00 sec)

Come up with a strategy to first identify and then remove the duplicate rows.

What SQL queries would you use ?

Solution

1. Identify duplicate rows:

SELECT * FROM Philosopher
 GROUP BY philosopherID, firstName, lastName HAVING COUNT(*) > 1;

2. Remove duplicate rows

i. Create temporary table and copy data over:

CREATE TEMPORARY TABLE PhilosopherDups
(
 rowId INTEGER NOT NULL AUTO_INCREMENT PRIMARY KEY,
 philosopherID INT,
 firstName VARCHAR(50),
 lastName varchar(50)
)

SELECT philosopherID,firstName, lastName FROM Philosopher;

ii. Truncate original table and do SELECT DISTINCT:

TRUNCATE TABLE Philosopher;
INSERT INTO Philosopher(philosopherID, firstName, lastName)
SELECT DISTINCT philosopherID, firstName, lastName 
FROM PhilosopherDups;

iii. Another option to using DISTINCT would be to use the rowId in the temptable to delete duplicate rows, truncate the original table and copy data back:

DELETE FROM PhilosopherDups
WHERE rowID IN 
SELECT MAX(rowId) FROM PhilosopherDups  
GROUP BY philosopherID, firstName, lastName 
HAVING COUNT(*) > 1;

TRUNCATE TABLE Philosopher;

INSERT INTO Philosopher(philosopherID, firstName, lastName)
SELECT philosopherID, firstName, lastName 
FROM PhilosopherDups;

Leave a Reply

Your email address will not be published. Required fields are marked *