Navicat Blog

January 16, 2018 by Robert Gravelle

The majority of duplicate records fall into one of two categories: Duplicate Meaning and Non-unique Keys. The How to Spot and Delete Values with Duplicate Meaning in MySQL blog dealt with Duplicate Meaning; the follow-up addressed how to identify Non-unique Keys. That’s where two records in the same table have the same key, but may or may not have different values and meanings. Today’s blog will cover how to delete rows with duplicated data, but with different keys.

Identifying Duplicates by Type

The last query presented in the How to Identify Duplicates with Non-unique Keys in MySQL blog listed all the duplicates in a format that was easy to visually scan through:

Repetitions     row_data
----------------------------------------------------------------
2 22 (DAVIS, JENNIFER) | 22 (DAVIS, JENNIFER)
2 23 (LOLLOBRIGIDA, JOHNNY) | 23 (GABLE, CHRISTIAN)
2 41 (WAHLBERG, NICK) | 12 (WAHLBERG, NICK)

Having identified all of the duplicated keys and values, we can decide how best to deal with the redundant data.

JENNIFER DAVIS appears in two records with the same key of 22, making those rows exact duplicates. Nick Walberg’s name fields are duplicated, but the IDs are not. There is also a duplicated key that is associated with two unrelated actors: #23 for “JOHNNY LOLLOBRIGIDA” and “CHRISTIAN GABLE”. With regards to the duplicated keys of 22 and 23, the first is a true duplicate, whereas the second only needs a new key to be generated for one of the records.

Deleting Rows using DELETE JOIN

In the How to Spot and Delete Values with Duplicate Meaning in MySQL blog, we removed duplicates from SELECT result sets by performing a Search & Replace on values. Here we will permanently delete one of the duplicated rows using the DELETE JOIN statement.

Since we are comparing fields from the same table, we have to join the table to itself. We can choose to keep either the lower or higher id number by comparing the ids in the WHERE clause. The following statement keeps the highest id:

DELETE a FROM wp.amalgamated_actors a
  INNER JOIN wp.amalgamated_actors a2
WHERE a.id < a2.id
AND   a.first_name = a2.first_name
AND   a.last_name  = a2.last_name;

1 row(s) affected	0.093 sec

In this case, the affected (deleted) row is NICK WAHLBERG with an id of 12. A quick SELECT confirms the result:

id     first_name     last_name
-------------------------------------
10 PENELOPE GUINESS
12 NICK WAHLBERG
14 ED CHASE
22 JENNIFER DAVIS
23 JOHNNY LOLLOBRIGIDA
27 BETTE NICHOLSON
34 GRACE MOSTEL
39 JOE SWANK
23 CHRISTIAN GABLE
22 JENNIFER DAVIS

If you wanted to keep the lowest id, you would just change the a.id < a2.id expression to a.id > a2.id.

id     first_name     last_name
-------------------------------------
10 PENELOPE GUINESS
14 ED CHASE
22 JENNIFER DAVIS
23 JOHNNY LOLLOBRIGIDA
27 BETTE NICHOLSON
34 GRACE MOSTEL
41 NICK WAHLBERG
39 JOE SWANK
23 CHRISTIAN GABLE
22 JENNIFER DAVIS

Deleting Rows with Non-unique Keys

In the case of JENNIFER DAVIS, who appears twice with the same id of 22, we would need to employ a different approach because running the above statement with a.id = a2.id will target every row in the table! The reason is that we are essentially matching every row against itself! In the next blog, we’ll learn how to delete rows with non-unique keys such as these.

Navicat Blogs
Feed Entries
Blog Archives
Share