Help Center/ GaussDB(for MySQL)/ Troubleshooting/ SQL Issues/ Duplicate Data Exists After ORDER BY LIMIT Is Executed
Updated on 2024-09-05 GMT+08:00

Duplicate Data Exists After ORDER BY LIMIT Is Executed

Scenario

Sorting a table and paginating the results did not yield the expected outcome.

Suppose there is a table called merchants with only two columns: id and category. The table structure is as follows:

mysql> show create table merchants;
+-----------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table     | Create Table                                                                                                                                                                                            |
+-----------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| merchants | CREATE TABLE `merchants` (
  `id` int NOT NULL AUTO_INCREMENT,
  `category` int DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=11 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci |
+-----------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

Run the following SQL statement to view data in the table:

mysql>  select * from merchants;
+----+----------+
| id | category |
+----+----------+
|  1 |        1 |
|  2 |        3 |
|  3 |        2 |
|  4 |        2 |
|  5 |        1 |
|  6 |        2 |
|  7 |        3 |
|  8 |        3 |
|  9 |        2 |
| 10 |        1 |
+----+----------+
10 rows in set (0.00 sec)

Run the following SQL statement to sort the data by category:

mysql> select * from merchants order by category;
+----+----------+
| id | category |
+----+----------+
|  1 |        1 |
|  5 |        1 |
| 10 |        1 |
|  3 |        2 |
|  4 |        2 |
|  6 |        2 |
|  9 |        2 |
|  2 |        3 |
|  7 |        3 |
|  8 |        3 |
+----+----------+
10 rows in set (0.00 sec)

Run the following SQL statement to paginate the sorted results with a limit of two rows per page:

mysql> select * from merchants order by category limit 0,2;
+----+----------+
| id | category |
+----+----------+
|  1 |        1 |
|  5 |        1 |
+----+----------+
2 rows in set (0.00 sec)
 
mysql> select * from merchants order by category limit 2,2;
+----+----------+
| id | category |
+----+----------+
|  1 |        1 |
|  9 |        2 |
+----+----------+
2 rows in set (0.00 sec)

The data on the second page is incorrect. According to the original sorting order without pagination, the second page should display rows with IDs 10 and 3. However, the actual result shows rows with IDs 1 and 9.

Possible Causes

When the optimizer encounters an ORDER BY LIMIT statement, the optimizer uses a priority queue structure for sorting. However, this sorting method is considered unstable. Once the LIMIT n results are filtered, they are directly returned without any guarantee of order.

Solution

  • Solution 1: Add an index to the column that needs to be sorted.

    Example: alter table ratings add index idx_category (category);

  • Solution 2: Add the primary key column after ORDER BY in the sorting statement.

    Example: select * from ratings order by category, id limit 2,2;

  • Solution 3: On the Parameters page of the GaussDB(for MySQL) console, set rds_force_stable_sort to ON. This parameter forces the use of stable sorting algorithms to ensure stable sorting results.
    Figure 1 Setting the rds_force_stable_sort parameter