Updated on 2025-05-29 GMT+08:00

Updating Data in a Table

Existing data in a database can be updated. You can update one row, all rows, or specified rows of data, or update data in a single column without affecting the data in the other columns.

The following information is required when the UPDATE statement is used to update rows:

  • Name of a table and name of a column to be updated
  • New column value
  • Rows to be updated

SQL does not provide a unique identifier for data rows. Therefore, the WHERE clause is used to locate the target row. If a primary key is defined in a table, the primary key value can be used to accurately specify the row to be updated. Otherwise, the update must depend on the combination of service logic conditions.

For details about how to create a table and insert data to it, see Creating Tables and Inserting Data to Tables.

The value of c_customer_sk in the table customer_t1 must be changed from 9527 to 9876.

1
gaussdb=# UPDATE customer_t1 SET c_customer_sk = 9876 WHERE c_customer_sk = 9527;

The table name can also be qualified with a schema name. Otherwise, the table is queried from the default schema of the database. SET is followed by the column and the new column value. The new column value can be a constant or an expression.

For example, increase all the values in the c_customer_sk column by 100.

1
gaussdb=# UPDATE customer_t1 SET c_customer_sk = c_customer_sk + 100;

This statement does not contain the WHERE clause, and therefore all rows are updated. If the statement contains the WHERE clause, only the rows matching the clause are updated.

In the SET clause, the equal sign (=) indicates value setting. In the WHERE clause, the equal sign indicates comparison. A WHERE condition does not have to be an equality comparison, but can be another operator.

You can use an UPDATE statement to update multiple columns by specifying multiple values in the SET clause, for example:

1
gaussdb=# UPDATE customer_t1 SET  c_customer_id = 'Admin', c_first_name = 'Local' WHERE c_customer_sk = 4421; 

After data has been updated or deleted in batches, a large number of deletion markers are generated in the data file. During query, data with these deletion markers needs to be scanned as well. In this case, a large amount of data with deletion marks can greatly affect the query performance after batch updates or deletions. If data needs to be updated or deleted in batches frequently, you are advised to periodically run the VACUUM FULL statement to ensure the query performance.

Updating Data in a Table in Batches

When you update a table that contains millions of records in batches, the table will be locked if you directly perform the update operation, which significantly affects the performance of other operations. To avoid this, you can use the following methods to optimize batch update operations:

  • Updating data in batches

    You can split a large-scale update into multiple small batches. Only some data is updated each time to shorten the time for locking a table at a time.

    For example, the original batch update statement is UPDATE products SET price = price * 1.1 WHERE categroy = 'Electronics';

    Optimization solution: Split the batch update statement into multiple UPDATE statements by ID and update some data each time.

    1
    2
    3
    4
    gaussdb=# UPDATE products SET price = price * 1.1 WHERE categroy = 'Electronics' AND id >= 1 and id <= 10000;
    gaussdb=# UPDATE products SET price = price * 1.1 WHERE categroy = 'Electronics' AND id > 10000 and id <= 20000;
    gaussdb=# UPDATE products SET price = price * 1.1 WHERE categroy = 'Electronics' AND id > 20000 and id <= 30000;
    ...
    
  • Using a temporary table

    If a transaction contains complex queries, you can insert the records to be updated into a temporary table and then update the main table from the temporary table to prevent the data rows from being locked for a long time.

    For example, the original batch update statement is UPDATE products SET price = price * 1.1 WHERE (size = 101 OR size = 102) AND categroy = 'Electronics' AND amount > 150;

    Optimization solution: Create a temporary table to store the target records that meet the conditions, update the records, and then update the main table from the temporary table.

    1
    2
    3
    gaussdb=# CREATE TEMPORARY TABLE temp_products AS SELECT id, price * 1.1 as new_price from products WHERE (size = 101 OR size = 102) AND categroy = 'Electronics' AND amount > 150;
    gaussdb=# UPDATE products p SET price = t.new_price FROM temp_products t where p.id = t.id;
    gaussdb=# DROP TABLE temp_products;
    
  • Using parallel updates

    In a multi-thread or multi-process environment, you can update records in parallel, that is, multiple threads update different records at the same time. This requires that the records processed by each thread do not overlap.

    For example, the original batch update statement is UPDATE products SET price = price * 1.1 WHERE categroy = 'Electronics';

    Optimization solution: Split the data into multiple small batches by ID, start multiple sessions to execute the UPDATE operation at the same time, and reduce the total time required for batch operations through parallel updates. Ensure that the tuples updated by each session do not overlap.

    session1:

    1
    gaussdb=# UPDATE products SET price = price * 1.1 WHERE categroy = 'Electronics' AND id >= 1 and id <= 10000;
    

    session2:

    1
    gaussdb=# UPDATE products SET price = price * 1.1 WHERE categroy = 'Electronics' AND id > 10000 and id <= 30000;
    

    session3:

    1
    gaussdb=# UPDATE products SET price = price * 1.1 WHERE categroy = 'Electronics' AND id > 30000;