Updated on 2026-01-09 GMT+08:00

Basic SELECT Statements

Function

This statement is a basic query statement and is used to return the query results.

Syntax

1
2
3
4
5
6
7
SELECT [ALL | DISTINCT] attr_expr_list FROM table_reference
  [WHERE where_condition]
  [GROUP BY col_name_list]
  [ORDER BY col_name_list][ASC | DESC]
  [CLUSTER BY col_name_list | DISTRIBUTE BY col_name_list]
  [SORT BY col_name_list]]
  [LIMIT number];

Keywords

Table 1 SELECT keywords

Parameter

Description

ALL

The keyword ALL is used to return all matching rows from a database, including duplicates. It can only be followed by an asterisk (*); otherwise, an error will occur.

This is the default behavior in SQL statements and is typically not written explicitly. If neither ALL nor DISTINCT is specified, the query result will include all rows, even if they are duplicates.

DISTINCT

When the DISTINCT keyword is used in a SELECT statement, the system removes duplicate data from the query results to ensure uniqueness.

WHERE

Specifies the filter criteria for a query. Arithmetic operators, relational operators, and logical operators are supported.

where_condition

Filter criteria.

GROUP BY

Specifies the grouping field. Single-field grouping and multi-field grouping are supported.

col_name_list

Field list

ORDER BY

Sort the query results.

ASC/DESC

ASC sorts from the lowest value to the highest value. DESC sorts from the highest value to the lowest value. ASC is the default sort order.

CLUSTER BY

CLUSTER BY is used to bucket the table according to the bucketing fields and then sort within the bucketed table. If the field of DISTRIBUTE BY is the same as the field of SORT BY and the sorting is in descending order, the combination of DISTRIBUTE BY and SORT BY achieves the same function as CLUSTER BY.

DISTRIBUTE BY

Specifies the bucketing fields without sorting the table.

SORT BY

The objects will be sorted in the bucket.

LIMIT

LIMIT is used to limit the query results. Only INT type is supported by the number parameter.

Precautions

  • The table to be queried must already exist, or an error message will be displayed.
  • When submitting SQL statements to display binary data on the DLI management console, the binary data is converted to Base64.
  • When using not in in the WHERE clause, ensure that the subquery result set does not contain any null value. Once null exists, the entire query result will be an empty set. Refer to FAQ: What Can I Do If the Query Result Is Empty When the Not In Subquery Contains Null Values? for the solution.

Example

To filter the record, in which the name is Mike, from the student table and sort the results in ascending order of score, run the following statement:

1
2
3
SELECT * FROM student
  WHERE name = 'Mike'
  ORDER BY score;

FAQ: What Can I Do If the Query Result Is Empty When the Not In Subquery Contains Null Values?

  • Issue overview: In SQL syntax, if there is even one null value in the NOT IN subquery, the entire query result will be an empty set (no data returned).
  • Example: In the following example, the expected result is 3, but the actual return is null.
    -- Table A
    id
    ---
    1
    2
    3
    
    -- Table B
    value
    -----
    1
    2

    Run the following SQL statements:

    SELECT * FROM A
    WHERE id NOT IN (SELECT value FROM B);
  • Possible causes

    id NOT IN (1, 2, NULL)

    Is equivalent to

    id <> 1 AND id <> 2 AND id <> NULL

    If the result of id <> NULL is UNKNOWN (three-valued logic in SQL), the entire expression is FALSE.

  • Solution:

    To avoid NULL resulting in an empty set, consider the following solutions:

    • Solution 1: Use NOT EXISTS instead of NOT IN.
      SELECT * FROM A
      WHERE NOT EXISTS (
        SELECT 1 FROM B WHERE B.value = A.id
      );
    • Solution 2: Filter out NULL values in the subquery.
      SELECT * FROM A
      WHERE id NOT IN (
        SELECT value FROM B WHERE value IS NOT NULL
      );