Updated on 2024-06-03 GMT+08:00

Skew Hints

Description

Specifies redistribution keys containing skew data and skew values, and are used to optimize redistribution involving Join or HashAgg.

Syntax

  • Specify single-table skew.
    1
    skew( [@queryblock] table (column) [(value)])
    
  • Specify intermediate result skew.
    1
    skew( [@queryblock] (join_rel) (column) [(value)])
    

Parameters

  • For details about @queryblock, see Hint Specifying the Query Block Where the Hint Is Located. @queryblock can be omitted, indicating that the hint takes effect in the current query block.
  • table specifies the table where skew occurs.
  • join_rel specifies two or more joined tables. For example, (t1 t2) indicates that the result of joining t1 and t2 tables contains skew data.
  • column specifies one or more columns where skew occurs.
  • value specifies one or more skew values.
  • Skew hints are used only if redistribution is required and the specified skew information matches the redistribution information.
  • Skew hints are controlled by the GUC parameter skew_option. If the parameter is disabled, skew hints cannot be used for solving skew.
  • Currently, skew hints support only the table relationships of the ordinary table and subquery types. Hints can be specified for base tables, subqueries, and WITH ... AS clauses. Unlike other hints, a subquery can be used in skew hints regardless of whether it is pulled up.
  • Use an alias (if any) to specify a table where data skew occurs.
  • You can use a name or an alias to specify a skew column as long as it is not ambiguous. The columns in skew hints cannot be expressions. If data skew occurs in the redistribution that uses an expression as a redistribution key, set the redistribution key as a new column and specify the column in skew hints.
  • The number of skew values must be an integer multiple of the number of columns. Skew values must be grouped based on the column sequence, with each group containing a maximum of 10 values. You can specify duplicate values to group skew columns having different number of skew values. For example, the c1 and c2 columns of the t1 table contain skew data. The skew value of the c1 column is a1, and the skew values of the c2 column are b1 and b2. In this case, the skew hint is skew(t1 (c1 c2) ((a1 b1)(a1 b2))). (a1 b1) is a value group, where NULL is allowed as a skew value. Each hint can contain a maximum of 10 groups and the number of groups should be an integer multiple of the number of columns.
  • In the redistribution optimization of Join, a skew value must be specified for skew hints. The skew value can be left empty for HashAgg.
  • If multiple tables, columns, or values are specified, separate items of the same type with spaces.
  • For skew values, forcible type conversion is not supported in hints. For the string type, single quotation marks are required.

Example:

  • Specify single-table skew.

    Each skew hint describes the skew information of one table relationship. To describe the skews of multiple table relationships in a query, specify multiple skew hints.

    Skew hints have the following formats:

    • One skew value in one column: skew(t (c1) (v1))

      Description: The v1 value in the c1 column of the t table relationship causes skew in query execution.

    • Multiple skew values in one column: skew(t (c1) (v1 v2 v3 ...))

      Description: Values including v1, v2, and v3 in the c1 column of the t table relationship cause skew in query execution.

    • Multiple columns, each having one skew value: skew(t (c1 c2) (v1 v2))

      Description: The v1 value in the c1 column and the v2 value in the c2 column of the t table relationship cause skew in query execution.

    • Multiple columns, each having multiple skew values: skew(t (c1 c2) ((v1 v2) (v3 v4) (v5 v6) ...))

      Description: Values including v1, v3, and v5 in the c1 column and values including v2, v4, and v6 in the c2 column of the t table relationship cause skew in query execution.

      In the last format, parentheses for skew value groups can be omitted, for example, skew(t (c1 c2) (v1 v2 v3 v4 v5 v6 ...)). In a skew hint, either use parentheses for all skew value groups or for none of them.

      Otherwise, a syntax error will be generated. For example, skew(t (c1 c2) (v1 v2 v3 v4 (v5 v6) ...)) will generate an error.

  • Specify intermediate result skew.

    If data skew does not occur in base tables but in an intermediate result during query execution, specify skew hints of the intermediate result to solve the skew. skew((t1 t2) (c1) (v1))

    Description: Data skew occurs after the table relationships t1 and t2 are joined. The c1 column of the t1 table contains skew data and its skew value is v1.

    c1 can exist only in a table relationship of join_rel. If there is another column having the same name, use aliases to avoid ambiguity.

Suggestion

  • For a multi-level query, write the hint on the layer where data skew occurs.
  • For a listed subquery, you can specify the subquery name in a hint. If you know data skew occurs on which base table, directly specify the table.
  • Aliases are preferred when you specify a table or column in a hint.