Bu sayfa henüz yerel dilinizde mevcut değildir. Daha fazla dil seçeneği eklemek için yoğun bir şekilde çalışıyoruz. Desteğiniz için teşekkür ederiz.

Compute
Elastic Cloud Server
Huawei Cloud Flexus
Bare Metal Server
Auto Scaling
Image Management Service
Dedicated Host
FunctionGraph
Cloud Phone Host
Huawei Cloud EulerOS
Networking
Virtual Private Cloud
Elastic IP
Elastic Load Balance
NAT Gateway
Direct Connect
Virtual Private Network
VPC Endpoint
Cloud Connect
Enterprise Router
Enterprise Switch
Global Accelerator
Management & Governance
Cloud Eye
Identity and Access Management
Cloud Trace Service
Resource Formation Service
Tag Management Service
Log Tank Service
Config
OneAccess
Resource Access Manager
Simple Message Notification
Application Performance Management
Application Operations Management
Organizations
Optimization Advisor
IAM Identity Center
Cloud Operations Center
Resource Governance Center
Migration
Server Migration Service
Object Storage Migration Service
Cloud Data Migration
Migration Center
Cloud Ecosystem
KooGallery
Partner Center
User Support
My Account
Billing Center
Cost Center
Resource Center
Enterprise Management
Service Tickets
HUAWEI CLOUD (International) FAQs
ICP Filing
Support Plans
My Credentials
Customer Operation Capabilities
Partner Support Plans
Professional Services
Analytics
MapReduce Service
Data Lake Insight
CloudTable Service
Cloud Search Service
Data Lake Visualization
Data Ingestion Service
GaussDB(DWS)
DataArts Studio
Data Lake Factory
DataArts Lake Formation
IoT
IoT Device Access
Others
Product Pricing Details
System Permissions
Console Quick Start
Common FAQs
Instructions for Associating with a HUAWEI CLOUD Partner
Message Center
Security & Compliance
Security Technologies and Applications
Web Application Firewall
Host Security Service
Cloud Firewall
SecMaster
Anti-DDoS Service
Data Encryption Workshop
Database Security Service
Cloud Bastion Host
Data Security Center
Cloud Certificate Manager
Edge Security
Situation Awareness
Managed Threat Detection
Blockchain
Blockchain Service
Web3 Node Engine Service
Media Services
Media Processing Center
Video On Demand
Live
SparkRTC
MetaStudio
Storage
Object Storage Service
Elastic Volume Service
Cloud Backup and Recovery
Storage Disaster Recovery Service
Scalable File Service Turbo
Scalable File Service
Volume Backup Service
Cloud Server Backup Service
Data Express Service
Dedicated Distributed Storage Service
Containers
Cloud Container Engine
Software Repository for Container
Application Service Mesh
Ubiquitous Cloud Native Service
Cloud Container Instance
Databases
Relational Database Service
Document Database Service
Data Admin Service
Data Replication Service
GeminiDB
GaussDB
Distributed Database Middleware
Database and Application Migration UGO
TaurusDB
Middleware
Distributed Cache Service
API Gateway
Distributed Message Service for Kafka
Distributed Message Service for RabbitMQ
Distributed Message Service for RocketMQ
Cloud Service Engine
Multi-Site High Availability Service
EventGrid
Dedicated Cloud
Dedicated Computing Cluster
Business Applications
Workspace
ROMA Connect
Message & SMS
Domain Name Service
Edge Data Center Management
Meeting
AI
Face Recognition Service
Graph Engine Service
Content Moderation
Image Recognition
Optical Character Recognition
ModelArts
ImageSearch
Conversational Bot Service
Speech Interaction Service
Huawei HiLens
Video Intelligent Analysis Service
Developer Tools
SDK Developer Guide
API Request Signing Guide
Terraform
Koo Command Line Interface
Content Delivery & Edge Computing
Content Delivery Network
Intelligent EdgeFabric
CloudPond
Intelligent EdgeCloud
Solutions
SAP Cloud
High Performance Computing
Developer Services
ServiceStage
CodeArts
CodeArts PerfTest
CodeArts Req
CodeArts Pipeline
CodeArts Build
CodeArts Deploy
CodeArts Artifact
CodeArts TestPlan
CodeArts Check
CodeArts Repo
Cloud Application Engine
MacroVerse aPaaS
KooMessage
KooPhone
KooDrive

Skew Hints

Updated on 2024-04-28 GMT+08:00

Function

Theses hints specify redistribution keys containing skew data and skew values, and are used to optimize redistribution involving Join or HashAgg.

Syntax

  • Specify single-table skew.
    1
    skew(table (column) [(value)])
    
  • Specify intermediate result skew.
    1
    skew((join_rel) (column) [(value)])
    

Parameter Description

  • table specifies the table where skew occurs.
  • join_rel specifies two or more joined tables. For example, (t1 t2) indicates that the result of joining t1 and t2 tables contains skew data.
  • column specifies one or more columns where skew occurs.
  • value specifies one or more skew values.
NOTE:
  • Skew hints are used only if redistribution is required and the specified skew information matches the redistribution information.
  • Skew hints are controlled by the GUC parameter skew_option. If the parameter is disabled, skew hints cannot be used for solving skew.
  • Currently, skew hints support only the table relationships of the ordinary table and subquery types. Hints can be specified for base tables, subqueries, and WITH ... AS clauses. Unlike other hints, a subquery can be used in skew hints regardless of whether it is pulled up.
  • Use an alias (if any) to specify a table where data skew occurs.
  • You can use a name or an alias to specify a skew column as long as it is not ambiguous. The columns in skew hints cannot be expressions. If data skew occurs in the redistribution that uses an expression as a redistribution key, set the redistribution key as a new column and specify the column in skew hints.
  • The number of skew values must be an integer multiple of the number of columns. Skew values must be grouped based on the column sequence, with each group containing a maximum of 10 values. You can specify duplicate values to group skew columns having different number of skew values. For example, the c1 and c2 columns of the t1 table contains skew data. The skew value of the c1 column is a1, and the skew values of the c2 column are b1 and b2. In this case, the skew hint is skew(t1 (c1 c2) ((a1 b1)(a1 b2))). (a1 b1) is a value group, where NULL is allowed as a skew value. Each hint can contain a maximum of 10 groups and the number of groups should be an integer multiple of the number of columns.
  • In the redistribution optimization of Join, a skew value must be specified for skew hints. The skew value can be left empty for HashAgg.
  • If multiple tables, columns, or values are specified, separate items of the same type with spaces.
  • The type of skew values cannot be forcibly converted in hints. To specify a string, enclose it with single quotation marks (' ').

Example:

  • Specify single-table skew.

    Each skew hint describes the skew information of one table relationship. To describe the skews of multiple table relationships in a query, specify multiple skew hints.

    Skew hints have the following formats:

    • One skew value in one column: skew(t (c1) (v1))

      Description: The v1 value in the c1 column of the t table relationship causes skew in query execution.

    • Multiple skew values in one column: skew(t (c1) (v1 v2 v3 ...))

      Description: Values including v1, v2, and v3 in the c1 column of the t table relationship cause skew in query execution.

    • Multiple columns, each having one skew value: skew(t (c1 c2) (v1 v2))

      Description: The v1 value in the c1 column and the v2 value in the c2 column of the t table relationship cause skew in query execution.

    • Multiple columns, each having multiple skew values: skew(t (c1 c2) ((v1 v2) (v3 v4) (v5 v6) ...))

      Description: Values including v1, v3, and v5 in the c1 column and values including v2, v4, and v6 in the c2 column of the t table relationship cause skew in query execution.

      NOTICE:

      In the last format, parentheses for skew value groups can be omitted, for example, skew(t (c1 c2) (v1 v2 v3 v4 v5 v6 ...)). In a skew hint, either use parentheses for all skew value groups or for none of them.

      Otherwise, a syntax error will be generated. For example, skew(t (c1 c2) (v1 v2 v3 v4 (v5 v6) ...)) will generate an error.

  • Specify intermediate result skew.

    If data skew does not occur in base tables but in an intermediate result during query execution, specify skew hints of the intermediate result to solve the skew. The format is skew((t1 t2) (c1) (v1)).

    Description: Data skew occurs after the table relationships t1 and t2 are joined. The c1 column of the t1 table contains skew data and its skew value is v1.

    c1 can exist only in a table relationship of join_rel. If there is another column having the same name, use aliases to avoid ambiguity.

Suggestion

  • For a multi-level query, write the hint on the layer where data skew occurs.
  • For a listed subquery, you can specify the subquery name in a hint. If you know data skew occurs on which base table, directly specify the table.
  • Aliases are preferred when you specify a table or column in a hint.

Examples

Specify single-table skew.

  • Specify hints in the original query.

    For example, the original query is as follows:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    explain
    with customer_total_return as
    (select sr_customer_sk as ctr_customer_sk
    ,sr_store_sk as ctr_store_sk
    ,sum(SR_FEE) as ctr_total_return
    from store_returns
    ,date_dim
    where sr_returned_date_sk = d_date_sk
    and d_year =2000
    group by sr_customer_sk
    ,sr_store_sk)
     select  c_customer_id
    from customer_total_return ctr1
    ,store
    ,customer
    where ctr1.ctr_total_return > (select avg(ctr_total_return)*1.2
    from customer_total_return ctr2
    where ctr1.ctr_store_sk = ctr2.ctr_store_sk)
    and s_store_sk = ctr1.ctr_store_sk
    and s_state = 'NM'
    and ctr1.ctr_customer_sk = c_customer_sk
    order by c_customer_id
    limit 100;
    

    Specify the hints of HashAgg in the inner with clause and of the outer Hash Join. The query containing hints is as follows:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    explain
    with customer_total_return as
    (select /*+ skew(store_returns(sr_store_sk sr_customer_sk)) */sr_customer_sk as ctr_customer_sk
    ,sr_store_sk as ctr_store_sk
    ,sum(SR_FEE) as ctr_total_return
    from store_returns
    ,date_dim
    where sr_returned_date_sk = d_date_sk
    and d_year =2000
    group by sr_customer_sk
    ,sr_store_sk)
     select  /*+ skew(ctr1(ctr_customer_sk)(11))*/  c_customer_id
    from customer_total_return ctr1
    ,store
    ,customer
    where ctr1.ctr_total_return > (select avg(ctr_total_return)*1.2
    from customer_total_return ctr2
    where ctr1.ctr_store_sk = ctr2.ctr_store_sk)
    and s_store_sk = ctr1.ctr_store_sk
    and s_state = 'NM'
    and ctr1.ctr_customer_sk = c_customer_sk
    order by c_customer_id
    limit 100;
    

    The hints indicate that the group by in the inner with clause contains skew data during redistribution by HashAgg, corresponding to the original Hash Agg operators 10 and 21; and that the ctr_customer_sk column in the outer ctr1 table contains skew data during redistribution by Hash Join, corresponding to operator 6 in the original plan. The optimized plan is as follows:

    To solve data skew in the redistribution, Hash Agg is changed to double-level Agg operators and the redistribution operators used by Hash Join are changed in the optimized plan.

  • Modify the query and then specify hints.

    For example, the original query and its plan are as follows:

    1
    explain select count(*) from store_sales_1 group by round(ss_list_price);
    

    Columns in hints do not support expressions. To specify hints, rewrite the query as several subqueries. The rewritten query and its plan are as follows:

    1
    2
    3
    4
    5
    explain 
    select count(*) 
    from (select round(ss_list_price),ss_hdemo_sk
    from store_sales_1)tmp(a,ss_hdemo_sk)                      
    group by a;
    

    Ensure that the service logic is not changed during the rewriting.

    Specify hints in the rewritten query as follows:

    1
    2
    3
    4
    5
    explain 
    select /*+ skew(tmp(a)) */ count(*) 
    from (select round(ss_list_price),ss_hdemo_sk
    from store_sales_1)tmp(a,ss_hdemo_sk)                      
    group by a;
    

    The plan shows that after Hash Agg is changed to double-layer Agg operators, redistributed data is greatly reduced and redistribution time shortened.

    You can specify hints in columns in a subquery, for example:

    1
    2
    3
    4
    5
    explain 
    select /*+ skew(tmp(b)) */ count(*) 
    from (select round(ss_list_price) b,ss_hdemo_sk
    from store_sales_1)tmp(a,ss_hdemo_sk)                      
    group by a;
    

Sitemizi ve deneyiminizi iyileştirmek için çerezleri kullanırız. Sitemizde tarama yapmaya devam ederek çerez politikamızı kabul etmiş olursunuz. Daha fazla bilgi edinin

Feedback

Feedback

Feedback

0/500

Selected Content

Submit selected content with the feedback