Creating a SQL Inspection Rule
What Is SQL Inspection?
There are numerous SQL engines in the big data field, which bring diversity to the solutions but also expose some issues such as varying quality of SQL input statements, difficult SQL problem localization, and excessive resource consumption by large SQL statements.
Poor quality SQL can have unforeseeable impacts on data analysis platforms, affecting system performance or platform stability.
DLI offers this feature to allow you to create inspection rules for the Spark SQL engine. This helps prevent common issues like large or low-quality SQL statements by providing pre-event information, blocking, and in-event circuit breaker. You do not need to change your SQL submission method or syntax, so it is easy to implement without affecting service operations.
- You can configure SQL inspection rules in a visualized manner and also have the ability to query and modify these rules.
- During query execution and response, each SQL engine proactively inspects SQL statements based on the rules.
- Administrators can choose to display hints on, intercept, or block SQL statements. The system logs SQL inspection events in real time for SQL audit. O&M engineers can analyze the logs, assess the quality of SQL statements on the live network, identify potential risks, and take preventive measures.
This section describes how to create a SQL inspection rule to enhance SQL defense capabilities.
Notes and Constraints of DLI SQL Inspection Rules
- SQL inspection is only supported by Spark 3.3.x or later.
- Only one inspection rule can be created for an action or a queue.
- Each rule can be associated with a maximum of 50 SQL queues.
- A maximum of 1,000 rules can be created for each project.
Creating a SQL Inspection Rule
You can create SQL inspection rules for specified SQL queues on the SQL Inspector page. The system will prompt, block, or perform circuit breaking on SQL requests that trigger the rules.
When creating or modifying a SQL inspection rule, evaluate the appropriateness of enabling the rules and setting the threshold based on the service scenario. This will help avoid the negative impact of unreasonable inspection rules blocking or performing circuit breaking on relevant SQL requests on the services.
- Log in to the DLI management console.
- In the navigation pane on the left, choose Global Configuration > SQL Inspector.
- On the displayed SQL Inspector page, click Create Rule in the upper right corner. In the Create Rule dialog box, set parameters based on the table below.
Table 1 Parameters for creating a SQL inspection rule Parameter
Description
Rule Name
Name of a SQL inspection rule
System Rules
Select an inspection rule. For details about the system inspection rules supported by DLI, see SQL Inspection System Rules That DLI Supports.
Queues
Select the queues the rules are bound to.
Description
Enter a rule description.
Rule Action
Actions that the current SQL inspection rule supports.
SQL rules support the following types of actions:
- Info: Record logs and provide a hint for handling the SQL request. If the rule has parameters, you need to configure the threshold.
- Block: Intercept the SQL request that meets the rule. If the rule has parameters, you need to configure the threshold.
- Circuit Breaker: Perform circuit breaking on the SQL request that meets the inspection rules. If the rule has parameters, you need to configure the threshold.
- Click OK.
View the added inspection rule on the SQL Inspector page. The rule takes effect dynamically.
To modify a rule, click Modify in its Operation column.
SQL Inspection System Rules That DLI Supports
This part describes the system inspection rules supported by DLI. For details, see Table 2.
- Default system rules are SQL inspection rules automatically created by DLI when a queue is created. These rules are bound to the queue and cannot be deleted.
- Default system rules include Scan files number, Scan partitions number, Shuffle data(GB), Count(distinct) occurrences, and Not in<Subquery>.
- Only one inspection rule can be created for an action or a queue.
- For every supported action, a default system rule is created. For example, when a queue is created, a Scan files number rule is automatically created for both the Info and Block actions.
- Different engine versions support different inspection rules.
To view the engine version of a queue, choose Resources > Queue Management in the navigation pane on the left, select the queue, double-click the pane at the bottom of the page, and check the value of Default Version.
Figure 1 Viewing the engine version of a queue
Rule ID |
Rule Name |
Description |
Type |
Applicable Engine |
Action |
Value |
Default System Rule |
Example SQL Statement |
|
---|---|---|---|---|---|---|---|---|---|
dynamic_0001 |
Scan files number |
Maximum number of files to be scanned |
Dynamic |
Spark Trino |
Info Block |
Value range: 1–2000000 Default value: 200000 |
Yes |
N/A |
|
dynamic_0002 |
Scan partitions number |
Maximum number of partitions involved in the operations (select, delete, update, and alter) that can be performed on a table |
Dynamic |
Spark |
Info Block |
Value range: 1–500000 Default value: 5000 |
Yes |
select * from Partitioned table |
|
running_0002 |
Memory used(MB) |
Peak memory usage of the SQL statement |
Running |
Spark |
Circuit Breaker |
Value range: 1–8388608 |
No |
N/A |
|
running_0003 |
Run time(S) |
Maximum running duration of the SQL statement |
Running |
Spark |
Circuit Breaker |
Unit: second Value range: 1–43200 |
No |
N/A |
|
running_0004 |
Scan data(GB) |
Maximum amount of data to be scanned |
Running |
Spark |
Circuit Breaker |
Unit: GB Value range: 1–10240 |
No |
N/A |
|
running_0005 |
Shuffle data(GB) |
Maximum amount of data to be shuffled |
Running |
Spark |
Circuit Breaker |
Unit: GB Value range: 1–10240 |
Yes |
N/A |
Spark 3.3.1 Spark 2.4.5 |
static_0001 |
Count(distinct) occurrences |
Maximum number of occurrences of count(distinct) in the SQL statement |
Static |
Spark |
Info Block |
Value range: 1–100 Default value: 10 |
Yes |
SELECT COUNT(DISTINCT deviceId), COUNT(DISTINCT collDeviceId) FROM table GROUP BY deviceName, collDeviceName, collCurrentVersion; |
|
static_0002 |
Not in<Subquery> |
Check whether not in <subquery> is used in the SQL statement. |
Static |
Spark |
Info Block |
Value range: Yes, No Default value: Yes |
Yes |
SELECT * FROM Orders o WHERE Orders.Order_ID not in (Select Order_ID FROM HeldOrders h where h.order_id = o.order_id); |
|
static_0003 |
Join occurrences |
Maximum number of joins in the SQL statement |
Static |
Spark |
Info Block |
Value range: 1–50 |
No |
SELECT name, text FROM table_1 JOIN table_2 ON table_1.Id = table_2.Id |
|
static_0004 |
Union occurrences |
Maximum number of union all times in the SQL statement |
Static |
Spark |
Info Block |
Value range: 1–100 |
No |
select * from tables t1 union all select * from tables t2 union all select * from tables t3 |
|
static_0005 |
Subquery nesting layers |
Maximum number of subquery nesting layers |
Static |
Spark |
Info Block |
Value range: 1–20 |
No |
select * from ( with temp1 as (select * from tables) select * from temp1); |
|
static_0006 |
Sql size(KB) |
Maximum size of a SQL file |
Static |
Spark |
Info Block |
Unit: KB Value range: 1–1024 |
No |
N/A |
|
static_0007 |
Cartesian product |
Limitation of Cartesian products when multiple tables are being associated |
Static |
Spark |
Info Block |
Value range: 0–1 |
No |
select * from A,B; |
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot