Help Center/ MapReduce Service/ Component Operation Guide (LTS)/ Using Spark/Spark2x/ Spark SQL Enterprise-Class Enhancements/ Using Spark SQL Statements Without Aggregate Functions for Correlated Subqueries
Updated on 2024-12-13 GMT+08:00

Using Spark SQL Statements Without Aggregate Functions for Correlated Subqueries

This section is available for MRS 3.3.1-LTS or later version only.

Scenario

If you are using open-source Spark SQL, the aggregate function must be used for correlated subqueries. Otherwise, "Error in query: Correlated scalar subqueries must be aggregated" will be reported. MRS allows you to perform correlated subqueries without using aggregate functions.

Constraints

  • SQL statements similar to select id, (select group_name from emp2 b where a.group_id=b.group_id) as banji from emp1 a is supported.
  • SQL statements similar to select id, (select distinct group_name from emp2 b where a.group_id=b.group_id) as banji from emp1 a is supported.

Parameter Configuration

  • Spark SQL scenario: Modify the following parameters in the Client installation directory/Spark/spark/conf/spark-defaults.conf file on the Spark client.

    Parameter

    Mandatory

    Default Value

    spark.sql.legacy.correlated.scalar.query.enabled

    If this parameter is set to true, Spark supports correlated subqueries without aggregate functions.

    false

  • In the spark-beeline scenario, configure JDBCServer custom parameters.
    1. Log in to FusionInsight Manager, choose Cluster > Services > Spark, choose Configurations > All Configurations, and choose JDBCServer (Role) > Custom. Add the spark.sql.legacy.correlated.scalar.query.enabled parameter in the Custom area and set it to true.

    2. Click Save. Click Instances, select all JDBCServer instances, choose More > Restart Instance, and operate as prompted.

If a correlated subquery uses multiple match predicates, an exception occurs.