Updated on 2024-02-21 GMT+08:00

Preparing Environment

Before interconnecting open source component Spark with LakeFormation, complete the following operations:

  1. Prepare an available open source Spark environment and Hive environment. Install the Git environment.

    Currently, only Spark 3.1.1 and Spark 3.3.1 are supported. The Hive kernel version is 2.3.

  2. Prepare a LakeFormation instance. For details, see Creating an Instance.
  3. Create a LakeFormation access client in the same VPC and subnet as Spark. For details, see Managing Clients.
  4. Prepare the development environment. For details, see the part "Preparing the Environment" inPreparing the Development Environment. You can choose whether to install and configure IntelliJ IDEA.
  5. Download the LakeFormation client.

  6. Prepare and replace JAR packages required by the Hive kernel.

    If only SparkCatalogPlugin is used for interconnection (MetastoreClient is not used), skip this step.

    • Method 1: downloading the JAR packages needed for Hive pre-building.

      Download link: https://gitee.com/HuaweiCloudDeveloper/huaweicloud-lake-formation-lakecat-sdk-java/releases

      Download the corresponding client based on the Spark and Hive versions. For example, if the versions of Spark and Hive are 3.1.1 and 2.3.7, download hive-exec-2.3.7-core.jar and hive-common-2.3.7.jar.

    • Method 2: modifying Hive-related JAR packages locally

      If the connected environment is Spark 3.1.1, use Hive 2.3.7. If the interconnected environment is Spark 3.3.1, use Hive 2.3.9.

      In Windows, you need to perform Maven operations in the WSL development environment.

      1. Download the Hive source code based on the Hive version.

        For example, if the Hive kernel version is 2.3.9, the download link is https://github.com/apache/hive/tree/rel/release-2.3.9.

      2. Apply the patch in the LakeFormation client code to the Hive source code.
        1. Switch the Hive source code branch as required. For example, if the Hive kernel version is 2.3.9, run the following command:

          git checkout rel/release-2.3.9

        2. Run the following command to apply the patch file to the Hive source code project after the branch is switched:

          mvn patch:apply -DpatchFile=${your patch file location}

          In the command, your patch file location indicates the storage path of the hive-2_3_for_lakeformation.patch file. The patch file can be obtained from the client project, as shown in the following figure.

        3. Run the following command to recompile the Hive kernel source code:

          mvn clean install -DskipTests=true

  7. Add the JAR packages required by the Spark environment.

    Obtain the JAR packages listed in the following table and supplement or replace them in the jars directory in Spark.

    Table 1 JAR packages required

    No.

    JAR Package

    How to Obtain

    1

    spring-web-5.3.24.jar

    https://mirrors.huaweicloud.com/repository/maven/org/springframework/spring-web/5.3.24/

    2

    spring-core-5.3.24.jar

    https://mirrors.huaweicloud.com/repository/maven/org/springframework/spring-core/5.3.24/

    3

    spring-context-5.3.24.jar

    https://mirrors.huaweicloud.com/repository/maven/org/springframework/spring-context/5.3.24/

    4

    spring-beans-5.3.24.jar

    https://mirrors.huaweicloud.com/repository/maven/org/springframework/spring-beans/5.3.24/

    5

    caffeine-2.9.3.jar

    https://mirrors.huaweicloud.com/repository/maven/com/github/ben-manes/caffeine/caffeine/2.9.3/

    6

    mapstruct-1.5.3.Final.jar

    https://mirrors.huaweicloud.com/repository/maven/org/mapstruct/mapstruct/1.5.3.Final/

    7

    log4j-api-2.19.0.jar

    https://mirrors.huaweicloud.com/repository/maven/org/apache/logging/log4j/log4j-api/2.19.0/

    8

    java-sdk-core-3.2.4.jar

    (If only Custom Authentication Information Obtaining Class is used for token authentication, this JAR package is not required.)

    https://mirrors.huaweicloud.com/repository/maven/huaweicloudsdk/com/huawei/apigateway/java-sdk-core/3.2.4/

    9

    bcprov-jdk15to18-1.70.jar

    https://mirrors.huaweicloud.com/repository/maven/org/bouncycastle/bcprov-jdk15to18/1.70/

    10

    jca-1.0.4.jar

    https://mirrors.huaweicloud.com/repository/maven/org/openeuler/jca/1.0.4/

    11

    hadoop-huaweicloud-3.1.1-hw-53.8.jar

    https://github.com/huaweicloud/obsa-hdfs/blob/master/release/hadoop-huaweicloud-3.1.1-hw-53.8.jar

    12

    lakeformation-lakecat-client-1.0.0.jar

    Obtain the package by referring to the operation in 5.

    13

    hive-exec-${version}-core.jar

    Obtain the package by referring to the operation in 6.

    14

    hive-common-${version}.jar

    Obtain the package by referring to the operation in 6.