Help Center/ Data Lake Insight/ Developer Guide/ Spark Jar Jobs/ Using Spark Jobs to Access Data Sources of Datasource Connections/ Connecting to CSS/ Java Example Code

Updated on 2025-02-21 GMT+08:00

Java Example Code

Prerequisites

A datasource connection has been created on the DLI management console. For details, see Enhanced Datasource Connections.

CSS Non-Security Cluster

Development description
- Code implementation
  - Constructing dependency information and creating a Spark session
    1. Import dependencies.
      Maven dependency
```
<dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql_2.11</artifactId>
            <version>2.3.2</version>
</dependency>
```
      Import dependency packages.
      
      1
      
      import org.apache.spark.sql.SparkSession;
    2. Create a session.
      1
      
      SparkSession sparkSession = SparkSession.builder().appName("datasource-css").getOrCreate();
- Connecting to data sources through SQL APIs
  1. Create a table to connect to a CSS data source.
```
sparkSession.sql("create table css_table(id long, name string) using css options( 'es.nodes' = '192.168.9.213:9200', 'es.nodes.wan.only' = 'true','resource' ='/mytest')");
```
  2. Insert data.
```
sparkSession.sql("insert into css_table values(18, 'John'),(28, 'Bob')");
```
  3. Query data.
```
sparkSession.sql("select * from css_table").show();
```
  4. Delete the datasource connection table.
```
sparkSession.sql("drop table css_table");
```
- Submitting a Spark job
  1. Generate a JAR file based on the code file and upload the JAR file to the OBS bucket.
  2. In the Spark job editor, select the corresponding dependency module and execute the Spark job.
    For Spark 2.3.2 (soon to be take offline) or 2.4.5, set Module to sys.datasource.css when submitting a job.
    
    If the Spark version is 3.1.1 or later, you do not need to select a module. Configure Spark parameters (--conf).
    spark.driver.extraClassPath=/usr/share/extension/dli/spark-jar/datasource/css/*
    
    spark.executor.extraClassPath=/usr/share/extension/dli/spark-jar/datasource/css/*
    
    For how to submit a job on the console, see Table 3 "Parameters for selecting dependency resources" in Creating a Spark Job.
    
    For details about how to submit a job through an API, see the description of the modules parameter in Table 2 "Request parameters" in Creating a Batch Processing Job.

Complete example code

Maven dependency

<dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql_2.11</artifactId>
            <version>2.3.2</version>
</dependency>

Connecting to data sources through SQL APIs

     import org.apache.spark.sql.*;
 
public class java_css_unsecurity {
 
    public static void main(String[] args) {
        SparkSession sparkSession = SparkSession.builder().appName("datasource-css-unsecurity").getOrCreate();
 
        // Create a DLI data table for DLI-associated CSS
        sparkSession.sql("create table css_table(id long, name string) using css options( 'es.nodes' = '192.168.15.34:9200', 'es.nodes.wan.only' = 'true', 'resource' = '/mytest')");
 
        //*****************************SQL model***********************************
        // Insert data into the DLI data table
        sparkSession.sql("insert into css_table values(18, 'John'),(28, 'Bob')");
 
        // Read data from DLI data table
        sparkSession.sql("select * from css_table").show();
 
        // drop table
        sparkSession.sql("drop table css_table");
 
        sparkSession.close();
    }
}
 
 
  

CSS Security Cluster

Preparations
Generate the keystore.jks and truststore.jks files and upload them to the OBS bucket. For details, see CSS Security Cluster Configuration.

Description of the development with HTTPS disabled

If HTTPS is disabled, keystore.jks and truststore.jks files are not required. You only need to set SSL access parameters and credentials.

Constructing dependency information and creating a Spark session

Import dependencies.

Maven dependency

<dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql_2.11</artifactId>
            <version>2.3.2</version>
</dependency>

Import dependency packages.

     import org.apache.spark.sql.SparkSession;

Create a session.

     SparkSession sparkSession = SparkSession.builder().appName("datasource-css").getOrCreate();

Connecting to data sources through SQL APIs

Create a table to connect to a CSS data source.

     sparkSession.sql("create table css_table(id long, name string) using css options( 'es.nodes' = '192.168.9.213:9200', 'es.nodes.wan.only' = 'true', 'resource' = '/mytest','es.net.ssl'='false','es.net.http.auth.user'='admin','es.net.http.auth.pass'='*******')");

For details about the parameters for creating a CSS datasource connection table, see Table 1.
In the preceding example, HTTPS access is disabled for the CSS security cluster. Therefore, you need to set es.net.ssl to false. es.net.http.auth.user and es.net.http.auth.pass are the username and password set during cluster creation, respectively.

Insert data.

     sparkSession.sql("insert into css_table values(18, 'John'),(28, 'Bob')");

Query data.

     sparkSession.sql("select * from css_table").show();

Delete the datasource connection table.

sparkSession.sql("drop table css_table");

Submitting a Spark job
1. Generate a JAR package based on the code file and upload the package to DLI.
  
  For details about console operations, see Creating a Package. For details about API operations, see Uploading a Package Group.
2. In the Spark job editor, select the corresponding dependency module and execute the Spark job.
  For details about console operations, see Creating a Spark Job. For details about API operations, see Creating a Batch Processing Job.
  When submitting a job, you need to specify a dependency module named sys.datasource.css.
  
  For how to submit a job on the console, see Table 3 "Parameters for selecting dependency resources" in Creating a Spark Job.
  
  For details about how to submit a job through an API, see the modules parameter in Request parameters of Creating a Batch Processing Job in the Data Lake Insight API Reference.

Complete example code

Maven dependency

<dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql_2.11</artifactId>
            <version>2.3.2</version>
</dependency>

Description of development with HTTPS enabled

Constructing dependency information and creating a Spark session

Import dependencies.

Maven dependency

<dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql_2.11</artifactId>
            <version>2.3.2</version>
</dependency>

Import dependency packages.

     import org.apache.spark.SparkFiles;
import org.apache.spark.sql.SparkSession;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
 
 
  

Create a session.

     SparkSession sparkSession = SparkSession.builder().appName("datasource-css").getOrCreate();

Copy the certificate.

      sparkSession.sparkContext().addFile("obs://Bucket name/Address/transport-keystore.jks");
      sparkSession.sparkContext().addFile("obs://Bucket name/Address/truststore.jks");

        // Obtain the path of the current working directory.
        String pathUser = System.getProperty("user.dir");
        System.out.println("path_user is " + pathUser);

        // Obtain the file name.
        String esTransportKeystoreFileName = SparkFiles.get("transport-keystore.jks");
        String esTruststoreFileName = SparkFiles.get("truststore.jks");

        System.out.println("esTransportKeystoreFileName is " + esTransportKeystoreFileName);
        System.out.println("esTruststoreFileName is " + esTruststoreFileName);
        // Combine the file path.
        String esTransportKeystoreLocalPath = pathUser + "/" + "transport-keystore.jks";
        String esTruststoreLocalPath = pathUser + "/" + "truststore.jks";

        System.out.println("esTransportKeystoreLocalPath is " + esTransportKeystoreLocalPath);
        System.out.println("esTruststoreLocalPath is " + esTruststoreLocalPath);
        try {
            // Copy the keystore file.
            copyFile(esTransportKeystoreFileName, esTransportKeystoreLocalPath);
            // Copy the truststore file.
            copyFile(esTruststoreFileName, esTruststoreLocalPath);
            // Wait for a few minutes.
            Thread.sleep(2000);

            System.out.println("Files copied successfully:");
            System.out.println("es_transport-keystore.jks: " + esTransportKeystoreLocalPath);
            System.out.println("es_truststore.jks: " + esTruststoreLocalPath);
        } catch (IOException | InterruptedException e) {
            e.printStackTrace();
        }

Connecting to data sources through SQL APIs

Create a table to connect to a CSS data source.

     sparkSession.sql("create table css_table(id long, name string) using css options( 'es.nodes' = '192.168.13.189:9200', 'es.nodes.wan.only' = 'true', 'resource' = '/mytest','es.net.ssl'='true','es.net.ssl.keystore.location' = 'file://" + esTransportKeystoreLocalPath + "','es.net.ssl.keystore.pass' = '**', 
'es.net.ssl.truststore.location'='file://" + esTruststoreLocalPath + "', 
'es.net.ssl.truststore.pass'='***','es.net.http.auth.user'='admin','es.net.http.auth.pass'='**')");

For details about the parameters for creating a CSS datasource connection table, see Table 1.

Insert data.

     sparkSession.sql("insert into css_table values(18, 'John'),(28, 'Bob')");

Query data.

     sparkSession.sql("select * from css_table").show();

Delete the datasource connection table.

sparkSession.sql("drop table css_table");

Submitting a Spark job
1. Generate a JAR package based on the code file and upload the package to DLI.
  
  For details about console operations, see Creating a Package. For details about API operations, see Uploading a Package Group.
2. In the Spark job editor, select the corresponding dependency module and execute the Spark job.
  For details about console operations, see Creating a Spark Job. For details about API operations, see Creating a Batch Processing Job.
  When submitting a job, you need to specify a dependency module named sys.datasource.css.
  
  For details about how to submit a job on the console, see Parameters for selecting dependency resources in the Data Lake Insight User Guide.
  
  For details about how to submit a job through an API, see the modules parameter in Request parameters of Creating a Batch Processing Job in the Data Lake Insight API Reference.

Complete example code

Maven dependency

<dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql_2.11</artifactId>
            <version>2.3.2</version>
</dependency>

Connecting to data sources through SQL APIs

     import org.apache.spark.SparkFiles;
import org.apache.spark.sql.SparkSession;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;

public class java_css_security_httpson { 
    public static void main(String[] args) { 
        SparkSession sparkSession = SparkSession.builder().appName("datasource-css").getOrCreate(); 

        sparkSession.sparkContext().addFile("obs://Bucket name/Address/transport-keystore.jks");
        sparkSession.sparkContext().addFile("obs://Bucket name/Address/css/truststore.jks");

        // Obtain the path of the current working directory.
        String pathUser = System.getProperty("user.dir");
        System.out.println("path_user is " + pathUser);

        // Obtain the file name.
        String esTransportKeystoreFileName = SparkFiles.get("transport-keystore.jks");
        String esTruststoreFileName = SparkFiles.get("truststore.jks");

        System.out.println("esTransportKeystoreFileName is " + esTransportKeystoreFileName);
        System.out.println("esTruststoreFileName is " + esTruststoreFileName);
        // Combine the file path.
        String esTransportKeystoreLocalPath = pathUser + "/" + "transport-keystore.jks";
        String esTruststoreLocalPath = pathUser + "/" + "truststore.jks";

        System.out.println("esTransportKeystoreLocalPath is " + esTransportKeystoreLocalPath);
        System.out.println("esTruststoreLocalPath is " + esTruststoreLocalPath);
        try {
            // Copy the keystore file.
            copyFile(esTransportKeystoreFileName, esTransportKeystoreLocalPath);
            // Copy the truststore file.
            copyFile(esTruststoreFileName, esTruststoreLocalPath);
            // Wait for a few minutes.
            Thread.sleep(2000);

            System.out.println("Files copied successfully:");
            System.out.println("es_transport-keystore.jks: " + esTransportKeystoreLocalPath);
            System.out.println("es_truststore.jks: " + esTruststoreLocalPath);
        } catch (IOException | InterruptedException e) {
            e.printStackTrace();
        }

        // Create a DLI data table for DLI-associated CSS 
        sparkSession.sql("create table css_table(id long, name string) using css options( 'es.nodes' = '192.168.13.189:9200', 'es.nodes.wan.only' = 'true', 'resource' = '/mytest','es.net.ssl'='true','es.net.ssl.keystore.location' = 'file://" + esTransportKeystoreLocalPath + "','es.net.ssl.keystore.pass' = '**','es.net.ssl.truststore.location'='file://" + esTruststoreLocalPath + "','es.net.ssl.truststore.pass'='**','es.net.http.auth.user'='admin','es.net.http.auth.pass'='**')");

        //*****************************SQL model*********************************** 
        // Insert data into the DLI data table 
        sparkSession.sql("insert into css_table values(34, 'Yuan'),(28, 'Kids')"); 

        // Read data from DLI data table 
        sparkSession.sql("select * from css_table").show(); 

        // drop table 
        sparkSession.sql("drop table css_table"); 

        sparkSession.close(); 
    } 
    private static void copyFile(String sourcePath, String destinationPath) throws IOException {
         // Copy a file from remote storage to local storage.
         byte[] fileContent = Files.readAllBytes(Paths.get(sourcePath));
         Files.write(Paths.get(destinationPath), fileContent);
    }
}
 
 
  

Parent topic: Connecting to CSS

Previous topic: PySpark Example Code

Next topic: Connecting to GaussDB(DWS)

Feedback

Was this page helpful?

Helpful Not helpful

Provide feedback

Thank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.

The system is busy. Please try again later.

For any further questions, feel free to contact us through the chatbot.

Chatbot