Preparing for Development and Operating Environment

Spark2x applications can be developed in Scala, Java, and Python. Table 1 describes the development and operating environment to be prepared.

**Table 1** Development environment
Preparation Item	Description
OS	Development environment: Windows OS. Windows 7 or later is supported. Operating environment: Windows OS or Linux OS. If the program needs to be commissioned locally, the runtime environment must be able to communicate with the cluster service plane network.
JDK installation	Basic configuration for the Java/Scala development and operating environment. The version requirements are as follows: The server and client support only built-in OpenJDK, version: 1.8.0_272, and therefore a JDK replacement is not allowed. Customers' applications that need to reference the JAR packages of SDK to run in the application processes support Oracle JDK, IBM JDK, and OpenJDK. For x86 nodes that run clients, the following JDKs can be used: Oracle JDK versions: 1.8 IBM JDK versions: 1.8.5.11 For nodes that run TaiShan and clients, the following JDK can be used: OpenJDK: 1.8.0_272 NOTE: The server supports only TLS V1.2 or later to ensure security. By default, IBM JDK supports only TLS V1.0. If IBM JDK is used, setting com.ibm.jsse2.overrideDefaultTLS to true enables TLS V1.0, V1.1, and V1.2. For details, see https://www.ibm.com/support/knowledgecenter/en/SSYKE2_8.0.0/com.ibm.java.security.component.80.doc/security-component/jsse2Docs/matchsslcontext_tls.html#matchsslcontext_tls.
IntelliJ IDEA installation and configuration	It is a tool used to develop Spark applications. Version 2019.1 or other compatible versions are recommended. NOTE: If the IBM JDK is used, ensure that the JDK configured in IntelliJ IDEA is the IBM JDK. If the Oracle JDK is used, ensure that the JDK configured in IntelliJ IDEA is the Oracle JDK. If the Open JDK is used, ensure that the JDK configured in IntelliJ IDEA is the Open JDK. Do not use the same workspace and the sample project in the same path for different IntelliJ IDEA programs.
Installation of Maven	Basic configurations of the development environment. It can be used for project management throughout the lifecycle of software development.
Scala installation	It is the basic configuration for the Scala development environment. The required version is 2.12.10.
Scala plug-in installation	It is the basic configuration for the Scala development environment. The required version is 2018.2.11 or other compatible versions.
Editra installation	Editra is an editor in the Python development environment and is used to compile Python programs. You can also use other IDEs for Python programming.
Developer account preparation	See Preparing the Developer Account for configuration.
7-zip	Used to decompress .zip and .rar packages. The 7-Zip 16.04 is supported.
Python installation	Its version must be 2.6.6 or later.

Preparing a Runtime Environment

During application development, you need to prepare the environment for code running and commissioning to verify that the application is running properly.

If the local Windows development environment can communicate with the cluster service plane network, download the cluster client to the local host; obtain the cluster configuration file required by the commissioning program; configure the network connection, and commission the program in Windows.

Log in to the FusionInsight Manager portal and choose Cluster > Dashboard > More > Download Client. Set Select Client Type to Configuration Files Only. Select the platform type based on the type of the node where the client is to be installed (select x86_64 for the x86 architecture and aarch64 for the Arm architecture) and click OK. After the client files are packaged and generated, download the client to the local PC as prompted and decompress it.
For example, if the client file package is FusionInsight_Cluster_1_Services_Client.tar, decompress it to obtain FusionInsight_Cluster_1_Services_ClientConfig_ConfigFiles.tar file. Then, decompress FusionInsight_Cluster_1_Services_ClientConfig_ConfigFiles.tar file to the D:\FusionInsight_Cluster_1_Services_ClientConfig_ConfigFiles directory on the local PC. The directory name cannot contain spaces.

Go to the client decompression path FusionInsight_Cluster_1_Services_ClientConfig_ConfigFiles\Spark2x\config and manually import the configuration file to the configuration file directory (usually the resources folder) of the Hive sample project.

The keytab file obtained during the Preparing the Developer Account is also stored in this directory. Table 2 describes the main configuration files.

**Table 2** Configuration files
File	Function
carbon.properties	CarbonData configuration file
core-site.xml	Configures HDFS parameters.
hdfs-site.xml	Configures HDFS parameters.
hbase-site.xml	Configures HBase parameters.
hive-site.xml	Configures Hive parameters.
jaas-zk.conf	Java authentication configuration file
log4j-executor.properties	executor log configuration file
mapred-site.xml	Hadoop mapreduce configuration file
ranger-spark-audit.xml	Ranger audit log configuration file
ranger-spark-security.xml	Ranger permission management configuration file
yarn-site.xml	Configures Yarn parameters.
spark-defaults.conf	Configures Spark2x parameters.
spark-env.sh	Spark2x environment variable configuration file
user.keytab	Provides HDFS user information for Kerberos security authentication.
krb5.conf	Provides Kerberos server configuration information.

During application development, if you need to commission the application in the local Windows system, copy the content in the hosts file in the decompression directory to the hosts file of the node where the client is located. Ensure that the local host can communicate correctly with the hosts listed in the hosts file in the decompression directory.
- If the host where the client is installed is not a node in the cluster, configure network connections for the client to prevent errors when you run commands on the client.
- The local hosts file in a Windows environment is stored in, for example, C:\WINDOWS\system32\drivers\etc\hosts.

If you use the Linux environment for commissioning, you need to prepare the Linux node where the cluster client is to be installed and obtain related configuration files.

Install the client on the node. For example, install the client in the directory /opt/client.
Ensure that the difference between the client time and the cluster time is less than 5 minutes.

For details about how to use the client on a Master or Core node in the cluster, see Using an MRS Client on Nodes Inside a Cluster. For details about how to install the client outside the MRS cluster, see Using an MRS Client on Nodes Outside a Cluster.

Log in to the FusionInsight Manager portal. Download the cluster client software package to the active management node and decompress it. Then, log in to the active management node as user root. Go to the decompression path of the cluster client and copy all configuration files in the FusionInsight_Cluster_1_Services_ClientConfig/Spark2x/config directory to the conf directory where the compiled JAR package is stored for subsequent commissioning, for example, /opt/client/conf.

For example, if the client software package is FusionInsight_Cluster_1_Services_Client.tar and the download path is /tmp/FusionInsight-Client on the active management node, run the following command:

cd /tmp/FusionInsight-Client

tar -xvf FusionInsight_Cluster_1_Services_Client.tar

tar -xvf FusionInsight_Cluster_1_Services_ClientConfig.tar

cd FusionInsight_Cluster_1_Services_ClientConfig

scp Spark2x/config/* root@IP address of the client node:/opt/client/conf

The keytab file obtained during the Preparing the Developer Account is also stored in this directory. Table 3 describes the main configuration files.

**Table 3** Configuration files
File	Function
carbon.properties	CarbonData configuration file
core-site.xml	Configures HDFS parameters.
hdfs-site.xml	Configures HDFS parameters.
hbase-site.xml	Configures HBase parameters.
hive-site.xml	Configures Hive parameters.
jaas-zk.conf	Java authentication configuration file
log4j-executor.properties	executor log configuration file
mapred-site.xml	Hadoop mapreduce configuration file
ranger-spark-audit.xml	Ranger audit log configuration file
ranger-spark-security.xml	Ranger permission management configuration file
yarn-site.xml	Configures Yarn parameters.
spark-defaults.conf	Configures Spark2x parameters.
spark-env.sh	Spark2x environment variable configuration file
user.keytab	Provides HDFS user information for Kerberos security authentication.
krb5.conf	Provides Kerberos server configuration information.

Check the network connection of the client node.
During the client installation, the system automatically configures the hosts file on the client node. You are advised to check whether the /etc/hosts file contains the host names of the nodes in the cluster. If no, manually copy the content in the hosts file in the decompression directory to the hosts file on the node where the client resides, to ensure that the local host can communicate with each host in the cluster.