Updated on 2024-09-23 GMT+08:00

Configuring Hadoop Data Encryption During Transmission

Configuring Security Channel Encryption

The channels between components are not encrypted by default. You can set the following parameters to configure security channel encryption.

To modify parameters, log in to FusionInsight Manager, choose Cluster > Services > Target Service Name, and click Configurations then All Configurations. Enter a parameter name in the search box.

  • Restart corresponding services for the modification to take effect after you modify configuration parameters.
  • This topic is available for MRS 3.x or later.
Table 1 Parameter description

Service

Parameter

Description

Default Value

HBase

hbase.rpc.protection

Indicates whether the HBase channels, including the remote procedure call (RPC) channels for HBase clients to access the HBase server and the RPC channels between the HMaster and RegionServer, are encrypted. If this parameter is set to privacy, the channels are encrypted and the authentication, integrity, and privacy functions are enabled. If this parameter is set to integrity, the channels are not encrypted and only the authentication and integrity functions are enabled. If this parameter is set to authentication, the channels are not encrypted, only packets are authenticated, and integrity and privacy are not required.

NOTE:

The privacy mode encrypts transmitted content, including sensitive information such as user tokens, to ensure the security of the transmitted content. However, this mode has great impact on performance. Compared with the other two modes, this mode reduces read/write performance by about 60%. Modify the configuration based on the enterprise security requirements. The configuration items on the client and server must be the same.

  • Security mode: privacy
  • Normal mode: authentication

HDFS

dfs.encrypt.data.transfer

Indicates whether the HDFS data transfer channels and the channels for clients to access HDFS are encrypted. The HDFS data transfer channels include the data transfer channels between DataNodes and the Data Transfer (DT) channels for clients to access DataNodes. The value true indicates that the channels are encrypted. The channels are not encrypted by default.

false

HDFS

dfs.encrypt.data.transfer.algorithm

Indicates the encryption algorithm of the HDFS data transfer channels and the channels for clients to access HDFS. This parameter is available only when dfs.encrypt.data.transfer is set to true.

The default value is 3des, indicating that 3DES algorithm is used to encrypt data. The value can also be set to rc4. However, to avoid security risks, you are not advised to set the parameter to this value.

3des

HDFS

hadoop.rpc.protection

Indicates whether the RPC channels of each module in Hadoop are encrypted. The channels include:

  • RPC channels for clients to access HDFS
  • RPC channels between HDFS modules, for example, between DataNode and NameNode
  • RPC channels for clients to access YARN
  • RPC channels between NodeManager and ResourceManager
  • RPC channels for Spark to access YARN and HDFS
  • RPC channels for MapReduce to access YARN and HDFS
  • RPC channels for HBase to access HDFS

The default value is privacy, indicating encrypted transmission. The value authentication indicates that transmission is not encrypted.

NOTE:

You can set this parameter on the HDFS component configuration page. The parameter setting is valid globally, that is, the setting of whether the RPC channel is encrypted takes effect on all modules in Hadoop.

  • Security mode: privacy
  • Normal mode: authentication

Setting the Maximum Number of Concurrent Web Connections

To ensure web server reliability, new connections are rejected when the number of user connections reaches a specific threshold. This prevents DDOS attacks and service unavailability caused by too many users accessing the web server at the same time.

To modify parameters, log in to FusionInsight Manager, choose Cluster > Services > Target Service Name, and click Configurations then All Configurations. Enter a parameter name in the search box.

Table 2 Parameter description

Service

Parameter

Description

Default Value

HDFS/Yarn

hadoop.http.server.MaxRequests

Specifies the maximum number of concurrent web connections of each component.

2000

Spark2x

spark.connection.maxRequest

Specifies the maximum number of request connections of JobHistory.

5000