Updated on 2025-09-16 GMT+08:00

Configuring Hadoop Data Encryption During Transmission

Configuring Security Channel Encryption for Hadoop Components

Encrypted channel is an encryption protocol of remote procedure call (RPC) in HDFS. When a user invokes an RPC, the user's login name will be transmitted to RPC through RPC head. Then RPC uses Simple Authentication and Security Layer (SASL) to determine an authorization protocol (Kerberos and DIGEST-MD5) to complete RPC authorization.

By default, the communication between components in the big data cluster is not encrypted. However, the communication can be encrypted to prevent data interception on the Internet or an untrusted network.

To do this, you just need to set related parameters to control whether to encrypt RPC channels of all Hadoop modules. The settings take effect globally.

RPC channels in Hadoop include:

  • RPC channels for clients to access HDFS
  • RPC channels between HDFS modules, for example, between DataNode and NameNode
  • RPC channels for clients to access YARN
  • RPC channels between NodeManager and ResourceManager
  • RPC channels for Spark to access YARN and HDFS
  • RPC channels for MapReduce to access YARN and HDFS
  • RPC channels for HBase to access HDFS

On FusionInsight Manager, choose Cluster > Services > Target service name. Click Configurations and then All Configurations. Enter the parameter name in the search box and restart the corresponding service for the configuration to take effect.

Table 1 Parameter description

Service

Parameter

Description

Default Value

HDFS

hadoop.rpc.protection

Whether to encrypt the RPC channels of each module in Hadoop.

  • privacy: indicates that the RPC channels are encrypted, and the authentication, integrity, and privacy functions are enabled. This mode may degrade the performance.
  • integrity: indicates that the RPC channels are not encrypted, and only the authentication and integrity functions are enabled. To ensure data security, exercise caution when using this mode.
  • authentication: indicates that the RPC channels are not encrypted, and only the authentication packet is required. The integrity and privacy are not required. This mode ensures performance but has security risks.

This parameter applies to the RPC channels of all modules in Hadoop.

  • The setting takes effect only after the service is restarted. Rolling restart is not supported.
  • After the setting, you need to download the client configuration file again. Otherwise, HDFS cannot provide the read and write services.
  • Security mode: privacy
  • Normal mode: authentication

HDFS

dfs.encrypt.data.transfer

Whether to encrypt the Data Transfer (DT) channels through which the client accesses DataNodes and the data transmission channels between DataNodes.

This parameter is used only when hadoop.rpc.protection is set to privacy. Note that enabling encryption by default may severely affect performance when a large amount of service data is transmitted.

If data transmission encryption is configured for one cluster in a pair of trusted clusters, data transmission encryption must also be configured for the peer cluster.

  • true: indicates that the channels are encrypted.
  • false: indicates that the channels are not encrypted.

false

HDFS

dfs.encrypt.data.transfer.algorithm

Indicates the encryption algorithm of the HDFS data transfer channels and the channels for clients to access HDFS. This parameter is available only when dfs.encrypt.data.transfer is set to true.

  • 3des: default value, which indicates that the 3DES algorithm is used to encrypt data.
  • rc4: indicates that the RC4 algorithm is used to encrypt data. This value is not recommended because it may cause security risks.

3des

HDFS

dfs.encrypt.data.transfer.cipher.suites

The password suite for data encryption.

This parameter can be left empty or set to AES/CTR/NoPadding. If this parameter is left empty, the encryption algorithm specified by dfs.encrypt.data.transfer.algorithm is used to encrypt data.

AES/CTR/NoPadding

HBase

hbase.rpc.protection

Whether to encrypt HBase channels, including the RPC channels for HBase clients to access the HBase server and the RPC channels between the HMaster and RegionServer.

  • privacy: indicates that the RPC channels are encrypted, and the authentication, integrity, and privacy functions are enabled.
  • integrity: indicates that the RPC channels are not encrypted, and only the authentication and integrity functions are enabled.
  • authentication: indicates that the RPC channels are not encrypted, and only the authentication packet is required. The integrity and privacy are not required.

The privacy mode encrypts transmitted content, including sensitive information such as user tokens. This mode ensures the security of transmitted information, but has a great impact on performance. Compared with the other two modes, this mode reduces the read/write performance by about 60%.

Modify the configuration based on the enterprise security requirements. The configuration items on the client and server must be the same.

  • Security mode: privacy
  • Normal mode: authentication

Setting the Maximum Number of Concurrent Web Connections

To ensure web server reliability, new connections are rejected when the number of user connections reaches a specific threshold. This prevents DDOS attacks and service unavailability caused by too many users accessing the web server at the same time.

On FusionInsight Manager, choose Cluster > Services > Target service name. Click Configurations and then All Configurations. Enter the parameter name in the search box and restart the corresponding service for the configuration to take effect.

Table 2 Parameter description

Service

Parameter

Description

Default Value

HDFS/Yarn

hadoop.http.server.MaxRequests

Specifies the maximum number of concurrent web connections of each component.

2000

Spark/Spark2x

spark.connection.maxRequest

Specifies the maximum number of request connections of JobHistory.

5000