Updated on 2025-08-11 GMT+08:00

Configuring Hadoop Data Encryption During Transmission

Configuring Security Channel Encryption for Hadoop Components

Encrypted channel is an encryption protocol of remote procedure call (RPC) in HDFS. When a user invokes an RPC, the user's login name will be transmitted to RPC through RPC head. Then RPC uses Simple Authentication and Security Layer (SASL) to determine an authorization protocol (Kerberos and DIGEST-MD5) to complete RPC authorization.

By default, the communication between components in the big data cluster is not encrypted. However, when it comes to data interception on the public network or untrusted network, the encrypted protocol can be used to encrypt data transmission channels between Hadoop components.

To do this, you just need to set related parameters to control whether to encrypt RPC channels of all Hadoop modules. The settings take effect globally.

RPC channels in Hadoop include:

  • RPC channels for clients to access HDFS
  • RPC channels between HDFS modules, for example, between DataNode and NameNode
  • RPC channels for clients to access YARN
  • RPC channels between NodeManager and ResourceManager
  • RPC channels for Spark to access YARN and HDFS
  • RPC channels for MapReduce to access YARN and HDFS
  • RPC channels for HBase to access HDFS

On FusionInsight Manager, choose Cluster > Services > Target service name. Click Configurations then All Configurations. Enter the parameter name in the search box and restart the corresponding service for the configuration to take effect.

Table 1 Parameter description

Service

Parameter

Description

Default Value

HDFS

hadoop.rpc.protection

Whether to encrypt the RPC channels of each module in Hadoop.

  • privacy: indicates that the RPC channels are encrypted, and the authentication, integrity, and privacy functions are enabled. This mode may degrade the performance.
  • integrity: indicates that the RPC channels are not encrypted, and only the authentication and integrity functions are enabled. To ensure data security, exercise caution when using this mode.
  • authentication: indicates that the RPC channels are not encrypted, and only the authentication packet is required. The integrity and privacy are not required. This mode ensures performance but has security risks.

After this parameter is set, the encryption attribute of the RPC channels of all modules in the Hadoop takes effect globally.

  • The setting takes effect only after the service is restarted. Rolling restart is not supported.
  • After the setting, you need to download the client configuration file again. Otherwise, HDFS cannot provide the read and write services.
  • Security mode: privacy
  • Normal mode: authentication

HDFS

dfs.encrypt.data.transfer

Whether to encrypt the channels through which the client accesses HDFS and the HDFS data transmission channels, including the data transmission channels between DataNodes and the Data Transfer (DT) channels through which the client accesses DataNodes.

This parameter is used only when hadoop.rpc.protection is set to privacy. Note that enabling encryption by default may severely affect performance when a large amount of service data is transmitted.

If data transmission encryption is configured for one cluster in the trusted cluster, the same data transmission encryption must be configured for the peer cluster.

  • true: indicates that the channels are encrypted.
  • false: indicates that the channels are not encrypted.

false

HDFS

dfs.encrypt.data.transfer.algorithm

Indicates the encryption algorithm of the HDFS data transfer channels and the channels for clients to access HDFS. This parameter is available only when dfs.encrypt.data.transfer is set to true.

  • 3des: default value, indicates that 3DES algorithm is used to encrypt data.
  • rc4: indicates that the RC4 algorithm is used to encrypt data. This value is not recommended because it may cause security risks.

3des

HDFS

dfs.encrypt.data.transfer.cipher.suites

The password suite for data encryption.

This parameter can be left empty or set to AES/CTR/NoPadding. If this parameter is left empty, the encryption algorithm specified by dfs.encrypt.data.transfer.algorithm is used to encrypt data.

AES/CTR/NoPadding

HBase

hbase.rpc.protection

Whether to encrypt HBase channels, including the RPC channels for HBase clients to access the HBase server and the RPC channels between the HMaster and RegionServer.

  • privacy: indicates that the RPC channels are encrypted, and the authentication, integrity, and privacy functions are enabled.
  • integrity: indicates that the RPC channels are not encrypted, and only the authentication and integrity functions are enabled.
  • authentication: indicates that the RPC channels are not encrypted, and only the authentication packet is required. The integrity and privacy are not required.

The privacy mode encrypts transmitted content, including sensitive information such as user tokens. This mode ensures the security of transmitted information, but has a great impact on performance. Compared with the other two modes, this mode reduces the read/write performance by about 60%.

Modify the configuration based on the enterprise security requirements. The configuration items on the client and server must be the same.

  • Security mode: privacy
  • Normal mode: authentication

Setting the Maximum Number of Concurrent Web Connections

To ensure web server reliability, new connections are rejected when the number of user connections reaches a specific threshold. This prevents DDOS attacks and service unavailability caused by too many users accessing the web server at the same time.

On FusionInsight Manager, choose Cluster > Services > Target service name. Click Configurations then All Configurations. Enter the parameter name in the search box and restart the corresponding service for the configuration to take effect.

Table 2 Parameter description

Service

Parameter

Description

Default Value

HDFS/Yarn

hadoop.http.server.MaxRequests

Specifies the maximum number of concurrent web connections of each component.

2000

Spark/Spark2x

spark.connection.maxRequest

Specifies the maximum number of request connections of JobHistory.

5000