Configuring Hadoop Data Encryption During Transmission
Configuring Security Channel Encryption for Hadoop Components
Encrypted channel is an encryption protocol of remote procedure call (RPC) in HDFS. When a user invokes an RPC, the user's login name will be transmitted to RPC through RPC head. Then RPC uses Simple Authentication and Security Layer (SASL) to determine an authorization protocol (Kerberos and DIGEST-MD5) to complete RPC authorization.
By default, the communication between components in the big data cluster is not encrypted. However, when it comes to data interception on the public network or untrusted network, the encrypted protocol can be used to encrypt data transmission channels between Hadoop components.
To do this, you just need to set related parameters to control whether to encrypt RPC channels of all Hadoop modules. The settings take effect globally.
RPC channels in Hadoop include:
- RPC channels for clients to access HDFS
- RPC channels between HDFS modules, for example, between DataNode and NameNode
- RPC channels for clients to access YARN
- RPC channels between NodeManager and ResourceManager
- RPC channels for Spark to access YARN and HDFS
- RPC channels for MapReduce to access YARN and HDFS
- RPC channels for HBase to access HDFS
On FusionInsight Manager, choose Cluster > Services > Target service name. Click Configurations then All Configurations. Enter the parameter name in the search box and restart the corresponding service for the configuration to take effect.
Service |
Parameter |
Description |
Default Value |
---|---|---|---|
HDFS |
hadoop.rpc.protection |
Whether to encrypt the RPC channels of each module in Hadoop.
After this parameter is set, the encryption attribute of the RPC channels of all modules in the Hadoop takes effect globally.
|
|
HDFS |
dfs.encrypt.data.transfer |
Whether to encrypt the channels through which the client accesses HDFS and the HDFS data transmission channels, including the data transmission channels between DataNodes and the Data Transfer (DT) channels through which the client accesses DataNodes. This parameter is used only when hadoop.rpc.protection is set to privacy. Note that enabling encryption by default may severely affect performance when a large amount of service data is transmitted. If data transmission encryption is configured for one cluster in the trusted cluster, the same data transmission encryption must be configured for the peer cluster.
|
false |
HDFS |
dfs.encrypt.data.transfer.algorithm |
Indicates the encryption algorithm of the HDFS data transfer channels and the channels for clients to access HDFS. This parameter is available only when dfs.encrypt.data.transfer is set to true.
|
3des |
HDFS |
dfs.encrypt.data.transfer.cipher.suites |
The password suite for data encryption. This parameter can be left empty or set to AES/CTR/NoPadding. If this parameter is left empty, the encryption algorithm specified by dfs.encrypt.data.transfer.algorithm is used to encrypt data. |
AES/CTR/NoPadding |
HBase |
hbase.rpc.protection |
Whether to encrypt HBase channels, including the RPC channels for HBase clients to access the HBase server and the RPC channels between the HMaster and RegionServer.
The privacy mode encrypts transmitted content, including sensitive information such as user tokens. This mode ensures the security of transmitted information, but has a great impact on performance. Compared with the other two modes, this mode reduces the read/write performance by about 60%. Modify the configuration based on the enterprise security requirements. The configuration items on the client and server must be the same. |
|
Setting the Maximum Number of Concurrent Web Connections
To ensure web server reliability, new connections are rejected when the number of user connections reaches a specific threshold. This prevents DDOS attacks and service unavailability caused by too many users accessing the web server at the same time.
On FusionInsight Manager, choose Cluster > Services > Target service name. Click Configurations then All Configurations. Enter the parameter name in the search box and restart the corresponding service for the configuration to take effect.
Service |
Parameter |
Description |
Default Value |
---|---|---|---|
HDFS/Yarn |
hadoop.http.server.MaxRequests |
Specifies the maximum number of concurrent web connections of each component. |
2000 |
Spark/Spark2x |
spark.connection.maxRequest |
Specifies the maximum number of request connections of JobHistory. |
5000 |
Helpful Links
For details about the secure Hadoop RPC, visit the following website:
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot