Big Data Cluster Design

When designing the big data cluster deployment architecture on the cloud, you are advised to comply with the following principles:

Big data cloud services preferred: If the source is a self-built big data cluster, and there are equivalent cloud services available on the destination cloud platform which fulfill all functional, performance, and compatibility criteria with minimal assessed reconstruction workloads, you are advised to preferentially use big data cloud services when designing the deployment architecture for big data clusters. Use your existing setup if the destination cloud platform does not have the corresponding big data cluster components or if the destination cloud platform's big data cluster components have limited compatibility and require significant rework.
Minimum reconstruction: If there is no special service requirement, avoid large-scale reconstruction. Design all components for the big data cluster in 1:1 benchmarking mode. Keep their versions consistent. Before a version upgrade, assess the adaptive reconstruction workloads.
Elastic and auto scaling: When designing a big data cluster on the cloud, consider the elastic and auto scaling capabilities of the cluster. This means that compute and storage resources in the cluster can automatically increase or decrease based on workload requirements, thereby enhancing performance, boosting efficiency, and reducing costs.
Fault tolerance and high availability: Big data clusters deployed on the cloud must have fault tolerance and high availability to ensure system reliability and stability. Use multiple replicas, redundant nodes, and failover mechanisms to ensure data and task persistence in the event of hardware or software failures.
Data security and compliance: Big data clusters deployed on the cloud need strong data security and compliance measures. Implement appropriate data encryption, identity authentication, access control, and data isolation measures to safeguard sensitive data against potential security threats.
Cost-effectiveness: When deploying big data clusters on the cloud, consider cost-effectiveness. Cloud service providers can provide elastic compute and storage resources, avoiding direct investment and maintenance costs on physical hardware. In addition, resources can be optimized and adjusted as required to minimize costs and improve resource utilization.