What Are the Differences Between a Data Warehouse and the Hadoop Big Data Platform?
The Hadoop big data platform can be regarded as a next-generation data warehousing system. It has the characteristics of modern data warehouses and is widely used by enterprises. Because of the scalability of MPP, the MPP-based data warehousing system is sometimes classified as a big data platform.
However, data warehouses greatly differ from the Hadoop platform in function and user experience in different scenarios. For details, see the following table.
Feature |
Hadoop |
Data Warehouse |
---|---|---|
Number of compute nodes |
1000s |
Max 256 |
Data volume |
Over 10 PB |
Max 10 PB |
Data type |
Relational, semi-relational, unstructured (voice, images, and video) |
Relational only |
Latency |
Medium to high |
Low |
Application ecosystem |
Innovative/AI |
Traditional/BI |
Application development API |
SQL and other programming language APIs, such as MapReduce |
Standard database SQL |
Scalability |
Unlimited, with comprehensive programming APIs |
Limited, supported by UDFs |
Transaction support |
Limited |
Comprehensive |
Data warehouses and the Hadoop platform work together in different scenarios. GaussDB(DWS) on the public cloud can seamlessly integrate with Hadoop-based MRS on the public cloud to provide the SQL-over-Hadoop data sharing across platforms and services. GaussDB(DWS) serves as a data warehouse for managing massive data while relishing the openness, convenience, and innovation of the Hadoop platform. You can also enjoy the upper-layer applications of conventional data warehouses, especially BI applications, using GaussDB(DWS).
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.