Introduction to Impala Application Development

Impala provides fast, interactive SQL queries directly on your Apache Hadoop data stored in HDFS, HBase, or the Object Storage Service (OBS). In addition to using the same unified storage platform, Impala also uses the same metadata, SQL syntax (Hive SQL), ODBC driver, and user interface (Impala query UI in Hue) as Apache Hive. This provides a familiar and unified platform for both real-time and batch queries. Impala is a supplementary tool for querying big data. It does not replace batch processing frameworks built on MapReduce, such as Hive. MapReduce-based Hive and other frameworks are best suited for long-running batch jobs.

Impala supports the following features:

Most common SQL-92 features provided by Hive Query Language (HiveQL) including SELECT, JOIN, and aggregate functions
HDFS, HBase, and OBS storage, including:
- HDFS file formats: delimited text files, Parquet, Avro, SequenceFile, and RCFile
- Compression codecs: Snappy, GZIP, Deflate, BZIP
Common data access methods, including:
- JDBC driver
- ODBC driver
- Hue Beeswax and the Impala query UI
impala-shell command line interface
Kerberos authentication

Impala is ideal for offline analysis (such as log and cluster status analysis) of real-time data queries, large-scale data mining (including user behavior, interest, and geographic display), and other scenarios.

Parent topic: Impala Application Development Overview

Previous topic: Impala Application Development Overview

Next topic: Common Concepts of Impala Application Development

Feedback

Was this page helpful?

Helpful Not helpful

Provide feedback

Thank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.

The system is busy. Please try again later.

For any further questions, feel free to contact us through the chatbot.

Chatbot