Updated on 2025-11-13 GMT+08:00

Using CarbonData for First Query

Tool Overview

The first query of CarbonData is slow, which may cause a delay for nodes that have high requirements on real-time performance.

The tool provides the following functions:

  • Preheat the tables that have high requirements on query delay for the first time.

Tool Usage

  1. Install the Spark client.
  2. Log in to the Spark client node as the client installation user.
  3. Go to the {Client installation directory}/Spark2x/spark/bin directory and run the following command:

    start-prequery.sh

    Configure prequeryParams.properties by referring to Table 1.

    Table 1 Parameters

    Parameter

    Description

    Example Value

    spark.prequery.period.max.minute

    Maximum preheating duration, in minutes.

    60

    spark.prequery.tables

    Table name configuration, database.table:int. The table name supports the wildcard (*). int indicates the duration (unit: day) within which the table is updated before it is preheated.

    default.test*:10

    spark.prequery.maxThreads

    Maximum number of concurrent threads during preheating

    50

    spark.prequery.sslEnable

    The value is true in security mode and false in non-security mode.

    true

    spark.prequery.driver

    IP address and port number of JDBCServer. The format is IP address:Port number. If multiple servers need to be preheated, enter multiple IP address:Port number of the servers and separate them with commas (,).

    192.168.0.2:22550

    spark.prequery.sql

    SQL statement for preheating. Different statements are separated by colons (:).

    The statement configured in spark.prequery.sql is executed in each preheated table. The table name is replaced with %s.

    SELECT COUNT(*) FROM %s;SELECT * FROM %s LIMIT 1

    spark.security.url

    URL required by JDBC in security mode

    ;saslQop=auth-conf;auth=KERBEROS;principal=spark2x/hadoop.hadoop.com@HADOOP.COM;

    Script Usage

    Command format: sh start-prequery.sh

    To run this command, place user.keytab or jaas.conf (either of them) and krb5.conf (mandatory) in the conf directory.

    • Currently, this tool supports only Carbon tables.
    • This tool initializes the Carbon environment and pre-reads table metadata to JDBCServer. Therefore, this tool is more suitable for multi-active instances and static allocation mode.