Updated on 2024-11-29 GMT+08:00

Configuring a MySQL Data Source

You can interconnect HetuEngine with MySQL data sources to access and query MySQL data. This section describes how to add a MySQL JDBC data source on HSConsole.

Prerequisites

  • The data source and the HetuEngine cluster nodes can communicate with each other.
  • If Kerberos authentication is enabled for the cluster (the cluster is in security mode), create a HetuEngine administrator user. If Kerberos authentication is disabled for the cluster (the cluster is in normal mode), create a HetuEngine service user, and assign the HDFS administrator permission to the user. That is, the user is added to both the hadoop and hadoopmanager user groups. For details about how to create a user, see Creating a HetuEngine User.
  • A HetuEngine compute instance has been created. For details, see Creating a HetuEngine Compute Instance.
  • You have obtained the IP address, port number, username, and password for logging in to the MySQL database.

Constraints on Interconnection Between HetuEngine and MySQL Data Sources

  • HetuEngine supports interconnecting with MySQL using the following SQL syntaxes: SHOW CATALOGS, SCHEMAS, TABLES, COLUMNS, DESCRIBE, USE, and SELECT TABLE/VIEW.
  • The schema and table names of MySQL data sources supported by HetuEngine are case insensitive.
  • Predicate pushups or pushdowns are not allowed on columns of text types such as CHAR or VARCHAR.

    For example, if name is a column of the VARCHAR type, the predicates of the following two queries cannot be pushed down.

    SELECT * FROM nation WHERE name>'abcd';
    SELECT * FROM nation WHERE name='abcd';

Configuring a MySQL Data Source

Installing a cluster client

  1. Install the cluster client that contains the HetuEngine service in the /opt/hadoopclient directory.

Prepare the MySQL driver

  1. Obtain the MySQL driver file (xxx.jar) from the MySQL official website. The supported versions are MySQL 5.7, MySQL 8.0, and later versions.
  2. Upload the MySQL driver file to the cluster where HetuEngine is deployed.

    You can use either of the following methods:
    • Upload the file to HDFS on FusionInsight Manager.
      1. Log in to FusionInsight Manager as a HetuEngine administrator and choose Cluster > Services > HDFS.
      2. In the Basic Information area on the Dashboard page, click the link next to NameNode Web UI.
      3. Select Utilities > Browse the file system, click , and create the /user/hetuserver/fiber/extra_file/driver/mysql directory.
      4. Go to the /user/hetuserver/fiber/extra_file/driver/mysql directory and click to upload the MySQL driver file obtained in 2.
      5. Click the value in the Permission column in the row containing the driver file, select Read and Write in the User column, Read in the Group column, and Read in the Other column, and click Set.
    • Run HDFS commands to upload the file.
      1. Log in to the node where the HDFS service client is deployed and switch to the client installation directory, for example, /opt/hadoopclient.

        cd /opt/hadoopclient

      2. Configure environment variables.

        source bigdata_env

      3. If the cluster is in security mode, authenticate the user. For a normal cluster, user authentication is not required.

        kinit HetuEngine administrator username

        Enter the password as prompted.

      4. Run the following commands to create /user/hetuserver/fiber/extra_file/driver/mysql, upload the MySQL driver obtained in 2, and modify the permission:

        hdfs dfs -mkdir -p /user/hetuserver/fiber/extra_file/driver/mysql

        hdfs dfs -put ./MySQL driver file /user/hetuserver/fiber/extra_file/driver/mysql

        hdfs dfs -chmod -R 644 /user/hetuserver/fiber/extra_file/driver/mysql

Configuring a MySQL Data Source

  1. Log in to FusionInsight Manager as a HetuEngine administrator and choose Cluster > Services > HetuEngine.
  2. In the displayedDashboard tab, find the Basic Information area, and click the link next to HSConsole WebUI.
  3. Choose Data Source and click Add Data Source. Configure parameters on the Add Data Source page.

    1. In the Basic Configuration area, configure Name and choose JDBC > MySQL for Data Source Type.
    2. In the MySQL Configuration area, configure the parameters according to Table 1.
      Table 1 MySQL configuration

      Parameter

      Description

      Example Value

      Driver

      The default value is mysql.

      mysql

      Driver Name

      Select the MySQL driver that has been uploaded in 2. The driver format is xxx.jar.

      mysql-connector-java-8.0.11.jar

      JDBC URL

      JDBC URL for connecting to MySQL.

      Format: jdbc:mysql://IP address of the MySQL database:Port number.

      The default port number is 3306.

      • IPV4: jdbc:mysql://10.10.10.11:3306
      • IPV6: jdbc:mysql://[10:10::10:11]:3306

      Username

      MySQL username for connecting to the MySQL data source

      -

      Password

      Password of the MySQL username for connecting to the MySQL data source

      -

    3. (Optional) Customize the configuration.
      Click Add to add custom configuration parameters. Configure custom parameters of the MySQL data source. For details, see Table 2.
      Table 2 Custom parameters of the MySQL data source

      Parameter

      Description

      Example Value

      mysql.auto-reconnect

      Whether to reconnect automatically

      • true (default value): Enable automatic reconnection.
      • false: Disable automatic reconnection.

      true

      mysql.max-reconnects

      Maximum number of reconnection attempts. The default value is 3.

      3

      mysql.jdbc.use-information-schema

      Whether the driver should use INFORMATION_SCHEMA to derive the information used by DatabaseMetaData.

      true

      use-connection-pool

      Whether to use the JDBC connection pool. The default value is false.

      false

      jdbc.connection.pool.maxTotal

      Maximum number of connections in the JDBC connection pool. The default value is 8.

      8

      jdbc.connection.pool.maxIdle

      Maximum number of idle connections in the JDBC connection pool. The default value is 8.

      8

      jdbc.connection.pool.minIdle

      Minimum number of idle connections in the JDBC connection pool. The default value is 0.

      0

      case-insensitive-name-matching

      The schema and table names of MySQL data sources supported by HetuEngine are case sensitive.

      • false (default value): Only schemas and tables whose names contain only lowercase letters can be queried.
      • true:
        • If no schema or table is matched ignoring case sensitivity, the schema and table can be queried.
        • If schemas and tables are matched ignoring case sensitivity, the schema and table cannot be queried.

      false

      case-insensitive-name-matching.cache-ttl

      Timeout interval for caching case-sensitive schema and table names of the MySQL data source. The default value is 1 minute.

      1m

      dynamic-filtering.enabled

      Whether dynamic filters will be pushed down to JDBC queries.

      • true (default value): Enable pushdown.
      • false: Disable pushdown.

      true

      dynamic-filtering.wait-timeout

      The maximum duration that HetuEngine will wait to collect dynamic filters from the build side of the connection before starting a JDBC query. Using a larger value may result in a more detailed dynamic filter. However, the latency of some queries is increased. The default value is 20s.

      20s

      unsupported-type-handling

      How data types that are not supported by the connector will be processed

      • CONVERT_TO_VARCHAR: Convert unsupported types to VARCHAR and allow only read operations on them.
      • IGNORE (default value): Do not display the unsupported types.

      IGNORE

      join-pushdown.enabled

      Whether join pushdown is enabled.

      • true (default value): Enable join pushdown.
      • false: Disable join pushdown.

      true

      join-pushdown.strategy

      Policy used to evaluate whether the Join operation is pushed down.

      • AUTOMATIC (default value): Enable cost-based connection pushdown.
      • EAGER: Push down joins as much as possible. Even if table statistics are unavailable, using EAGER will push down joins, which may cause query performance deterioration. Use EAGER only in test and troubleshooting scenarios.

      AUTOMATIC

      Click Delete to delete custom configuration parameters.

    4. Click OK

  4. Log in to the node where the cluster client is deployed and run the following commands to switch to the client installation directory and authenticate the user:

    cd /opt/hadoopclient

    source bigdata_env

    kinit User performing HetuEngine operations (If the cluster is in normal mode, skip this command.)

  5. Log in to the catalog of the data source.

    hetu-cli --catalog Data source name --schema Database name

    For example, run the following command:

    hetu-cli --catalog mysql_1 --schema mysql

  6. Run the following command. If the database table information can be viewed or no error is reported, the connection is successful.

    show tables;

Mapping Between MySQL and HetuEngine Data Types

Mapping from MySQL data types to HetuEngine data types

MySQL Type

HetuEngine Data Type

BIT

BOOLEAN

BOOLEAN

TINYINT

TINYINT

TINYINT

SMALLINT

SMALLINT

INTEGER

INTEGER

BIGINT

BIGINT

DOUBLE PRECISION

DOUBLE

FLOAT

REAL

REAL(m, d)

REAL(m, d)

DECIMAL(p, s)

DECIMAL(p, s)

CHAR(n)

CHAR(n)

VARCHAR(n)

VARCHAR(n)

TINYTEXT

VARCHAR(255)

TEXT

VARCHAR(65535)

MEDIUMTEXT

VARCHAR(16777215)

LONGTEXT

VARCHAR

ENUM(n)

VARCHAR(n)

BINARY, VARBINARY, TINYBLOB, BLOB, MEDIUMBLOB, LONGBLOB

VARBINARY

JSON

JSON

DATE

DATE

TIME(n)

TIME(n)

DATETIME(n)

TIMESTAMP(n)

TIMESTAMP(n)

TIMESTAMP(n)