Adding a MySQL Data Source

This section applies to MRS 3.3.0 or later.

You can interconnect HetuEngine with MySQL data sources to access and query MySQL data. This section describes how to add a MySQL JDBC data source on HSConsole.

Prerequisites for Adding a MySQL Data Source

The data source and the HetuEngine cluster nodes can communicate with each other.
If Kerberos authentication is enabled for the cluster (the cluster is in security mode), create a HetuEngine administrator user. If Kerberos authentication is disabled for the cluster (the cluster is in normal mode), create a HetuEngine service user, and assign the HDFS administrator permission to the user. That is, the user is added to both the hadoop and hadoopmanager user groups. For details about how to create a user, see Creating a HetuEngine Permission Role.
A HetuEngine compute instance has been created. For details, see Creating a HetuEngine Compute Instance.
You have obtained the IP address, port number, username, and password for logging in to the MySQL database.

Constraints on Interconnection Between HetuEngine and MySQL Data Sources

HetuEngine supports interconnecting with MySQL using the following SQL syntaxes: SHOW CATALOGS, SCHEMAS, TABLES, COLUMNS, DESCRIBE, USE, and SELECT TABLE/VIEW.
The schema and table names of MySQL data sources supported by HetuEngine are case insensitive.
Predicate pushups or pushdowns are not allowed on columns of text types such as CHAR or VARCHAR.
For example, if name is a column of the VARCHAR type, the predicates of the following two queries cannot be pushed down.
```
SELECT * FROM nation WHERE name>'abcd';
SELECT * FROM nation WHERE name='abcd';
```

Configuring a MySQL Data Source

Installing a cluster client

Install the cluster client that contains the HetuEngine service in the /opt/hadoopclient directory.

Prepare the MySQL driver

Obtain the MySQL driver file (xxx.jar) from the MySQL official website. The supported versions are MySQL 5.7, MySQL 8.0, and later versions.
Upload the MySQL driver file to the cluster where HetuEngine is deployed.
You can use either of the following methods:
- Upload the file to HDFS on FusionInsight Manager.
  1. Log in to FusionInsight Manager as a HetuEngine administrator and choose Cluster > Services > HDFS.
  2. In the Basic Information area on the Dashboard page, click the link next to NameNode Web UI.
  3. Select Utilities > Browse the file system, click , and create the /user/hetuserver/fiber/extra_file/driver/mysql directory.
  4. Go to the /user/hetuserver/fiber/extra_file/driver/mysql directory and click to upload the MySQL driver file obtained in 2.
  5. Click the value in the Permission column in the row containing the driver file, select Read and Write in the User column, Read in the Group column, and Read in the Other column, and click Set.
- Run HDFS commands to upload the file.
  1. Log in to the node where the HDFS service client is deployed and switch to the client installation directory, for example, /opt/hadoopclient.
    cd /opt/hadoopclient
  2. Configure environment variables.
    source bigdata_env
  3. If the cluster is in security mode, authenticate the user. For a normal cluster, user authentication is not required.
    kinit HetuEngine administrator username
    
    Enter the password as prompted.
  4. Run the following commands to create /user/hetuserver/fiber/extra_file/driver/mysql, upload the MySQL driver obtained in 2, and modify the permission:
    hdfs dfs -mkdir -p /user/hetuserver/fiber/extra_file/driver/mysql
    
    hdfs dfs -put ./MySQL driver file /user/hetuserver/fiber/extra_file/driver/mysql
    
    hdfs dfs -chmod -R 644 /user/hetuserver/fiber/extra_file/driver/mysql

Configuring a MySQL Data Source

Log in to FusionInsight Manager as a HetuEngine administrator and choose Cluster > Services > HetuEngine.
In the displayed Dashboard tab, find the Basic Information area, and click the link next to HSConsole WebUI.

Choose Data Source and click Add Data Source. Configure parameters on the Add Data Source page.

In the Basic Configuration area, configure Name and choose JDBC > MySQL for Data Source Type.

In the MySQL Configuration area, configure the parameters according to Table 1.

**Table 1** MySQL configuration
Parameter	Description	Example Value
Driver	The default value is mysql.	mysql
Driver Name	Select the MySQL driver that has been uploaded in 2. The driver format is xxx.jar.	mysql-connector-java-8.0.11.jar
JDBC URL	JDBC URL for connecting to MySQL. Format: jdbc:mysql://IP address of the MySQL database:Port number. The default port number is 3306.	IPV4: jdbc:mysql://10.10.10.11:3306 IPV6: jdbc:mysql://[10:10::10:11]:3306
Username	MySQL username for connecting to the MySQL data source	-
Password	Password of the MySQL username for connecting to the MySQL data source	-

(Optional) Customize the configuration.

Click Add to add custom configuration parameters. Configure custom parameters of the MySQL data source. For details, see Table 2.

**Table 2** Custom parameters of the MySQL data source
Parameter	Description	Example Value
mysql.auto-reconnect	Whether to reconnect automatically true (default value): Enable automatic reconnection. false: Disable automatic reconnection.	true
mysql.max-reconnects	Maximum number of reconnection attempts. The default value is 3.	3
mysql.jdbc.use-information-schema	Whether the driver should use INFORMATION_SCHEMA to derive the information used by DatabaseMetaData.	true
use-connection-pool	Whether to use the JDBC connection pool. The default value is false.	false
jdbc.connection.pool.maxTotal	Maximum number of connections in the JDBC connection pool. The default value is 8.	8
jdbc.connection.pool.maxIdle	Maximum number of idle connections in the JDBC connection pool. The default value is 8.	8
jdbc.connection.pool.minIdle	Minimum number of idle connections in the JDBC connection pool. The default value is 0.	0
case-insensitive-name-matching	The schema and table names of MySQL data sources supported by HetuEngine are case sensitive. false (default value): Only schemas and tables whose names contain only lowercase letters can be queried. true: If no schema or table is matched ignoring case sensitivity, the schema and table can be queried. If schemas and tables are matched ignoring case sensitivity, the schema and table cannot be queried.	false
case-insensitive-name-matching.cache-ttl	Timeout interval for caching case-sensitive schema and table names of the MySQL data source. The default value is 1 minute.	1m
dynamic-filtering.enabled	Whether dynamic filters will be pushed down to JDBC queries. true (default value): Enable pushdown. false: Disable pushdown.	true
dynamic-filtering.wait-timeout	The maximum duration that HetuEngine will wait to collect dynamic filters from the build side of the connection before starting a JDBC query. Using a larger value may result in a more detailed dynamic filter. However, the latency of some queries is increased. The default value is 20s.	20s
unsupported-type-handling	How data types that are not supported by the connector will be processed CONVERT_TO_VARCHAR: Convert unsupported types to VARCHAR and allow only read operations on them. IGNORE (default value): Do not display the unsupported types.	IGNORE
join-pushdown.enabled	Whether join pushdown is enabled. true (default value): Enable join pushdown. false: Disable join pushdown.	true
join-pushdown.strategy	Policy used to evaluate whether the Join operation is pushed down. AUTOMATIC (default value): Enable cost-based connection pushdown. EAGER: Push down joins as much as possible. Even if table statistics are unavailable, using EAGER will push down joins, which may cause query performance deterioration. Use EAGER only in test and troubleshooting scenarios.	AUTOMATIC

Click Delete to delete custom configuration parameters.

Click OK

Log in to the node where the cluster client is deployed and run the following commands to switch to the client installation directory and authenticate the user:

cd /opt/hadoopclient

source bigdata_env

kinit User performing HetuEngine operations (If the cluster is in normal mode, skip this command.)
Log in to the catalog of the data source.

hetu-cli --catalog Data source name --schema Database name

For example, run the following command:

hetu-cli --catalog mysql_1 --schema mysql
Run the following command. If the database table information can be viewed or no error is reported, the connection is successful.

show tables;

Mapping Between MySQL and HetuEngine Data Types

Mapping from MySQL data types to HetuEngine data types

MySQL Type	HetuEngine Data Type
BIT	BOOLEAN
BOOLEAN	TINYINT
TINYINT	TINYINT
SMALLINT	SMALLINT
INTEGER	INTEGER
BIGINT	BIGINT
DOUBLE PRECISION	DOUBLE
FLOAT	REAL
REAL(m, d)	REAL(m, d)
DECIMAL(p, s)	DECIMAL(p, s)
CHAR(n)	CHAR(n)
VARCHAR(n)	VARCHAR(n)
TINYTEXT	VARCHAR(255)
TEXT	VARCHAR(65535)
MEDIUMTEXT	VARCHAR(16777215)
LONGTEXT	VARCHAR
ENUM(n)	VARCHAR(n)
BINARY, VARBINARY, TINYBLOB, BLOB, MEDIUMBLOB, LONGBLOB	VARBINARY
JSON	JSON
DATE	DATE
TIME(n)	TIME(n)
DATETIME(n)	TIMESTAMP(n)
TIMESTAMP(n)	TIMESTAMP(n)