Configuring a MySQL Data Source
You can interconnect HetuEngine with MySQL data sources to access and query MySQL data. This section describes how to add a MySQL JDBC data source on HSConsole.
Prerequisites
- The data source and the HetuEngine cluster nodes can communicate with each other.
- If Kerberos authentication is enabled for the cluster (the cluster is in security mode), create a HetuEngine administrator user. If Kerberos authentication is disabled for the cluster (the cluster is in normal mode), create a HetuEngine service user, and assign the HDFS administrator permission to the user. That is, the user is added to both the hadoop and hadoopmanager user groups. For details about how to create a user, see Creating a HetuEngine User.
- A HetuEngine compute instance has been created. For details, see Creating a HetuEngine Compute Instance.
- You have obtained the IP address, port number, username, and password for logging in to the MySQL database.
Constraints on Interconnection Between HetuEngine and MySQL Data Sources
- HetuEngine supports interconnecting with MySQL using the following SQL syntaxes: SHOW CATALOGS, SCHEMAS, TABLES, COLUMNS, DESCRIBE, USE, and SELECT TABLE/VIEW.
- The schema and table names of MySQL data sources supported by HetuEngine are case insensitive.
- Predicate pushups or pushdowns are not allowed on columns of text types such as CHAR or VARCHAR.
For example, if name is a column of the VARCHAR type, the predicates of the following two queries cannot be pushed down.
SELECT * FROM nation WHERE name>'abcd'; SELECT * FROM nation WHERE name='abcd';
Configuring a MySQL Data Source
Installing a cluster client
- Install the cluster client that contains the HetuEngine service in the /opt/hadoopclient directory.
Prepare the MySQL driver
- Obtain the MySQL driver file (xxx.jar) from the MySQL official website. The supported versions are MySQL 5.7, MySQL 8.0, and later versions.
- Upload the MySQL driver file to the cluster where HetuEngine is deployed.
You can use either of the following methods:
- Upload the file to HDFS on FusionInsight Manager.
- Log in to FusionInsight Manager as a HetuEngine administrator and choose Cluster > Services > HDFS.
- In the Basic Information area on the Dashboard page, click the link next to NameNode Web UI.
- Select Utilities > Browse the file system, click , and create the /user/hetuserver/fiber/extra_file/driver/mysql directory.
- Go to the /user/hetuserver/fiber/extra_file/driver/mysql directory and click to upload the MySQL driver file obtained in 2.
- Click the value in the Permission column in the row containing the driver file, select Read and Write in the User column, Read in the Group column, and Read in the Other column, and click Set.
- Run HDFS commands to upload the file.
- Log in to the node where the HDFS service client is deployed and switch to the client installation directory, for example, /opt/hadoopclient.
cd /opt/hadoopclient
- Configure environment variables.
- If the cluster is in security mode, authenticate the user. For a normal cluster, user authentication is not required.
kinit HetuEngine administrator username
Enter the password as prompted.
- Run the following commands to create /user/hetuserver/fiber/extra_file/driver/mysql, upload the MySQL driver obtained in 2, and modify the permission:
hdfs dfs -mkdir -p /user/hetuserver/fiber/extra_file/driver/mysql
hdfs dfs -put ./MySQL driver file /user/hetuserver/fiber/extra_file/driver/mysql
hdfs dfs -chmod -R 644 /user/hetuserver/fiber/extra_file/driver/mysql
- Log in to the node where the HDFS service client is deployed and switch to the client installation directory, for example, /opt/hadoopclient.
- Upload the file to HDFS on FusionInsight Manager.
Configuring a MySQL Data Source
- Log in to FusionInsight Manager as a HetuEngine administrator and choose Cluster > Services > HetuEngine.
- In the displayedDashboard tab, find the Basic Information area, and click the link next to HSConsole WebUI.
- Choose Data Source and click Add Data Source. Configure parameters on the Add Data Source page.
- In the Basic Configuration area, configure Name and choose JDBC > MySQL for Data Source Type.
- In the MySQL Configuration area, configure the parameters according to Table 1.
Table 1 MySQL configuration Parameter
Description
Example Value
Driver
The default value is mysql.
mysql
Driver Name
Select the MySQL driver that has been uploaded in 2. The driver format is xxx.jar.
mysql-connector-java-8.0.11.jar
JDBC URL
JDBC URL for connecting to MySQL.
Format: jdbc:mysql://IP address of the MySQL database:Port number.
The default port number is 3306.
- IPV4: jdbc:mysql://10.10.10.11:3306
- IPV6: jdbc:mysql://[10:10::10:11]:3306
Username
MySQL username for connecting to the MySQL data source
-
Password
Password of the MySQL username for connecting to the MySQL data source
-
- (Optional) Customize the configuration.
Click Add to add custom configuration parameters. Configure custom parameters of the MySQL data source. For details, see Table 2.
Table 2 Custom parameters of the MySQL data source Parameter
Description
Example Value
mysql.auto-reconnect
Whether to reconnect automatically
- true (default value): Enable automatic reconnection.
- false: Disable automatic reconnection.
true
mysql.max-reconnects
Maximum number of reconnection attempts. The default value is 3.
3
mysql.jdbc.use-information-schema
Whether the driver should use INFORMATION_SCHEMA to derive the information used by DatabaseMetaData.
true
use-connection-pool
Whether to use the JDBC connection pool. The default value is false.
false
jdbc.connection.pool.maxTotal
Maximum number of connections in the JDBC connection pool. The default value is 8.
8
jdbc.connection.pool.maxIdle
Maximum number of idle connections in the JDBC connection pool. The default value is 8.
8
jdbc.connection.pool.minIdle
Minimum number of idle connections in the JDBC connection pool. The default value is 0.
0
case-insensitive-name-matching
The schema and table names of MySQL data sources supported by HetuEngine are case sensitive.
- false (default value): Only schemas and tables whose names contain only lowercase letters can be queried.
- true:
- If no schema or table is matched ignoring case sensitivity, the schema and table can be queried.
- If schemas and tables are matched ignoring case sensitivity, the schema and table cannot be queried.
false
case-insensitive-name-matching.cache-ttl
Timeout interval for caching case-sensitive schema and table names of the MySQL data source. The default value is 1 minute.
1m
dynamic-filtering.enabled
Whether dynamic filters will be pushed down to JDBC queries.
- true (default value): Enable pushdown.
- false: Disable pushdown.
true
dynamic-filtering.wait-timeout
The maximum duration that HetuEngine will wait to collect dynamic filters from the build side of the connection before starting a JDBC query. Using a larger value may result in a more detailed dynamic filter. However, the latency of some queries is increased. The default value is 20s.
20s
unsupported-type-handling
How data types that are not supported by the connector will be processed
- CONVERT_TO_VARCHAR: Convert unsupported types to VARCHAR and allow only read operations on them.
- IGNORE (default value): Do not display the unsupported types.
IGNORE
join-pushdown.enabled
Whether join pushdown is enabled.
- true (default value): Enable join pushdown.
- false: Disable join pushdown.
true
join-pushdown.strategy
Policy used to evaluate whether the Join operation is pushed down.
- AUTOMATIC (default value): Enable cost-based connection pushdown.
- EAGER: Push down joins as much as possible. Even if table statistics are unavailable, using EAGER will push down joins, which may cause query performance deterioration. Use EAGER only in test and troubleshooting scenarios.
AUTOMATIC
Click Delete to delete custom configuration parameters.
- Click OK
- Log in to the node where the cluster client is deployed and run the following commands to switch to the client installation directory and authenticate the user:
cd /opt/hadoopclient
source bigdata_env
kinit User performing HetuEngine operations (If the cluster is in normal mode, skip this command.)
- Log in to the catalog of the data source.
hetu-cli --catalog Data source name --schema Database name
For example, run the following command:
hetu-cli --catalog mysql_1 --schema mysql
- Run the following command. If the database table information can be viewed or no error is reported, the connection is successful.
show tables;
Mapping Between MySQL and HetuEngine Data Types
MySQL Type |
HetuEngine Data Type |
---|---|
BIT |
BOOLEAN |
BOOLEAN |
TINYINT |
TINYINT |
TINYINT |
SMALLINT |
SMALLINT |
INTEGER |
INTEGER |
BIGINT |
BIGINT |
DOUBLE PRECISION |
DOUBLE |
FLOAT |
REAL |
REAL(m, d) |
REAL(m, d) |
DECIMAL(p, s) |
DECIMAL(p, s) |
CHAR(n) |
CHAR(n) |
VARCHAR(n) |
VARCHAR(n) |
TINYTEXT |
VARCHAR(255) |
TEXT |
VARCHAR(65535) |
MEDIUMTEXT |
VARCHAR(16777215) |
LONGTEXT |
VARCHAR |
ENUM(n) |
VARCHAR(n) |
BINARY, VARBINARY, TINYBLOB, BLOB, MEDIUMBLOB, LONGBLOB |
VARBINARY |
JSON |
JSON |
DATE |
DATE |
TIME(n) |
TIME(n) |
DATETIME(n) |
TIMESTAMP(n) |
TIMESTAMP(n) |
TIMESTAMP(n) |
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot