Adding an Oracle Data Source
This topic is available for MRS 3.5.0 and later versions only.
HetuEngine allows you to configure, access, and query the Oracle data source. This topic guides you to add an Oracle JDBC data source on the HSConsole page of the cluster.
Prerequisites
- The data source and the HetuEngine cluster nodes can communicate with each other.
- If Kerberos authentication is enabled for the cluster (the cluster is in security mode), create a HetuEngine administrator user. If Kerberos authentication is disabled for the cluster (the cluster is in normal mode), create a HetuEngine service user, and assign the HDFS administrator permission to the user. That is, the user is added to both the hadoop and hadoopmanager user groups. For details about how to create a user, see Creating a HetuEngine Permission Role.
- A HetuEngine compute instance has been created. For details, see Creating a HetuEngine Compute Instance.
- You have obtained the IP address, port, database instance name or PDB name, username, and password of the Oracle database.
Constraints on the Interconnection with Oracle Data Sources
- Currently, the Oracle data source is read-only by default. Oracle 12 and later versions are supported.
- The schema and table names of Oracle data sources supported by HetuEngine are case insensitive.
- The following syntaxes are not supported: CREATE SCHEMA, ALTER SCHEMA, DROP SCHEMA, INSERT INTO, SELECT, INSERT OVERWRITE, UPDATE, ANALYZE and VIEW.
- Columns of the CLOB, NCLOB, BLOB, or RAW(n) Oracle types or columns of the Trino data type that are mapped to these types do not support predicate pushdown.
- If you specify a WHERE clause, DELETE can be executed only when the predicate in the WHERE clause can be completely pushed down to Oracle.
Configuring the Oracle Data Source
Installing a cluster client
- Install the cluster client that contains the HetuEngine service in the /opt/hadoopclient directory.
Preparing the Oracle driver
- Obtain the Oracle driver file from the Oracle official website. The format is ojdbcxxx.jar, for example, ojdbc8.jar.
The name of the driver file to be uploaded must meet the following verification rules: The prefix of the driver file must be ojdbc and the end be jar. The version number in the middle can be any characters, but the total length cannot exceed 80 characters and the file cannot be larger than 100 MB.
- Upload the Oracle driver file to the cluster where HetuEngine is deployed.
You can use either of the following methods:
- Upload the file to HDFS on FusionInsight Manager.
- Log in to FusionInsight Manager as a HetuEngine administrator and choose Cluster > Services > HDFS.
- In the Basic Information area on the Dashboard page, click the link next to NameNode Web UI.
- Choose Utilities > Browse the file system and click to create the /user/hetuserver/fiber/extra_file/driver/oracle directory.
- Go to the /user/hetuserver/fiber/extra_file/driver/oracle directory and click to upload the Oracle driver file obtained in 2.
- Click the value in the Permission column in the row containing the driver file, select Read and Write in the User column, Read in the Group column, and Read in the Other column, and click Set.
- Run HDFS commands to upload the file.
- Log in to the node where the HDFS service client is deployed and switch to the client installation directory, for example, /opt/hadoopclient.
cd /opt/hadoopclient
- Configure environment variables.
- If the cluster is in security mode, run the following command to authenticate the user. In normal mode, user authentication is not required.
kinit HetuEngine administrator username
Enter the password as prompted.
- Create the /user/hetuserver/fiber/extra_file/driver/oracle directory, upload the Oracle driver obtained in 2, and modify the permission.
hdfs dfs -mkdir -p /user/hetuserver/fiber/extra_file/driver/oracle
hdfs dfs -put ./Oracle driver file /user/hetuserver/fiber/extra_file/driver/oracle
hdfs dfs -chmod -R 644 /user/hetuserver/fiber/extra_file/driver/oracle
- Log in to the node where the HDFS service client is deployed and switch to the client installation directory, for example, /opt/hadoopclient.
- Upload the file to HDFS on FusionInsight Manager.
Configuring the Oracle data source
- Log in to FusionInsight Manager as a HetuEngine administrator and choose Cluster > Services > HetuEngine.
- In the Basic Information area in the Dashboard tab, click the link next to HSConsole Web UI to access the HSConsole page.
- Choose Data Source and click Add Data Source. Configure parameters on the Add Data Source page.
- Configure the basic information, enter the data source name, and select JDBC > Oracle as the data source type.
- Configure the parameters in the Oracle Configuration area. For details, see Table 1 Oracle configuration.
Table 1 Oracle configuration Parameter
Description
Example Value
Driver Name
Select the Oracle driver that has been uploaded in 2. The format is ojdbcxxx.jar.
ojdbc8.jar
JDBC URL
JDBC URL for connecting to the Oracle database. The default port is 1521.
The following formats are available:
- jdbc:oracle:thin@IP address of the Oracle database:Port/Oracle PDB name
- jdbc:oracle:thin@IP address of the Oracle database:Port:OracleDatabase instance name
- jdbc:oracle:thin:@192.168.1.1:1521/orclpdb
- jdbc:oracle:thin:@192.168.1.1:1521:orcl
Username
Oracle username for connecting to the Oracle data source.
-
Password
Oracle user password for connecting to the Oracle data source.
-
- (Optional) Customize the configuration.
Click Add to add custom configuration parameters. For details, see Table 2.
Table 2 Custom parameters for the Oracle data source Parameter
Description
Example Value
case-insensitive-name-matching
The schema and table names of the Oracle data source supported by HetuEngine are case sensitive.
- false (default value): Schemas and tables whose names contain only lowercase letters can be queried.
- true:
- If no schema or table is matched ignoring case sensitivity, the schema and table can be queried.
- If schemas and tables are matched ignoring case sensitivity, the schema and table cannot be queried.
false
case-insensitive-name-matching.cache-ttl
Timeout interval for caching case-sensitive schema and table names of the Oracle data source. The default value is 1m (1 minute).
1m
dynamic-filtering.enabled
Whether dynamic filters will be pushed down to JDBC queries.
- true (default value): Enable pushdown.
- false: Disable pushdown.
true
dynamic-filtering.wait-timeout
Maximum duration for which HetuEngine waits to collect dynamic filters from where the connection is built before starting a JDBC query. Using a larger value may result in a more detailed dynamic filter. However, the latency of some queries is increased. The default value is 20s.
20s
unsupported-type-handling
How data types that are not supported by the connector will be processed.
- CONVERT_TO_VARCHAR: Convert unsupported types to VARCHAR and allow only read operations on them. Only the types in the data type mapping table are supported.
- IGNORE (default value): Do not display the unsupported types.
IGNORE
join-pushdown.enabled
Whether to enable join pushdown. Join pushdown may adversely affect some query statements.
- true: Enable Join pushdown.
- false (default value): Disable join pushdown.
false
join-pushdown.strategy
Policy used to evaluate whether Join is pushed down.
- AUTOMATIC (default value): Enable cost-based connection pushdown.
- EAGER: Push down joins as much as possible. Even if table statistics are unavailable, using EAGER will push down joins, which may cause query performance deterioration. Use EAGER only in test and troubleshooting scenarios.
AUTOMATIC
oracle.number.default-scale
Number of decimal places of the HetuEngine Decimal type mapped to the Oracle Number data type (which does not have specified precision and decimal places). By default, this parameter is not set and the column is not supported.
5
oracle.remarks-reporting.enabled
Whether to expose metadata comments.
- true: Expose metadata annotations.
- false (default): Metadata annotations are not exposed through the REMARKS column.
false
oracle.synonyms.enabled
Whether to enable the synonym function. HetuEngine disables the support for Oracle SYNONYM by default for performance reasons.
- true: The synonym function is enabled.
- false (default value): The synonym function is disabled.
false
oracle.source-encoding
Character set code of the remote data source environment, which is used to prevent garbled characters
- ZHS16GBK or GBK
- AL32UTF8 or UTF-8 (default value)
UTF-8
You can click Delete to delete custom configuration parameters.
- Click OK
- Log in to the node where the cluster client is deployed and run the following commands to switch to the client installation directory and authenticate the user:
cd /opt/hadoopclient
source bigdata_env
kinit User performing HetuEngine operations (If the cluster is in normal mode, skip this command.)
- Log in to the catalog of the data source.
hetu-cli --catalog Data source name --schema Database name
For example, run the following command:
hetu-cli --catalog oracle_1 --schema oralce
- Run the following command. If the database table information can be viewed or no error is reported, the connection is successful.
show tables;
Mapping Between Oracle and HetuEngine Data Types
Constraints:
- HetuEngine cannot directly read data of the Number type whose precision and scale are not set in the Oracle database. You need to add the custom parameter oracle.number.default-scale=s to the data source configuration to map the data type to the decimal(38, s) type.
- HetuEngine cannot read columns whose p – s > 38 in the Number(p, s) data type.
- The date type in the Oracle database is stored in seconds. Therefore, the data type mapped to HetuEngine is timestamp(0).
- If the decimal precision of the second in the timestamp data queried by HetuEngine is greater than 3, a value is truncated to three decimal places instead of being rounded off.
- The time and date formats supported by the JDBC driver are different. So, an error may occur when the date and time earlier than 1582-10-15 are inserted or queried.
- VARCHAR(n) in HetuEngine is mapped to VARCHAR2(n CHAR) in Oracle. If n is greater than 4000, it is mapped to NCLOB. CHAR(n) is mapped to CHAR(n CHAR). If n is greater than 2000, it is also mapped to NCLOB.
- HetuEngine cannot write CHAR and VARCHAR data types to columns with improper lengths.
- When you use CREATE TABLE AS to create an NCLOB column from a CHAR value, the suffix space in the initial value is removed. However, if you insert a CHAR value into an existing NCLOB column, the suffix space is retained.
Oracle Type |
HetuEngine Data Type |
---|---|
NUMBER(p, s) |
DECIMAL(p, s) |
NUMBER(p) |
DECIMAL(p, 0) |
FLOAT[(p)] |
DOUBLE |
BINARY_FLOAT |
REAL |
BINARY_DOUBLE |
DOUBLE |
VARCHAR2(n CHAR) |
VARCHAR(n) |
VARCHAR2(n BYTE) |
VARCHAR(n) |
NVARCHAR2(n) |
VARCHAR(n) |
CHAR(n) |
CHAR(n) |
NCHAR(n) |
CHAR(n) |
CLOB |
VARCHAR |
NCLOB |
VARCHAR |
RAW(n) |
VARBINARY |
BLOB |
VARBINARY |
DATE |
TIMESTAMP(0) |
TIMESTAMP(p) |
TIMESTAMP(3) |
TIMESTAMP(p) WITH TIME ZONE |
TIMESTAMP(3) WITH TIME ZONE |
HetuEngine Type |
Oracle Type |
---|---|
TINYINT |
NUMBER(3) |
SMALLINT |
NUMBER(5) |
INTEGER |
NUMBER(10) |
BIGINT |
NUMBER(19) |
DECIMAL(p, s) |
NUMBER(p, s) |
REAL |
BINARY_FLOAT |
DOUBLE |
BINARY_DOUBLE |
VARCHAR |
NCLOB |
VARCHAR(n) |
VARCHAR2(n CHAR) or NCLOB |
CHAR(n) |
CHAR(n CHAR) or NCLOB |
VARBINARY |
BLOB |
DATE |
DATE |
TIMESTAMP |
TIMESTAMP(3) |
TIMESTAMP WITH TIME ZONE |
TIMESTAMP(3) WITH TIME ZONE |
Function Enhancement
SELECT * FROM TABLE( oracle.system.query( query => 'SELECT * FROM tpch.nation'));
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot