Connecting Presto to OBS
Overview
There are PrestoSQL (renamed to Trino) and PrestoDB available.
Only PrestoSQL (Trino) can connect to OBS. The following example describes how to connect PrestoSQL 333 to OBS. PrestoSQL 332 and later must use JDK 11.
Presto in this section refers to PrestoSQL (Trino).
Prerequisites
- Hadoop has been installed. For details, see Connecting Hadoop to OBS.
- Hive has been installed. For details, see Connecting Hive to OBS.
Installing the Presto Server
Version: PrestoSQL 333
- Download the Presto client and server.
- Download the hadoop-huaweicloud pug-in.
- Decompress the Presto server package:
tar –zxvf presto-server-333.tar.gz
Place the following JAR packages in the Presto root directory /plugin/hive-hadoop2:
- hadoop-huaweicloud-${hadoop.version}-hw-${version}.jar
- Apache commons-lang-xxx.jar
You can download them from the Maven central repository or copy them from the hadoop directory.
Configuring Presto
Create an etc directory inside the installation directory. Under etc, create the following configuration files:
- Node configuration file: environment configurations of each node
- JVM configuration file: command line options for Java virtual machines (JVMs)
- Server configuration file: configurations of the Presto server
- Catalog configuration file: configurations of different Presto connectors (data sources)
- Log configuration file: Presto log configurations
Node Configuration File
etc/node.properties is the node property file that contains configurations of each node. A node is a Presto instance. This file is typically created when Presto is first installed. The minimum configuration is as follows:
node.environment=production node.id=ffffffff-ffff-ffff-ffff-ffffffffffff node.data-dir=/var/presto/data
Explanations:
node.environment: environment name. All nodes in a Presto cluster must have the same environment name.
node.id: the unique identifier for a node. A node ID must keep unchanged across reboots or upgrades of the Presto cluster.
node.data-dir: data directory. It is used by Presto to store logs and other data.
Example:
node.environment=presto_cluster
node.id=bigdata00
node.data-dir=/home/modules/presto-server-0.215/data #data needs to be manually created.
JVM Configuration File
etc/jvm.config is the JVM configuration file that contains command line options for starting JVMs. Each command line option is on a separate line. This file is interpreted by the shell, so options containing spaces or special characters will be ignored.
Reference configurations:
-server -Xmx16G -XX:-UseBiasedLocking -XX:+UseG1GC -XX:G1HeapRegionSize=32M -XX:+ExplicitGCInvokesConcurrent -XX:+ExitOnOutOfMemoryError -XX:+UseGCOverheadLimit -XX:+HeapDumpOnOutOfMemoryError -XX:ReservedCodeCacheSize=512M -Djdk.attach.allowAttachSelf=true -Djdk.nio.maxCachedBufferSize=2000000
The parameters above are from the Presto official website and must be adjusted in an actual environment.
Server Configuration File
etc/config.properties is a configuration property file that contains the configurations for the Presto server. A Presto server can serve as both a coordinator and a worker. In large clusters, you are advised to specify only one machine as the coordinator.
- Configuration file of the coordinator node
coordinator=true node-scheduler.include-coordinator=true http-server.http.port=5050 discovery-server.enabled=true discovery.uri=http://192.168.XX.XX:5050 query.max-memory=20GB query.max-memory-per-node=1GB query.max-total-memory-per-node=2GB
- Configuration file of the worker node
coordinator=false http-server.http.port=5050 discovery.uri=http://192.168.XX.XX:5050 query.max-memory=20GB query.max-memory-per-node=1GB query.max-total-memory-per-node=2GB
Explanations:
coordinator: whether to run the instance as a coordinator, to receive queries from clients and manage query executions.
node-scheduler.include-coordinator: whether the coordinator also serves as a worker. For larger clusters, processing work on the coordinator can impact query performance.
http-server.http.port: HTTP port. Presto uses HTTP for all external and internal communications.
query.max-memory: the total maximum memory that can be allocated for queries
query.max-memory-per-node: the maximum single-node memory that can be allowed for queries
discovery-server.enabled: Presto uses the Discovery service to find all nodes in the cluster. The Presto coordinator has a built-in Discovery service, and each Presto instance will be registered with the Discovery service on startup. This way, the deployment can be simplified and no additional service is required.
discovery.uri: URI of the Discovery service. In the URI, replace example.net:8080 with the host and port of the coordinator. The URI cannot end with a slash, or error 404 will be reported.
Additional properties:
jmx.rmiregistry.port: registry of the JMX RMI. The JMX client can connect to the port specified here.
jmx.rmiserver.port: server of the JMX RMI. The JMX can be used for listening.
Catalog Configuration File (Key)
Configure a Hive connector as follows:
- Create a catalog directory under etc.
- Create the configuration file hive.properties for the Hive connector.
# hive.properties #Connector name connector.name=hive-hadoop2 #Configure the Hive metastore connection. hive.metastore.uri=thrift://192.168.XX.XX:9083 #Specify the Hadoop configuration file. hive.config.resources=/home/modules/hadoop-2.8.3/etc/hadoop/core-site.xml,/home/modules/hadoop-2.8.3/etc/hadoop/hdfs-site.xml,/home/modules/hadoop-2.8.3/etc/hadoop/mapred-site.xml # Grant the permission to drop tables. hive.allow-drop-table=true
Log Configuration File
1. Create a log.properties file.
2. Write content: com.facebook.presto=INFO.
There are four log levels: DEBUG, INFO, WARN, and ERROR.
Starting Presto
The procedure is as follows:
- Run hive --service metastore & to start the Hive metastore.
- Run bin/launcher start to start the Presto server. To stop the Presto server, run bin/launcher stop.
- Start the Presto client.
- Rename presto-cli-333-executable.jar to presto, place it in the bin directory, and run the chmod +x presto command to make it executable.
- Run ./presto --server XX.XX.XX.XX:5050 --catalog hive --schema default to start the client.
Using Presto to Query OBS
Creating a Hive table
1 2 3 4 5 6 7 8 9 |
hive>
CREATE TABLE sample01(id int,name string,address string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE
LOCATION 'obs://obs-east-bkt001/sample01';
insert into sample01 values(1,'xiaoming','cd');
insert into sample01 values(2,'daming','sh');
|
Using Presto to query the Hive table
./presto --server XX.XX.XX.XX:5050 --catalog hive --schema default
1 2 |
presto:default>
select * from sample01;
|
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.