Updated on 2023-12-20 GMT+08:00

Connecting Presto to OBS

Overview

There are PrestoSQL (renamed to Trino) and PrestoDB available.

Only PrestoSQL (Trino) can connect to OBS. The following example describes how to connect PrestoSQL 333 to OBS. PrestoSQL 332 and later must use JDK 11.

Presto in this section refers to PrestoSQL (Trino).

Prerequisites

Installing the Presto Server

Version: PrestoSQL 333

  1. Download the Presto client and server.

    Presto client

    Presto server

  2. Download the hadoop-huaweicloud pug-in.
  3. Decompress the Presto server package:

    tar –zxvf presto-server-333.tar.gz

    Place the following JAR packages in the Presto root directory /plugin/hive-hadoop2:

Configuring Presto

Create an etc directory inside the installation directory. Under etc, create the following configuration files:

  • Node configuration file: environment configurations of each node
  • JVM configuration file: command line options for Java virtual machines (JVMs)
  • Server configuration file: configurations of the Presto server
  • Catalog configuration file: configurations of different Presto connectors (data sources)
  • Log configuration file: Presto log configurations

Node Configuration File

etc/node.properties is the node property file that contains configurations of each node. A node is a Presto instance. This file is typically created when Presto is first installed. The minimum configuration is as follows:

node.environment=production
node.id=ffffffff-ffff-ffff-ffff-ffffffffffff
node.data-dir=/var/presto/data

Explanations:

node.environment: environment name. All nodes in a Presto cluster must have the same environment name.

node.id: the unique identifier for a node. A node ID must keep unchanged across reboots or upgrades of the Presto cluster.

node.data-dir: data directory. It is used by Presto to store logs and other data.

Example:

node.environment=presto_cluster

node.id=bigdata00

node.data-dir=/home/modules/presto-server-0.215/data #data needs to be manually created.

JVM Configuration File

etc/jvm.config is the JVM configuration file that contains command line options for starting JVMs. Each command line option is on a separate line. This file is interpreted by the shell, so options containing spaces or special characters will be ignored.

Reference configurations:

-server
-Xmx16G
-XX:-UseBiasedLocking
-XX:+UseG1GC
-XX:G1HeapRegionSize=32M
-XX:+ExplicitGCInvokesConcurrent
-XX:+ExitOnOutOfMemoryError
-XX:+UseGCOverheadLimit
-XX:+HeapDumpOnOutOfMemoryError
-XX:ReservedCodeCacheSize=512M
-Djdk.attach.allowAttachSelf=true
-Djdk.nio.maxCachedBufferSize=2000000

The parameters above are from the Presto official website and must be adjusted in an actual environment.

Server Configuration File

etc/config.properties is a configuration property file that contains the configurations for the Presto server. A Presto server can serve as both a coordinator and a worker. In large clusters, you are advised to specify only one machine as the coordinator.

  1. Configuration file of the coordinator node
coordinator=true
node-scheduler.include-coordinator=true
http-server.http.port=5050
discovery-server.enabled=true
discovery.uri=http://192.168.XX.XX:5050
query.max-memory=20GB
query.max-memory-per-node=1GB
query.max-total-memory-per-node=2GB
  1. Configuration file of the worker node
coordinator=false
http-server.http.port=5050
discovery.uri=http://192.168.XX.XX:5050
query.max-memory=20GB
query.max-memory-per-node=1GB
query.max-total-memory-per-node=2GB

Explanations:

coordinator: whether to run the instance as a coordinator, to receive queries from clients and manage query executions.

node-scheduler.include-coordinator: whether the coordinator also serves as a worker. For larger clusters, processing work on the coordinator can impact query performance.

http-server.http.port: HTTP port. Presto uses HTTP for all external and internal communications.

query.max-memory: the total maximum memory that can be allocated for queries

query.max-memory-per-node: the maximum single-node memory that can be allowed for queries

discovery-server.enabled: Presto uses the Discovery service to find all nodes in the cluster. The Presto coordinator has a built-in Discovery service, and each Presto instance will be registered with the Discovery service on startup. This way, the deployment can be simplified and no additional service is required.

discovery.uri: URI of the Discovery service. In the URI, replace example.net:8080 with the host and port of the coordinator. The URI cannot end with a slash, or error 404 will be reported.

Additional properties:

jmx.rmiregistry.port: registry of the JMX RMI. The JMX client can connect to the port specified here.

jmx.rmiserver.port: server of the JMX RMI. The JMX can be used for listening.

Catalog Configuration File (Key)

Configure a Hive connector as follows:

  1. Create a catalog directory under etc.
  2. Create the configuration file hive.properties for the Hive connector.
# hive.properties
#Connector name
connector.name=hive-hadoop2
#Configure the Hive metastore connection.
hive.metastore.uri=thrift://192.168.XX.XX:9083
#Specify the Hadoop configuration file.
hive.config.resources=/home/modules/hadoop-2.8.3/etc/hadoop/core-site.xml,/home/modules/hadoop-2.8.3/etc/hadoop/hdfs-site.xml,/home/modules/hadoop-2.8.3/etc/hadoop/mapred-site.xml
# Grant the permission to drop tables.
hive.allow-drop-table=true

Log Configuration File

1. Create a log.properties file.

2. Write content: com.facebook.presto=INFO.

There are four log levels: DEBUG, INFO, WARN, and ERROR.

Starting Presto

The procedure is as follows:

  1. Run hive --service metastore & to start the Hive metastore.
  2. Run bin/launcher start to start the Presto server. To stop the Presto server, run bin/launcher stop.
  3. Start the Presto client.

    1. Rename presto-cli-333-executable.jar to presto, place it in the bin directory, and run the chmod +x presto command to make it executable.
    2. Run ./presto --server XX.XX.XX.XX:5050 --catalog hive --schema default to start the client.

Using Presto to Query OBS

Creating a Hive table

1
2
3
4
5
6
7
8
9
hive>
CREATE TABLE sample01(id int,name string,address string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE
LOCATION 'obs://obs-east-bkt001/sample01';

insert into sample01 values(1,'xiaoming','cd');
insert into sample01 values(2,'daming','sh');

Using Presto to query the Hive table

./presto --server XX.XX.XX.XX:5050 --catalog hive --schema default

1
2
presto:default> 
select * from sample01;