Updated on 2024-08-16 GMT+08:00

Python Sample Code

Function Description

The IP address and port number of the current active JDBCServer can be obtained by connecting the znode on ZooKeeper and the JDBCServer is connected through PyHive. Thereby, after an active/standby switchover, the new active JDBCServer service can be directly accessed without code modification in the JDBCServer-HA mode.

This function applies only to common clusters (clusters with Kerberos authentication disabled).

Environment Preparation

  1. Install the support environment. (For details, see Spark Application Development Environment.)

    Run the following commands to install the compilation tools:

    yum install cyrus-sasl-devel -y

    yum install gcc-c++ -y

  2. Install the Python modules, including

    SASL, Thrift, Thrift-SASL, and PyHive.

    pip install sasl

    pip install thrift

    pip install thrift-sasl

    pip install PyHive

  3. Install the Python tool for connecting to ZooKeeper.

    pip install kazoo

  4. Obtain related parameters from the MRS cluster.
    • To obtain the IP address and port number of the ZooKeeper:

      View the configuration item spark.deploy.zookeeper.url in the configuration file /opt/client/Spark/spark/conf/hive-site.xml.

    • To obtain the IP address and port number of the active JDBCServer node stored in the ZooKeeper:

      View the configuration item spark.thriftserver.zookeeper.dir (/thriftserver by default) in the configuration file /opt/client/Spark/spark/conf/hive-site.xml. The IP address and port number of the active JDBCServer node are stored on the znode subnode (active_thriftserver).

Sample Code

from kazoo.client import KazooClient
zk = KazooClient(hosts='ZookeeperHost')
zk.start()
result=zk.get("/thriftserver/active_thriftserver")
result=result[0].decode('utf-8')
JDBCServerHost=result[0].split(":")[0]
JDBCServerPort=result[0].split(":")[1]
from pyhive import hive
conn = hive.Connection(host=JDBCServerHost, port=JDBCServerPort,database='default')
cursor=conn.cursor()
cursor.execute("select * from test")
for result in cursor.fetchall():
    print result

Replace ZookeeperHost with the ZooKeeper IP address and port number obtained in 4.