Updated on 2023-08-31 GMT+08:00

Accessing Hive Using Python 3

Function

Use Python 3 to connect to Hive to execute data analysis tasks.

Example Code

Before connecting to Hive in security mode, you need to use the cluster client for authentication. Run the kinit command to authenticate Kerberos users that have required permissions. After the authentication, you can execute sample analysis tasks provided in the hive-examples/python3-examples/pyCLI_sec.py file.

  1. Import Hive classes.
    from pyhive import hive
  2. Create a JDBC connection.
    connection = hive.Connection(host='hiveserverIp', port=hiveserverPort, username='hive', database='default', auth='KERBEROS', kerberos_service_name="hive", krbhost='hadoop.hadoop.com')

    Modify the following parameters based on the site requirements:

    • hiveserverIp: Replace it with the IP address of the HiveServer node you want to connect. You can log in to FusionInsight Manager and choose Cluster > Service > Hive and click the Instances tab to view the IP address.
    • hiveserverPort: Replace it with the port of the Hive service. To view the port number, log in to FusionInsight Manager, choose Cluster > Service > Hive and click the Configuration tab. Search for hive.server2.thrift.port. The default value is 10000.
    • username: Set this parameter to the username, that is, the one created by the User Information for Cluster Authentication.
    • kerberos_service_name: Set this parameter to the instance you want to connect. For example, to connect to Hive, set it to kerberos_service_name="hive".
    • krbhost: Set this parameter in hadoop.domain name format. To obtain the domain name, log in to FusionInsight Manager and choose System > Permission > Domain and Mutual Trust. On the page that is displayed, the value of Local Domain is the domain name.
  3. Run the statement. The sample code only queries all tables. You can modify the HiveQL statements as you need.
    cursor = connection.cursor()
    cursor.execute('show tables')
  4. Obtain and output the result.
    for result in cursor.fetchall():
        print(result)