Updated on 2023-08-31 GMT+08:00

Accessing Hive Using Python 3

Function

Use Python 3 to connect to Hive to execute data analysis tasks.

Example Code

You can execute sample analysis tasks provided in the hive-examples/python3-examples/pyCLI_sec.py file.

  1. Import Hive classes.
    from pyhive import hive
  2. Create a JDBC connection.
    connection = hive.Connection(host='hiveserverIp', port=hiveserverPort, username='hive', database='default', auth='KERBEROS', kerberos_service_name="hive", krbhost='hadoop.hadoop.com')

    Modify the following parameters based on the site requirements:

    • hiveserverIp: Replace it with the IP address of the HiveServer node you want to connect. You can log in to FusionInsight Manager and choose Cluster > Service > Hive and click the Instances tab to view the IP address.
    • hiveserverPort: Replace it with the port of the Hive service. To view the port number, log in to FusionInsight Manager, choose Cluster > Service > Hive and click the Configuration tab. Search for hive.server2.thrift.port. The default value is 10000.
  3. Run the statement. The sample code only queries all tables. You can modify the HiveQL statements as you need.
    cursor = connection.cursor()
    cursor.execute('show tables')
  4. Obtain and output the result.
    for result in cursor.fetchall():
        print(result)