Updated on 2023-08-31 GMT+08:00

Accessing Hive Using Python

Function

Use Python to connect to Hive to execute data analysis tasks.

Example Code

Submit a data analysis task using a Python program python-examples/pyCLI_sec.py. Security authentication of the cluster to which the sample program connects is required. Before running the sample program, run the kinit command to authenticate the Kerberos user with related rights.
  1. Import the HAConnection class.
     from pyhs2.haconnection import HAConnection     
  2. Declare the HiveServer IP address list. In this example, hosts indicates HiveServer nodes, and xxx.xxx.xxx.xxx indicates a service IP address.
    hosts = ["xxx.xxx.xxx.xxx", "xxx.xxx.xxx.xxx"] 

    If the HiveServer instance is migrated, the original sample program becomes invalid. In this case, you need to update the HiveServer IP address used in the sample program.

  3. Configure the Kerberos host name and service name. In this example, the value of krb_host is hadoop.Domain name. To obtain the domain name, log in to FusionInsight Manager and choose System > Permission > Domain and Mutual Trust. On the page that is displayed, the value of Local Domain is the domain name. The host name is hadoop and the service name is hive.
    conf = {"krb_host":"hadoop.<System domain name>", "krb_service":"hive"}
  4. Create the connection and execute HiveQL statements. The sample code only queries all tables. You can modify the HiveQL statements as you need and output the queried column names and results to the console.
       try: 
           with HAConnection(hosts = hosts, 
                              port = 21066, 
                              authMechanism = "KERBEROS", 
                              configuration = conf) as haConn: 
               with haConn.getConnection() as conn: 
                   with conn.cursor() as cur: 
                       # show databases 
                       print cur.getdatabases() 
                        
                       # execute query 
                       cur.execute("show tables") 
                        
                       # return column info from query 
                       print cur.getschema() 
                        
                       # fetch table results 
                       for i in cur.fetch(): 
                           print i 
                            
       except exception, e: 
           print e