Updated on 2023-08-31 GMT+08:00

Accessing Hive Using Python

Function

Use Python to connect to Hive to execute data analysis tasks.

Example Code

Submit a data analysis task using a Python program python-examples/pyCLI_sec.py.
  1. Import the HAConnection class.
     from pyhs2.haconnection import HAConnection     
  2. Declare the HiveServer IP address list. In this example, hosts indicates HiveServer nodes, and xxx.xxx.xxx.xxx indicates a service IP address.
    hosts = ["xxx.xxx.xxx.xxx", "xxx.xxx.xxx.xxx"] 

    If the HiveServer instance is migrated, the original sample program becomes invalid. In this case, you need to update the HiveServer IP address used in the sample program.

  3. Create the connection and execute HiveQL statements. The sample code only queries all tables. You can modify the HiveQL statements as you need and output the queried column names and results to the console.
       try: 
           with HAConnection(hosts = hosts, 
                              port = 21066, 
                              authMechanism = "KERBEROS", 
                              configuration = conf) as haConn: 
               with haConn.getConnection() as conn: 
                   with conn.cursor() as cur: 
                       # show databases 
                       print cur.getdatabases() 
                        
                       # execute query 
                       cur.execute("show tables") 
                        
                       # return column info from query 
                       print cur.getschema() 
                        
                       # fetch table results 
                       for i in cur.fetch(): 
                           print i 
                            
       except exception, e: 
           print e