Help Center > > User Guide> MRS Cluster Component Operation Guide> Using Hive> Using Hive from Scratch

Using Hive from Scratch

Updated at:Sep 08, 2020 GMT+08:00

Hive is a data warehouse framework built on Hadoop. It maps structured data files to a database table and provides SQL-like functions to analyze and process data. It also allows you to quickly implements simple MapReduce statistics using SQL-like statements without the need of developing a specific MapReduce application. Therefore, it is suitable for statistics and analysis of data warehouses.

Background

After an MRS cluster is successfully created, the original client is stored in the /opt/client directory on all nodes in the cluster by default. Before using the client, download and update the client configuration file, and ensure that the active management node of MRS Manager is available.

Suppose a user develops an application to manage users who use service A in an enterprise. The procedure of operating service A on the Hive client is as follows:

Operations on common tables:

  • Create the user_info table.
  • Add users' educational backgrounds and titles to the table.
  • Query user names and addresses by user ID.
  • Delete the user information table after service A ends.
Table 1 User information

ID

Name

Gender

Age

Address

12005000201

A

Male

19

City A

12005000202

B

Female

23

City B

12005000203

C

Male

26

City C

12005000204

D

Male

18

City D

12005000205

E

Female

21

City E

12005000206

F

Male

32

City F

12005000207

G

Female

29

City G

12005000208

H

Female

30

City H

12005000209

I

Male

26

City I

12005000210

J

Female

25

City J

Procedure

  1. Download the client configuration file.

    1. Log in to the MRS management console. In the navigation tree on the left, choose Clusters > Active Clusters and click the cluster name.
    2. Click Components.

      For MRS 2.0.1 or earlier, log in to MRS Manager. For details, see Accessing MRS Manager. Then, choose Services.

    3. Click Download Client.

      Set Client Type to Only configuration files, Save Path to Server, and click OK to generate the client configuration file. The generated file is saved in the /tmp/MRS-client directory on the active management node by default.

      Figure 1 Downloading only the client configuration files

  2. Log in to the active management node of MRS Manager.

    1. In the MRS management console, choose Clusters > Active Clusters and click the cluster name. Select Nodes to view the Node parameter. The node that contains master1 in its name is the Master1 node. The node that contains master2 in its name is the Master2 node.

      The active and standby management nodes of MRS Manager are installed on Master nodes by default. Because Master1 and Master2 are switched over in active and standby mode, Master1 is not always the active management node of MRS Manager. Run a command in Master1 to check whether Master1 is the active management node of MRS Manager. For details about the command, see 2.d.

    2. Log in to the Master1 node using a password as user root. For details, see Logging In to an ECS in the User Guide.
    3. Run the following command to switch to user omm:

      sudo su - root

      su - omm

    4. Run the following command to check the active management node of MRS Manager:

      sh ${BIGDATA_HOME}/om-0.0.1/sbin/status-oms.sh

      In the command output, the node whose HAActive is active is the active management node, and the node whose HAActive is standby is the standby management node. In the following example, mgtomsdat-sh-3-01-1 is the active management node, and mgtomsdat-sh-3-01-2 is the standby management node.

      Ha mode
      double
      NodeName              HostName                      HAVersion          StartTime                HAActive             HAAllResOK           HARunPhase 
      192-168-0-30          mgtomsdat-sh-3-01-1           V100R001C01        2014-11-18 23:43:02      active               normal               Actived    
      192-168-0-24          mgtomsdat-sh-3-01-2           V100R001C01        2014-11-21 07:14:02      standby              normal               Deactived
    5. Log in to the active management node as user root, for example, node 192-168-0-30.

  3. Run the following command to go to the client installation directory:

    After an MRS cluster is successfully created, the client is installed in the /opt/client directory by default.

    cd /opt/client

  4. Run the following command to update the client configuration for the active management node:

    Switch to user omm.

    sudo su - omm

    sh refreshConfig.sh /opt/client Full path of the client configuration file package

    For example, run the following command:

    sh refreshConfig.sh /opt/client /tmp/MRS-client/MRS_Services_Client.tar

    If the following information is displayed, the configuration is updated successfully.

     ReFresh components client config is complete.
     Succeed to refresh components client config.

    For clusters of MRS 1.8.5 or later, you can also refer to method 2 in Updating a Client to perform operations in 1 to 4.

  5. Use the client on a Master node.

    1. On the active management node where the client is updated, for example, node 192-168-0-30, run the following command to go to the client directory.

      cd /opt/client

    2. Run the following command to configure environment variables:

      source bigdata_env

    3. If Kerberos authentication has been enabled for the current cluster, run the following command to authenticate the current user. The current user must have a permission to create Hive tables. For details about how to configure a role with the corresponding permission, see Creating a Role. For details about how to bind a role to a user, see Creating a User. If the Kerberos authentication is disabled for the current cluster, skip this step.

      kinit MRS cluster user

      For example, kinit hiveuser.

    4. Run the client command of the Hive component directly.

      beeline

  6. Run the Hive client command to implement service A.

    Operations on internal tables:

    1. Create the user_info user information table according to Table 1 and add data to it.
      create table user_info(id string,name string,gender string,age int,addr string);
      insert into table user_info(id,name,gender,age,addr) values("12005000201", "A", "Male", "19", "City A");
      ... (Other statements are the same.)
    2. Add users' educational backgrounds and titles to the user_info table.

      For example, to add educational background and title information about user 12005000201, run the following command:

      alter table user_info add columns(education string,technical string);
    3. Query user names and addresses by user ID.

      For example, to query the name and address of user 12005000201, run the following command:

      select name,addr from user_info where id='12005000201';
    4. Run the following command to delete the user information table.
      drop table user_info;

    Operations on external partition tables:

    Create an external partition table and import data.

    1. Create a path for storing external table data.
      hdfs dfs -mkdir /hive/user_info
    2. Create a table.
      create external table user_info(id string,name string,gender string,age int,addr string) partitioned by(year string) row format delimited fields terminated by ' ' lines terminated by '\n' stored as textfile location '/hive/user_info';

      fields terminated indicates delimiters, for example, spaces.

      lines terminated indicates line breaks, for example, \n.

      /hive/user_info indicates the path of the data file.

    3. Import data.
      1. Execute the insert statement to insert data.
        insert into user_info partition(year="2018") values ("12005000201", "A", "Male", "19", "City A");
      2. Run the load data command to import file data.
        1. Create a file based on the data in Table 1. For example, the file name is txt.log. Fields are separated by space, and the line feed characters are used as the line breaks.
        2. Upload the file to HDFS.
          hdfs dfs -put txt.log /tmp
        3. Load data to the table.
          load data inpath '/tmp/txt.log' into table user_info partition (year='2011');
    4. Query the imported data.
      select * from user_info;
    5. Delete the user information table.
      drop table user_info;

  7. Delete the cluster.

    For details, see Terminating a Cluster in the User Guide.

Did you find this page helpful?

Submit successfully!

Thank you for your feedback. Your feedback helps make our documentation better.

Failed to submit the feedback. Please try again later.

Which of the following issues have you encountered?







Please complete at least one feedback item.

Content most length 200 character

Content is empty.

OK Cancel