Updated on 2024-08-16 GMT+08:00

Storm-HDFS Development Guideline

Scenario

This topic applies only to the interaction between Storm and HDFS. Determine the versions of the JAR files described in this section based on the actual situation.

Login in security mode is classified into ticket login and keytab file login, and the procedures for these two login modes are the same. The ticket login mode is an open-source capability and requires manual ticket uploading, which may cause reliability and usability problems. Therefore, the keytab file login mode is recommended.

Procedure for Developing an Application

  1. Verify that the Storm and HDFS components have been installed and are running correctly.
  2. Import storm-examples to the Eclipse development environment. For details, see Configuring and Importing a Project.
  3. If the cluster is enabled with security services, perform the following operations based on the login mode.

    • Keytab mode: You need to obtain a human-machine user from the administrator for authentication and obtain the keytab file of the user.
    • Ticket mode: Obtain a human-machine user from the administrator for subsequent secure login, enable the renewable and forwardable functions of the Kerberos service, set the ticket update period, and restart Kerberos and related components.
    • The obtained user must belong to the storm group.
    • The parameters for enabling the renewable and forwardable functions and setting the ticket update interval are on the System tab of the Kerberos service configuration page. The ticket update interval can be set to kdc_renew_lifetime or kdc_max_renewable_life based on the actual situation.

  4. Download and install the HDFS client. For details, see section "Preparing a Linux Client Operating Environment."
  5. Obtain the HDFS-related configuration files by performing the following operations:

    Go to the /opt/client/HDFS/hadoop/etc/hadoop directory on the installed HDFS client, and obtain the configuration files core-site.xml and hdfs-site.xml.

    In keytab mode, obtain the keytab file by following 3. In ticket mode, no extra configuration file is required.

    Copy the obtained files to the src/main/resources directory of the sample project.

    The obtained keytab file is named as user.keytab by default. A user can directly change the file name as required. However, the user must upload the changed file name as a parameter when submitting a task.

Eclipse Sample Code

Create a topology.

  public static void main(String[] args) throws Exception   
    {  
      TopologyBuilder builder = new TopologyBuilder();  

      // Separator. Use | to replace the default comma (,) to separate fields in tuple.  
      // Mandatory HdfsBolt parameter  
      RecordFormat format = new DelimitedRecordFormat()  
              .withFieldDelimiter("|");  

      // Synchronization policy. Synchronize the file system for every 1000 tuples.  
      // Mandatory HdfsBolt parameter  
      SyncPolicy syncPolicy = new CountSyncPolicy(1000);  

      // File size cyclic policy. If the size of a file reaches 5 MB, the file is written from the beginning.  
      // Mandatory HdfsBolt parameter  
      FileRotationPolicy rotationPolicy = new FileSizeRotationPolicy(5.0f, Units.MB);  

      // Objective file written to HDFS  
      // Mandatory HdfsBolt parameter  
      FileNameFormat fileNameFormat = new DefaultFileNameFormat()  
              .withPath("/user/foo/");  


      //Create HdfsBolt.  
      HdfsBolt bolt = new HdfsBolt()  
              .withFileNameFormat(fileNameFormat)  
              .withRecordFormat(format)  
              .withRotationPolicy(rotationPolicy)  
              .withSyncPolicy(syncPolicy);  

      //Spout generates a random statement.  
      builder.setSpout("spout", new RandomSentenceSpout(), 1);   
      builder.setBolt("split", new SplitSentence(), 1).shuffleGrouping("spout");  
      builder.setBolt("count", bolt, 1).fieldsGrouping("split", new Fields("word"));  

      //Add the plugin required for Kerberos authentication to the list. This operation is mandatory in security mode.  
      setSecurityConf(conf,AuthenticationType.KEYTAB); 

      Config conf = new Config();  
      //Write the plugin list configured on the client to a specific config item. This operation is mandatory in security mode.  
      conf.put(Config.TOPOLOGY_AUTO_CREDENTIALS, auto_tgts);  

      if(args.length >= 2)  
      {  
         // If the default keytab file name has been changed, configure the new keytab file name.
          conf.put(Config.STORM_CLIENT_KEYTAB_FILE, args[1]);  
      }  

      //Run the related command to submit the topology. 
      StormSubmitter.submitTopology(args[0], conf, builder.createTopology());  

    }

Running the Application and Viewing Results

  1. In the root directory of Storm sample code, run the mvn package command. After the command is executed successfully, the storm-examples-1.0.jar file is generated in the target directory.
  2. Run the related command to submit the topology.

    In keytab mode, if the user changes the keytab file name, for example, huawei.keytab, the changed keytab file name must be added to the command as a parameter for description. The submission command example is as follows (the topology name is hdfs-test):

    storm jar /opt/jartarget/storm-examples-1.0.jar com.huawei.storm.example.hdfs.SimpleHDFSTopology hdfs-test huawei.keytab

    In security mode, ensure that Kerberos security login has been performed before the source.jar file is submitted. In keytab mode, the login user and the user to whom the uploaded keytab file belongs must be the same user.

  3. After the topology is submitted successfully, log in to the HDFS cluster to check whether files are generated in the /user/foo directory.
  4. To perform login in ticket mode, perform the following operations to regularly upload a ticket. The interval for uploading the ticket depends on the deadline for updating the ticket.

    1. Add the following content to a new line at the end of the Storm/storm-0.10.0/conf/storm.yaml file in the Storm client installation directory.

      topology.auto-credentials:

      - backtype.storm.security.auth.kerberos.AutoTGT

    2. Run the ./storm upload-credentials hdfs-test command.