Updated on 2022-09-14 GMT+08:00

Storm-HDFS Development Guideline

Scenario

This topic applies only to the interaction between Storm and HDFS. Determine the versions of the jar packages described in this chapter based on the actual situation.

Procedure for Developing an Application

  1. Verify that the Storm and HDFS components have been installed and are running properly.
  2. Import storm-examples to the IntelliJ IDEA development environment. For details, see Environment Preparation.
  3. Download and install the HDFS client.
  4. Obtain the related configuration files using the following method.

    Go to the /opt/clientHDFS/HDFS/hadoop/etc/hadoop directory on the installed HDFS client, and obtain configuration files core-site.xml and hdfs-site.xml.

  5. Obtain the related JAR packages.

    • Go to the HDFS/hadoop/share/hadoop/common/lib directory on the installed HDFS client, and obtain the following JAR packages:
      • commons-cli-<version>.jar
      • commons-io-<version>.jar
      • commons-lang-<version>.jar
      • commons-lang3-<version>.jar
      • commons-collections-<version>.jar
      • commons-configuration2-<version>.jar
      • commons-logging-<version>.jar
      • guava-<version>.jar
      • hadoop-*.jar
      • protobuf-java-<version>.jar
      • jackson-databind-<version>.jar
      • jackson-core-<version>.jar
      • jackson-annotations-<version>.jar
      • re2j-<version>.jar
      • jaeger-core-<version>.jar
      • opentracing-api-<version>.jar
      • opentracing-noop-<version>.jar
      • opentracing-tracerresolver-<version>.jar
      • opentracing-util-<version>.jar
    • Go to the HDFS/hadoop/share/hadoop/common directory on the installed HDFS client, and obtain the hadoop-*.jar package.
    • Go to the HDFS/hadoop/share/hadoop/client directory on the installed HDFS client, and obtain the hadoop-*.jar package.
    • Go to the HDFS/hadoop/share/hadoop/hdfs directory on the installed HDFS client, obtain the hadoop-hdfs-*.jar package.
    • Obtain the following JAR packages from the sample project /src/storm-examples/storm-examples/lib:
      • storm-hdfs-<version>.jar
      • storm-autocreds-<version>.jar

IntelliJ IDEA Code Sample

Create a topology.

  public static void main(String[] args) throws Exception  
   { 
     TopologyBuilder builder = new TopologyBuilder(); 

     // Separator. Use ¡°|¡± to replace the default ¡°,¡± to separate fields in tuple. 
     // Mandatory HdfsBolt parameter 
     RecordFormat format = new DelimitedRecordFormat() 
             .withFieldDelimiter("|"); 

     // Synchronize policy. Synchronize the file system for every 1000 tuples. 
     // Mandatory HdfsBolt parameter
 
     SyncPolicy syncPolicy = new CountSyncPolicy(1000); 

     // File size cyclic policy. If the size of a file reaches 5 MB, the file is written from the beginning.
     // Mandatory HdfsBolt parameter 
     FileRotationPolicy rotationPolicy = new FileSizeRotationPolicy(5.0f, Units.MB); 

     // Objective file written to hdfs
     // Mandatory HdfsBolt parameter 
     FileNameFormat fileNameFormat = new DefaultFileNameFormat() 
             .withPath("/user/foo/"); 


     //Create HdfsBolt. 
     HdfsBolt bolt = new HdfsBolt() 
             .withFsUrl(DEFAULT_FS_URL)
             .withFileNameFormat(fileNameFormat) 
             .withRecordFormat(format) 
             .withRotationPolicy(rotationPolicy) 
             .withSyncPolicy(syncPolicy); 

     //Spout generates a random statement. 
     builder.setSpout("spout", new RandomSentenceSpout(), 1);  
     builder.setBolt("split", new SplitSentence(), 1).shuffleGrouping("spout"); 
     builder.setBolt("count", bolt, 1).fieldsGrouping("split", new Fields("word")); 

     //Add the plugin required for kerberos authentication to the list. The security mode is mandatory. 
 
     setSecurityConf(conf,AuthenticationType.KEYTAB);

     Config conf = new Config(); 
     //Write the plugin list configured on the client to a specific config item. The security mode is mandatory. 
     conf.put(Config.TOPOLOGY_AUTO_CREDENTIALS, auto_tgts); 

     if(args.length >= 2) 
     { 
         //The default keytab file name is changed by the user. Specify the new keytab file name as a parameter. 
         conf.put(Config.STORM_CLIENT_KEYTAB_FILE, args[1]); 
     } 

     //Run a command to submit the topology. 
     StormSubmitter.submitTopology(args[0], conf, builder.createTopology()); 

   }

The target file path of Storm cannot be in a SM4 encrypted HDFS partition.

Running the Application and Viewing Results

  1. Export the local JAR package. For details, see Packaging IntelliJ IDEA Code.
  2. Combine the configuration files and JAR packages obtained respectively in 4 and 5, and export a complete service JAR package. For details, see Packaging Services.
  3. Run a command to submit the topology.

    storm jar /opt/jartarget/source.jar

    com.huawei.storm.example.hdfs.SimpleHDFSTopology hdfs-test

  4. After the topology is submitted successfully, log in to the HDFS cluster to view the topology.