Access Hive with HCatalog
Function
Use HCatalog to analyze Hive table data with a MapReduce task, read int data in the first column of the input table, perform the count(distinct XX) operation, and write the result to the output table.
Example Code
The sample program is packed in HCatalogExample.java of hive-examples/hcatalog-example. The function modules are as follows:
- Implement the Mapper class, use HCatRecord to obtain data of the int type in the first column, count 1, and output the data.
public static class Map extends Mapper<LongWritable, HCatRecord, IntWritable, IntWritable> { int age; @Override protected void map( LongWritable key, HCatRecord value, Mapper<LongWritable, HCatRecord, IntWritable, IntWritable>.Context context) throws IOException, InterruptedException { if ( value.get(0) instanceof Integer ) { age = (Integer) value.get(0); } context.write(new IntWritable(age), new IntWritable(1)); } }
- Implement the Reducer class, count the map output results, collect statistics on the number of occurrences of non-repeated values, and use HCatRecord to output the results.
public static class Reduce extends Reducer<IntWritable, IntWritable, IntWritable, HCatRecord> { @Override protected void reduce( IntWritable key, Iterable<IntWritable> values, Reducer<IntWritable, IntWritable, IntWritable, HCatRecord>.Context context) throws IOException, InterruptedException { int sum = 0; Iterator<IntWritable> iter = values.iterator(); while (iter.hasNext()) { sum++; iter.next(); } HCatRecord record = new DefaultHCatRecord(2); record.set(0, key.get()); record.set(1, sum); context.write(null, record); } }
- Define the MapReduce task. Specify the input/output class, Mapper/Reducer class, and input/output key-value pair format.
Job job = new Job(conf, "GroupByDemo"); HCatInputFormat.setInput(job, dbName, inputTableName); job.setInputFormatClass(HCatInputFormat.class); job.setJarByClass(HCatalogExample.class); job.setMapperClass(Map.class); job.setReducerClass(Reduce.class); job.setMapOutputKeyClass(IntWritable.class); job.setMapOutputValueClass(IntWritable.class); job.setOutputKeyClass(WritableComparable.class); job.setOutputValueClass(DefaultHCatRecord.class); String outputTableName = otherArgs[1]; OutputJobInfo outputjobInfo = OutputJobInfo.create(dbName, outputTableName, null); HCatOutputFormat.setOutput(job, outputjobInfo); HCatSchema schema = outputjobInfo.getOutputSchema(); HCatOutputFormat.setSchema(job, schema); job.setOutputFormatClass(HCatOutputFormat.class);
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot