Optimizing the Design of Partitioning Method
Scenarios
The divide of tasks can be optimized by optimizing the partitioning method. If data skew occurs in a certain task, the whole execution process is delayed. Therefore, when designing the partitioning method, ensure that partitions are evenly assigned.
Procedure
Partitioning methods are as follows:
- Random partitioning: randomly partitions data.
dataStream.shuffle();
- Rebalancing (round-robin partitioning): evenly partitions data based on round-robin. The partitioning method is useful to optimize data with data skew.
dataStream.rebalance();
- Rescaling: assign data to downstream subsets in the form of round-robin. The partitioning method is useful if you want to deliver data from each parallel instance of a data source to subsets of some mappers without the using rebalance (), that is, the complete rebalance operation.
dataStream.rescale();
- Broadcast: broadcast data to all partitions.
dataStream.broadcast();
- User-defined partitioning: use a user-defined partitioner to select a target task for each element. The user-defined partitioning allows user to partition data based on a certain feature to achieve optimized task execution.
The following is an example:
// fromElements builds simple Tuple2 stream DataStream<Tuple2<String, Integer>> dataStream = env.fromElements(Tuple2.of("hello",1), Tuple2.of("test",2), Tuple2.of("world",100)); // Defines the key value used for partitioning. Adding one to the value equals to the id. Partitioner<Tuple2<String, Integer>> strPartitioner = new Partitioner<Tuple2<String, Integer>>() { @Override public int partition(Tuple2<String, Integer> key, int numPartitions) { return (key.f0.length() + key.f1) % numPartitions; } }; // The Tuple2 data is used as the basis for partitioning. dataStream.partitionCustom(strPartitioner, new KeySelector<Tuple2<String, Integer>, Tuple2<String, Integer>>() { @Override public Tuple2<String, Integer> getKey(Tuple2<String, Integer> value) throws Exception { return value; } }).print();
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot