Yige

Yige

Build

Flink Basics - DataStream Programming

Evolution of Stream Processing API (Comparison with Storm)#

  • Storm's API has a lower level of abstraction, which is operation-oriented, allowing you to directly customize the construction of DAG through code.
TopologyBuilder builder = new TopologyBuilder();

builder.setSpout("spout", new RandomSentenceSpout(), 5);
builder.setBolt("split", new SplitSentence(), 8).shuffleGrouping("spout");
builder.setBolt("count", new WordCount(), 12).fieldsGrouping("split", new Fields("word"));
  • Flink is data-oriented, using APIs, which means a series of operators perform a series of transformations on the data, with Flink's underlying system constructing the DAG, thus having a relatively higher level of abstraction.

Basic Usage#

//1、Set up the execution environment
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
//2、Configure the data source to read data
DataStream text = env.readTextFile ("input");
//3、Perform a series of transformations
DataStream<Tuple2<String, Integer>> counts = text.flatMap(new Tokenizer()).keyBy(0).sum(1);
//4、Configure the data sink to write out data
counts.writeAsText("output");
//5、Submit for execution
env.execute("Streaming WordCount");

Overview of Operations#

image.png

Basic Transformations of DataStream#

image.png

Physical Grouping Methods#

Grouping must be done before calling operators like reduce/sum for statistical calculations.

TypeDescription
keyBy()Most commonly used for sending by key type; the number of key types is often much larger than the number of operator instances.
global()All sent to the first task.
broadcast()Broadcast, suitable for small data volumes.
forward()One-to-one sending when upstream and downstream concurrency is consistent.
shuffle()Random and uniform distribution.
rebalance()Round-Robin distribution.
rescale()Local Round-Robin distribution.
partitionCustomer()Custom unicast.
Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.