今天介绍用 Flink 读取Kafka生成的数据,并进行汇总的案例
第一步:环境准备,kafka,flink,zookeeper。我这边是用的CDH环境,kafka跟zookeeper 都安装完毕,并测试可以正常使用
第二步:用kafka创建一个生产者进行消息生产
./kafka-console-producer.sh --broker-list 192.168.58.177:9092 --topic my_topic
3. 第三步:在idea里面创建一个flink项目。代码如下:
StreamExecutionEnvironment Env = StreamExecutionEnvironment.getExecutionEnvironment();
Properties properties = new Properties();
properties.setProperty("bootstrap.servers", "192.168.58.177:9092");
properties.setProperty("zookeeper.connect", "192.168.58.171:2181,192.168.58.177:2181");
properties.setProperty("group.id", "test");
FlinkKafkaConsumer<String> myConsumer = new FlinkKafkaConsumer<String>("my_topic",new SimpleStringSchema(),properties);
myConsumer.setStartFromLatest();
myConsumer.setStartFromGroupOffsets();
Env.setParallelism(2).setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
DataStream<Tuple2<String,Integer>> stream = Env.addSource(myConsumer)
.flatMap((String lines, Collector<Tuple2<String,Integer>> out) ->
Stream.of(lines.split(","))
.forEach(a -> out.collect(Tuple2.of(a,1))))
.returns(Types.TUPLE(Types.STRING,Types.INT))
.keyBy(0)
//.window(TumblingEventTimeWindows.of(Time.seconds(5)))
.sum(1)
;
//stream.writeAsText("C:\\Users\\yaowentao\\Desktop\\a");
stream.print();
Env.execute("my first stream flink");
第四步:返回kafka进行消息输入,并观察控制台是否有数据输出
这样就能初步实现 flink读取kafka的消息