flink基础

DataSet: batch
DataStream: streaming

Basic parts of a Flink program:

  1. Obtain an execution environment,
  2. Load/create the initial data,
  3. Specify transformations on this data,
  4. Specify where to put the results of your computations,
  5. Trigger the program execution

Lazy evaluation: The operations are actually executed when the execution is explicitly triggered by an execute() call on the execution environment.

Some transformations (join, coGroup, keyBy, groupBy) require a key.
Other transformations (Reduce, GroupReduce, Aggregate, Windows) allow data being grouped on a key before they are applied.

Keys are “virtual”: they are defined as functions over the actual data to guide the grouping operator.