Skip to content

Commit

Permalink
[FLINK-24760][docs] Update user document for batch window tvf (apache…
Browse files Browse the repository at this point in the history
  • Loading branch information
beyond1920 committed Nov 25, 2021
1 parent 591c398 commit 5bc3951
Show file tree
Hide file tree
Showing 8 changed files with 58 additions and 30 deletions.
10 changes: 6 additions & 4 deletions docs/content.zh/docs/dev/table/sql/queries/window-agg.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ under the License.

## Window TVF Aggregation

{{< label Streaming >}}
{{< label Batch >}} {{< label Streaming >}}

Window aggregations are defined in the `GROUP BY` clause contains "window_start" and "window_end" columns of the relation applied [Windowing TVF]({{< ref "docs/dev/table/sql/queries/window-tvf" >}}). Just like queries with regular `GROUP BY` clauses, queries with a group by window aggregation will compute a single result row per group.

Expand All @@ -40,7 +40,9 @@ Unlike other aggregations on continuous tables, window aggregation do not emit i

### Windowing TVFs

Flink supports `TUMBLE`, `HOP` and `CUMULATE` types of window aggregations, which can be defined on either [event or processing time attributes]({{< ref "docs/dev/table/concepts/time_attributes" >}}). See [Windowing TVF]({{< ref "docs/dev/table/sql/queries/window-tvf" >}}) for more windowing functions information.
Flink supports `TUMBLE`, `HOP` and `CUMULATE` types of window aggregations.
In streaming mode, the time attribute field of a window table-valued function must be on either [event or processing time attributes]({{< ref "docs/dev/table/concepts/time_attributes" >}}). See [Windowing TVF]({{< ref "docs/dev/table/sql/queries/window-tvf" >}}) for more windowing functions information.
In batch mode, the time attribute field of a window table-valued function must be an attribute of type `TIMESTAMP` or `TIMESTAMP_LTZ`.

Here are some examples for `TUMBLE`, `HOP` and `CUMULATE` window aggregations.

Expand Down Expand Up @@ -253,9 +255,9 @@ Group Window Aggregations are defined in the `GROUP BY` clause of a SQL query. J

### Time Attributes

For SQL queries on streaming tables, the `time_attr` argument of the group window function must refer to a valid time attribute that specifies the processing time or event time of rows. See the [documentation of time attributes]({{< ref "docs/dev/table/concepts/time_attributes" >}}) to learn how to define time attributes.
In streaming mode, the `time_attr` argument of the group window function must refer to a valid time attribute that specifies the processing time or event time of rows. See the [documentation of time attributes]({{< ref "docs/dev/table/concepts/time_attributes" >}}) to learn how to define time attributes.

For SQL on batch tables, the `time_attr` argument of the group window function must be an attribute of type `TIMESTAMP`.
In batch mode, the `time_attr` argument of the group window function must be an attribute of type `TIMESTAMP`.

### Selecting Group Window Start and End Timestamps

Expand Down
4 changes: 2 additions & 2 deletions docs/content.zh/docs/dev/table/sql/queries/window-join.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,9 +23,9 @@ under the License.
-->

# Window Join
{{< label Streaming >}}
{{< label Batch >}} {{< label Streaming >}}

A window join adds the dimension of time into the join criteria themselves. In doing so, the window join joins the elements of two streams that share a common key and lie in the same window. The semantic of window join is same to the [DataStream window join]({{< ref "docs/dev/datastream/operators/joining" >}}#window-join)
A window join adds the dimension of time into the join criteria themselves. In doing so, the window join joins the elements of two streams that share a common key and are in the same window. The semantic of window join is same to the [DataStream window join]({{< ref "docs/dev/datastream/operators/joining" >}}#window-join)

For streaming queries, unlike other joins on continuous tables, window join does not emit intermediate results but only emits final results at the end of the window. Moreover, window join purge all intermediate state when no longer needed.

Expand Down
2 changes: 1 addition & 1 deletion docs/content.zh/docs/dev/table/sql/queries/window-topn.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ under the License.
-->

# Window Top-N
{{< label Streaming >}}
{{< label Batch >}} {{< label Streaming >}}

Window Top-N is a special [Top-N]({{< ref "docs/dev/table/sql/queries/topn" >}}) which returns the N smallest or largest values for each window and other partitioned keys.

Expand Down
28 changes: 20 additions & 8 deletions docs/content.zh/docs/dev/table/sql/queries/window-tvf.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ under the License.

# Windowing table-valued functions (Windowing TVFs)

{{< label Streaming >}}
{{< label Batch >}} {{< label Streaming >}}

Windows are at the heart of processing infinite streams. Windows split the stream into “buckets” of finite size, over which we can apply computations. This document focuses on how windowing is performed in Flink SQL and how the programmer can benefit to the maximum from its offered functionality.

Expand All @@ -48,15 +48,21 @@ See more how to apply further computations based on windowing TVF:

## Window Functions

Apache Flink provides 3 built-in windowing TVFs: `TUMBLE`, `HOP` and `CUMULATE`. The return value of windowing TVF is a new relation that includes all columns of original relation as well as additional 3 columns named "window_start", "window_end", "window_time" to indicate the assigned window. The "window_time" field is a [time attributes]({{< ref "docs/dev/table/concepts/time_attributes" >}}) of the window after windowing TVF which can be used in subsequent time-based operations, e.g. another windowing TVF, or <a href="{{< ref "docs/dev/table/sql/queries/joins" >}}#interval-joins">interval joins</a>, <a href="{{< ref "docs/dev/table/sql/queries/over-agg" >}}">over aggregations</a>. The value of `window_time` always equal to `window_end - 1ms`.
Apache Flink provides 3 built-in windowing TVFs: `TUMBLE`, `HOP` and `CUMULATE`. The return value of windowing TVF is a new relation that includes all columns of original relation as well as additional 3 columns named "window_start", "window_end", "window_time" to indicate the assigned window.
In streaming mode, the "window_time" field is a [time attributes]({{< ref "docs/dev/table/concepts/time_attributes" >}}) of the window.
In batch mode, the "window_time" field is an attribute of type `TIMESTAMP` or `TIMESTAMP_LTZ` based on input time field type.
The "window_time" field can be used in subsequent time-based operations, e.g. another windowing TVF, or <a href="{{< ref "docs/dev/table/sql/queries/joins" >}}#interval-joins">interval joins</a>, <a href="{{< ref "docs/dev/table/sql/queries/over-agg" >}}">over aggregations</a>. The value of `window_time` always equal to `window_end - 1ms`.

### TUMBLE

The `TUMBLE` function assigns each element to a window of specified window size. Tumbling windows have a fixed size and do not overlap. For example, suppose you specify a tumbling window with a size of 5 minutes. In that case, Flink will evaluate the current window, and a new window started every five minutes, as illustrated by the following figure.

{{< img src="/fig/tumbling-windows.svg" alt="Tumbling Windows" width="70%">}}

The `TUMBLE` function assigns a window for each row of a relation based on a [time attribute]({{< ref "docs/dev/table/concepts/time_attributes" >}}) column. The return value of `TUMBLE` is a new relation that includes all columns of original relation as well as additional 3 columns named "window_start", "window_end", "window_time" to indicate the assigned window. The original time attribute "timecol" will be a regular timestamp column after window TVF.
The `TUMBLE` function assigns a window for each row of a relation based on a time attribute field.
In streaming mode, the time attribute field must be either [event or processing time attributes]({{< ref "docs/dev/table/concepts/time_attributes" >}}).
In batch mode, the time attribute field of window table function must be an attribute of type `TIMESTAMP` or `TIMESTAMP_LTZ`.
The return value of `TUMBLE` is a new relation that includes all columns of original relation as well as additional 3 columns named "window_start", "window_end", "window_time" to indicate the assigned window. The original time attribute "timecol" will be a regular timestamp column after window TVF.

`TUMBLE` function takes three required parameters, one optional parameter:

Expand All @@ -65,7 +71,7 @@ TUMBLE(TABLE data, DESCRIPTOR(timecol), size [, offset ])
```

- `data`: is a table parameter that can be any relation with a time attribute column.
- `timecol`: is a column descriptor indicating which [time attributes]({{< ref "docs/dev/table/concepts/time_attributes" >}}) column of data should be mapped to tumbling windows.
- `timecol`: is a column descriptor indicating which time attributes column of data should be mapped to tumbling windows.
- `size`: is a duration specifying the width of the tumbling windows.
- `offset`: is an optional parameter to specify the offset which window start would be shifted by.

Expand Down Expand Up @@ -141,7 +147,10 @@ For example, you could have windows of size 10 minutes that slides by 5 minutes.

{{< img src="/fig/sliding-windows.svg" alt="Hopping windows" width="70%">}}

The `HOP` function assigns windows that cover rows within the interval of size and shifting every slide based on a [time attribute]({{< ref "docs/dev/table/concepts/time_attributes" >}}) column. The return value of `HOP` is a new relation that includes all columns of original relation as well as additional 3 columns named "window_start", "window_end", "window_time" to indicate the assigned window. The original time attribute "timecol" will be a regular timestamp column after windowing TVF.
The `HOP` function assigns windows that cover rows within the interval of size and shifting every slide based on a time attribute field.
In streaming mode, the time attribute field must be either [event or processing time attributes]({{< ref "docs/dev/table/concepts/time_attributes" >}}).
In batch mode, the time attribute field of window table function must be an attribute of type `TIMESTAMP` or `TIMESTAMP_LTZ`.
The return value of `HOP` is a new relation that includes all columns of original relation as well as additional 3 columns named "window_start", "window_end", "window_time" to indicate the assigned window. The original time attribute "timecol" will be a regular timestamp column after windowing TVF.

`HOP` takes four required parameters, one optional parameter:

Expand All @@ -150,7 +159,7 @@ HOP(TABLE data, DESCRIPTOR(timecol), slide, size [, offset ])
```

- `data`: is a table parameter that can be any relation with an time attribute column.
- `timecol`: is a column descriptor indicating which [time attributes]({{< ref "docs/dev/table/concepts/time_attributes" >}}) column of data should be mapped to hopping windows.
- `timecol`: is a column descriptor indicating which time attributes column of data should be mapped to hopping windows.
- `slide`: is a duration specifying the duration between the start of sequential hopping windows
- `size`: is a duration specifying the width of the hopping windows.
- `offset`: is an optional parameter to specify the offset which window start would be shifted by.
Expand Down Expand Up @@ -214,7 +223,10 @@ For example, you could have a cumulating window for 1 hour step and 1 day max si

{{< img src="/fig/cumulating-windows.png" alt="Cumulating Windows" width="70%">}}

The `CUMULATE` functions assigns windows based on a [time attribute]({{< ref "docs/dev/table/concepts/time_attributes" >}}) column. The return value of `CUMULATE` is a new relation that includes all columns of original relation as well as additional 3 columns named "window_start", "window_end", "window_time" to indicate the assigned window. The original time attribute "timecol" will be a regular timestamp column after window TVF.
The `CUMULATE` functions assigns windows based on a time attribute column.
In streaming mode, the time attribute field must be either [event or processing time attributes]({{< ref "docs/dev/table/concepts/time_attributes" >}}).
In batch mode, the time attribute field of window table function must be an attribute of type `TIMESTAMP` or `TIMESTAMP_LTZ`.
The return value of `CUMULATE` is a new relation that includes all columns of original relation as well as additional 3 columns named "window_start", "window_end", "window_time" to indicate the assigned window. The original time attribute "timecol" will be a regular timestamp column after window TVF.

`CUMULATE` takes four required parameters, one optional parameter:

Expand All @@ -223,7 +235,7 @@ CUMULATE(TABLE data, DESCRIPTOR(timecol), step, size)
```

- `data`: is a table parameter that can be any relation with an time attribute column.
- `timecol`: is a column descriptor indicating which [time attributes]({{< ref "docs/dev/table/concepts/time_attributes" >}}) column of data should be mapped to tumbling windows.
- `timecol`: is a column descriptor indicating which time attributes column of data should be mapped to cumulating windows.
- `step`: is a duration specifying the increased window size between the end of sequential cumulating windows.
- `size`: is a duration specifying the max width of the cumulating windows. `size` must be an integral multiple of `step`.
- `offset`: is an optional parameter to specify the offset which window start would be shifted by.
Expand Down
10 changes: 6 additions & 4 deletions docs/content/docs/dev/table/sql/queries/window-agg.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ under the License.

## Window TVF Aggregation

{{< label Streaming >}}
{{< label Batch >}} {{< label Streaming >}}

Window aggregations are defined in the `GROUP BY` clause contains "window_start" and "window_end" columns of the relation applied [Windowing TVF]({{< ref "docs/dev/table/sql/queries/window-tvf" >}}). Just like queries with regular `GROUP BY` clauses, queries with a group by window aggregation will compute a single result row per group.

Expand All @@ -40,7 +40,9 @@ Unlike other aggregations on continuous tables, window aggregation do not emit i

### Windowing TVFs

Flink supports `TUMBLE`, `HOP` and `CUMULATE` types of window aggregations, which can be defined on either [event or processing time attributes]({{< ref "docs/dev/table/concepts/time_attributes" >}}). See [Windowing TVF]({{< ref "docs/dev/table/sql/queries/window-tvf" >}}) for more windowing functions information.
Flink supports `TUMBLE`, `HOP` and `CUMULATE` types of window aggregations.
In streaming mode, the time attribute field of a window table-valued function must be on either [event or processing time attributes]({{< ref "docs/dev/table/concepts/time_attributes" >}}). See [Windowing TVF]({{< ref "docs/dev/table/sql/queries/window-tvf" >}}) for more windowing functions information.
In batch mode, the time attribute field of a window table-valued function must be an attribute of type `TIMESTAMP` or `TIMESTAMP_LTZ`.

Here are some examples for `TUMBLE`, `HOP` and `CUMULATE` window aggregations.

Expand Down Expand Up @@ -253,9 +255,9 @@ Group Window Aggregations are defined in the `GROUP BY` clause of a SQL query. J

### Time Attributes

For SQL queries on streaming tables, the `time_attr` argument of the group window function must refer to a valid time attribute that specifies the processing time or event time of rows. See the [documentation of time attributes]({{< ref "docs/dev/table/concepts/time_attributes" >}}) to learn how to define time attributes.
In streaming mode, the `time_attr` argument of the group window function must refer to a valid time attribute that specifies the processing time or event time of rows. See the [documentation of time attributes]({{< ref "docs/dev/table/concepts/time_attributes" >}}) to learn how to define time attributes.

For SQL on batch tables, the `time_attr` argument of the group window function must be an attribute of type `TIMESTAMP`.
In batch mode, the `time_attr` argument of the group window function must be an attribute of type `TIMESTAMP`.

### Selecting Group Window Start and End Timestamps

Expand Down
4 changes: 2 additions & 2 deletions docs/content/docs/dev/table/sql/queries/window-join.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,9 +23,9 @@ under the License.
-->

# Window Join
{{< label Streaming >}}
{{< label Batch >}} {{< label Streaming >}}

A window join adds the dimension of time into the join criteria themselves. In doing so, the window join joins the elements of two streams that share a common key and lie in the same window. The semantic of window join is same to the [DataStream window join]({{< ref "docs/dev/datastream/operators/joining" >}}#window-join)
A window join adds the dimension of time into the join criteria themselves. In doing so, the window join joins the elements of two streams that share a common key and are in the same window. The semantic of window join is same to the [DataStream window join]({{< ref "docs/dev/datastream/operators/joining" >}}#window-join)

For streaming queries, unlike other joins on continuous tables, window join does not emit intermediate results but only emits final results at the end of the window. Moreover, window join purge all intermediate state when no longer needed.

Expand Down
2 changes: 1 addition & 1 deletion docs/content/docs/dev/table/sql/queries/window-topn.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ under the License.
-->

# Window Top-N
{{< label Streaming >}}
{{< label Batch >}} {{< label Streaming >}}

Window Top-N is a special [Top-N]({{< ref "docs/dev/table/sql/queries/topn" >}}) which returns the N smallest or largest values for each window and other partitioned keys.

Expand Down
Loading

0 comments on commit 5bc3951

Please sign in to comment.