Skip to content

Commit

Permalink
docs: add documentation for using lambda functions (confluentinc#7092)
Browse files Browse the repository at this point in the history
* docs: add documentation for using lambda functions

* Adding lambda example docs

* Updates for reduce

* Review update

* Apply suggestions from code review

Co-authored-by: Jim Galasyn <jim.galasyn@confluent.io>

* Review updates - adding index cards

* language clean-up

Co-authored-by: Steven Zhang <stevenz@confluent.io>
Co-authored-by: Jim Galasyn <jim.galasyn@confluent.io>
  • Loading branch information
3 people committed Mar 8, 2021
1 parent 170a3c0 commit 8f684da
Show file tree
Hide file tree
Showing 6 changed files with 314 additions and 1 deletion.
7 changes: 6 additions & 1 deletion docs/concepts/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,12 @@ Learn the core concepts that ksqlDB is built around.
<p class="card-body"><small>Connectors source and sink data from external systems.</small></p>
<span><a href="/concepts/connectors">Learn →</a></span>
</div>

<div class="card concepts">
<strong>Lambda Functions</strong>
<p class="card-body"><small>Lambda functions allow you to apply in-line functions without creating a full UDF.</small></p>
<span><a href="/concepts/lambda-functions">Learn →</a></span>
</div>

<div class="card concepts">
<a href="/overview/apache-kafka-primer"><strong>Apache Kafka primer</strong></a>
<p class="card-body"><small>None of this making sense? Take a step back and learn the basics of Kafka first.</small></p>
Expand Down
14 changes: 14 additions & 0 deletions docs/concepts/lambda-functions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
---
layout: page
title: Lambda Functions
keywords: ksqldb, function, udf, lambda
---

# Lambda Functions

Use lambda functions, or "lambdas" for short, to express simple inline functions that can be applied to input values in various ways.
For example, you could apply a lambda function to each element of a collection, resulting in a transformed output collection.
Also, you can use lambdas to filter the elements of a collection, or reduce a collection to a single value.
The advantage of a lambda is that you can express user-defined functionality in a way that doesn’t require implementing a full [UDF](/how-to-guides/create-a-user-defined-function).

Learn how to use lambda functions in the [how-to guide](/how-to-guides/use-lambda-functions-in-udfs).
54 changes: 54 additions & 0 deletions docs/developer-guide/ksqldb-reference/scalar-functions.md
Original file line number Diff line number Diff line change
Expand Up @@ -520,6 +520,60 @@ SLICE(col1, from, to)
Slices a list based on the supplied indices. The indices start at 1 and
include both endpoints.

## Invocation Functions

Apply lambda functions to collections.

### `TRANSFORM`

Since: 0.17.0

```sql
TRANSFORM(array, x => ...)

TRANSFORM(map, (k,v) => ..., (k,v) => ...)
```

Transform a collection by using a lambda function.

If the collection is an array, the lambda function must have one input argument.

If the collection is a map, two lambda functions must be provided, and both lambdas must have two arguments: a map entry key and a map entry value.

### `Reduce`

Since: 0.17.0

```sql
REDUCE(array, state, (s, x) => ...)

REDUCE(map, state, (s, k, v) => ...)
```

Reduce a collection starting from an initial state.

If the collection is an array, the lambda function must have two input arguments.

If the collection is a map, the lambda function must have three input arguments.

If the state is `null`, the result is `null`.

### `Filter`

Since: 0.17.0

```sql
FILTER(array, x => ...)

FILTER(map, (k,v) => ...)
```

Filter a collection with a lambda function.

If the collection is an array, the lambda function must have one input argument.

If the collection is a map, the lambda function must have two input arguments.

## Strings

### `CHR`
Expand Down
6 changes: 6 additions & 0 deletions docs/how-to-guides/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,12 @@ Follow compact lessons that help you work with common ksqlDB functionality.
</div>

<div class="cards">
<div class="card how-to-guide">
<strong>Transforming columns with structured data</strong>
<p class="card-body"><small>Transform columns of structured data without user-defined functions.</small></p>
<span><a href="/how-to-guides/use-lambda-functions">Learn →</a></span>
</div>

<div class="card how-to-guide contribute">
<a href="https://github.com/confluentinc/ksql"><strong>Help us write another?</strong></a>
<p class="card-body"><small>We're always looking for more guides. Just send a pull request!</small></p>
Expand Down
232 changes: 232 additions & 0 deletions docs/how-to-guides/use-lambda-functions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,232 @@
---
layout: page
title: How to transform columns with structured data.
tagline: Transform columns of structured data without user-defined functions.
description: ksqlDB can compose existing functions to create new expressions over structured data
keywords: function, lambda, aggregation, user-defined function, ksqlDB
---
# Use lambda functions

## Context

You want to transform a column with structured data in a particular way, but there doesn't
exist a built-in function that suits your needs and you're unable to implement and deploy a
user-defined function. ksqlDB is capable of composing existing functions to create
new expressions over structured data. These are called lambda functions.

## In action
```sql
CREATE STREAM stream1 (
id INT,
lambda_map MAP<STRING, INTEGER>
) WITH (
kafka_topic = 'stream1',
partitions = 1,
value_format = 'avro'
);

CREATE STREAM output AS
SELECT id,
TRANSFORM(lambda_map, (k, v) => UCASE(k), (k, v) => v + 5)
FROM stream1
EMIT CHANGES;
```

## Syntax

The arguments for the lambda function are separated from the body of the lambda with the lambda operator, `=>`.

When there are two or more arguments, you must enclose the arguments with parentheses. Parentheses are optional for lambda functions with one argument.

Currently, ksqlDB supports up to three arguments in a single lambda function.

```sql
x => x + 5

(x,y) => x - y

(x,y,z) => z AND x OR y
```

## Invocation UDFs

Lambda functions must be used inside designated invocation functions. These are the available Invocations:

- [TRANSFORM](/developer-guide/ksqldb-reference/scalar-functions#TRANSFORM)
- [REDUCE](/developer-guide/ksqldb-reference/scalar-functions#REDUCE)
- [FILTER](/developer-guide/ksqldb-reference/scalar-functions#FILTER)

## Create a lambda-compatible stream
Invocation functions require either a map or array input. The following example creates a stream
with a column type of `MAP<STRING, INTEGER>`.
```sql
CREATE STREAM stream1 (
id INT,
lambda_map MAP<STRING, INTEGER>
) WITH (
kafka_topic = 'stream1',
partitions = 1,
value_format = 'avro'
);
```

## Apply a lambda invocation function
A lambda invocation function is a [scalar UDF](/developer-guide/ksqldb-reference/scalar-functions), and you use it like other scalar functions.

The following example lambda function transforms both the key and value of a map and produces a new map. A built-in UDF transforms the key
into an uppercase string using a built in UDF, and the value is transformed through addition. The order of the variables
is important: the first item in the arguments list, named `k` in this example, is treated as the key, and the second,
named `v` in this example, is treated as the value. Pay attention to this if your map has different types.
Note that `transform` on a map requires two lambda functions, while `transform` on an array requires one.
```sql
CREATE STREAM output AS
SELECT id,
TRANSFORM(lambda_map, (k, v) => UCASE(k), (k, v) => v + 5)
FROM stream1;
```

Insert some values into `stream1`.
```sql
INSERT INTO stream1 (
id, lambda_map
) VALUES (
3, MAP("hello":= 15, "goodbye":= -5)
);
```

Query the output.
```sql
SELECT * FROM output AS final_output;
```

Your output should resemble:
```sql
+------------------------------+------------------------------+
|id |final_output |
+------------------------------+------------------------------+
|3 |{HELLO: 20} |
|4 |{GOODBYE: 0} |
```

## Use a reduce lambda invocation function
The following example creates a stream with a column type `ARRAY<INTEGER>` and applies the `reduce` lambda
invocation function.
```sql
CREATE STREAM stream1 (
id INT,
lambda_arr ARRAY<INTEGER>
) WITH (
kafka_topic = 'stream1',
partitions = 1,
value_format = 'avro'
);

CREATE STREAM output AS
SELECT id,
REDUCE(lambda_arr, 2, (s, x) => ceil(x/s))
FROM stream1
EMIT CHANGES;
```
Insert some values into `stream1`.
```sql
INSERT INTO stream1 (
id, lambda_arr
) VALUES (
1, ARRAY(2, 3, 4, 5)
);
```

Query the output.
```sql
SELECT * FROM output AS final_output;
```

You should see something similar to:
```sql
+------------------------------+------------------------------+
|id |final_output |
+------------------------------+------------------------------+
|1 |{output:3} |
```

## Use a filter lambda invocation function
Create a stream with a column type `MAP<STRING, INTEGER>`and apply the `filter` lambda
invocation function.
```sql
CREATE STREAM stream1 (
id INT,
lambda_map MAP<STRING, INTEGER>
) WITH (
kafka_topic = 'stream1',
partitions = 1,
value_format = 'avro'
);

CREATE STREAM output AS
SELECT id,
FILTER(lambda_map, (k, v) => instr(k, 'name') > 0 AND v != 0)
FROM stream1
EMIT CHANGES;
```
Insert some values into `stream1`.
```sql
INSERT INTO stream1 (
id, lambda_arr
) VALUES (
1, MAP("first name":= 15, "middle":= 25, "last name":= 0, "alt name":= 33)
);
```

Query the output.
```sql
SELECT * FROM output AS final_output;
```

Your output should resemble:
```sql
+------------------------------+-----------------------------------------------+
|id |final_output |
+------------------------------+-----------------------------------------------+
|1 |{first name: 15, alt name: 33} |
```

## Advanced lambda use cases
the following example creates a stream with a column type `MAP<STRING, ARRAY<DECIMAL(2,3)>` and applies the `transform`
lambda invocation function with a nested `transform` lambda invocation function.
```sql
CREATE STREAM stream1 (
id INT,
lambda_map MAP<STRING, ARRAY<DECIMAL(2,3)>>
) WITH (
kafka_topic = 'stream1',
partitions = 1,
value_format = 'avro'
);

CREATE STREAM output AS
SELECT id,
TRANSFORM(lambda_map, (k, v) => concat(k, '_new') (k, v) => transform(v, x => round(x)))
FROM stream1
EMIT CHANGES;
```
Insert some values into `stream1`.
```sql
INSERT INTO stream1 (
id, lambda_arr
) VALUES (
1, MAP("Mary":= ARRAY[1.23, 3.65, 8.45], "Jose":= ARRAY[5.23, 1.65]})
);
```

Query the output.
```sql
SELECT * FROM output AS final_output;
```

Your output should resemble:
```sql
+------------------------------+----------------------------------------------------------+
|id |final_output |
+------------------------------+----------------------------------------------------------+
|1 |{Mary_new: [1, 4, 8], Jose_new: [5, 2]} |
```
2 changes: 2 additions & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@ nav:
- Time and Windows: concepts/time-and-windows-in-ksqldb-queries.md
- User-defined functions: concepts/functions.md
- Connectors: concepts/connectors.md
- Lambda Functions: concepts/lambda-functions.md
- Apache Kafka primer: concepts/apache-kafka-primer.md
- How-to guides:
- Synopsis: how-to-guides/index.md
Expand All @@ -58,6 +59,7 @@ nav:
- Use a custom timestamp column: how-to-guides/use-a-custom-timestamp-column.md
- Test an application: how-to-guides/test-an-app.md
- Substitute variables: how-to-guides/substitute-variables.md
- Transforming columns with structured data: how-to-guides/use-lambda-functions.md
- Tutorials:
- Synopsis: tutorials/index.md
- Materialized cache: tutorials/materialized.md
Expand Down

0 comments on commit 8f684da

Please sign in to comment.