Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: native map/array constructors #4232

Merged
merged 3 commits into from
Jan 17, 2020
Merged

Conversation

agavra
Copy link
Contributor

@agavra agavra commented Jan 7, 2020

BREAKING CHANGE: the old syntax for creating arrays using the
builtin AS_ARRAY function will no longer work

Description

Follow-up on #4120 - this PR creates maps and arrays natively instead of going through the UDF path. This has the benefit of custom syntax and stricter type checking.

Testing done

  • QTT
  • Unit Tests

Reviewer checklist

  • Ensure docs are updated if necessary. (eg. if a user visible feature is being added or changed).
  • Ensure relevant issues are linked (description should include text like "Fixes #")

@agavra agavra requested a review from a team as a code owner January 7, 2020 00:19
Copy link
Contributor

@big-andy-coates big-andy-coates left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice one @agavra

There are some comments inline, plus:

  1. Syntax wise, all our other constructors use standard brackets. I think array should too. I know Postgres supports both ARRAY[] and ARRAY(). But IMHO I think it's cleaner to leave [] for array access, not construction.
  2. This PR would benefit from RQTT test cases for INSERT VALUES. I'm confident this will flag up missing functionality in the coercer code.
  3. At the moment it looks like its not possible to have an empty arrays or maps, or arrays and maps with only null elements / values. I think this is fine for this initial PR, but this is a hole we'll want to fix quickly. We should raise, (and link), suitable issues to track this and, ideally, get this done soon. One possible solution is to support casting elements, e.g. Postgres supports ARRAY(null)::int[], ARRAY()::int[], which casts all the elements in the array to INT and the array type to ARRAY<INT>.
  4. You've dropped as_array. Is there the option to keep it until just before we go version 1.0? We could mark it as deprecated in the UDFs description and remove it from the docs. We could even log a warning when a query starts that uses it. Thereby giving people time to move over.

.collect(Collectors.toList());

if (sqlTypes.size() == 0) {
throw new KsqlException("Cannot extract type from array of nulls!");
Copy link
Contributor

@big-andy-coates big-andy-coates Jan 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again with the exclamation marks ;)

If this is customer facing, which as a KsqlException I'm assuming it is, then maybe something like:

Can not construct array with all NULL elements.

You could even link to the github issue that will add support for empty/all-null arrays so that people can upvote it.

Can the user work around this by casting one of the NULLs to a type? Maybe include this in the error message and add a QTT test case.

What about for INSERT VALUES where the required array type is known.... can we support all nulls then? How about empty arrays?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the user work around this by casting one of the NULLs to a type? Maybe include this in the error message and add a QTT test case.

Yes and done.

What about for INSERT VALUES where the required array type is known.... can we support all nulls then? How about empty arrays?

This is not a trivial change. I've thought about this in the context of various things (i.e. struct type inference) and it's out of scope for this PR, though technically possible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we link in a github issue to track this second part please?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


if (sqlTypes.size() != 1) {
throw new KsqlException(
"Invalid map expression! All values must be of the same type. " + exp);
Copy link
Contributor

@big-andy-coates big-andy-coates Jan 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As with ARRAYs, it might help the user if the error included the types of the values, so that they could more easily see where the issue is., e.g. a UDF that returns a type different to what they thought.

Maybe add details of how the user could use CAST to fix the issue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a better error message, but I don't think we need to tell the user to CAST as that may not be the right solution for them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the issue with suggesting CAST as a possible solution? We're not forcing them to use it...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough

}

if (sqlTypes.get(0) == null) {
throw new KsqlException("Maps do not accept null values!");
Copy link
Contributor

@big-andy-coates big-andy-coates Jan 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe something like:

Can not construct MAP with all NULL values.

You could even link to the github issue that will add support for empty/all-null arrays so that people can upvote it.

Can the user work around this by casting one of the NULLs to a type? Maybe include this in the error message and add a QTT test case.

What about for INSERT VALUES where the required map type is known.... can we support all nulls then? How about empty maps?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above

Copy link
Contributor

@big-andy-coates big-andy-coates Jan 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second part you've covered above. Lets just make sure its captured in an issue, maybe linking to it from the error.

Still think the error message could be improve to:

| Can not construct MAP with all NULL values.

And add the trick of using CAST.

@agavra
Copy link
Contributor Author

agavra commented Jan 7, 2020

Thanks for the review @big-andy-coates! I agree with your comments about testing and casting and I'll add those in.

I don't think we should keep AS_ARRAY because anyone that is upgrading to 0.7 will already have lots of breaking changes to make and this one isn't hard to change.

For the array constructor syntax, I think I'd prefer to keep it as is (which is the sql99 standard: http://web.cecs.pdx.edu/~len/sql1999.pdf):

<array value list constructor> ::= ARRAY <left bracket or trigraph> <array element list> <right bracket or trigraph>

A trigraph is ??(, which very few sql implementations seem to support. All examples I've found online of sql systems use the ARRAY[] syntax.

@blueedgenick
Copy link
Contributor

blueedgenick commented Jan 7, 2020 via email

@agavra agavra requested review from big-andy-coates and a team January 7, 2020 21:14
@agavra agavra force-pushed the maps_arrays branch 2 times, most recently from a22eff5 to 841a14d Compare January 13, 2020 18:59
.collect(Collectors.toList());

if (sqlTypes.size() == 0) {
throw new KsqlException("Cannot extract type from array of nulls!");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we link in a github issue to track this second part please?

throw new KsqlException("Only STRING keys are supported in maps but got: " + gkeyTypes);
}

final List<SqlType> sqlTypes = exp.getMap()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nit: can we call this valueTypes and the other keyTypes to make it clear?

}

if (sqlTypes.get(0) == null) {
throw new KsqlException("Maps do not accept null values!");
Copy link
Contributor

@big-andy-coates big-andy-coates Jan 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second part you've covered above. Lets just make sure its captured in an issue, maybe linking to it from the error.

Still think the error message could be improve to:

| Can not construct MAP with all NULL values.

And add the trick of using CAST.

Copy link
Contributor

@big-andy-coates big-andy-coates left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @agavra for addressing my suggestions. Agree with everything else. Couple of points / nits below.

Andy

@@ -7,7 +7,7 @@
"name": "construct a list from two elements",
"statements": [
"CREATE STREAM TEST (a INT, b INT) WITH (kafka_topic='test_topic', value_format='JSON');",
"CREATE STREAM OUTPUT AS SELECT as_array(a, b, 3) as l FROM TEST;"
"CREATE STREAM OUTPUT AS SELECT ARRAY[a, b, 3] as l FROM TEST;"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might want to rename this test file... it's currently asarray.json and AS_ARRAY no longer exists ;) . How about arrays.json :p

@@ -3,6 +3,19 @@
"Tests covering map creation"
],
"tests": [
{
"name": "create map from named tuples",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

likewise, consider renaming this test file.

Comment on lines 237 to 248
{
"name": "should handle arbitrary nested expressions",
"statements": [
"CREATE STREAM TEST (I INT, A ARRAY<ARRAY<INT>>) WITH (kafka_topic='test_topic', value_format='JSON');",
"INSERT INTO TEST (I, A) VALUES (-1, ARRAY[ARRAY[1]]);"
],
"inputs": [
],
"outputs": [
{"topic": "test_topic", "key": null, "value": {"I": -1, "A": [[1]]}}
]
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we're fully testing the coercer with this test as the 1 from ARRAY[ARRAY[1]] will default to a IntegerLiteral and the type is ARRAY<ARRAY<INT>>.

What happens if you make it ARRAY<ARRAY<BIGINT>>?

Same goes for the map test below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

BREAKING CHANGE: the old syntax for creating arrays using the builtin
AS_ARRAY function will no longer work
@agavra agavra merged commit 3ecfaad into confluentinc:master Jan 17, 2020
@agavra agavra deleted the maps_arrays branch January 17, 2020 18:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants