fix(expressions): Use compound names in function identifiers #23

wackywendell · 2023-08-07T21:57:11Z

This change addresses a bug where the type signatures in substrait proto files were lost in decoding -> encoding, and only simple names were used.

To address this, I updated the expression builders to always use compound names (e.g. add:i64_i64) and not just simple names (add). The spec currently requires compound names for functions with multiple impls (and may soon require them always), and the signatures were lost in the conversion from protobuf to internal formats. That is the first commit, with tests updated in the second commit.

Secondly, I changed the code so that if the function is not found with the given type signature, a "not found" error is returned, just as it is if the function is not found. That's the third commit. I'm less attached to this, but it helped in testing because I found a number of tests that "passed" incorrectly as (with the change above) they were referring to non-existent functions.

For that second part - I'm not sure it's necessary; we can drop that commit if you prefer. Alternatively - it might be worth removing the NewCustom*Func functions altogether.

zeroshade

I'm not sure about this change, the intent for allowing usage of the simple names was so that the builders can be used to auto resolve the compound function names based on the passed in argument types. You can see that resolution issue here when you had to change the types of the outputs. Notice that in the expressions_tests the output is no longer fp32. We want the type resolution to still work.

As for the NewCustom*Func functions, those are necessary for the cases where users utilize custom URNs rather than valid URIs. This is why we don't error out when we can't find the function. You can see more discussion of it here: #14 (comment)

zeroshade · 2023-08-09T14:54:50Z

expr/expressions_test.go

-func ExampleExpression_scalarFunction() {
+func TestExampleExpressionScalarFunction(t *testing.T) {


Why make this change? The intent was for this to show up as an example in the docs on pkg.go.dev, by making this change the test will no longer show up as a testable example.

Ah! That makes sense. I wanted to use assert as used elsewhere; I'm not used to example's like that, so I didn't realize the regression. Thanks for pointing it out!

zeroshade · 2023-08-09T14:55:53Z

expr/expressions_test.go

-	// print string represention of the expression
-	fmt.Println(fromProto)
-	// print the string representation of our
-	// manually constructed expression
+	// Check the string represention of the expression
+	assert.Equal(t, "add:i8_i8(.field(0), fp64(10)) => i8", fromProto.String())
+	// Check our manually constructed expression
+	assert.Equal(t, "add:i8_i8(.field(0) => i8, fp64(10)) => i8", ex.String())
 	fmt.Println(ex)

-	// verify that the Equals methods work recursively
-	fmt.Println(ex.Equals(fromProto))
+	// verify that the two are equal
+	assert.True(t, ex.Equals(fromProto))
 	// confirm our manually constructed expression is the same
 	// as the one we got from protojson
-	fmt.Println(pb.Equal(&exprProto, toProto))
-
-	// Output:
-	// add(.field(0), fp64(10)) => i32
-	// add(.field(0) => i32, fp64(10)) => i32
-	// true
-	// true


Same here, we want the // Output: and such so that this shows up as a testable example in the documentation to make it easy for consumers to see how to work with the library.

Good point, as above 👍

wackywendell · 2023-08-10T14:19:59Z

@zeroshade - great points! Yes, I had some uncertainty going into this, and you've confirmed it.

Perhaps we should start by clarifying goals / requirements, so that we can get an overall improvement?

I see a few that we've gathered:

The protobuf output from the builder should use compound names for functions in the substrait extensions. This is required by the spec for functions with multiple impls, and now for single impls as well. (This was the intent of this PR)
Converting a protobuf input to substrait internal representation should interpret compound function names into the correct SubstraitFunctionVariant with the appropriate types. (Also intended by this PR)
The builder should be able to accept function names without compound types, and determine the correct function variant. (Regression in this PR)
Users should be able to add custom functions specified with custom URNs, the NewCustom*Func functions. (Regression in this PR)
Examples should show up in the Godocs. (Regression in this PR).

So you pointed out 3-5, and that makes sense to me; I can work on that. I might do a bit of looking at the spec and ask about (3) on Slack, as the intention makes sense in a vague way to me, but I don't understand the specifics, and I'd like to.

For 1-2, do those make sense to you as reasonable goals? Also - please let me know if any of the above is incorrect or could be restated better, I'm new to this repo and trying to figure it out. Thanks for your help, I appreciate it!

zeroshade · 2023-08-14T21:02:06Z

Hey @wackywendell:

1 and 2 make perfect sense as reasonable goals for me. I believe that number 2 should already occur (to my knowledge compound function names should already get interpretted into the correct variant, if not then I must have missed a test somewhere and I'm glad you're fixing it!). Did you find a case where number 2 isn't currently true in the code?

wackywendell · 2023-08-14T22:03:09Z

For (2), it looks like that works most of the time, but has some issues with functions that are (a) specified with compound names, and (b) not found in the YAML, so they end up getting created with the NewCustomScalarFunc machinery.

See example here, where an ID with a compound name add:i32_i32 gets translated to simple name add:i32_i32 and type signature empty, leading to a full compound name add:i32_i32:.

This becomes relevant because if you fix (1), then you get problems related to (2), where parsing a protobuf file into the substrait-go representation and then back to protobuf fails.

🤔 I think this might be solved by changing the NewScalarFuncVariant function to parse compound names into appropriate arguments, if the name is found to be compound?

zeroshade · 2023-08-15T15:43:27Z

🤔 I think this might be solved by changing the NewScalarFuncVariant function to parse compound names into appropriate arguments, if the name is found to be compound?

Without digging too much into it myself at the moment, that sounds like a reasonable approach. Wanna try it out and see if that fixes the issue?

wackywendell · 2023-08-15T15:52:10Z

Yes - I'm giving it a try. Thanks! Let me see what code I can come up with and get back to you 🙇

wackywendell · 2023-08-17T18:35:27Z

Closing in favor of #24.

wackywendell added 3 commits August 7, 2023 17:41

Use compound names in function IDs

810e124

Update tests

5f35561

Function must be found in extensions

8f59b2d

vbarua requested a review from zeroshade August 8, 2023 20:31

zeroshade reviewed Aug 9, 2023

View reviewed changes

wackywendell mentioned this pull request Aug 15, 2023

fix(expressions): Use compound names in protobufs #24

Merged

wackywendell closed this Aug 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(expressions): Use compound names in function identifiers #23

fix(expressions): Use compound names in function identifiers #23

wackywendell commented Aug 7, 2023

zeroshade left a comment

zeroshade Aug 9, 2023

wackywendell Aug 10, 2023

zeroshade Aug 9, 2023

wackywendell Aug 10, 2023

wackywendell commented Aug 10, 2023

zeroshade commented Aug 14, 2023

wackywendell commented Aug 14, 2023

zeroshade commented Aug 15, 2023

wackywendell commented Aug 15, 2023

wackywendell commented Aug 17, 2023

		func ExampleExpression_scalarFunction() {
		func TestExampleExpressionScalarFunction(t *testing.T) {

fix(expressions): Use compound names in function identifiers #23

fix(expressions): Use compound names in function identifiers #23

Conversation

wackywendell commented Aug 7, 2023

zeroshade left a comment

Choose a reason for hiding this comment

zeroshade Aug 9, 2023

Choose a reason for hiding this comment

wackywendell Aug 10, 2023

Choose a reason for hiding this comment

zeroshade Aug 9, 2023

Choose a reason for hiding this comment

wackywendell Aug 10, 2023

Choose a reason for hiding this comment

wackywendell commented Aug 10, 2023

zeroshade commented Aug 14, 2023

wackywendell commented Aug 14, 2023

zeroshade commented Aug 15, 2023

wackywendell commented Aug 15, 2023

wackywendell commented Aug 17, 2023