Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(expressions): Use compound names in protobufs #24

Merged
merged 6 commits into from
Aug 17, 2023

Conversation

wackywendell
Copy link
Contributor

This PR finds and fixes a number of related issues:

  1. Adding multiple functions to the extension registry could insert the same extension file multiple times
  2. Unrecognized compound names from protobufs were parsed as simple names
  3. Compound names were not used for the protobuf output

This also updates tests to match the above - in particular, any tests with protobuf output now uses compound names.

I made some choices along the way for how to make this work; I'll comment inline in this PR.

Supersedes #23.

@@ -60,7 +60,7 @@ func ExampleExpression_scalarFunction() {
"extensionFunction": {
"extensionUriReference": 1,
"functionAnchor": 2,
"name": "add"
"name": "add:i32_i32"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compound names are now required by the spec in the protobuf. Simple names should still be parsed in the protobuf, but in the round trip it will be turned back into a compound name, hence the update here

@wackywendell wackywendell changed the title fix(expressions): Use compound names in function identifiers fix(expressions): Use compound names in protobufs Aug 15, 2023
@@ -142,7 +142,6 @@ func BoundFromProto(b *proto.Expression_WindowFunction_Bound) Bound {

type ScalarFunction struct {
funcRef uint32
id extensions.ID
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name and URI are in the declaration *extensions.ScalarFunctionVariant, so for simplification purposes, I removed the duplication so the two would not get out of sync

Comment on lines +259 to +260
func (s *ScalarFunction) Name() string { return s.declaration.Name() }
func (s *ScalarFunction) CompoundName() string { return s.declaration.CompoundName() }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I switched Name() to now be the "simple" name (e.g. add), and CompoundName() to now be the, well, compound name add:i32_i32, to keep the naming consistent with that used with the ScalarFunctionVariant. This wasn't particularly needed or all that useful here, just seemed like a relevant minor improvement

func (s *ScalarFunction) ID() extensions.ID { return s.id }
func (s *ScalarFunction) Name() string { return s.declaration.Name() }
func (s *ScalarFunction) CompoundName() string { return s.declaration.CompoundName() }
func (s *ScalarFunction) ID() extensions.ID { return s.declaration.ID() }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ID is now taken from the variant, which should always be at least as populated as the ScalarFunction.

Comment on lines +418 to +422
for k, v := range e.uris {
if v == uri {
return k, nil
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetFuncAnchor() calls this function, and if you use the same URI for two different functions, this deduplicates the URI list. As such, I changed the name to addOrGetURI to reflect that it may be returning a pre-existing index

Comment on lines 86 to 90
// When parsing a function variant from a string, we may know the string form of
// an argument, but nothing else. This represents that type of argument.
type unknownArgType struct {
arg string
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was useful to populate a ScalarFuncVariant from only a name, but it's a bit hacky. Suggestions welcome if you have better ideas.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we already have a parser for the type strings, why not just parse the arg type string into actual Type instances?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds like it would make sense 😅 . I looked for it a bit, but I didn't find it; can you point me to it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, thanks! 🤦 Now that I'm looking for it, I see it's already used in this very file 😓

@@ -101,13 +101,14 @@ func TestBuildEmitOutOfRangePlan(t *testing.T) {
}

func checkRoundTrip(t *testing.T, expectedJSON string, p *plan.Plan) {
t.Helper()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not required but was useful in debugging a test

@@ -1004,6 +1005,159 @@ func TestSortRelationErrors(t *testing.T) {
assert.ErrorContains(t, err, "output mapping index out of range")
}

func TestProjectExpressions(t *testing.T) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New test, round-tripping some function expressions. This fails without this PR: the URI gets duplicated, and it uses simple function names and not compound ones

Comment on lines +259 to +260
// Sort extensions by the anchor for consistent output
sort.Slice(uris, func(i, j int) bool { return uris[i].ExtensionUriAnchor < uris[j].ExtensionUriAnchor })
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not strictly necessary, but seemed nicer: it's consistent, which is nice; it's in order of anchors, which also seems nice; and it makes testing easier, because its deterministic

Copy link
Contributor

@zeroshade zeroshade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all seems great to me, along with the tests. Thanks so much!

@zeroshade zeroshade merged commit f120601 into substrait-io:main Aug 17, 2023
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants