-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add varchar support for min/max_by #1846
Conversation
@xiaoxmeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
3 similar comments
@xiaoxmeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@xiaoxmeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@xiaoxmeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xiaoxmeng A couple of initial questions. I'll review the PR later.
@xiaoxmeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xiaoxmeng The implementation looks good, but there is a lot of copy paste. It feels like it should be possible to reduce that using templates.
@xiaoxmeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xiaoxmeng The changes in the min_by/max_by aggregate function look good. These changes allow the function to take any type as 'value' including arrays, maps and structs. In a follow-up PR it would be nice to update the signature and add tests to confirm this use case works.
I'll read the test next.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xiaoxmeng Some comments for the tests. These are quite verbose and it is hard to tell what combinations are being testing. I wonder if there is a way to make this clearer.
velox/functions/prestosql/aggregates/tests/MinMaxByAggregationTest.cpp
Outdated
Show resolved
Hide resolved
velox/functions/prestosql/aggregates/tests/MinMaxByAggregationTest.cpp
Outdated
Show resolved
Hide resolved
velox/functions/prestosql/aggregates/tests/MinMaxByAggregationTest.cpp
Outdated
Show resolved
Hide resolved
velox/functions/prestosql/aggregates/tests/MinMaxByAggregationTest.cpp
Outdated
Show resolved
Hide resolved
velox/functions/prestosql/aggregates/tests/MinMaxByAggregationTest.cpp
Outdated
Show resolved
Hide resolved
velox/functions/prestosql/aggregates/tests/MinMaxByAggregationTest.cpp
Outdated
Show resolved
Hide resolved
velox/functions/prestosql/aggregates/tests/MinMaxByAggregationTest.cpp
Outdated
Show resolved
Hide resolved
velox/functions/prestosql/aggregates/tests/MinMaxByAggregationTest.cpp
Outdated
Show resolved
Hide resolved
struct { | ||
const RowVectorPtr inputRowVector; | ||
const std::string verifyDuckDbSql; | ||
const bool expectedBuildFailure = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why build failure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is verify that some data type we don't support it and expect an exception here when build velox operator plan.
velox/functions/prestosql/aggregates/tests/MinMaxByAggregationTest.cpp
Outdated
Show resolved
Hide resolved
@xiaoxmeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
1 similar comment
@xiaoxmeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@xiaoxmeng format-check CI job is failing. Would you take a look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xiaoxmeng The aggregate function changes look good. There is a CircleCI failure that needs fixing. How do you feel about refactoring the test in a separate PR to make extending the test more readable?
velox/functions/prestosql/aggregates/tests/MinMaxByAggregationTest.cpp
Outdated
Show resolved
Hide resolved
@xiaoxmeng format-check CircleCI job is still failing. Would you run 'make format-fix' and update the PR? |
f663c72
to
f51a9d5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. Thank you!
velox/functions/prestosql/aggregates/tests/MinMaxByAggregationTest.cpp
Outdated
Show resolved
Hide resolved
@xiaoxmeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@xiaoxmeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@xiaoxmeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
1 similar comment
@xiaoxmeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Summary: Add varchar support for min/max_by with unit tests. The followup is to add e2e test coverage. Pull Request resolved: facebookincubator#1846 Reviewed By: mbasmanova Differential Revision: D37325257 Pulled By: xiaoxmeng fbshipit-source-id: 4fc99a5864e4ac5a70935884741aad334fe4a59a
This pull request was exported from Phabricator. Differential Revision: D37325257 |
This pull request was exported from Phabricator. Differential Revision: D37325257 |
Summary: Add varchar support for min/max_by with unit tests. The followup is to add e2e test coverage. Pull Request resolved: facebookincubator#1846 Reviewed By: mbasmanova Differential Revision: D37325257 Pulled By: xiaoxmeng fbshipit-source-id: 288bb9285cf7ca50a32a55765223531709e59578
Summary: Add varchar support for min/max_by with unit tests. The followup is to add e2e test coverage. Pull Request resolved: facebookincubator#1846 Reviewed By: mbasmanova Differential Revision: D37325257 Pulled By: xiaoxmeng fbshipit-source-id: 18d3cfc5d96531522fdcb5158dccafe450cb6522
This pull request was exported from Phabricator. Differential Revision: D37325257 |
Summary: Add varchar support for min/max_by with unit tests. The followup is to add e2e test coverage. Pull Request resolved: facebookincubator#1846 Reviewed By: mbasmanova Differential Revision: D37325257 Pulled By: xiaoxmeng fbshipit-source-id: 63f3a5dc38d708d4915b76087b3037c80be51c97
This pull request was exported from Phabricator. Differential Revision: D37325257 |
This patch adds more spark 3.3 unit tests to gluten. The excluded ones will be fixed in a following patch. ``` - Various partition value types - Various inferred partition value types - Various partition value types - Various inferred partition value types - SPARK-32908: maximum target error in percentile_approx - SPARK-36825, SPARK-36854: year-month/day-time intervals written and read as INT32/INT64 - support batch reads for schema - SPARK-36182: read TimestampNTZ as TimestampLTZ - SPARK-36797: Union should resolve nested columns as top-level columns - SPARK-37371: UnionExec should support columnar if all children support columnar - SPARK-36280: Remove redundant aliases after RewritePredicateSubquery - SPARK-36182: read TimestampNTZ as TimestampLTZ - SPARK-39833: pushed filters with project without filter columns - SPARK-36825, SPARK-36854: year-month/day-time intervals written and read as INT32/INT64 - support batch reads for schema - SPARK-36794: Ignore duplicated key when building relation for semi/anti hash join ```
Add varchar support for min/max_by with unit tests. The followup is to add e2e test coverage.