Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add varchar support for min/max_by #1846

Closed
wants to merge 1 commit into from

Conversation

xiaoxmeng
Copy link
Contributor

Add varchar support for min/max_by with unit tests. The followup is to add e2e test coverage.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 21, 2022
@facebook-github-bot
Copy link
Contributor

@xiaoxmeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

3 similar comments
@facebook-github-bot
Copy link
Contributor

@xiaoxmeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@xiaoxmeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@xiaoxmeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@mbasmanova mbasmanova self-requested a review June 23, 2022 15:13
Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xiaoxmeng A couple of initial questions. I'll review the PR later.

@facebook-github-bot
Copy link
Contributor

@xiaoxmeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xiaoxmeng The implementation looks good, but there is a lot of copy paste. It feels like it should be possible to reduce that using templates.

@facebook-github-bot
Copy link
Contributor

@xiaoxmeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xiaoxmeng The changes in the min_by/max_by aggregate function look good. These changes allow the function to take any type as 'value' including arrays, maps and structs. In a follow-up PR it would be nice to update the signature and add tests to confirm this use case works.

I'll read the test next.

Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xiaoxmeng Some comments for the tests. These are quite verbose and it is hard to tell what combinations are being testing. I wonder if there is a way to make this clearer.

struct {
const RowVectorPtr inputRowVector;
const std::string verifyDuckDbSql;
const bool expectedBuildFailure = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why build failure?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is verify that some data type we don't support it and expect an exception here when build velox operator plan.

@facebook-github-bot
Copy link
Contributor

@xiaoxmeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

1 similar comment
@facebook-github-bot
Copy link
Contributor

@xiaoxmeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@mbasmanova
Copy link
Contributor

@xiaoxmeng format-check CI job is failing. Would you take a look?

Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xiaoxmeng The aggregate function changes look good. There is a CircleCI failure that needs fixing. How do you feel about refactoring the test in a separate PR to make extending the test more readable?

@mbasmanova
Copy link
Contributor

@xiaoxmeng format-check CircleCI job is still failing. Would you run 'make format-fix' and update the PR?

@xiaoxmeng xiaoxmeng force-pushed the velox branch 4 times, most recently from f663c72 to f51a9d5 Compare July 8, 2022 21:47
Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. Thank you!

@facebook-github-bot
Copy link
Contributor

@xiaoxmeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@xiaoxmeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@xiaoxmeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

1 similar comment
@facebook-github-bot
Copy link
Contributor

@xiaoxmeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

xiaoxmeng added a commit to xiaoxmeng/velox that referenced this pull request Jul 9, 2022
Summary:
Add varchar support for min/max_by with unit tests. The followup is to add e2e test coverage.

Pull Request resolved: facebookincubator#1846

Reviewed By: mbasmanova

Differential Revision: D37325257

Pulled By: xiaoxmeng

fbshipit-source-id: 4fc99a5864e4ac5a70935884741aad334fe4a59a
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D37325257

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D37325257

xiaoxmeng added a commit to xiaoxmeng/velox that referenced this pull request Jul 9, 2022
Summary:
Add varchar support for min/max_by with unit tests. The followup is to add e2e test coverage.

Pull Request resolved: facebookincubator#1846

Reviewed By: mbasmanova

Differential Revision: D37325257

Pulled By: xiaoxmeng

fbshipit-source-id: 288bb9285cf7ca50a32a55765223531709e59578
xiaoxmeng added a commit to xiaoxmeng/velox that referenced this pull request Jul 9, 2022
Summary:
Add varchar support for min/max_by with unit tests. The followup is to add e2e test coverage.

Pull Request resolved: facebookincubator#1846

Reviewed By: mbasmanova

Differential Revision: D37325257

Pulled By: xiaoxmeng

fbshipit-source-id: 18d3cfc5d96531522fdcb5158dccafe450cb6522
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D37325257

Summary:
Add varchar support for min/max_by with unit tests. The followup is to add e2e test coverage.

Pull Request resolved: facebookincubator#1846

Reviewed By: mbasmanova

Differential Revision: D37325257

Pulled By: xiaoxmeng

fbshipit-source-id: 63f3a5dc38d708d4915b76087b3037c80be51c97
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D37325257

marin-ma pushed a commit to marin-ma/velox-oap that referenced this pull request Dec 15, 2023
This patch adds more spark 3.3 unit tests to gluten. The excluded ones will be fixed in a following patch.
```
- Various partition value types 
- Various inferred partition value types 
- Various partition value types 
- Various inferred partition value types 
- SPARK-32908: maximum target error in percentile_approx 

- SPARK-36825, SPARK-36854: year-month/day-time intervals written and read as INT32/INT64 
- support batch reads for schema 
- SPARK-36182: read TimestampNTZ as TimestampLTZ 
- SPARK-36797: Union should resolve nested columns as top-level columns 
- SPARK-37371: UnionExec should support columnar if all children support columnar 
- SPARK-36280: Remove redundant aliases after RewritePredicateSubquery 
- SPARK-36182: read TimestampNTZ as TimestampLTZ 
- SPARK-39833: pushed filters with project without filter columns 
- SPARK-36825, SPARK-36854: year-month/day-time intervals written and read as INT32/INT64 
- support batch reads for schema 
- SPARK-36794: Ignore duplicated key when building relation for semi/anti hash join 
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants