Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement DataFrame nunique #1137

Merged
merged 5 commits into from
Apr 9, 2020

Conversation

hekaisheng
Copy link
Contributor

@hekaisheng hekaisheng commented Apr 3, 2020

What do these changes do?

This PR implements df.unique, it's now a simple way that record all unique values and count them at the aggregation stage, it should be optimized if the amount of unique values are too large.

Related issue number

Resolves #1124 .

@hekaisheng hekaisheng added type: feature New feature mod: dataframe to be backported Indicate that the PR need to be backported to stable branch labels Apr 3, 2020
@hekaisheng hekaisheng added this to the v0.4.0rc1 milestone Apr 3, 2020
@hekaisheng hekaisheng changed the title Implement DataFrame unique Implement DataFrame nunique Apr 3, 2020
@hekaisheng hekaisheng force-pushed the feature/df-unique branch 2 times, most recently from f11595f to dc496be Compare April 4, 2020 14:29
@qinxuye
Copy link
Collaborator

qinxuye commented Apr 4, 2020

Does this PR solve #1124 or not? At a glance, docs are absent.

@codecov
Copy link

codecov bot commented Apr 8, 2020

Codecov Report

❗ No coverage uploaded for pull request base (master@2df0f68). Click here to learn what that means.
The diff coverage is 96.96%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master    #1137   +/-   ##
=========================================
  Coverage          ?   92.97%           
=========================================
  Files             ?      642           
  Lines             ?    49944           
  Branches          ?     7416           
=========================================
  Hits              ?    46437           
  Misses            ?     2309           
  Partials          ?     1198
Impacted Files Coverage Δ
mars/dataframe/reduction/nunique.py 100% <100%> (ø)
mars/dataframe/merge/merge.py 96.27% <100%> (ø)
mars/dataframe/reduction/core.py 97.89% <100%> (ø)
mars/dataframe/reduction/__init__.py 100% <100%> (ø)
mars/dataframe/utils.py 94.59% <88.88%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2df0f68...2b0e966. Read the comment docs.

Copy link
Collaborator

@qinxuye qinxuye left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall, some comments are left.

docs/source/locale/zh_CN/LC_MESSAGES/index.po Outdated Show resolved Hide resolved
docs/source/locale/zh_CN/LC_MESSAGES/index.po Outdated Show resolved Hide resolved
mars/dataframe/reduction/nunique.py Outdated Show resolved Hide resolved
mars/dataframe/reduction/nunique.py Outdated Show resolved Hide resolved
mars/dataframe/reduction/nunique.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@qinxuye qinxuye left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@qinxuye qinxuye merged commit b9f8434 into mars-project:master Apr 9, 2020
@hekaisheng hekaisheng mentioned this pull request Apr 12, 2020
48 tasks
hekaisheng added a commit to hekaisheng/mars that referenced this pull request Apr 20, 2020
hekaisheng added a commit to hekaisheng/mars that referenced this pull request Apr 20, 2020
@qinxuye qinxuye added backport PR backported from pre-release branch to stable branch backported already PR has been backported and removed to be backported Indicate that the PR need to be backported to stable branch backport PR backported from pre-release branch to stable branch labels Apr 21, 2020
@hekaisheng hekaisheng deleted the feature/df-unique branch May 14, 2020 08:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support nunique for DataFrame
2 participants