Implement DataFrame nunique #1137

hekaisheng · 2020-04-03T08:46:49Z

What do these changes do?

This PR implements df.unique, it's now a simple way that record all unique values and count them at the aggregation stage, it should be optimized if the amount of unique values are too large.

Related issue number

Resolves #1124 .

qinxuye · 2020-04-04T16:59:36Z

Does this PR solve #1124 or not? At a glance, docs are absent.

codecov · 2020-04-08T03:06:43Z

Codecov Report

❗ No coverage uploaded for pull request base (master@2df0f68). Click here to learn what that means.
The diff coverage is 96.96%.

@@            Coverage Diff            @@
##             master    #1137   +/-   ##
=========================================
  Coverage          ?   92.97%           
=========================================
  Files             ?      642           
  Lines             ?    49944           
  Branches          ?     7416           
=========================================
  Hits              ?    46437           
  Misses            ?     2309           
  Partials          ?     1198

Impacted Files	Coverage Δ
mars/dataframe/reduction/nunique.py	`100% <100%> (ø)`
mars/dataframe/merge/merge.py	`96.27% <100%> (ø)`
mars/dataframe/reduction/core.py	`97.89% <100%> (ø)`
mars/dataframe/reduction/__init__.py	`100% <100%> (ø)`
mars/dataframe/utils.py	`94.59% <88.88%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2df0f68...2b0e966. Read the comment docs.

qinxuye

LGTM overall, some comments are left.

docs/source/locale/zh_CN/LC_MESSAGES/index.po

mars/dataframe/reduction/nunique.py

qinxuye

LGTM

(cherry picked from commit b9f8434)

hekaisheng added type: feature New feature mod: dataframe to be backported Indicate that the PR need to be backported to stable branch labels Apr 3, 2020

hekaisheng added this to the v0.4.0rc1 milestone Apr 3, 2020

hekaisheng changed the title ~~Implement DataFrame unique~~ Implement DataFrame nunique Apr 3, 2020

hekaisheng force-pushed the feature/df-unique branch 2 times, most recently from f11595f to dc496be Compare April 4, 2020 14:29

hekaisheng added 3 commits April 8, 2020 11:05

support dataframe unique

a58a1e7

add utils build_df and build_series

379e495

add docs

060eeb4

hekaisheng force-pushed the feature/df-unique branch from 627b44a to 93e7c70 Compare April 8, 2020 04:20

qinxuye reviewed Apr 8, 2020

View reviewed changes

fix install conda

4432777

hekaisheng force-pushed the feature/df-unique branch from bd3e22c to 4432777 Compare April 8, 2020 06:38

use loop instead of apply

2b0e966

qinxuye approved these changes Apr 9, 2020

View reviewed changes

qinxuye merged commit b9f8434 into mars-project:master Apr 9, 2020

hekaisheng mentioned this pull request Apr 12, 2020

Mars DataFrame roadmap #495

Closed

48 tasks

hekaisheng added a commit to hekaisheng/mars that referenced this pull request Apr 20, 2020

Implement DataFrame nunique (mars-project#1137)

45def4f

(cherry picked from commit b9f8434)

hekaisheng mentioned this pull request Apr 20, 2020

[BACKPORT]Implement DataFrame nunique (#1137) #1170

Merged

hekaisheng added a commit to hekaisheng/mars that referenced this pull request Apr 20, 2020

Implement DataFrame nunique (mars-project#1137)

5da5570

(cherry picked from commit b9f8434)

qinxuye added backport PR backported from pre-release branch to stable branch backported already PR has been backported and removed to be backported Indicate that the PR need to be backported to stable branch backport PR backported from pre-release branch to stable branch labels Apr 21, 2020

hekaisheng deleted the feature/df-unique branch May 14, 2020 08:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement DataFrame nunique #1137

Implement DataFrame nunique #1137

hekaisheng commented Apr 3, 2020 •

edited

Loading

qinxuye commented Apr 4, 2020 •

edited

Loading

codecov bot commented Apr 8, 2020 •

edited

Loading

qinxuye left a comment

qinxuye left a comment

Implement DataFrame nunique #1137

Implement DataFrame nunique #1137

Conversation

hekaisheng commented Apr 3, 2020 • edited Loading

What do these changes do?

Related issue number

qinxuye commented Apr 4, 2020 • edited Loading

codecov bot commented Apr 8, 2020 • edited Loading

Codecov Report

qinxuye left a comment

Choose a reason for hiding this comment

qinxuye left a comment

Choose a reason for hiding this comment

hekaisheng commented Apr 3, 2020 •

edited

Loading

qinxuye commented Apr 4, 2020 •

edited

Loading

codecov bot commented Apr 8, 2020 •

edited

Loading