Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

同步 hudi master #1

Merged
merged 102 commits into from
Jan 26, 2021
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
102 commits
Select commit Hold shift + click to select a range
3d5e9fe
[MINOR] refactor code in HoodieMergeHandle (#2272)
leesf Nov 28, 2020
36ce5bc
[HUDI-1424] Write Type changed to BULK_INSERT when set ENABLE_ROW_WR…
pengzhiwei2018 Nov 30, 2020
b826c53
[HUDI-1373] Add Support for OpenJ9 JVM (#2231)
guykhazma Dec 1, 2020
ac23d25
[HUDI-1357] Added a check to validate records are not lost during mer…
prashantwason Dec 1, 2020
78fd122
[HUDI-1196] Update HoodieKey when deduplicating records with global i…
rmpifer Dec 1, 2020
1f0d5c0
[HUDI-1349] spark sql support overwrite use insert_overwrite_table (#…
lw309637554 Dec 3, 2020
62b392b
[HUDI-1343] Add standard schema postprocessor which would rewrite the…
liujinhui1994 Dec 4, 2020
319b7a5
[HUDI-1427] Fix FileAlreadyExistsException when set HOODIE_AUTO_COMMI…
pengzhiwei2018 Dec 5, 2020
de2fbea
[HUDI-1412] Make HoodieWriteConfig support setting different default …
wangxianghu Dec 7, 2020
3a91d26
fix typo (#2308)
jshmchenxi Dec 8, 2020
fce1453
[HUDI-1040] Make Hudi support Spark 3 (#2208)
zhedoubushishi Dec 9, 2020
007014c
[MINOR] Throw an exception when keyGenerator initialization failed (#…
wangxianghu Dec 10, 2020
bd9ccec
[HUDI-1395] Fix partition path using FSUtils (#2312)
xushiyan Dec 10, 2020
4bc45a3
[HUDI-1445] Refactor AbstractHoodieLogRecordScanner to use Builder (#…
danny0405 Dec 10, 2020
6cf25d5
[MINOR] Minor improve in IncrementalRelation (#2314)
wangxianghu Dec 10, 2020
236d1b0
[HUDI-1439] Remove scala dependency from hudi-client-common (#2306)
shenh062326 Dec 11, 2020
11bc1fe
[HUDI-1428] Clean old fileslice is invalid (#2292)
yui2010 Dec 13, 2020
facde4c
[HUDI-1448] Hudi dla sync support skip rt table syncing (#2324)
lw309637554 Dec 14, 2020
069a1dc
[HUDI-1435] Fix bug in Marker File Reconciliation for Non-Partitioned…
bvaradar Dec 15, 2020
93d9c25
[MINOR] Improve code readability by passing in the fileComparisonsRDD…
danny0405 Dec 15, 2020
26cdc45
[HUDI-1376] Drop Hudi metadata cols at the beginning of Spark datasou…
zhedoubushishi Dec 16, 2020
6a6b772
[MINOR] Fix error information in exception (#2341)
lichang-bd Dec 16, 2020
4ddfc61
[MINOR] Make QuickstartUtil generate random timestamp instead of 0 (#…
wangxianghu Dec 17, 2020
14d5d11
[HUDI-1406] Add date partition based source input selector for Delta …
bhasudha Dec 17, 2020
8b5d6f9
[HUDI-1437] support more accurate spark JobGroup for better performa…
lw309637554 Dec 17, 2020
5388c7f
[HUDI-1470] Use the latest writer schema, when reading from existing …
nbalajee Dec 18, 2020
33d338f
[HUDI-115] Adding DefaultHoodieRecordPayload to honor ordering with c…
nsivabalan Dec 20, 2020
e4e2fbc
[HUDI-1419] Add base implementation for hudi java client (#2286)
shenh062326 Dec 20, 2020
0c821fe
[MINOR] Pass root exception to HoodieKeyGeneratorException for more i…
jshmchenxi Dec 22, 2020
6dc03b6
[HUDI-1075] Implement simple clustering strategies to create Clusteri…
satishkotha Nov 8, 2020
959afb8
Merge pull request #2263 from satishkotha/sk/clustering
satishkotha Dec 22, 2020
f8ccb28
[HUDI-1471] Make QuickStartUtils generate deletes according to specif…
wangxianghu Dec 22, 2020
01ad449
[HUDI-1485] Fix Deletes issued without any prior commits exception (#…
wangxianghu Dec 22, 2020
38b9264
[HUDI-1488] Fix Test Case Failure in TestHBaseIndex (#2365)
pengzhiwei2018 Dec 23, 2020
89f482e
[HUDI-1489] Fix null pointer exception when reading updated written b…
zhedoubushishi Dec 23, 2020
286055c
[HUDI-1451] Support bulk insert v2 with Spark 3.0.0 (#2328)
zhedoubushishi Dec 25, 2020
e807bb8
[HUDI-1487] fix unit test testCopyOnWriteStorage random failed (#2364)
lw309637554 Dec 25, 2020
3ec9270
[HUDI-1490] Incremental Query should work even when there are partit…
bvaradar Dec 26, 2020
8cf6a72
[HUDI-1331] Adding support for validating entire dataset and long run…
nsivabalan Dec 26, 2020
9e6889a
[HUDI-1481] add structured streaming and delta streamer clustering …
lw309637554 Dec 28, 2020
6cdf59d
[HUDI-1354] Block updates and replace on file groups in clustering (#…
lw309637554 Dec 28, 2020
e177466
[HUDI-1350] Support Partition level delete API in HUDI (#2254)
lw309637554 Dec 28, 2020
76faf59
[HUDI-1495] Upgrade Flink version to 1.12.0 (#2384)
danny0405 Dec 29, 2020
0ecdec3
[MINOR] Remove the duplicate code in AbstractHoodieWriteClient.startC…
danny0405 Dec 29, 2020
4c17528
[HUDI-1398] Align insert file size for reducing IO (#2256)
yui2010 Dec 29, 2020
b83d1d3
[HUDI-1484] Escape the partition value in HiveSyncTool (#2363)
pengzhiwei2018 Dec 29, 2020
da51aa6
[HUDI-1474] Add additional unit tests to TestHBaseIndex (#2349)
nbalajee Dec 29, 2020
e33a8f7
[HUDI-1147] Modify GenericRecordFullPayloadGenerator to generate vali…
nbalajee Dec 29, 2020
c6bf952
[HUDI-1493] Fixed schema compatibility check for fields. (#2350)
prashantwason Dec 30, 2020
ef28763
[MINOR] Update report_coverage.sh (#2396)
wangxianghu Dec 30, 2020
605b617
[HUDI-1434] fix incorrect log file path in HoodieWriteStat (#2300)
garyli1019 Dec 30, 2020
c5e8a02
[HUDI-1418] Set up flink client unit test infra (#2281)
garyli1019 Dec 31, 2020
a23aa41
[MINOR] Sync UpsertPartitioner modify of HUDI-1398 to flink/java (#2390)
yui2010 Dec 31, 2020
ff8313c
[HUDI-1423] Support delete in hudi-java-client (#2353)
shenh062326 Jan 3, 2021
c3e9243
[MINOR] Add maven profile to support skipping shade sources jars (#2358)
jshmchenxi Jan 4, 2021
298808b
[HUDI-842] Implementation of HUDI RFC-15.
prashantwason Dec 31, 2020
4e64226
[HUDI-1450] Use metadata table for listing in HoodieROTablePathFilter…
umehrot2 Dec 31, 2020
2bd4a68
[HUDI-1469] Faster initialization of metadata table using parallelize…
prashantwason Dec 31, 2020
4b94529
[HUDI-1325] [RFC-15] Merge updates of unsynced instants to metadata t…
Dec 31, 2020
1a0579c
[HUDI-1312] [RFC-15] Support for metadata listing for snapshot querie…
rmpifer Dec 29, 2020
31e674e
[HUDI-1504] Allow log files generated during restore/rollback to be s…
vinothchandar Jan 4, 2021
698694a
[HUDI-1498] Read clustering plan from requested file for inflight ins…
satishkotha Jan 4, 2021
47c5e51
[HUDI-1506] Fix wrong exception thrown in HoodieAvroUtils (#2405)
wangxianghu Jan 6, 2021
da2919a
[HUDI-1383] Fixing sorting of partition vals for hive sync computatio…
nsivabalan Jan 6, 2021
2c4868e
[HUDI-1507] Change timeline utils to support reading replacecommit me…
satishkotha Jan 6, 2021
b593f10
[MINOR] Rename unit test package of hudi-spark3 from scala to java (#…
wangxianghu Jan 6, 2021
5ff8e88
[HUDI-1513] Introduce WriteClient#preWrite() and relocate metadata ta…
vinothchandar Jan 7, 2021
17df517
[HUDI-1510] Move HoodieEngineContext and its dependencies to hudi-com…
umehrot2 Jan 7, 2021
c151147
[MINOR] Sync HUDI-1196 to FlinkWriteHelper (#2415)
Trevor-zhang Jan 8, 2021
1a836f9
[HUDI-1514] Avoid raw type use for parameter of Transformer interface…
puyvqi Jan 9, 2021
79ec7b4
[HUDI-920] Support Incremental query for MOR table (#1938)
garyli1019 Jan 9, 2021
65866c4
[HUDI-1276] [HUDI-1459] Make Clustering/ReplaceCommit and Metadata ta…
vinothchandar Jan 10, 2021
368c1a8
[HUDI-1399] support a independent clustering spark job to asynchronou…
lw309637554 Jan 10, 2021
23e93d0
[MINOR] fix spark 3 build for incremental query on MOR (#2425)
garyli1019 Jan 10, 2021
7ce3ac7
[HUDI-1479] Use HoodieEngineContext to parallelize fetching of partit…
umehrot2 Jan 11, 2021
de42adc
[HUDI-1520] add configure for spark sql overwrite use INSERT_OVERWRIT…
lw309637554 Jan 11, 2021
e3d3677
[HUDI-1502] MOR rollback and restore support for metadata sync (#2421)
nsivabalan Jan 11, 2021
e926c1a
HUDI-1525 fix test hbase index (#2436)
n3nash Jan 13, 2021
749f657
[HUDI-1509]: Reverting LinkedHashSet changes to combine fields from o…
n3nash Jan 14, 2021
a43e191
[MINOR] Bumping snapshot version to 0.7.0 (#2435)
nsivabalan Jan 16, 2021
3d1d5d0
[HUDI-1533] Make SerializableSchema work for large schemas and add ab…
satishkotha Jan 17, 2021
684e12e
[HUDI-1529] Add block size to the FileStatus objects returned from me…
umehrot2 Jan 18, 2021
a38612b
[HUDI-1532] Fixed suboptimal implementation of a magic sequence searc…
vburenin Jan 19, 2021
b9c2856
[HUDI-1535] Fix 0.7.0 snapshot (#2456)
nsivabalan Jan 19, 2021
91b9cb5
[MINOR] Fixing setting defaults for index config (#2457)
nsivabalan Jan 19, 2021
e23967b
[HUDI-1540] Fixing commons codec shading in spark bundle (#2460)
nsivabalan Jan 20, 2021
5ca0625
[HUDI 1308] Harden RFC-15 Implementation based on production testing …
vinothchandar Jan 20, 2021
c931dc5
[MINOR] Remove redundant judgments (#2466)
teeyog Jan 20, 2021
244f6de
[MINOR] Fix dataSource cannot use hoodie.datasource.hive_sync.auto_cr…
liujinhui1994 Jan 20, 2021
3719e7b
Moving to 0.8.0-SNAPSHOT on master branch.
vinothchandar Jan 20, 2021
5e30fc1
[MINOR] Disabling problematic tests temporarily to stabilize CI (#2468)
vinothchandar Jan 20, 2021
81ccb0c
[MINOR] Make a separate travis CI job for hudi-utilities (#2469)
vinothchandar Jan 21, 2021
976420c
[HUDI-1512] Fix spark 2 unit tests failure with Spark 3 (#2412)
zhedoubushishi Jan 21, 2021
b64d22e
[HUDI-1511] InstantGenerateOperator support multiple parallelism (#2434)
loukey-lj Jan 22, 2021
641abe8
[HUDI-1332] Introduce FlinkHoodieBloomIndex to hudi-flink-client (#2375)
Nieal-Yang Jan 22, 2021
748dcc9
[MINOR] Remove InstantGeneratorOperator parallelism limit in HoodieFl…
wangxianghu Jan 22, 2021
048633d
[MINOR] Improve code readability,remove the continue keyword (#2459)
c-f-cooper Jan 22, 2021
d3ea0f9
[HOTFIX] Revert upgrade flink verison to 1.12.0 (#2473)
wangxianghu Jan 22, 2021
e302c6b
[HUDI-1453] Fix NPE using HoodieFlinkStreamer to etl data from kafka …
wangxianghu Jan 23, 2021
84df263
[MINOR] Use skipTests flag for skip.hudi-spark2.unit.tests property (…
xushiyan Jan 24, 2021
81836f0
Removing spring repos from pom (#2481)
vinothchandar Jan 24, 2021
c4afd17
[HUDI-1476] Introduce unit test infra for java client (#2478)
shenh062326 Jan 24, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
[HUDI-1428] Clean old fileslice is invalid (apache#2292)
Co-authored-by: zhang wen <wen.zhang@dmall.com>
Co-authored-by: zhang wen <steven@stevendeMac-mini.local>
  • Loading branch information
3 people committed Dec 13, 2020
commit 11bc1fe6f498850d2c496151741813001d3850a3
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,7 @@ public class CleanPlanner<T extends HoodieRecordPayload, I, K, O> implements Ser
public CleanPlanner(HoodieTable<T, I, K, O> hoodieTable, HoodieWriteConfig config) {
this.hoodieTable = hoodieTable;
this.fileSystemView = hoodieTable.getHoodieView();
this.commitTimeline = hoodieTable.getCompletedCommitTimeline();
this.commitTimeline = hoodieTable.getCompletedCommitsTimeline();
this.config = config;
this.fgIdToPendingCompactionOperations =
((SyncableFileSystemView) hoodieTable.getSliceView()).getPendingCompactionOperations()
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -644,6 +644,50 @@ public void testKeepLatestFileVersionsMOR() throws Exception {
assertTrue(testTable.logFileExists(p0, "001", file1P0, 3));
}

/**
* Test HoodieTable.clean() Cleaning by commit logic for MOR table with Log files.
*/
@Test
public void testKeepLatestCommitsMOR() throws Exception {

HoodieWriteConfig config =
HoodieWriteConfig.newBuilder().withPath(basePath).withAssumeDatePartitioning(true)
.withCompactionConfig(HoodieCompactionConfig.newBuilder()
.withCleanerPolicy(HoodieCleaningPolicy.KEEP_LATEST_COMMITS).retainCommits(1).build())
.build();

HoodieTableMetaClient metaClient = HoodieTestUtils.init(hadoopConf, basePath, HoodieTableType.MERGE_ON_READ);
HoodieTestTable testTable = HoodieTestTable.of(metaClient);
String p0 = "2020/01/01";

// Make 3 files, one base file and 2 log files associated with base file
String file1P0 = testTable.addDeltaCommit("000").getFileIdsWithBaseFilesInPartitions(p0).get(p0);
testTable.forDeltaCommit("000")
.withLogFile(p0, file1P0, 1)
.withLogFile(p0, file1P0, 2);

// Make 2 files, one base file and 1 log files associated with base file
testTable.addDeltaCommit("001")
.withBaseFilesInPartition(p0, file1P0)
.withLogFile(p0, file1P0, 3);

// Make 2 files, one base file and 1 log files associated with base file
testTable.addDeltaCommit("002")
.withBaseFilesInPartition(p0, file1P0)
.withLogFile(p0, file1P0, 4);

List<HoodieCleanStat> hoodieCleanStats = runCleaner(config);
assertEquals(3,
getCleanStat(hoodieCleanStats, p0).getSuccessDeleteFiles()
.size(), "Must clean three files, one parquet and 2 log files");
assertFalse(testTable.baseFileExists(p0, "000", file1P0));
assertFalse(testTable.logFilesExist(p0, "000", file1P0, 1, 2));
assertTrue(testTable.baseFileExists(p0, "001", file1P0));
assertTrue(testTable.logFileExists(p0, "001", file1P0, 3));
assertTrue(testTable.baseFileExists(p0, "002", file1P0));
assertTrue(testTable.logFileExists(p0, "002", file1P0, 4));
}

@Test
public void testCleanMetadataUpgradeDowngrade() {
String instantTime = "000";
Expand Down