-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HBASE-27795: Define RPC API for cache cleaning #5492
base: master
Are you sure you want to change the base?
Conversation
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this initial work, @Kota-SH ! I have a few remarks, though. Please let me know if those make sense to you.
|
||
/** | ||
* Clean BucketCache | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Let's be more specific. What does it mean to clean BucketCache? We are evicting everything? And why this is only for BucketCache, not for all BlockCache implementations?
} | ||
|
||
message CleanBucketCacheResponse { | ||
repeated string uncached_files = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we return a Map<uncached_files, num_blocks_evicted> ?
@@ -170,4 +172,8 @@ default boolean isMetaBlock(BlockType blockType) { | |||
default Optional<Map<String, Boolean>> getFullyCachedFiles() { | |||
return Optional.empty(); | |||
} | |||
|
|||
default List<HStoreFile> getCachedButClosedFiles() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of exposing this info.
And let's use Optional to avoid returning null anti-pattern, just like the getFullyCachedFiles method above.
Also the naming is not very intuitive. We could call it getFilesWithStaleBlocks and let's put an explanatory javadoc.
Additionally, BlockCache interface should defined the clean method as well. It's up for the BlockCache implementation to deal with the eviction logic.
@@ -3946,4 +3949,19 @@ public GetCachedFilesListResponse getCachedFilesList(RpcController controller, | |||
}); | |||
return responseBuilder.addAllCachedFiles(fullyCachedFiles).build(); | |||
} | |||
|
|||
@Override | |||
public CleanBucketCacheResponse cleanBucketCache(RpcController controller, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused why we are implementing the eviction logic here. The RSRpcServices should be just a facade, finding "orphans" blocks and evict it should be a responsibility of the cache implementation class.
💔 -1 overall
This message was automatically generated. |
1 similar comment
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
|
||
@Override | ||
public Optional<Map<String, Integer>> uncacheStaleBlocks() { | ||
return l2Cache.uncacheStaleBlocks(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should do this to L1 as well.
|
||
public void setFilesWithStaleBlocks(HRegion hRegion) { | ||
for (HStore hs : hRegion.getStores()) { | ||
filesWithStaleBlocks.addAll(hs.getStorefiles()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we check first which files from the region's store have blocks in the cache? If the total cache size is smaller than the store file size, we might not be able to cache all files.
Also, I wonder what's the heap usage impact of keeping such extra list. Could you check that with a heap dump?
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
@wchevreuil - Could you please review this updated patch and provide feedback? Thanks, |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
* @return A map of filename and number of blocks evicted. | ||
*/ | ||
default Optional<Map<String, Integer>> | ||
uncacheStaleBlocks(RegionAvailabilityChecker regionAvailabilityChecker) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @Apache9 meant to use the RegionServerServices
interface in the method signature, not creating a new interface.
Map<String, Integer> uncachedStaleBlocksMap = | ||
l1Cache.uncacheStaleBlocks(regionAvailabilityChecker).orElseGet(HashMap::new); | ||
l2Cache.uncacheStaleBlocks(regionAvailabilityChecker).ifPresent( | ||
map2 -> map2.forEach((key, value) -> uncachedStaleBlocksMap.merge(key, value, Integer::sum))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why merge? We should just do putAll, no?
|
||
public Optional<Map<String, Integer>> | ||
uncacheStaleBlocks(RegionAvailabilityChecker regionAvailabilityChecker) { | ||
Map<String, Integer> evictedFilesWithStaleBlocks = new HashMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copying my previous concerns shared on the other PR:
Hum, I think we should make this method async. And we shouldn't block the RPC call until its finished. We should dedicate a thread pool with a max size of one.
I'm worried now this might not be a lightweight operation, if the related RPC is synchronous (as implemented here), we might eventually timeout and client might submit another call, eventually exhausting RPC handlers.
Simply making the RPC async isn't enough, we should also make sure we don't have more than one background thread running this.`
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
this.getBlockCache().flatMap(BlockCache::getFullyCachedFiles).orElse(Collections.emptyMap()); | ||
Map<String, Integer> evictedFilesWithStaleBlocks = new ConcurrentHashMap<>(); | ||
|
||
ExecutorService executor = Executors.newFixedThreadPool(6); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The threadpool should be global for HRegionServer. We don't want a new threadpool created on every client call to this method, that would potentially kill the RS if an operator calls it multiple times on a short period.
Callable<Void> task = () -> { | ||
HRegion regionOnServer = getRegion(value.getFirst()); | ||
int blocksEvicted = (regionOnServer == null || !regionOnServer.isAvailable()) | ||
? this.getBlockCache().get().evictBlocksByHfileName(fileName) | ||
: 0; | ||
evictedFilesWithStaleBlocks.put(fileName, blocksEvicted); | ||
LOG.info( | ||
"Uncached {} blocks belonging to the file {} as the region {} " | ||
+ "is not served by the region server {} anymore.", | ||
blocksEvicted, fileName, value.getFirst(), this.getServerName()); | ||
return null; | ||
}; | ||
tasks.add(task); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should do a single task per call to the method, not a task per file, that would create yet another collection in memory with as many objects as the total files cached. On large caches, that would be impacting.
} finally { | ||
executor.shutdown(); | ||
} | ||
return evictedFilesWithStaleBlocks; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should think about an alternative result here, as this map is effectively being updated by the background task, which can be confusing for the caller. I would rather make this as void, or a boolean, returning true indicating that the task is submitted and is running in the background.
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
Jira: https://issues.apache.org/jira/browse/HBASE-27795