-
Notifications
You must be signed in to change notification settings - Fork 494
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDDS-4337. Implement RocksDB options cache for new datanode DB utilities. #1544
Conversation
Removal of DBProfile also got mixed in here, but not finished.
If I understand this correctly, in AbstractDatanodeStore.start(), it pulls the default CF options from a static variable where they are setup on first use. This creates a single Its a little hard to follow the logic to ensure this cache is definitely being reused. Could you create a docker cluster, use Freon to create some containers and then check the RocksDB log files to ensure the same cache instance is being used - There is an example of what to look for to prove this in https://issues.apache.org/jira/browse/HDDS-4246. I also think the 8MB shared cache is probably too small. Could we add a configuration for it, and make the default higher, maybe 64MB for now? What do you think? |
Thanks, @sodonnel , I will add a setting for cache size and test this in a docker cluster. Looking at the rocks docs it appears those LRUCache objects can't be accidentally modified by callers, so I could also expose it as public and add a unit test to check that the same cache is used. |
hadoop-hdds/common/src/main/java/org/apache/hadoop/ozone/OzoneConfigKeys.java
Outdated
Show resolved
Hide resolved
hadoop-hdds/common/src/main/java/org/apache/hadoop/ozone/OzoneConfigKeys.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes look good to me. I am +1 on them, so I will go ahead and commit.
* master: (53 commits) HDDS-4458. Fix Max Transaction ID value in OM. (apache#1585) HDDS-4442. Disable the location information of audit logger to reduce overhead (apache#1567) HDDS-4441. Add metrics for ACL related operations.(Addendum for HA). (apache#1584) HDDS-4081. Create ZH translation of StorageContainerManager.md in doc. (apache#1558) HDDS-4080. Create ZH translation of OzoneManager.md in doc. (apache#1541) HDDS-4079. Create ZH translation of Containers.md in doc. (apache#1539) HDDS-4184. Add Features menu for Chinese document. (apache#1547) HDDS-4235. Ozone client FS path validation is not present in OFS. (apache#1582) HDDS-4338. Fix the issue that SCM web UI banner shows "HDFS SCM". (apache#1583) HDDS-4337. Implement RocksDB options cache for new datanode DB utilities. (apache#1544) HDDS-4083. Create ZH translation of Recon.md in doc (apache#1575) HDDS-4453. Replicate closed container for random selected datanodes. (apache#1574) HDDS-4408: terminate Datanode when Datanode State Machine Thread got uncaught exception. (apache#1533) HDDS-4443. Recon: Using Mysql database throws exception and fails startup (apache#1570) HDDS-4315. Use Epoch to generate unique ObjectIDs (apache#1480) HDDS-4455. Fix typo in README.md doc (apache#1578) HDDS-4441. Add metrics for ACL related operations. (apache#1571) HDDS-4437. Avoid unnecessary builder conversion in setting volume Quota/Owner request (apache#1564) HDDS-4417. Simplify Ozone client code with configuration object (apache#1542) HDDS-4363. Add metric to track the number of RocksDB open/close operations. (apache#1530) ...
What changes were proposed in this pull request
Add a cache of
ColumnFamilyOptions
objects toAbstractDatanodeStore
, so that eachConfigurationSource
object used to construct a container will result in the same RocksDBColumnFamilyOptions
object being used. Since the same configuration instance is used for all containers on a datanode, this results in all containers on a datanode using the sameColumnFamilyOptions
, and by extension the same RocksDB metadata cache. A configuration parameter is added for adjusting the size of this cache. Each RocksDB instance will have its ownDBOptions
, since this requires the use of RocksDBStatistics
objects which are not thread safe and cannot be shared.The
DBStoreBuilder
class was refactored to make configuration easier. It is now possible to use the same configurable column family options for all column families, and it is easier to reason about which configurations will be applied. It is no longer an error to register a table twice with the builder, the second call will simply override the first, which is usually the expected behavior in setters for builders. There should be no other functionality changes to this class.What is the link to the Apache JIRA
HDDS-4337
How was this patch tested?
Automated Testing
DBStoreBuilder
unit tests were updated.TestKeyValueContainer
to test thatColumnFamilyOptions
objects are shared between containers with the same configuration.Manual Testing
Manual testing was also done with a docker cluster to verify that the same cache is used for all containers on a datanode. Steps:
ozone freon ockg -n 100
ozone admin pipeline list
to get the pipeline IDs.ozone admin pipeline close <id>
to close the pipeline.grep -A5 "block_cache:" /data/hdds/hdds/*/current/containerDir0/1/metadata/1-dn-container.db/LOG
and
grep -A5 "block_cache:" /data/hdds/hdds/*/current/containerDir0/2/metadata/2-dn-container.db/LOG
block_cache:
is the same for both containers.