Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDDS-4337. Implement RocksDB options cache for new datanode DB utilities. #1544

Merged
merged 17 commits into from
Nov 13, 2020

Conversation

errose28
Copy link
Contributor

@errose28 errose28 commented Nov 2, 2020

What changes were proposed in this pull request

Add a cache of ColumnFamilyOptions objects to AbstractDatanodeStore, so that each ConfigurationSource object used to construct a container will result in the same RocksDB ColumnFamilyOptions object being used. Since the same configuration instance is used for all containers on a datanode, this results in all containers on a datanode using the same ColumnFamilyOptions, and by extension the same RocksDB metadata cache. A configuration parameter is added for adjusting the size of this cache. Each RocksDB instance will have its own DBOptions, since this requires the use of RocksDB Statistics objects which are not thread safe and cannot be shared.

The DBStoreBuilder class was refactored to make configuration easier. It is now possible to use the same configurable column family options for all column families, and it is easier to reason about which configurations will be applied. It is no longer an error to register a table twice with the builder, the second call will simply override the first, which is usually the expected behavior in setters for builders. There should be no other functionality changes to this class.

What is the link to the Apache JIRA

HDDS-4337

How was this patch tested?

Automated Testing

  • DBStoreBuilder unit tests were updated.
  • A unit test was added to TestKeyValueContainer to test that ColumnFamilyOptions objects are shared between containers with the same configuration.

Manual Testing

Manual testing was also done with a docker cluster to verify that the same cache is used for all containers on a datanode. Steps:

  1. Start an ozone docker cluster and enter a datanode.
  2. Write 100 keys: ozone freon ockg -n 100
  3. Close the pipeline those keys were written to.
    • ozone admin pipeline list to get the pipeline IDs.
    • ozone admin pipeline close <id> to close the pipeline.
  4. Write 100 more keys.
  5. Get the address of the cache object from the RocksDB log for each of the two containers:
    grep -A5 "block_cache:" /data/hdds/hdds/*/current/containerDir0/1/metadata/1-dn-container.db/LOG
    and
    grep -A5 "block_cache:" /data/hdds/hdds/*/current/containerDir0/2/metadata/2-dn-container.db/LOG
  • Make sure that the address listed after block_cache: is the same for both containers.

@sodonnel
Copy link
Contributor

sodonnel commented Nov 9, 2020

If I understand this correctly, in AbstractDatanodeStore.start(), it pulls the default CF options from a static variable where they are setup on first use. This creates a single LRUCache(CACHE_SIZE) which is passed into the the DBStoreBuilder for all container RocksDBs. This means they should share the LRU cache.

Its a little hard to follow the logic to ensure this cache is definitely being reused. Could you create a docker cluster, use Freon to create some containers and then check the RocksDB log files to ensure the same cache instance is being used - There is an example of what to look for to prove this in https://issues.apache.org/jira/browse/HDDS-4246.

I also think the 8MB shared cache is probably too small. Could we add a configuration for it, and make the default higher, maybe 64MB for now? What do you think?

@errose28
Copy link
Contributor Author

errose28 commented Nov 9, 2020

Thanks, @sodonnel , I will add a setting for cache size and test this in a docker cluster. Looking at the rocks docs it appears those LRUCache objects can't be accidentally modified by callers, so I could also expose it as public and add a unit test to check that the same cache is used.

@errose28 errose28 changed the title HDDS 4337. Implement RocksDB options cache for new datanode DB utilities. HDDS-4337. Implement RocksDB options cache for new datanode DB utilities. Nov 9, 2020
Copy link
Contributor

@sodonnel sodonnel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes look good to me. I am +1 on them, so I will go ahead and commit.

@sodonnel sodonnel merged commit 760c1e8 into apache:master Nov 13, 2020
errose28 added a commit to errose28/ozone that referenced this pull request Nov 18, 2020
* master: (53 commits)
  HDDS-4458. Fix Max Transaction ID value in OM. (apache#1585)
  HDDS-4442. Disable the location information of audit logger to reduce overhead (apache#1567)
  HDDS-4441. Add metrics for ACL related operations.(Addendum for HA). (apache#1584)
  HDDS-4081. Create ZH translation of StorageContainerManager.md in doc. (apache#1558)
  HDDS-4080. Create ZH translation of OzoneManager.md in doc. (apache#1541)
  HDDS-4079. Create ZH translation of Containers.md in doc. (apache#1539)
  HDDS-4184. Add Features menu for Chinese document. (apache#1547)
  HDDS-4235. Ozone client FS path validation is not present in OFS. (apache#1582)
  HDDS-4338. Fix the issue that SCM web UI banner shows "HDFS SCM". (apache#1583)
  HDDS-4337. Implement RocksDB options cache for new datanode DB utilities. (apache#1544)
  HDDS-4083. Create ZH translation of Recon.md in doc (apache#1575)
  HDDS-4453. Replicate closed container for random selected datanodes. (apache#1574)
  HDDS-4408: terminate Datanode when Datanode State Machine Thread got uncaught exception. (apache#1533)
  HDDS-4443. Recon: Using Mysql database throws exception and fails startup (apache#1570)
  HDDS-4315. Use Epoch to generate unique ObjectIDs (apache#1480)
  HDDS-4455. Fix typo in README.md doc (apache#1578)
  HDDS-4441. Add metrics for ACL related operations. (apache#1571)
  HDDS-4437. Avoid unnecessary builder conversion in setting volume Quota/Owner request (apache#1564)
  HDDS-4417. Simplify Ozone client code with configuration object (apache#1542)
  HDDS-4363. Add metric to track the number of RocksDB open/close operations. (apache#1530)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants