Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gitrest: initial write speedup #13249

Merged
merged 15 commits into from
Dec 12, 2022

Conversation

znewton
Copy link
Contributor

@znewton znewton commented Dec 5, 2022

Description

When using a remote filesystem, the overhead of filesystem operations makes shredded summaries extremely slow to read/write. We already work around this for Read by computing and storing a latest "Full Summary" single file that is used on read for most "Latest" summary reads.

The only Summary Write operation in the critical path is when creating the initial summary for a new document. This PR attempts to reduce the overhead of a shredded summary read/write by "compressing" the git tree for the initial summary into a single file (using Isomorphic git with memfs).

Breaking Changes

N/A, only impacts new documents

Reviewer Guidance

I validated the following scenarios to ensure functionality with each enableLowIoWrite setting ("initial", true, false) as well as forward/backwards compatibility using local R11s and my fluid app.

  • Write service summary from initial summary
  • Write client summary from initial summary from first client
  • Write client summary from initial summary from 2nd client (no service summary in the middle)
  • Write service summary after client summary(ies)
  • Write client summary after service summary(ies)
  • Load from service summary
  • Load from initial summary
  • Load from client summary

@github-actions github-actions bot added area: server Server related issues (routerlicious) base: main PRs targeted against main branch labels Dec 5, 2022
@znewton znewton marked this pull request as ready for review December 6, 2022 22:51
@znewton znewton requested review from msfluid-bot and a team as code owners December 6, 2022 22:51
);
Lumberjack.info("Low-IO mode: Writing summary to storage");
const summaryTreeHandle = this.writeSummaryTreeCore(
[{
Copy link
Member

@yangg-msft yangg-msft Dec 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generate a metric to capture tenant id, doc id, content length, and processing time? #Closed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or if we want to keep this as a long, we can also add those properties in the properties parameter for Lumberjack.info().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed most of these inner logs, but added additional properties to the wrapping summary write metric that will help us understand each one in our telemetry (newDocument, commit/tree sha, and enableLowIoWrite).

Copy link
Contributor

@hedasilv hedasilv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! Thanks for working on this. I have a few questions to help me understand the overall flow.

  1. In writeContainerSummary(), an in-memory repoManager is used as a way to obtain the full summary associated with an initial summary upload. Traditionally, we would have very high I/O latency because internally, GitRest would break up the initial summary payload into small Git objects, and upload each one of them individually to the chosen filesystem. The trick here is, then, to do exactly the same, but instead of using the original filesystem, have the uploads happen to an in-memory filesystem. By uploading the Git Objects to the temporary, in-memory repoManager, and later reading from there, we end up computing the full summary without the need to make multiple network calls. And then, once we have the full summary in hand, we write it as a single blob to Git. Is that correct?
  2. What happens to that initial full summary blob on subsequent summary writes? My current understanding is that on a subsequent incremental summary write, the full blob would be read from storage and parsed so GitRest and compute the incremental diff. Once that is done, it will generate Git objects (blobs, trees, commits and refs) as usual, uploading them in a "shredded" way to the repoManager/storage as usual, since this is not the critical path anymore. As a result, the initial full summary blob would not be referenced again, and would have to be eventually garbage collected (once that is available). Is that correct?

Copy link
Member

@yangg-msft yangg-msft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@znewton
Copy link
Contributor Author

znewton commented Dec 12, 2022

Merging because confident in correctness (tested with R11s and FRS extensively). Also, new functionality is hidden behind a feature flag, so it will only be enabled in intentional scenarios.

@znewton znewton merged commit 2f2e063 into microsoft:main Dec 12, 2022
@znewton znewton deleted the gitrest/initial-write-speedup branch December 12, 2022 18:27
znewton added a commit that referenced this pull request Feb 1, 2023
## Description

In theory, the `memfs` `Volume` object used in the new Gitrest
optimizations from #13249 should be garbage collected by Node.js after
the function ends. However, we see symptoms of a new memory leak in
Gitrest following this change, so I am adding a manual reset of the
created memfs Volume after its usage is complete. We do this in another
service in FRS that uses similar functionality, but I did not think it
would be necessary in Gitrest due to the differences in implementation.
zhenmichael pushed a commit that referenced this pull request Feb 2, 2023
## Description

In theory, the `memfs` `Volume` object used in the new Gitrest
optimizations from #13249 should be garbage collected by Node.js after
the function ends. However, we see symptoms of a new memory leak in
Gitrest following this change, so I am adding a manual reset of the
created memfs Volume after its usage is complete. We do this in another
service in FRS that uses similar functionality, but I did not think it
would be necessary in Gitrest due to the differences in implementation.
daesun-park pushed a commit to daesun-park/FluidFramework that referenced this pull request Feb 8, 2023
## Description

In theory, the `memfs` `Volume` object used in the new Gitrest
optimizations from microsoft#13249 should be garbage collected by Node.js after
the function ends. However, we see symptoms of a new memory leak in
Gitrest following this change, so I am adding a manual reset of the
created memfs Volume after its usage is complete. We do this in another
service in FRS that uses similar functionality, but I did not think it
would be necessary in Gitrest due to the differences in implementation.
znewton added a commit that referenced this pull request Mar 28, 2023
## Description

#13249 introduced `enableLowIoWrite` flag. We use `"initial"` in AFR to
speed up createDocument flow when contacting a remote storage service.
However, `true` flag was broken. I think somewhere down the line, the
error handler that I am introducing in this PR got removed in some
iteration, because I remember adding it early on. The previous version
of this error handler is visible in the [very first iteration of this
work](https://github.com/znewton/FluidFramework/pull/3/files#diff-d7c2a8fd402df5711115c7fba1f9e1a9da28cfaf472e5beb28c93afba58fe4abR524)
kian-thompson pushed a commit to kian-thompson/FluidFramework that referenced this pull request Mar 28, 2023
## Description

microsoft#13249 introduced `enableLowIoWrite` flag. We use `"initial"` in AFR to
speed up createDocument flow when contacting a remote storage service.
However, `true` flag was broken. I think somewhere down the line, the
error handler that I am introducing in this PR got removed in some
iteration, because I remember adding it early on. The previous version
of this error handler is visible in the [very first iteration of this
work](https://github.com/znewton/FluidFramework/pull/3/files#diff-d7c2a8fd402df5711115c7fba1f9e1a9da28cfaf472e5beb28c93afba58fe4abR524)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: server Server related issues (routerlicious) base: main PRs targeted against main branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants