Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SOR: implement downward-compatible update #152807

Closed
pgayvallet opened this issue Mar 7, 2023 · 15 comments
Closed

SOR: implement downward-compatible update #152807

pgayvallet opened this issue Mar 7, 2023 · 15 comments
Assignees
Labels
Epic:ZDTmigrations Zero downtime migrations Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc

Comments

@pgayvallet
Copy link
Contributor

pgayvallet commented Mar 7, 2023

Part of #150312

Because our update and bulkUpdate operations are effectively partial updates, we can’t run the attributes against our migration function (migration functions take the full SO document as input/out).

So, in order to support the new modelVersion option for update operations, we would need to perform a ‘client-side’ update, by having Kibana fetch the document and perform the update locally, instead of using the _update Elasticsearch API.

Screenshot 2023-03-07 at 13 43 21

We could, in theory, do that only when we're in managed upgrade mode, but to avoid unnecessary divergence in the code, we will just do it for both modes.

questions

  • update counter ?
  • automatic retries ?
  • conflict handling ?
  • preflight checks in update operations no longer 'required' given we fetch the document for updates
@botelastic botelastic bot added the needs-team Issues missing a team label label Mar 7, 2023
@pgayvallet pgayvallet added the Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc label Mar 7, 2023
@botelastic botelastic bot removed the needs-team Issues missing a team label label Mar 7, 2023
@pgayvallet pgayvallet added the Epic:ZDTmigrations Zero downtime migrations label Mar 7, 2023
@rudolf
Copy link
Contributor

rudolf commented Mar 9, 2023

We should consider applying our schema validation (#118969) to the update method now that we have the full saved object

@pgayvallet
Copy link
Contributor Author

pgayvallet commented Mar 9, 2023

Yeah. FWIW we need to port the schema feature to model versions for that, as atm the model version interface doesn't have it - I created #152994

@pgayvallet pgayvallet assigned pgayvallet and unassigned pgayvallet Mar 28, 2023
@TinaHeiligers TinaHeiligers self-assigned this May 8, 2023
@TinaHeiligers
Copy link
Contributor

Tentatively assigned this to me, as discussed during planning.

@TinaHeiligers
Copy link
Contributor

TinaHeiligers commented May 17, 2023

It looks like #152994 was handled in https://github.com/elastic/kibana/pull/156788/files#top, correct?

@TinaHeiligers
Copy link
Contributor

TinaHeiligers commented May 18, 2023

I had to review the code again to see exactly how update's implemented.
My initial thoughts on step 1:

Fetch the document:

Update makes at least one round trip to ES in the preflight checks for alias conflict and we already have the full document early on:

public async preflightCheckNamespaces({
type,
id,
namespace,
initialNamespaces,
}: PreflightCheckNamespacesParams): Promise<PreflightCheckNamespacesResult> {
if (!this.registry.isMultiNamespace(type)) {
throw new Error(`Cannot make preflight get request for non-multi-namespace type '${type}'.`);
}
const { body, statusCode, headers } = await this.client.get<SavedObjectsRawDocSource>(
{
id: this.serializer.generateRawId(undefined, type, id),
index: this.getIndexForType(type),
},
{
ignore: [404],
meta: true,
}
);

While these preflight checks don't return the full doc, if we change that and work with the doc from there, we won't need to make another round trip to fetch it.

The only conditions I can think of where we won't have an existing doc from the preflight checks is if there's an error (we throw early anyway and don't need to do anything more), or the doc doesn't exist and upsert is used. Sure, we'll need to create a new doc, since one doesn't exist in the namespace we're working in but that's ok, right?

I've still got to work through the rest.
As for the model migration (v1 -> v2 & v2(updatedDoc) -> v1), the document migrator handles that, correct?

const migrated = migrator.migrateDocument({
id,
type,
...(savedObjectNamespace && { namespace: savedObjectNamespace }),
...(savedObjectNamespaces && { namespaces: savedObjectNamespaces }),
attributes: {
...(await encryptionHelper.optionallyEncryptAttributes(type, id, namespace, upsert)),
},
updated_at: time,
});

where did we land with the v2 -> v1 conversion? Drop extra optional fields?
I'm not too familiar with this part.

@pgayvallet thoughts?

@pgayvallet
Copy link
Contributor Author

While these preflight checks don't return the full doc, if we change that and work with the doc from there, we won't need to make another round trip to fetch it.

I agree, I think that's the trick here. Just fetch the document (or documents for bulk update) regardless of it's namespaceType at the beginning of the operation and then apply the preflight logic if required, then continue from there.

Sure, we'll need to create a new doc, since one doesn't exist in the namespace we're working in but that's ok, right?

Yeah, no doc + upsert => no problem I think.

As for the model migration (v1 -> v2 & v2(updatedDoc) -> v1), the document migrator handles that, correct?

Yes, this logic will be delegated to the Kibana/Document migrator.

where did we land with the v2 -> v1 conversion? Drop extra optional fields?

Yes, atm it's just based on the version schema. But this should be an implementation detail for what you do there, the SOR would just ask the migrator to convert the document to the correct version

@TinaHeiligers
Copy link
Contributor

TinaHeiligers commented Jun 6, 2023

preflight checks in update operations no longer 'required' given we fetch the document for updates

Whether we need the check depends on how we're handling retrieving the doc but yes, barring edge cases, these shouldn't be needed.

I have a couple of edge cases in mind regarding race conditions between sharing and updating the same doc in the new shared space:

Scenario:

  • some user, say, Bob, shares a SO to a space it wasn't originally shared to and edits the doc in the new space
  • Some other user, say, James, is still busy editing and saving the original doc.

Given version control is concurrent though, the chances of that causing issues are minimal. The worst case is there's a conflict in the doc and the changes wouldn't stick.

@TinaHeiligers
Copy link
Contributor

TinaHeiligers commented Jun 20, 2023

@pgayvallet
Update on progress:
I'm having a hard time refactoring and implementing version compatibility, given that we now have to implement the update or upsert behavior client-side.

Adding the new permutations has exploded the conditional code control flow, along with fighting with types (not wanting to force the issue with casting too much).

The upsert-as-create path is relatively simple, it's the pure update that's got me ATM 😵‍💫
At this stage, the work is at risk of not making the deadline and I'll need to pair for a fresh 👁️ .

@pgayvallet
Copy link
Contributor Author

Adding the new permutations has exploded the conditional code control flow

Anything in particular causing this, or just the overall added complexity of the client-update operation?

@TinaHeiligers
Copy link
Contributor

Anything in particular causing this, or just the overall added complexity of the client-update operation?

It's the overall complexity of implementing what ES would otherwise handle. I'm busy with a diagram to get all the variations nailed down and will share it with the team when done.

FWIW, an initial implementation failed miserably on Kibana startup with Fleet package registration 😖 so that didn't work!
I'm looking at extracting the decision tree code into smaller utils to ensure we have no uncaught errors.

@TinaHeiligers
Copy link
Contributor

TinaHeiligers commented Jul 10, 2023

Progress update:

  • update refactored, working on updating tests
  • bulkUpdate still TODO

@TinaHeiligers
Copy link
Contributor

Progress update:

  • update api integration tests passing locally for:
    • test/api_integration/apis/saved_objects/update.ts,
    • x-pack/test/saved_object_api_integration/security_and_spaces (basic & trial license configs),
    • x-pack/test/saved_object_api_integration/spaces_only
  • src/core/server/integration_tests/saved_objects/routes/update.test.ts passing locally
  • update unit tests (mostly) refactored & passing: packages/core/saved-objects/core-saved-objects-api-server-internal/src/lib/repository.test.ts, todos: ~ half of the client calls.
  • Kibana runs against es snapshot

Remaining open questions:

  • implement retries
  • handle conflicts
  • update counter

All of #bulk_delete

@TinaHeiligers
Copy link
Contributor

Progress update:
The saved object related api integration test failed again, these are fixed.

Not done and can probably be with follow-up PRs:

  • validation on creating a saved object
  • refactor for better dev ex

The remaining open questions are still open. We need to decide if/how to handle those at some point too.
Neither create or index accept retrying, meaning another client-side implementation. A straightforward approach could be to only retry the calls to elasticsearch. Not sure it would be enough though.

@pgayvallet
Copy link
Contributor Author

opened a dedicated issue to track the work on bulkUpdate: #165434

pgayvallet added a commit that referenced this issue Sep 1, 2023
Part of #152807

## Summary

Change the way the `update` API of the savedObjects repository works, to
stop using ES `update` and perform the update manually instead by
fetching the document, applying the update and then re-indexing the
whole document.

This is required for BWC and version cohabitation reasons, to allow us
converting the document to the last known version before applying the
update.

The retry on conflict behavior is manually performed too, by detecting
conflict errors during the `index` calls and performing the whole loop
(fetch->migrate->update->index) again.

Upserts are done in a similar way, by checking the document's existence
first, and then using the `create` API.

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
Co-authored-by: pgayvallet <pierre.gayvallet@elastic.co>
kibanamachine pushed a commit to kibanamachine/kibana that referenced this issue Sep 1, 2023
Part of elastic#152807

## Summary

Change the way the `update` API of the savedObjects repository works, to
stop using ES `update` and perform the update manually instead by
fetching the document, applying the update and then re-indexing the
whole document.

This is required for BWC and version cohabitation reasons, to allow us
converting the document to the last known version before applying the
update.

The retry on conflict behavior is manually performed too, by detecting
conflict errors during the `index` calls and performing the whole loop
(fetch->migrate->update->index) again.

Upserts are done in a similar way, by checking the document's existence
first, and then using the `create` API.

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
Co-authored-by: pgayvallet <pierre.gayvallet@elastic.co>
(cherry picked from commit 7279296)
kibanamachine pushed a commit to kibanamachine/kibana that referenced this issue Sep 1, 2023
Part of elastic#152807

## Summary

Change the way the `update` API of the savedObjects repository works, to
stop using ES `update` and perform the update manually instead by
fetching the document, applying the update and then re-indexing the
whole document.

This is required for BWC and version cohabitation reasons, to allow us
converting the document to the last known version before applying the
update.

The retry on conflict behavior is manually performed too, by detecting
conflict errors during the `index` calls and performing the whole loop
(fetch->migrate->update->index) again.

Upserts are done in a similar way, by checking the document's existence
first, and then using the `create` API.

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
Co-authored-by: pgayvallet <pierre.gayvallet@elastic.co>
(cherry picked from commit 7279296)
kibanamachine added a commit that referenced this issue Sep 1, 2023
# Backport

This will backport the following commits from `main` to `8.10`:
- [[SOR] implement downward compatible `update`
(#161822)](#161822)

<!--- Backport version: 8.9.7 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sqren/backport)

<!--BACKPORT [{"author":{"name":"Christiane (Tina)
Heiligers","email":"christiane.heiligers@elastic.co"},"sourceCommit":{"committedDate":"2023-09-01T08:27:17Z","message":"[SOR]
implement downward compatible `update` (#161822)\n\nPart of
#152807
Summary\r\n\r\nChange the way the `update` API of the savedObjects
repository works, to\r\nstop using ES `update` and perform the update
manually instead by\r\nfetching the document, applying the update and
then re-indexing the\r\nwhole document.\r\n\r\nThis is required for BWC
and version cohabitation reasons, to allow us\r\nconverting the document
to the last known version before applying the\r\nupdate.\r\n\r\nThe
retry on conflict behavior is manually performed too, by
detecting\r\nconflict errors during the `index` calls and performing the
whole loop\r\n(fetch->migrate->update->index) again.\r\n\r\nUpserts are
done in a similar way, by checking the document's existence\r\nfirst,
and then using the `create` API.\r\n\r\n---------\r\n\r\nCo-authored-by:
kibanamachine
<42973632+kibanamachine@users.noreply.github.com>\r\nCo-authored-by:
pgayvallet
<pierre.gayvallet@elastic.co>","sha":"727929683e036abe1d5b8c2b05d9c0e5a7539b14","branchLabelMapping":{"^v8.11.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["Team:Core","Feature:Saved
Objects","release_note:skip","backport:all-open","v8.10.0","v8.11.0"],"number":161822,"url":"#161822
implement downward compatible `update` (#161822)\n\nPart of
#152807
Summary\r\n\r\nChange the way the `update` API of the savedObjects
repository works, to\r\nstop using ES `update` and perform the update
manually instead by\r\nfetching the document, applying the update and
then re-indexing the\r\nwhole document.\r\n\r\nThis is required for BWC
and version cohabitation reasons, to allow us\r\nconverting the document
to the last known version before applying the\r\nupdate.\r\n\r\nThe
retry on conflict behavior is manually performed too, by
detecting\r\nconflict errors during the `index` calls and performing the
whole loop\r\n(fetch->migrate->update->index) again.\r\n\r\nUpserts are
done in a similar way, by checking the document's existence\r\nfirst,
and then using the `create` API.\r\n\r\n---------\r\n\r\nCo-authored-by:
kibanamachine
<42973632+kibanamachine@users.noreply.github.com>\r\nCo-authored-by:
pgayvallet
<pierre.gayvallet@elastic.co>","sha":"727929683e036abe1d5b8c2b05d9c0e5a7539b14"}},"sourceBranch":"main","suggestedTargetBranches":["8.10"],"targetPullRequestStates":[{"branch":"8.10","label":"v8.10.0","labelRegex":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"},{"branch":"main","label":"v8.11.0","labelRegex":"^v8.11.0$","isSourceBranch":true,"state":"MERGED","url":"#161822
implement downward compatible `update` (#161822)\n\nPart of
#152807
Summary\r\n\r\nChange the way the `update` API of the savedObjects
repository works, to\r\nstop using ES `update` and perform the update
manually instead by\r\nfetching the document, applying the update and
then re-indexing the\r\nwhole document.\r\n\r\nThis is required for BWC
and version cohabitation reasons, to allow us\r\nconverting the document
to the last known version before applying the\r\nupdate.\r\n\r\nThe
retry on conflict behavior is manually performed too, by
detecting\r\nconflict errors during the `index` calls and performing the
whole loop\r\n(fetch->migrate->update->index) again.\r\n\r\nUpserts are
done in a similar way, by checking the document's existence\r\nfirst,
and then using the `create` API.\r\n\r\n---------\r\n\r\nCo-authored-by:
kibanamachine
<42973632+kibanamachine@users.noreply.github.com>\r\nCo-authored-by:
pgayvallet
<pierre.gayvallet@elastic.co>","sha":"727929683e036abe1d5b8c2b05d9c0e5a7539b14"}}]}]
BACKPORT-->

Co-authored-by: Christiane (Tina) Heiligers <christiane.heiligers@elastic.co>
bryce-b pushed a commit to bryce-b/kibana that referenced this issue Sep 19, 2023
Part of elastic#152807

## Summary

Change the way the `update` API of the savedObjects repository works, to
stop using ES `update` and perform the update manually instead by
fetching the document, applying the update and then re-indexing the
whole document.

This is required for BWC and version cohabitation reasons, to allow us
converting the document to the last known version before applying the
update.

The retry on conflict behavior is manually performed too, by detecting
conflict errors during the `index` calls and performing the whole loop
(fetch->migrate->update->index) again.

Upserts are done in a similar way, by checking the document's existence
first, and then using the `create` API.

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
Co-authored-by: pgayvallet <pierre.gayvallet@elastic.co>
rudolf added a commit that referenced this issue Sep 26, 2023
## Summary

We've fixed the update limitation
(#152807), so now it only
applies to bulkUpdates #165434

### Checklist

Delete any items that are not applicable to this PR.

- [ ] Any text added follows [EUI's writing
guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses
sentence case text and includes [i18n
support](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md)
- [ ]
[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)
was added for features that require explanation or tutorials
- [ ] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [ ] Any UI touched in this PR is usable by keyboard only (learn more
about [keyboard accessibility](https://webaim.org/techniques/keyboard/))
- [ ] Any UI touched in this PR does not create any new axe failures
(run axe in browser:
[FF](https://addons.mozilla.org/en-US/firefox/addon/axe-devtools/),
[Chrome](https://chrome.google.com/webstore/detail/axe-web-accessibility-tes/lhdoppojpmngadmnindnejefpokejbdd?hl=en-US))
- [ ] If a plugin configuration key changed, check if it needs to be
allowlisted in the cloud and added to the [docker
list](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)
- [ ] This renders correctly on smaller devices using a responsive
layout. (You can test this [in your
browser](https://www.browserstack.com/guide/responsive-testing-on-local-server))
- [ ] This was checked for [cross-browser
compatibility](https://www.elastic.co/support/matrix#matrix_browsers)


### Risk Matrix

Delete this section if it is not applicable to this PR.

Before closing this PR, invite QA, stakeholders, and other developers to
identify risks that should be tested prior to the change/feature
release.

When forming the risk matrix, consider some of the following examples
and how they may potentially impact the change:

| Risk | Probability | Severity | Mitigation/Notes |

|---------------------------|-------------|----------|-------------------------|
| Multiple Spaces&mdash;unexpected behavior in non-default Kibana Space.
| Low | High | Integration tests will verify that all features are still
supported in non-default Kibana Space and when user switches between
spaces. |
| Multiple nodes&mdash;Elasticsearch polling might have race conditions
when multiple Kibana nodes are polling for the same tasks. | High | Low
| Tasks are idempotent, so executing them multiple times will not result
in logical error, but will degrade performance. To test for this case we
add plenty of unit tests around this logic and document manual testing
procedure. |
| Code should gracefully handle cases when feature X or plugin Y are
disabled. | Medium | High | Unit tests will verify that any feature flag
or plugin combination still results in our service operational. |
| [See more potential risk
examples](https://github.com/elastic/kibana/blob/main/RISK_MATRIX.mdx) |


### For maintainers

- [ ] This was checked for breaking API changes and was [labeled
appropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)
@TinaHeiligers
Copy link
Contributor

Closed by #165434

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Epic:ZDTmigrations Zero downtime migrations Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc
Projects
None yet
Development

No branches or pull requests

3 participants