Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

eds: Adding eds caching support to grpc-mux #28273

Merged
merged 9 commits into from
Aug 2, 2023

Conversation

adisuissa
Copy link
Contributor

Commit Message: eds: Adding eds caching support to grpc-mux
Additional Description:

Continuation of PR #28079 (as part of the work for issue #26749).
Currently after an EDS-cluster update, Envoy waits for an EDS response. If a timeout occurs, the EDS-cluster will be used without endpoints.
This PR adds the use of caching into the GrpcMux. The GrpcMux object adds an EDS resource to the cache when it is received/updated, and removes it when there are no longer subscriptions (watchers).
A runtime flag is added to disable the use of the cache, and will be enabled in a future PR when ADS is used.

Next PR will plumb this into ADS, and add fetching of resources from the cache as part of the EdsClusterImpl.

The entire change can be looked here: adisuissa@f0b7ac8

Risk Level: Low - the disabled runtime flag should prevent the use of the cache in non-tests code.
Testing: Added unit tests.
Docs Changes: N/A.
Release Notes: N/A (future PR).
Platform Specific Features: N/A.
Runtime guard: disabled by default: envoy_restart_features_use_eds_cache_for_ads

Signed-off-by: Adi Suissa-Peleg <adip@google.com>
@repokitteh-read-only
Copy link

As a reminder, PRs marked as draft will not be automatically assigned reviewers,
or be handled by maintainer-oncall triage.

Please mark your PR as ready when you want it to be reviewed!

🐱

Caused by: #28273 was opened by adisuissa.

see: more, trace.

@repokitteh-read-only
Copy link

CC @envoyproxy/runtime-guard-changes: FYI only for changes made to (source/common/runtime/runtime_features.cc).

🐱

Caused by: #28273 was opened by adisuissa.

see: more, trace.

Signed-off-by: Adi Suissa-Peleg <adip@google.com>
@adisuissa adisuissa marked this pull request as ready for review July 7, 2023 04:03
@adisuissa
Copy link
Contributor Author

Assigning @htuch for a high-level review.
/assign @htuch

@adisuissa
Copy link
Contributor Author

/retest

@htuch
Copy link
Member

htuch commented Jul 7, 2023

@adisuissa before diving in to review, can you remind me what the plan is for the scenario when we have a cluster being upgraded from non-TLS to TLS? Do we still have an opportunity to wait for the appropriate endpoints, or do we want to discourage the use of synchronization here?

@adisuissa
Copy link
Contributor Author

@adisuissa before diving in to review, can you remind me what the plan is for the scenario when we have a cluster being upgraded from non-TLS to TLS? Do we still have an opportunity to wait for the appropriate endpoints, or do we want to discourage the use of synchronization here?

Yeah, we still support the upgrade from non-TLS to TLS.

Envoy will wait for the EDS assignment, and only if it isn't received (timeout), it will be fetched from the cache.
The main benefit - not breaking anything, because previously an empty assignment will be set on timeout, making the cluster unusable.
The downside is that there is a delay between receiving the cluster update until that cluster is actually used (with the old endpoints). It is the same behavior as before, but may be a bit non-intuitive from operations point-of-view.

@alyssawilk
Copy link
Contributor

/wait on review comments

Signed-off-by: Adi Suissa-Peleg <adip@google.com>
Signed-off-by: Adi Suissa-Peleg <adip@google.com>
Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM implementation modulo nits. Have you verified no regression in test coverage?

Signed-off-by: Adi Suissa-Peleg <adip@google.com>
Signed-off-by: Adi Suissa-Peleg <adip@google.com>
@adisuissa
Copy link
Contributor Author

LGTM implementation modulo nits. Have you verified no regression in test coverage?

That's a good comment. I've added more tests to cover the call to the cache getter (it was only covered by the integration test that was planned in the followup PR).
Also added some tests to the unified_mux implementation, to make it on par with the non-unified mux.

htuch
htuch previously approved these changes Jul 24, 2023
Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@htuch htuch enabled auto-merge (squash) July 24, 2023 16:30
@adisuissa
Copy link
Contributor Author

/retest

Signed-off-by: Adi Suissa-Peleg <adip@google.com>
Signed-off-by: Adi Suissa-Peleg <adip@google.com>
Signed-off-by: Adi Suissa-Peleg <adip@google.com>
@htuch htuch merged commit 6e7d486 into envoyproxy:main Aug 2, 2023
115 checks passed
michaelfinch added a commit to michaelfinch/envoy that referenced this pull request Aug 2, 2023
* main: (144 commits)
  eds: Adding eds caching support to grpc-mux (envoyproxy#28273)
  deps: Bump `build_bazel_rules_apple` -> 2.5.0 (envoyproxy#28747)
  deps: Bump `com_github_cyan4973_xxhash` -> 0.8.2 (envoyproxy#28745)
  deps: Bump `rules_rust` -> 0.26.0 (envoyproxy#28744)
  Revert "Golang: remove HTTP related elements from the golang network … (envoyproxy#28759)
  Golang: remove HTTP related elements from the golang network filter (envoyproxy#28743)
  build(deps): bump jaegertracing/all-in-one from `93ba12f` to `2e210e4` in /examples/shared/jaeger (envoyproxy#28740)
  build(deps): bump pyparsing from 3.0.9 to 3.1.1 in /mobile/docs (envoyproxy#28714)
  deps: Bump `com_github_bufbuild_buf` -> 1.25.0 (envoyproxy#28746)
  Reduce log spam of load shed points if not configured. (envoyproxy#28753)
  [contrib-siporoxy] fix SipTraTest.TraCompleteRetrieveRsp flakyness (envoyproxy#28660)
  Remove implicit deps on fmt_lib (envoyproxy#28662)
  UHV: harmonize check for %00 with legacy path normalization (envoyproxy#28727)
  build(deps): bump redis from `08a82d4` to `b0bdc1a` in /examples/redis (envoyproxy#28713)
  build(deps): bump otel/opentelemetry-collector from `6a76e13` to `dbf77b0` in /examples/opentelemetry (envoyproxy#28741)
  bazel: Switch `exec_tools` -> `tools` (envoyproxy#28729)
  correct some typos for thread_synchronizer.cc and outlier_detection_i… (envoyproxy#28707)
  Cleanup ActiveUdpListenerBase dispatcher call (envoyproxy#28697)
  cleanup: moving checkApiConfigSourceNames to unnamed namespace (envoyproxy#28701)
  ci: Pass through `--nocache_test_results` correctly on scheduled runs (envoyproxy#28690)
  ...
htuch pushed a commit that referenced this pull request Aug 4, 2023
…28842)

#28273 introduced the support of EDS caching in the gRPC muxes.

However, there's a bug when multiple SotW subscriptions are added to the same resource, and the resource is removed once the first subscription is deleted. This PR fixes the issue by removing the resource from the cache only after the last subscription is removed.

Risk Level: Low - code not used yet (already runtime guarded)
Testing: Added an appropriate unit test.

Signed-off-by: Adi Suissa-Peleg <adip@google.com>
htuch pushed a commit that referenced this pull request Aug 8, 2023
Following #28273 this is a cleanup requested to pass c'tor parameters in a single struct.

Risk Level: low - refactor
Testing: N/A

Signed-off-by: Adi Suissa-Peleg <adip@google.com>
htuch pushed a commit that referenced this pull request Mar 1, 2024
…32604)

Caching of EDS responses was added in #28273 and we have been using it internally.
This PR flips the runtime guard to be true by default.
This solves the issue of not receiving EDS assignments after receiving a cluster update (in a CDS response).

Risk Level: medium - may impact memory, but should not impact data-plane/config-plane behavior
Testing: updated in previous PRs
Docs Changes: updated in previous PRs
Release Notes: Added
Platform Specific Features: N/A

Signed-off-by: Adi Suissa-Peleg <adip@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants