Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: Resolving the SVC resolution issue #4388

Open
oncilla opened this issue Sep 2, 2023 · 3 comments
Open

proposal: Resolving the SVC resolution issue #4388

oncilla opened this issue Sep 2, 2023 · 3 comments
Labels
i/proposal A new idea requiring additional input and discussion

Comments

@oncilla
Copy link
Contributor

oncilla commented Sep 2, 2023

Status: Draft
Note that this proposal is not finished yet. There are some TODOs, and we probably want some PoCs to validate the claims.

Resolving the SVC resolution issue

Currently, we have a process that is called SVC (service) resolution in the
SCION control plane exchanges. Its purpose is to resolve the address of a
SCION control plane service. This is done by sending a packet with an SVC
destination address to the target AS. The response contains the address where
the service is reachable (technically a map from protocol to address, but for
now we have ever only supported one protocol)

TODO: Why SVC resolution is an issue

Uses

SVC resolution has currently the following two uses:

  1. Bootstrapping communication to remote AS

    Given we are at a very deep layer in the networking stack, we cannot rely on
    many other systems during bootstrapping. To establish SCION control plane
    connections, we need to talk to services whose addresses we do not know
    beforehand. The connection is stream oriented, thus we need packets to
    consistently delivered to the same server. SVC resolution is a way to achieve
    this. However, currently, I don't see the use case.

  2. Bootstrapping SCION path from One-Hop paths

    Currently, during beaconing, One-Hop paths are used to send the SVC resolutions
    the responses are sent using a full SCION path, which allows the SVC resolution
    client to bootstrap a valid SCION dataplane path.

Both of these two uses can be achieved in a different manner. Use 1 can be
solved in various different ways, we do not necessarily need SVC resolution.
Use 2 is not obviously necessary. If we still require it, we could also decouple
it from SVC resolution.

Background

TODO: Add more background

  • How we got here?
  • Packet based RPC stack -> move to stream based approach
  • Limitations of quic-go

Proposal 1 - SVC resolution free communication

One key observation is that we are using QUIC in our SCION control plane
protocol. During the design of QUIC, a lot of thought went into connection
migration and resumption. Every QUIC packet carries a connection
ID
. Different
connections are de-multiplexed based on their connection ID, and not based on
the addresses of the packets. The connection ID can also be used by load
balancers to infer where the packet should be sent to. E.g.,
quic-lb
attempts to standardize a scheme to encode routing data in the connection ID.

We can leverage this fact in our SCION control plane too. In the world of QUIC
connections, resolving an address first to establish the connection is
unnecessary. As long as the packets get routed to the same server, it will
manage to identify the connection by the connection ID. The destination address
is irrelevant. Thus, the client can simply establish a QUIC connection with an
SVC destination address. This has been proven to work with
#4387 that simply drops SVC
resolution when dialing gRPC connections for the SCION control plane. All
packets from the client contain an SVC address as the destination. The reply
packets contains the real server address.

:::{note}

In theory, the client could change the destination address after it has received
the first response. However, in the past this has been proven to be hard in
practice because various parts of the quic-go implementation have assumptions
that the destination address does not change. It is also not required, thus,
I would advocate for not doing it.

:::

Keeping Backwards Compatibility - De-multiplexing

Simply dropping SVC resolution is not an AS internal change, the whole network
needs to adapt. Naively switching on this new behavior would lead to
interruptions, which is not feasible given our productive deployments.

However, there are two key observations here:

  • Other than dialing the QUIC connection, neither the client nor the server
    change at all.
  • SVC resolution requests can be distinguished from QUIC packets. They UDP
    payload of a SVC resolution request is empty (0 bytes), every QUIC packet
    carries some information and the UDP payload is never 0 bytes.

To keep backwards compatibility, we can use these observations. We initialize a
packet connection that passes all non-empty payload packets further up the stack
to the QUIC server, and all empty payload packets are treated as SVC resolution
requsts. Luckily, we have implement something like this in the past which can be
used as inspiration:
svc.ResolverPacketDispatcher.

With this change, the control server will open a single UDP/SCION socket and
handle SVC resolution and regular QUIC connections on the same socket. This allows
a two phase rollout plan: First upgrade the whole network with the server side
changes, then enable the client side. We could even do a "happy eyeballs" approach
and try both at the same time.

Supporting Multiple SVC Destinations

In current deployments, there is usually only one control/discovery service
reachable from any given border router. However, in the future, we might want to
support setups where multiple instances can be reached via a border router. This
is still possible. In such a setup, the connection ID will encode the target
instance. (e.g., with the
quic-lb
scheme). This will allow for consistent routing across the different QUIC
packets of the same connection. Given we do not have such a use case yet, we do
not need to handle it right now. Implementation may vary, but they are AS
internal implementation details. Every as can decide how to do this without
affecting any other AS in the network.

Drawbacks

This proposal relies on the fact that we are using QUIC. It is crucial that we
have connection IDs, such that connections can be identified and routed
consistently. This is a slight layer violation.

Alternatives Considered

TODO

@oncilla oncilla added the i/proposal A new idea requiring additional input and discussion label Sep 2, 2023
@jiceatscion
Copy link
Contributor

jiceatscion commented Sep 12, 2023

Thanks for the write-up. I do not quite understand how proposal 2 addresses the titular "Resolving the svc resolution issue". Aren't the two topics distinct? Give-or-take one enabling the other...may be?

Regarding your first proposal.

  • Is the fact the we use QUIC in any way material to your proposed approach? It seems that you're just using QUIC's as an inspiration here. Did I miss something?

Otherwise, I agree that the number of moving parts involved in resolving exactly one service is ludicrous. So, I'm all for removing some of the redundancy. From what I understand, there are currently two layers of indirection (even 3 if counting the dispatcher):

  • The router maps the SVC number to a previously registered address. (From a config file, so that could be considered redundant too).
  • What the router maps to isn't the service, but yet another resolution service, which sole purpose is to respond with the real service's address.

Now, I think you're suggesting to remove the second resolution mechanism and let the router do the mapping for every packet. Given the simplicity of performing the mapping compared to everything else, that seems reasonable. On the other hand, I don't understand how the mapping is kept stable between packets in the presence of more than one instance of the service. It does not, right? The QUIC connection ID is enough for the server to [de]multiplex its clients, but there still has to be a single server. If we want to solve that later, it might not actually be possible without a visible inter-AS protocol change.

So, may be we should consider taking a slightly higher road? How about actually solving the client-side destination-address-update? Is that outright infeasible? Quic-LB looks a bit Rube-Goldbergesque and may be was not designed to solved that problem. It is meant to be handled by load-balancers 2 layers above the router so, not sure we can leverage it.

I have more than one reason to consider this... I'd like to see one day when ordinary, unmodified applications can use SCION without relying on a gateway. There are many necessary conditions, but one of them is that TCP connections or at least UDP connections can map to some underlying SCION protocol. Because the control service is an application, if there's ever more than one instance, we'd have to be able to support anycast properly. Therefore QUIC, even with its connection-ID isn't enough unless the client-side updates its destination address.

@matzf
Copy link
Contributor

matzf commented Oct 3, 2023

Regarding Proposal 1, Supporting Multiple SVC Destinations:

This proposal relies on the fact that we are using QUIC. It is crucial that we have connection IDs, such that connections can be identified and routed consistently. This is a slight layer violation.

As a simple, layer-violation-free and transport-agnostic alternative, a stateful load balancer can use the tuple (SourceISD, SrcAS, SrcHostAddr, FlowID) as connection identifier to consistently route packets to an anycast destination. (I'm aware that the flow ID is currently not initialized appropriately in the snet library, but this is something that we should be able to fix.)
This is more clumsy than using the QUIC connection IDs (because state), but it could be a fallback option for the case that we'd move the RPCs away from QUIC to something that does not make this easy.

@oncilla
Copy link
Contributor Author

oncilla commented Nov 3, 2023

Alternative Proposal

(copying from slack so that we do not lose it: https://scionproto.slack.com/archives/C8ADA9CEP/p1696322961523659)

A while back I wrote a proposal to remove the SVC resolution round trip. (Still need to address the comments on that proposal)
The proposal made the assumption that we still want to have SVC addresses. The only real use-case nowadays we still have is for bootstrapping initial communication to an AS that is not our direct neighbor.
E.g., when you want to send a segment request to a remote AS.
For direct neighbors, technically do not need the SVC resolution, because they can agree on the address out-of-band. They anyway need to agree on information out-of-band (e.g., interface ID)
Now, going from a service to description to a layer 3 address is a solved problem. It is called DNS.
(A bit tongue in cheek, but essentially true)
Sadly, we cannot rely completely on DNS because of the circular dependency it would create.
I.e., to reach the DNS server you already need a path, but to get a path, you need to know the service address, which you would get from DNS.
If the only "real" use case for SVC addresses is the bootstrapping of communication, I think we can do away with it altogether.
We already have a mechanism to distribute information about an AS in the system. Our path beacons/segments.
The idea would be to include the address of DNS (or DNS like) service in the AS information.
The clients can then resolve the target using this address (and cache it if necessary)
For load balancing, the initial proposal relied on the fact that the transport is QUIC.
With this alternative proposal, I think load balancing can be done without requiring a certain transport.
(Although, I do not see us switching away from QUIC any time soon for our control plane traffic.)
At this point, I don't think we need the SVC address at all anymore, and can just drop it.
The beautiful thing is that even this change can be done in a backwards compatible way.
We can add the additional info to the AS info. New clients chose the new mechanism based on its existence in the AS info.
Old clients keep the same behavior. As soon as all clients have been upgraded, we can just turn off the old way.
Now, my question to you:

  • Do you think SVC addresses have a different purpose that I have missed?
  • Do you think SVC addresses are worth keeping?
  • Do you think it is worth fleshing out these thoughts in an alternative proposal?

(will be further fleshed out)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
i/proposal A new idea requiring additional input and discussion
Projects
None yet
Development

No branches or pull requests

3 participants