-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall seems reasonable.
return | ||
} | ||
} | ||
}() |
This comment was marked as abuse.
This comment was marked as abuse.
Sorry, something went wrong.
return err.Error() | ||
} | ||
return "" | ||
} |
This comment was marked as abuse.
This comment was marked as abuse.
Sorry, something went wrong.
This comment was marked as abuse.
This comment was marked as abuse.
Sorry, something went wrong.
This comment was marked as abuse.
This comment was marked as abuse.
Sorry, something went wrong.
This comment was marked as abuse.
This comment was marked as abuse.
Sorry, something went wrong.
This comment was marked as abuse.
This comment was marked as abuse.
Sorry, something went wrong.
This comment was marked as abuse.
This comment was marked as abuse.
Sorry, something went wrong.
This comment was marked as abuse.
This comment was marked as abuse.
Sorry, something went wrong.
} | ||
|
||
// Subscribe registers a remote platform.Platform implementation as | ||
// representing a particular instance ID, blocking indefinitely. |
This comment was marked as abuse.
This comment was marked as abuse.
Sorry, something went wrong.
This comment was marked as abuse.
This comment was marked as abuse.
Sorry, something went wrong.
Error string | ||
} | ||
|
||
type ping struct{} |
This comment was marked as abuse.
This comment was marked as abuse.
Sorry, something went wrong.
This comment was marked as abuse.
This comment was marked as abuse.
Sorry, something went wrong.
This comment was marked as abuse.
This comment was marked as abuse.
Sorry, something went wrong.
This comment was marked as abuse.
This comment was marked as abuse.
Sorry, something went wrong.
As it stands, this will have problems with long-running regrades, since they take an unbounded amount of time, and the timeouts here will look like failures. So it will have to wait until we sort regrades out (#256). |
dec4396
to
ee2e50b
Compare
|
ee2e50b
to
7ec508c
Compare
Deployment notes This can be deployed without changing config (it will default to using the standalone bus). To use NATS:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple questions...
|
||
// requester just collect the things you need to make a request | ||
// together | ||
type requester struct { |
This comment was marked as abuse.
This comment was marked as abuse.
Sorry, something went wrong.
This comment was marked as abuse.
This comment was marked as abuse.
Sorry, something went wrong.
// platform.FatalError returned when processing requests will result | ||
// in the platform being deregistered, with the error put on the | ||
// channel `done`. | ||
func (n *NATS) Subscribe(instID flux.InstanceID, remote platform.Platform, done chan<- error) { |
This comment was marked as abuse.
This comment was marked as abuse.
Sorry, something went wrong.
This comment was marked as abuse.
This comment was marked as abuse.
Sorry, something went wrong.
This comment was marked as abuse.
This comment was marked as abuse.
Sorry, something went wrong.
This comment was marked as abuse.
This comment was marked as abuse.
Sorry, something went wrong.
This comment was marked as abuse.
This comment was marked as abuse.
Sorry, something went wrong.
Fixes #216 |
fc07f89
to
147e32c
Compare
The expected behaviour of Subscribe is that any errors will cause it to exit (as well as being passed back to the caller), so do that. There's no need to use a subscription for each method, when we have wildcards. We have to switch on the method and do our own decoding, but we had to do a (type) switch anyway. Doing it this way makes tidying up easier.
.. Use the method `Ping` (and `ping{}` and `".Platform.Ping"`) instead of Presence .. and rewrite Subscribe .. and use the platform.MockPlatform in tests .. and make the tests work otherwise
.. that is, call the subscribed platform's Ping.
`errorAsString` and `stringAsError` are better tin labels.
We now distinguish between application errors and "fatal" errors, that is those indicating the connection has failed. Only the latter must tear down the subscription.
The Regrade method can take an arbitrary amount of time, since it depends on k8s completing a rolling upgrade. While we stick to the current daemon-service protocol, this means we can't fail a regrade after 5 seconds -- most of them will take longer than that. So, set it to twenty minutes, because why not. The risk that comes with this is that a particular call *should* time out after five seconds, because the platform has actually gone away and will never answer. However, since Regrade comes after other calls (on short timeouts) to the platform, there's a fairly small window for this to happen.
This is done simply by broadcasting a "kick" message to the instance topic. A unique ID is used so that a subscription doesn't kick itself. (This involved moving GUID (or pseudo-GUID) generation to its own package, since I wanted guids for subscriptions too.)
This adds a metric for the total kicks, which may be helpful in detecting flip-flopping between multiple daemons using the same instance token.
147e32c
to
ada8781
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! 🌮
@@ -0,0 +1,190 @@ | |||
package nats |
This comment was marked as abuse.
This comment was marked as abuse.
Sorry, something went wrong.
This comment was marked as abuse.
This comment was marked as abuse.
Sorry, something went wrong.
Going to wait until a weekday to merge .. |
In short: this provides an implementation of
MessageBus
which sends requests over NATS.What's the point? If we are serving daemon connections from more than one pod (i.e., if we scale the fluxsvc deployment to > 1) then the pod that answers a client request may be different to the pod that is serving the daemon's connection. Using a message bus (i.e., NATS) connects them up.
It may be possible to make the
net/rpc
abstractions work with NATS (though I didn't find a library for it). However, we have a requirement which is difficult to reconcile withnet/rpc
: each request must be addressed and routed to a specific daemon. For that reason I implemented a simple scheme using NATS operations (Subscribe, Request) directly.