Skip to content
This repository has been archived by the owner on Nov 1, 2022. It is now read-only.

Use NATS as a message bus #246

Merged
merged 12 commits into from
Dec 20, 2016
Merged

Use NATS as a message bus #246

merged 12 commits into from
Dec 20, 2016

Conversation

squaremo
Copy link
Member

In short: this provides an implementation of MessageBus which sends requests over NATS.

What's the point? If we are serving daemon connections from more than one pod (i.e., if we scale the fluxsvc deployment to > 1) then the pod that answers a client request may be different to the pod that is serving the daemon's connection. Using a message bus (i.e., NATS) connects them up.

It may be possible to make the net/rpc abstractions work with NATS (though I didn't find a library for it). However, we have a requirement which is difficult to reconcile with net/rpc: each request must be addressed and routed to a specific daemon. For that reason I implemented a simple scheme using NATS operations (Subscribe, Request) directly.

@squaremo squaremo changed the title Use nats messagebus WIP Use nats messagebus Nov 21, 2016
Copy link
Contributor

@paulbellamy paulbellamy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall seems reasonable.

return
}
}
}()

This comment was marked as abuse.

@squaremo squaremo changed the title WIP Use nats messagebus Use nats messagebus Nov 21, 2016
return err.Error()
}
return ""
}

This comment was marked as abuse.

This comment was marked as abuse.

This comment was marked as abuse.

This comment was marked as abuse.

This comment was marked as abuse.

This comment was marked as abuse.

This comment was marked as abuse.

}

// Subscribe registers a remote platform.Platform implementation as
// representing a particular instance ID, blocking indefinitely.

This comment was marked as abuse.

This comment was marked as abuse.

Error string
}

type ping struct{}

This comment was marked as abuse.

This comment was marked as abuse.

This comment was marked as abuse.

This comment was marked as abuse.

@squaremo squaremo changed the title Use nats messagebus WIP Use nats messagebus Nov 30, 2016
@squaremo
Copy link
Member Author

squaremo commented Nov 30, 2016

As it stands, this will have problems with long-running regrades, since they take an unbounded amount of time, and the timeouts here will look like failures.

So it will have to wait until we sort regrades out (#256).

@squaremo squaremo force-pushed the use-nats-messagebus branch 2 times, most recently from dec4396 to ee2e50b Compare December 12, 2016 12:23
@squaremo
Copy link
Member Author

squaremo commented Dec 12, 2016

  • Increase or remove timeout on Regrade, or otherwise figure out what to do with them

@squaremo squaremo changed the title WIP Use nats messagebus Use NATS as a message bus Dec 12, 2016
@squaremo
Copy link
Member Author

Deployment notes

This can be deployed without changing config (it will default to using the standalone bus). To use NATS:

  • deploy NATS (a minimal configuration is fine to start with)
  • add a --nats-url argument to fluxsvc's config
  • bump up the replicas!

Copy link
Contributor

@paulbellamy paulbellamy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple questions...


// requester just collect the things you need to make a request
// together
type requester struct {

This comment was marked as abuse.

This comment was marked as abuse.

// platform.FatalError returned when processing requests will result
// in the platform being deregistered, with the error put on the
// channel `done`.
func (n *NATS) Subscribe(instID flux.InstanceID, remote platform.Platform, done chan<- error) {

This comment was marked as abuse.

This comment was marked as abuse.

This comment was marked as abuse.

This comment was marked as abuse.

This comment was marked as abuse.

@paulbellamy
Copy link
Contributor

Fixes #216

The expected behaviour of Subscribe is that any errors will cause it
to exit (as well as being passed back to the caller), so do that.

There's no need to use a subscription for each method, when we have
wildcards. We have to switch on the method and do our own decoding,
but we had to do a (type) switch anyway. Doing it this way makes
tidying up easier.
.. Use the method `Ping` (and `ping{}` and `".Platform.Ping"`) instead of Presence
.. and rewrite Subscribe
.. and use the platform.MockPlatform in tests
.. and make the tests work otherwise
.. that is, call the subscribed platform's Ping.
`errorAsString` and `stringAsError` are better tin labels.
We now distinguish between application errors and "fatal" errors, that
is those indicating the connection has failed. Only the latter must
tear down the subscription.
The Regrade method can take an arbitrary amount of time, since it
depends on k8s completing a rolling upgrade. While we stick to the
current daemon-service protocol, this means we can't fail a regrade
after 5 seconds -- most of them will take longer than that.

So, set it to twenty minutes, because why not. The risk that comes
with this is that a particular call *should* time out after five
seconds, because the platform has actually gone away and will never
answer. However, since Regrade comes after other calls (on short
timeouts) to the platform, there's a fairly small window for this to
happen.
This is done simply by broadcasting a "kick" message to the instance
topic. A unique ID is used so that a subscription doesn't kick itself.

(This involved moving GUID (or pseudo-GUID) generation to its own
package, since I wanted guids for subscriptions too.)
This adds a metric for the total kicks, which may be helpful in
detecting flip-flopping between multiple daemons using the same
instance token.
Copy link
Contributor

@paulbellamy paulbellamy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 🌮

@@ -0,0 +1,190 @@
package nats

This comment was marked as abuse.

This comment was marked as abuse.

@squaremo
Copy link
Member Author

Going to wait until a weekday to merge ..

@squaremo squaremo merged commit d2d85f2 into master Dec 20, 2016
@paulbellamy paulbellamy deleted the use-nats-messagebus branch December 20, 2016 09:54
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants