Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

watchman & fb-watchman - doc error, impl error, misunderstanding? #564

Open
dcombslinkedin opened this issue Jan 16, 2018 · 13 comments
Open

Comments

@dcombslinkedin
Copy link

In fb-watchman’s README.md is the comment “change notifications are relative to the project root”. It appears this is for both watch and watch-project. However, when I run tests where I do a ‘watch-project’ on a dir 2 levels below the project root (what we seem to be calling the “watch root”), then change a file below that somewhere, the value returned is relative to the watch root, not the project root.

For example, I have ember project test123, in which the directory 'app' exists. If I do a 'watch-project' on 'app', then change the file app/styles/app.css, the change notification I get back is for file "styles/app.css", not "app/styles/app.css".

Is the documentation wrong, or is the implementation wrong? Or am I misunderstanding something?

@sunshowers
Copy link
Contributor

Thanks for the bug report!

Could you provide the JSON output of running watch-project on the app subdirectory?

@wez
Copy link
Contributor

wez commented Jan 16, 2018

Please also include the watchman query that is being run? It sounds like the relative root parameter may be missing?

@wez
Copy link
Contributor

wez commented Jan 16, 2018

(or the opposite; maybe you don't want it to have a relative root set in that case?)

@dcombslinkedin
Copy link
Author

Thanks! A couple questions based on your questions (hope this isn't WAY too much):

  • Which query are we talking about - the request to watch-project, or the subscribe? Right now the 'amasad/sane' code does the following:
    o issues a capabilityCheck (optional relative_root and wildmatch),
    o on return, if relative_root is supported, calls 'watch-project' on the watch root (not project root). if not, issues 'watch' on same dir.
    o on return (assuming w-p was called) stores the project root and relative path to watch root
    (i.e. the two together are the original watch root).
    o on return, Issues a 'clock' command to get the latest time
    o on return, computes options (including setting options.relative_root if it applies, based on
    what was stored 2 steps above.) When I went through the docs, relative_root wasn't mentioned as an option for 'subscribe'. I see it as an option for 'query'. I assumed (apparently incorrectly, that this was an error and removed that line. That's probably my issue.)
    o once options are computed, the original code issues a 'subscribe' against the project root
    (because of storing the project and relative root above). I had changed this to subscribe against the watch root, because my reading of the docs indicated that you didn't explicitly have to subscribe against the project root--it would happen automatically that a single watcher would be created internally for multiple watch roots.
  • If 'watch-project' was originally used, and provided the watch root, why does 'relative_path' need to be provided for 'subscribe' - isn't that implicit in using 'watch-project' in the first place?
  • Should 'relative_root' be added to the 'subscribe' doc?

@dcombslinkedin
Copy link
Author

Sid, you asked about the 'watch-project' call. The input was
client.command(['watch-project', '/Users/dcombs/dev/test123/app'], cb).
The output was {
relative_path: "app",
version: "4.9.0",
watch: "/Users/dcombs/dev/test123",
watcher: "fsevents"
}
so the watch + relative = original watch root.

@dcombslinkedin
Copy link
Author

Maybe a separate question that might clear some of this up: what's the difference between 'watch' and 'subscribe'? The first seems to be 'enable the watching mechanism' and the second seems to be 'tell me about the changes that you see with what happened while watching'. Is that right? (I'm trying to understand why both are needed - doesn't saying "watch" or "watch project" imply that you are interested in being told about them? Sorry for my confusion on this! If there's a better place to ask, I can take my questions there.

@wez
Copy link
Contributor

wez commented Jan 17, 2018

re: terminology: The watch command is an instruction to watchman to watch a portion of the filesystem. It creates an object that we refer to as a root internally, but that our command UI tends to refer to as a watch. Everything inside watchman is relative to one of these root objects. You need to have established a watch on a root in order to be able to perform query or subscribe commands. subscribe is used to ask watchman to delivery realtime updates about changes on a root. query is used to perform a one-off immediate query.

A project is what we use to refer to a sub-tree of a root. We desire the root (and thus all the operating system watch resources) to be aligned to the source control repository boundary so that we can avoid creating overlapping watches. However, this is a little at odds with the interface; clients want to think in terms of querying sub-trees so the watch-project command is used to establish the root in alignment with the repo boundary and give you an offset that you can then use in your queries to make them operate on the sub-tree. (This project aspect of our interface is a bit clunky and is something that I'd like to see evolve to be easier to consume in the future)

So in our model/terminology the watch should be the outermost thing and there may be some number of projects contained within it.

Are you saying that sane wasn't behaving so you tried tweaking it, or that you tried tweaking it and then hit the error you're reporting in this issue?

@dcombslinkedin
Copy link
Author

Wez, thanks for the explanation! That's very helpful.

What I'm doing is based on a request from Stef Penner (I'm also at LinkedIn, working on a team concerned with developer productivity). Our primary Ember application is large, and uses ember-cli and broccoli to build/serve it (this is all in a dev environment). Ultimately when served, everything is under a single root directory with a .watchmanconfig in that directory. Watchman is being used to watch for source changes to cause rebuild+serve. Broccoli ends up creating a ton of 'sane' instances (WatchmanWatcher) for various directories in the tree, and each of those is creating a watchman.client. So we're ending up with hundreds of connections from our process to watchman, ultimately all coming from under the same root dir (at least that's how it appears from my watching the log). I was trying to create a 'WatchmanClient' in the 'sane' project that would wrap a singleton watchman.Client, and have all the WatchmanWatcher instances use that, as opposed to doing a single 'watch-project' and a lot of subscribes (per discussion with Stef). In looking at the WatchmanWatcher code, it never seems to do anything that is outside of the original directory root that was passed to it (not the project root, which for us is the repo root dir, but what I called the watch root). So it appeared that since watch-project returns the "project" root + offset to the requested root dir, I could remove references to the project root entirely from the WW code. That's where I lost putting the 'relative_root' attribute on the subscribe options.

I'm still unclear - how would I know that 'relative_root' should be used in the subscribe, since it's not in the 'subscribe' description? I had assumed (apparently wrongly) that 'watch-project' would then imply that any subscribes done after that would be from the project root. That's my confusion. It's actually preferable to me that they are from the 'watch root' for each subscribe, since we don't appear to need the project root info at all in responding to the subscription messages.

Sorry this is so long! And again, I do appreciate the help. Suggestions are definitely encouraged!

@dcombslinkedin
Copy link
Author

Oh, one last thing: the WatchmanWatcher code never actually does a 'query' command. It just does watch or watch-project, relevant stuff to get things set up, then a 'subscribe' and listens for file change/add/delete events.

Also, my above comment about watch-project should read "I had assumed (apparently wrongly) that 'watch-project' would then imply that any subscribes done after that would return the offset from the project root as the affected file (rather than offset from the 'watch root'). That's my confusion."

Finally, just FYI: the way I've thought of the organization is that the 'project' (i.e. repo) root is highest in the tree, and that the 'roots' (what I called 'watch roots') are below that. It sounds like that's backwards to what you're using. Right? If so, then we're establishing a single 'root' using 'watch-project', and each WW corresponds to what you call a project underneath that root.

@wez
Copy link
Contributor

wez commented Jan 18, 2018

Thanks for the info; that helps set context and clarify what's going on. Yeah, having one connection per subscription is overkill!

The watchman protocol is pretty stateless; none of the commands implicitly impact how subsequent commands are interpreted. There's a little bit of magic regarding the encoding of the transport, but semantically, commands you executed earlier don't implicitly change how we parse subsequent commands. Side effects of previous commands impact whether a subsequent command will succeed, but they won't change how we plan to execute the command. With that in mind, using watch-project won't implicitly change subsequent query or subscribe parsing to include the relative_root parameter.

Re: knowing about relative_root on subscribe, I think its fair to call that a documentation bug. It certainly could be clearer! The nodejs example we have in our docs shows how this is used in the canonical example for nodejs using our client directly:
https://facebook.github.io/watchman/docs/nodejs.html#subscribing-to-changes

What I'd recommend for your use case, ignoring how sane does things for the moment and focusing purely on the watchman interaction, is:

  • Each time you want to subscribe to a directory, perform watch-project to obtain the root and relative_root information.
  • Generate a subscription name that you can then use to disambiguate the watchman subscription responses. The name is arbitrary and watchman doesn't care how it is formatted, so you could get creative and encode information in the name, or just generate a name like "subscription1" that you use to key into a local hash to lookup the information that you care about. The important thing is that the subscription name should be unique among the subscriptions you establish on that same watchman connection.
  • Generate your subscribe command passing in the name, root and relative_root from above.

When you get a subscription notification from watchman it will be tagged with the name (eg: it will have its subscription field set to your chosen name) and you can use that to route and process the results.

If you'd prefer that watchman gives you all your results relative to the root (eg: the highest level in the tree) rather than the project (the sub-tree), you can omit the relative_root parameter from the subscription, but you will want to include a query term to filter the results to that subdir. In other words, something like this:

expression = /* some query expression */;
relative_root = /* "relative_path" value from `watch-project` result */;
name = "sub1";
root = "/some/path";

// Using relative_root.  Subscription results will be relative to root + relative_path
subscribe = ["subscribe", root, name, {
    expression: expression,
    relative_root: relative_root,
}];

// Alternatively: not using relative_root (results will be relative root)
subscribe = ["subscribe", root, name, {
    expression: ["allof", 
                          expression,
                         ["dirname", relative_root]]
}];

FWIW, I'm not sure that you can use sane without using relative_root as sane is implemented today.

@dcombslinkedin
Copy link
Author

I'd like to ask something more complicated, but for now - if I'm really interested for a single subscribe in the data under some directory below the 'watch-project' root, can I just subscribe to that by setting the second item in the subscribe array to that dir, and leaving out the relative_root param? Simple example hierarchy:

/Users/dcombs/dev/test123 .  (my project 'test123', containing a .watchmanconfig)
    app/
        styles/
          app.css

Let's say broccoli-sane-watcher says it wants to put a sane WatchmanWatcher on 'styles'.
If I do a 'watch-project' on the styles dir, I will get a response of form
{watch: "/Users/dcombs/dev/test123", relative_path: "app/styles"} Note that that combination is the full path to the "root" directory I want to watch. Great.
If I then do a command of the form ["subscribe", "/Users/dcombs/dev/test123/app/styles", options], it appears I don't need 'relative_root' to be in the options. The 'sane' code seems to put ["subscribe", "/Users/dcombs/dev/test123", options] with specifying relative_root: "app/styles" instead.

It appears from my tests that both will return a "subscription" result with "app.css" as the path if that file changes, which is relative to the styles dir. Is there a reason to use relative_path in this scenario, or ever?

@wez
Copy link
Contributor

wez commented Jan 18, 2018

Aah, I understand. It's because of some legacy broken behavior that we fairly recently fixed up (e02729e) and that we don't see often due to some restrictive configuration at FB (briefly: we turn on a root_restrict option that ONLY allows watches to be made at the repo root and forces all integrators to use watch-project correctly).

The subscribe command had an undocumented and unexpected side effect of implicitly performing watch on the supplied dir. The commit I referenced removed that behavior but we haven't run a formal release since that commit landed.

So in your example above, your watch-project call will create a root for /Users/dcombs/dev/test123, but your subscribe call would then implicitly create an additional overlapping root for /Users/dcombs/dev/test123/app/styles, which then happens to work the way you expect but is doing a couple of bad things:

  • 2 Overlapping watches generates twice the notification traffic from the kernel, which increases the chances of an overflow and recrawl
  • 2 Overlapping watches generate competing IO requests as changes come in and need to be observed to build up the watchman view. This can increase latency and churn and impact overall performance in some pretty serious ways for larger directory trees.
  • The watches are separate and you cannot use the clock values from one to reason about the other. The notification streams are not in sync with each other.

So: best practice is to do what sane is already doing; subscribe to the root and specify the relative_root parameter by feeding in the result from watch-project!

@dcombslinkedin
Copy link
Author

Aha! That's it - that's exactly what I was looking for. Thank you!! I'll do things the way sane is doing them now, though might be able to clean things up slightly there (I need to look at it again). All the other points you made earlier (separate subscription IDs and stuff) are what I was already intending to do, so I think we're good. Thank you again, Wez!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants