Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[macOS] new hard link to file outside watched root not reported #858

Open
chrisd8088 opened this issue Sep 29, 2020 · 2 comments
Open

[macOS] new hard link to file outside watched root not reported #858

chrisd8088 opened this issue Sep 29, 2020 · 2 comments

Comments

@chrisd8088
Copy link

I realize this may not be the ideal place to report this issue, as it might be better addressed to Apple, but I thought I would describe it here first for the benefit of anyone else who comes across the problem.

On macOS Catalina, it appears that Watchman often does not report the creation within a watched directory of a hard link to an existing file outside the watched directory.

I'm currently testing on macOS Catalina 10.15.17 but saw the issue on prior Catalina versions as well, and in a CI context using Catalina, but notably did not see this issue in some CI jobs which used macOS Mojave. I've tried with both Watchman v4.9.0 and with a fresh build of Watchman master.

Here is a short shell script which demonstrates the issue; this reproduces it pretty consistently for me:

#!/bin/bash
mkdir foo
WATCH_PATH="$PWD/foo"

printf "Setting watch:\n"
watchman watch "$WATCH_PATH"
CLOCK=$(watchman --no-pretty clock "$WATCH_PATH" | sed 's/.*"clock":"//' | sed 's/".*//')

touch bar
cd foo
printf "\nShould report no changes yet:\n"
watchman since "$WATCH_PATH" "$CLOCK"

printf "\nCreating hard link.\n"
ln ../bar

printf "\nMay report no changes, despite hard link creation:\n"
watchman since "$WATCH_PATH" "$CLOCK"

printf "\nCreating new file.\n"
touch baz

printf "\nShould report new file, but may not report hard link:\n"
watchman since "$WATCH_PATH" "$CLOCK"

printf "\nCurrent watched directory contents:\n"
ls

printf "\nStopping watch:\n"
watchman watch-del "$WATCH_PATH"
watchman shutdown-server

Here's the output on my system using Watchman v4.9.0, but the results are the same with a fresh build of Watchman from master and using watchman -S to set the watch with the built binary:

Setting watch:
{
    "version": "4.9.0",
    "watch": "/Users/chrisd8088/foo",
    "watcher": "fsevents"
}

Should report no changes yet:
{
    "version": "4.9.0",
    "is_fresh_instance": false,
    "clock": "c:1601347496:85726:1:3",
    "files": []
}

Creating hard link.

May report no changes, despite hard link creation:
{
    "version": "4.9.0",
    "is_fresh_instance": false,
    "clock": "c:1601347496:85726:1:5",
    "files": []
}

Creating new file.

Should report new file, but may not report hard link:
{
    "version": "4.9.0",
    "is_fresh_instance": false,
    "clock": "c:1601347496:85726:1:7",
    "files": [
        {
            "cclock": "c:1601347496:85726:1:6",
            "nlink": 1,
            "dev": 16777220,
            "ctime": 1601347496,
            "new": true,
            "mtime": 1601347496,
            "gid": 20,
            "mode": 33188,
            "size": 0,
            "oclock": "c:1601347496:85726:1:6",
            "ino": 37925764,
            "uid": 501,
            "exists": true,
            "name": "baz"
        }
    ]
}

Current watched directory contents:
bar	baz

Stopping watch:
{
    "version": "4.9.0",
    "watch-del": true,
    "root": "/Users/chrisd8088/foo"
}
{
    "version": "4.9.0",
    "shutdown-server": true
}

One interesting note is that dumping the corresponding data file under my home volume's .fseventsd directory (in this case, /System/Volumes/Data/.fseventsd/) using the handy fse_dump utility, I see two events recorded, one for the creation of the bar file outside the watched directory and one for the creation of the hard link inside the watched directory. Both events share a common node ID, which makes sense.

But looking at a Watchman log generated during a run of the above test with a parallel invocation of watchman --server-encoding=json --persistent log-level debug, there is no fsevents: got /Users/chrisd8088/foo/bar ... entry for the hard link's creation. An initial skim through the code in watcher/fsevents.cpp suggests to me that no events are being inappropriately ignored, and I didn't see any obvious missing registration for link-related events or anything like that.

So I'm a bit suspicious that possibly the FSEvents API is treating these as events on the link source path and not the link target path, and hence not delivering them to Watchman, which has only registered for event notifications on the watched path and its descendants. But I admit that is pure speculation on my part and I haven't dug any further into the issue than I've described above, and I'm also not a macOS expert or anything, either.

@wez
Copy link
Contributor

wez commented Sep 29, 2020

Thanks for digging in!
I wouldn't expect this to work reliably or perhaps at all.

Hardlinks are similar to symlinks in that they present as aliases for the same node in the filesystem. These aliases are problematic because there is no backwards edge from the node to all possible names. Those edges are incoming edges (name -> node). I'm not actually sure that any of the underlying VFS APIs would allow a reverse lookup to work; I think it's only possible to resolve node -> canonical name.

The fsevents watcher API is path based (because we're watching a sub-directory tree) so the thing doing the dispatching needs to decide on the path that is reported for a changed node. The most obvious path to pick is going to be the cheapest one to compute based on what you already know. For creation cases is the name of the newly created thing, because the source name was converted to the source inode and the hardlink creation callback in the kernel just has the new name and the source inode. And even if it wanted to resolve that source inode, it could resolve ambiguously if there were already 2+ hardlinks, or if the create is racing with a rename of the source name (TOCTOU).

With that in mind, it makes sense to me that creating a new link outside of a watched root doesn't get reported as a change to watchman inside the watched root.

What would it take to enable detecting this? It would require that watchman hook into every change event for the entire VFS and build a complete forward and reverse index for name<->inode for everything. That presents a huge memory, IO and time cost to build and maintain such an index; and that full filesystem examination would need to happen before watchman can respond to any queries. That isn't something that we're going to support--we already disallow attempting to watch / because it is too large.

@chrisd8088
Copy link
Author

chrisd8088 commented Sep 29, 2020

With that in mind, it makes sense to me that creating a new link outside of a watched root doesn't get reported as a change to watchman inside the watched root.

Just to clarify one detail, in case it got lost in my report: the problem is not about hard link creation outside of the watched directory, but inside it, so the reverse of what you described. In other words, creating a new link within the watched directory but which happens to point to an existing inode outside of that directory.

This works reliably for us using Watchman on Linux, where Watchman uses inotify(7). I believe it also used to work on macOS Mojave, but now does not on Catalina. Hence my suspicion that this may be a change in macOS itself, such that it is no longer reporting the events properly to Watchman.

What would it take to enable detecting this? It would require that watchman hook into every change event for the entire VFS and build a complete forward and reverse index for name<->inode for everything.

Yeah, I understand how inodes and hard links work, and we don't need this (i.e., we don't need a report on a new hard link somewhere outside the watched directory to an inode whose other linking path(s) are inside it; that would indeed be very hard to do, as you describe). But we would expect that a new link inside the watched directory -- regardless of where the existing link(s) to the source inode live -- be reported, since we have the path to new link and that path should match against the path to the watched directory.

Watchman does this reliably on Linux now with inotify events; the script I provided in the issue description always shows the new link when run on Linux. I'm fairly sure it also used to work on macOS Mojave because our Azure Dev Ops CI jobs succeed, and those execute on Mojave-based runners. But when we started using Catalina-based macOS runners, we saw problems with a specific test, and that's what prompted me to look into the source of the problem. My hunch is that this is a possible bug in macOS's event handling, but I would appreciate someone with familiarity with FSEvents confirming that.


To provide a bit more context, the CI suite for microsoft/scalar is partly written in C#, and exercises Git when used with Watchman via Git's fsmonitor hook. One of those tests moves a file from outside the watched Git working tree inside it, and expects that git status will notice the new file. As it happens, on Unix (macOS/Linux) platforms, .NET Core implements its File.Move() method using link(2) followed by unlink(2) of the source, rather than a straight rename(2). This detail is what triggers the failure of the "file moved into watched repository" CI test, but only on macOS Catalina, because Watchman usually doesn't see or report the new link within the watched directory to Git.

Again, I think this may be a change of behaviour from Mojave to Catalina, and may not be due to any change or failing in Watchman itself. But as I'm more of a Linux than macOS person, I'd appreciate someone else double-checking that before possibly reporting it upstream to Apple, in case I'm missing something about Watchman's use of the FSEvents API. And I also wanted to document it here because others may stumble across the same difference between Watchman on Linux and Watchman on macOS.

chrisd8088 added a commit to chrisd8088/scalar that referenced this issue Sep 30, 2020
On macOS and Linux platforms, we use rename(2) directly when
moving files with the SystemIORunner's MoveFile() method, instead
of using .NET Core's File.Move() method.  The latter actually
implements most file moves using a combination of link(2)
followed by unlink(2) on the source file.

The consequence of using link() instead of rename() is that on
recent versions of macOS, when running with Watchman, file
creation events may not be delivered by Watchman to Git when a
new hard link within the Git working tree is created if that
link points to an existing inode outside of the watched tree.
This in turn causes the MoveFileFromOutsideRepoToInsideRepoAndAdd()
test in GitCommandsTests to fail intermittently.

We can work around the problem by utilizing rename() directly, which
also parallels the behaviour of SystemIORunner's RenameDirectory()
method, added in commit 0a517fa.

See also:

facebook/watchman#858
https://github.com/dotnet/runtime/blob/94515f7bd34aabaab5ba6a1316f4a881cbf2370c/src/libraries/System.IO.FileSystem/src/System/IO/FileSystem.Unix.cs#L142-L144
derrickstolee added a commit to microsoft/scalar that referenced this issue Oct 2, 2020
…Hub Actions CI

We add Watchman-enabled runs of the functional test suite on all GitHub Actions runners for all three platforms.

The code changes in this PR are primarily the result of needing to work around some inconsistent behaviour from Watchman when running on macOS Catalina, which is used by the Actions macOS runners.  As described in facebook/watchman#858, creating a hard link to an existing inode from inside the watched Git working tree may not be reported to Git as a new file if the inode's other, existing "source" path (as passed to `link(2)`) resides outside the watched directory.

This problem is tickled by our functional test suite because the [`MoveFileFromOutsideRepoToInsideRepoAndAdd()`](https://github.com/microsoft/scalar/blob/7a86016e9bb7a6c7560a87aba2c7c0739f896571/Scalar.FunctionalTests/Tests/GitCommands/GitCommandsTests.cs#L418-L449) test [relies](https://github.com/microsoft/scalar/blob/7a86016e9bb7a6c7560a87aba2c7c0739f896571/Scalar.FunctionalTests/FileSystemRunners/SystemIORunner.cs#L21) on the `System.IO.File.Move()` method, which on Unix/POSIX systems like macOS, actually [performs](https://github.com/dotnet/runtime/blob/94515f7bd34aabaab5ba6a1316f4a881cbf2370c/src/libraries/System.IO.FileSystem/src/System/IO/FileSystem.Unix.cs#L142-L144) a `link(2)` call rather than `rename(2)`.

While the underlying problem is outside of Scalar's control, we can work around it by using `rename(2)` directly to move files in the `SystemIORunner` test helper class.

This particular intermittent but persistent test failure also exposed another, minor issue in the `ValidateGitCommand()` utility method, which passed its actual and expected string values to the `LinesShouldMatch()` method in the reverse order from what was intended, so we swap them here.

Debugging these issues also turned up some minor typos in the error messages output from the `run clone` command when a repository can not be initialized, so those are fixed here also.

Finally, to enable the functional test suite to run in CI on macOS, we add to the Mac version of the `RunFunctionalTests.sh` script so that like the Linux and Windows versions it copies the Scalar binary into the functional test binary's build location so it can be located by the functional test program.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants