Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

precious-files.txt: new document proposing new precious file type #1627

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

newren
Copy link

@newren newren commented Dec 26, 2023

A couple months ago, we had another in-depth discussion of precious files[1]. As with previous times, multiple strategies were discussed (including multiple new ones), meaning we keep making the possible solution space wider and never nail down an agreed path. I also got the feeling we were potentially pigeonholing on a subset of the problem space, and I thought it'd be good to better enumerate what areas of Git are affected.

So, I went through the exercise of creating a design document to: (1) provide a specific design proposal and explore it, (2) cover at a high level the breadth of issues that an implementor needs to at least think about and which reviewers should be aware of in terms of readiness of a potential implementation, and (3) provide links to other discussions and alternative proposals for completeness.

I had some off-list discussions with Sebastian about this proposal, and he provided some helpful feedback. The idea at this point is that if folks agree with the general direction, that he is going to be implementing at least the first cut basic capability. I'll help review changes, but I'm mostly interested in avoiding unfortunate surprises.

So...does the proposed direction seem reasonable to folks?

[1] https://lore.kernel.org/git/79901E6C-9839-4AB2-9360-9EBCA1AAE549@icloud.com/

CC: Sebastian Thiel sebastian.thiel@icloud.com
CC: Josh Triplett josh@joshtriplett.org
cc: Elijah Newren newren@gmail.com
cc: Phillip Wood phillip.wood123@gmail.com

We have traditionally considered all ignored files to be expendable, but
users occasionally want ignored files that are not considered
expendable.  Add a design document covering how to split ignored files
into two types: 'trashable' (what all ignored files are currently
considered) and 'precious' (the new type of ignored file).

Helped-by: Sebastian Thiel <sebastian.thiel@icloud.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
@newren
Copy link
Author

newren commented Dec 27, 2023

/submit

Copy link

gitgitgadget bot commented Dec 27, 2023

Submitted as pull.1627.git.1703643931314.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-1627/newren/precious-files-v1

To fetch this version to local tag pr-1627/newren/precious-files-v1:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-1627/newren/precious-files-v1

Copy link

gitgitgadget bot commented Dec 27, 2023

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Elijah Newren <newren@gmail.com>
>
> We have traditionally considered all ignored files to be expendable, but
> users occasionally want ignored files that are not considered
> expendable.  Add a design document covering how to split ignored files
> into two types: 'trashable' (what all ignored files are currently
> considered) and 'precious' (the new type of ignored file).

The proposed syntax is a bit different from what I personally prefer
(which is Phillip's [P14] or something like it), but I consider that
the more valuable parts of this document is about how various
commands ought to interact with precious paths, which shouldn't
change regardless of the syntax.

Thanks for putting this together.

Copy link

gitgitgadget bot commented Dec 27, 2023

On the Git mailing list, Elijah Newren wrote (reply to this):

On Tue, Dec 26, 2023 at 9:28 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > From: Elijah Newren <newren@gmail.com>
> >
> > We have traditionally considered all ignored files to be expendable, but
> > users occasionally want ignored files that are not considered
> > expendable.  Add a design document covering how to split ignored files
> > into two types: 'trashable' (what all ignored files are currently
> > considered) and 'precious' (the new type of ignored file).
>
> The proposed syntax is a bit different from what I personally prefer
> (which is Phillip's [P14] or something like it), but I consider that
> the more valuable parts of this document is about how various
> commands ought to interact with precious paths, which shouldn't
> change regardless of the syntax.

I agree that syntax and command behavior are mostly separate issues,
but unfortunately they are not orthogonal.  In particular, syntax of
precious file specification is directly tied to fallback behavior for
older Git clients, and it might potentially affect backward
compatibility of non-cone-mode sparse-checkout syntax as well.

I think fallback behavior is of particular importance.  There are
precisely two choices in our design for how older Git versions can
treat precious files:
  * ignored-and-expendable
  * untracked-and-precious
If we pick syntax that causes older Git versions to treat precious
files as ignored-and-expendable, we risk deleting important files.
Alternatively, if we pick syntax that causes older Git versions to
treat precious files as untracked-and-precious, they won't be ignored
by e.g. git-status, and are easier to accidentally add with git-add.

I felt the "precious" bit was much more important than the "ignored"
bit of "precious" files, so I thought untracked-and-precious was a
better fallback.  However, to get that, we have to rule out lots of
the syntax proposals, such as Phillip's [P14].

Anyway, I'm open to alternative syntax, but we need to measure it
against the relevant criteria, which I believe are:
  (A) ease for users to understand, remember, and use
  (B) size of backward compatibility break with .gitignore syntax
  (C) appropriateness of implied fallback behavior for older Git
clients with precious files
  (D) room for additional extension in .gitignore files
  (E) potential affects on backward compatibility of non-cone
sparse-checkout syntax
We probably also need to agree on the relative importance of these
criteria; personally, I would probably order them from most important
to least as C, B, E, A, D.

Phillip's P14 is better for D, and perhaps a little better for B, but
I thought slightly worse for A, and much worse for C.  (I think
there's no significant relative difference for E between his proposed
syntax and mine.)

> Thanks for putting this together.

Copy link

gitgitgadget bot commented Dec 27, 2023

User Elijah Newren <newren@gmail.com> has been added to the cc: list.

Copy link

gitgitgadget bot commented Dec 27, 2023

On the Git mailing list, Junio C Hamano wrote (reply to this):

Elijah Newren <newren@gmail.com> writes:

> There are
> precisely two choices in our design for how older Git versions can
> treat precious files:
>   * ignored-and-expendable
>   * untracked-and-precious
> If we pick syntax that causes older Git versions to treat precious
> files as ignored-and-expendable, we risk deleting important files.

Yes but not really.  I'd expect the adoption of precious feature and
the adoption of versions of Git that supports that feature will go
more or less hand in hand.  Projects that, for any reason, need to
keep their participants at pre-precious versions of Git would
naturally refrain from marking the "precious" paths in their "ignore"
mechanism before their participants are ready, so even if we chose
syntax that will make the precious ones mistaken as merely ignored,
the damage would be fairly small.

Copy link

gitgitgadget bot commented Jan 18, 2024

On the Git mailing list, Sebastian Thiel wrote (reply to this):

I thought it would be helpful to see the syntax being referred to here,
as first brought up by Phillip Wood:

#(keep)
/my-precious-file

The main benefit I see for it is that it's extensible, despite having
trouble imagining what such extension would be 10 years from now.
On the flip side, since it's already using a comment, people will
be even more inclined to document the reason for the preciousness
of the file.

# The kernel configuration, typically created by running a TUI program
#(keep)
.config

As a side-effect of the syntax, it's obvious this is an 'upgrade', with
perfect backwards compatibility as old git does the same as always.

I'd love to take first steps into the implementation, and if the above
should be the syntax to use, I'd be happy to submit a patch for parsing
it, along with initial support for precious files in `git clean` and
`git status`.

Does that sound like a reasonable next step?


On 27 Dec 2023, at 23:15, Junio C Hamano wrote:

> Elijah Newren <newren@gmail.com> writes:
>
>> There are
>> precisely two choices in our design for how older Git versions can
>> treat precious files:
>>   * ignored-and-expendable
>>   * untracked-and-precious
>> If we pick syntax that causes older Git versions to treat precious
>> files as ignored-and-expendable, we risk deleting important files.
>
> Yes but not really.  I'd expect the adoption of precious feature and
> the adoption of versions of Git that supports that feature will go
> more or less hand in hand.  Projects that, for any reason, need to
> keep their participants at pre-precious versions of Git would
> naturally refrain from marking the "precious" paths in their "ignore"
> mechanism before their participants are ready, so even if we chose
> syntax that will make the precious ones mistaken as merely ignored,
> the damage would be fairly small.

Copy link

gitgitgadget bot commented Jan 18, 2024

On the Git mailing list, Junio C Hamano wrote (reply to this):

Sebastian Thiel <sebastian.thiel@icloud.com> writes:

> #(keep)
> .config
>
> As a side-effect of the syntax, it's obvious this is an 'upgrade', with
> perfect backwards compatibility as old git does the same as always.

Yes but ...

The point Elijah makes is worth considering.  To existing versions
of git, having this entry for ".config" means that it is ignored
(i.e. "git add ." will not include it), but expendable (i.e. "git
clean" considers ".config" as a candidate for removal; "git checkout
other", if the "other" branch has it as a tracked path, will clobber
it).  Compared to the case where ".config" is not mentioned in
".gitignore", where it may be added by use of "git add .", it won't
be clobbered by "git clean".

So this syntax having "perfect backward compatibility" is not quite
true.  It does have downsides when used by existing versions of Git.

If we use Elijah's syntax to say

	$.config

then the entry to existing versions of git is a no-op wrt a file
named ".config".  It simply does not match the pattern, so an
accidental "git add ." *will* add ".config" to the index, while "git
clean" may not touch it, simply because it is treated as "untracked
and precious".  In other words, its downside is the same as not
marking the path ".config" in any way in ".gitignore", as far as
existing versions of Git are concerned.

We of course discount the possibility that people keep a file whose
name literally is dollar-dot followed by "config" and older versions
of Git would start treating them as ignored-and-expendable.  While
it *is* an additional downside compared to Phillip's "#(keep)"
approach, I do not think that particular downside is worth worrying
about.  Yet another downside compared to Phillip's is that it is
less extensible.  Over the years, however, the ignored-but-precious
is the only one we heard from users that lack of which is hurting
them, so lack of extensibility may not be too huge a deal.

For projects that are currently listing these files in ".gitignore"
as "ignored-and-expendable" already and want to categorize them as
"ignored-and-precious" by changing ".config" to "$.config" (or
adding "#(keep)" comment before the existing entry), the
pros-and-cons equation may differ.  Their current participants are
protected from accidentally adding them with "git add ." but risking
to lose them with "git clean -f".  They may even be trained to be
careful to see "git clean -n" output before actually running the
command with "-f".  Now, if their project ships a new version of
".gitignore" that marks these paths as "ignored-and-precious", both
approaches will have intended effect to participants who upgraded to
the version of Git.

To participants using the current version of Git:

 * Phillip's approach to add "#(keep)" will not change anything.
   They will be protected from accidental "git add ." as before, and
   they will still have to be careful about "git clean -f".

 * Elijah's approach to rewrite existing'.config' to '$.config',
   however, will stop protecting them from "git add .", even though
   it will start protecting them from "git clean -f".

The devil you already know may be the lessor of two evils in such a
situation.

So, all it boils down to is these two questions.

 * Which one between "'git add .' adds '.config' that users did not
   want to add" and "'git clean -f' removes '.config' together with
   other files" a larger problem to the users, who participate in a
   project that already decided to use the new .gitignore feature to
   mark ".config" as "precious", of older versions of Git that
   predate "precious"?

 * What are projects doing to paths that they want to make
   "precious" with the current system?  Do they leave them out of
   ".gitignore" and have them subject to accidental "git add ." to
   protect them from "git clean -f"?  Or do they list them in
   ".gitignore" to prevent "git add ." from touching, but leave them
   susceptible to accidental removal by "git clean -f"?

Thanks.

Copy link

gitgitgadget bot commented Jan 18, 2024

On the Git mailing list, Sebastian Thiel wrote (reply to this):

Thanks so much for the analysis, as seeing the problem of choosing
a syntax from the perspective of its effects when using common commands
like "git add" and "git clean -f" seems very promising!

When thinking about "git add ." vs "git clean -f" one difference comes to
mind: "git clean -f" is much less desirable it's fatal. "git add ." on the
other hand leaves room for correction, even when used with `git commit -a"
(and with the exception of "git commit -am 'too late'").

From that point of view I'd naturally prefer the "$.config" syntax as it
will turn precious files into untracked ones for current Git.

>  * Which one between "'git add .' adds '.config' that users did not
>    want to add" and "'git clean -f' removes '.config' together with
>    other files" a larger problem to the users, who participate in a
>    project that already decided to use the new .gitignore feature to
>    mark ".config" as "precious", of older versions of Git that
>    predate "precious"?
>

If the user should have a choice, than both syntaxes could also be allowed
to let them choose what to optimise for.

Doing so might be less relevant in the `.config` case, but most relevant
for ignored files like ".env" or ".env.secret" which under no circumstances
must be tracked.

>  * What are projects doing to paths that they want to make
>    "precious" with the current system?  Do they leave them out of
>    ".gitignore" and have them subject to accidental "git add ." to
>    protect them from "git clean -f"?  Or do they list them in
>    ".gitignore" to prevent "git add ." from touching, but leave them
>    susceptible to accidental removal by "git clean -f"?

I did hear that some projects use make files with specifically configured
"git clean" invocations to specifically "--exclude" precious files.
Thus far I didn't encounter one that use such a technique to prevent
"git add" from tracking too much though.

To my mind, in order to support projects with both ".config" and
".env.secret" they would have to be given a choice of which syntax
to use, e.g.

    # This file shouldn't accidentally be deleted by `git clean`
    $.config

    # These files should never be accidentally tracked
    #(keep)
    .env*


On 18 Jan 2024, at 20:14, Junio C Hamano wrote:

> Sebastian Thiel <sebastian.thiel@icloud.com> writes:
>
>> #(keep)
>> .config
>>
>> As a side-effect of the syntax, it's obvious this is an 'upgrade', with
>> perfect backwards compatibility as old git does the same as always.
>
> Yes but ...
>
> The point Elijah makes is worth considering.  To existing versions
> of git, having this entry for ".config" means that it is ignored
> (i.e. "git add ." will not include it), but expendable (i.e. "git
> clean" considers ".config" as a candidate for removal; "git checkout
> other", if the "other" branch has it as a tracked path, will clobber
> it).  Compared to the case where ".config" is not mentioned in
> ".gitignore", where it may be added by use of "git add .", it won't
> be clobbered by "git clean".
>
> So this syntax having "perfect backward compatibility" is not quite
> true.  It does have downsides when used by existing versions of Git.
>
> If we use Elijah's syntax to say
>
>     $.config
>
> then the entry to existing versions of git is a no-op wrt a file
> named ".config".  It simply does not match the pattern, so an
> accidental "git add ." *will* add ".config" to the index, while "git
> clean" may not touch it, simply because it is treated as "untracked
> and precious".  In other words, its downside is the same as not
> marking the path ".config" in any way in ".gitignore", as far as
> existing versions of Git are concerned.
>
> We of course discount the possibility that people keep a file whose
> name literally is dollar-dot followed by "config" and older versions
> of Git would start treating them as ignored-and-expendable.  While
> it *is* an additional downside compared to Phillip's "#(keep)"
> approach, I do not think that particular downside is worth worrying
> about.  Yet another downside compared to Phillip's is that it is
> less extensible.  Over the years, however, the ignored-but-precious
> is the only one we heard from users that lack of which is hurting
> them, so lack of extensibility may not be too huge a deal.
>
> For projects that are currently listing these files in ".gitignore"
> as "ignored-and-expendable" already and want to categorize them as
> "ignored-and-precious" by changing ".config" to "$.config" (or
> adding "#(keep)" comment before the existing entry), the
> pros-and-cons equation may differ.  Their current participants are
> protected from accidentally adding them with "git add ." but risking
> to lose them with "git clean -f".  They may even be trained to be
> careful to see "git clean -n" output before actually running the
> command with "-f".  Now, if their project ships a new version of
> ".gitignore" that marks these paths as "ignored-and-precious", both
> approaches will have intended effect to participants who upgraded to
> the version of Git.
>
> To participants using the current version of Git:
>
>  * Phillip's approach to add "#(keep)" will not change anything.
>    They will be protected from accidental "git add ." as before, and
>    they will still have to be careful about "git clean -f".
>
>  * Elijah's approach to rewrite existing'.config' to '$.config',
>    however, will stop protecting them from "git add .", even though
>    it will start protecting them from "git clean -f".
>
> The devil you already know may be the lessor of two evils in such a
> situation.
>
> So, all it boils down to is these two questions.
>
>  * Which one between "'git add .' adds '.config' that users did not
>    want to add" and "'git clean -f' removes '.config' together with
>    other files" a larger problem to the users, who participate in a
>    project that already decided to use the new .gitignore feature to
>    mark ".config" as "precious", of older versions of Git that
>    predate "precious"?
>
>  * What are projects doing to paths that they want to make
>    "precious" with the current system?  Do they leave them out of
>    ".gitignore" and have them subject to accidental "git add ." to
>    protect them from "git clean -f"?  Or do they list them in
>    ".gitignore" to prevent "git add ." from touching, but leave them
>    susceptible to accidental removal by "git clean -f"?
>
> Thanks.

Copy link

gitgitgadget bot commented Jan 19, 2024

On the Git mailing list, Elijah Newren wrote (reply to this):

On Thu, Jan 18, 2024 at 1:33 PM Sebastian Thiel
<sebastian.thiel@icloud.com> wrote:
>
> Thanks so much for the analysis, as seeing the problem of choosing
> a syntax from the perspective of its effects when using common commands
> like "git add" and "git clean -f" seems very promising!
>
> When thinking about "git add ." vs "git clean -f" one difference comes to
> mind: "git clean -f" is much less desirable it's fatal. "git add ." on the
> other hand leaves room for correction, even when used with `git commit -a"
> (and with the exception of "git commit -am 'too late'").

"git commit -a" and "git commit -am 'too late'", by themselves, will
only commit changes to already-tracked files.  So they wouldn't be
problematic alone.

But perhaps the -a was distracting and you were thinking of "git add .
&& git commit -m whatever".  That does remove the chance to correct
before creating a commit, but I don't think it's too bad either.  Even
though it skips the chance to catch the problem pre-commit, there's
still time to review & correct before publishing for patch review (or
PR review or MR review or whatever you want to call it).  And, even if
published for patch review, it can still be caught & corrected by
those doing patch review as well.

So, I just don't see the "accidental add" problem as being very
severe; there are so many chances to catch and correct it.

> To my mind, in order to support projects with both ".config" and
> ".env.secret" they would have to be given a choice of which syntax
> to use, e.g.
>
>     # This file shouldn't accidentally be deleted by `git clean`
>     $.config
>
>     # These files should never be accidentally tracked
>     #(keep)
>     .env*

Reminds me of https://www.emacswiki.org/pics/static/TabsSpacesBoth.png

;-)

Besides, if for a specific file or filetype, accidental additions are
more important to protect against than accidental nuking, then can't
folks achieve that by simply using

    # Don't let older git versions add the file
    .env.secret

    # For newer git versions, override the above; treat it as precious
(i.e. don't add AND don't accidentally nuke)
    $.env.secret

In contrast, if protection against accidental nuking is more important
for certain files, one can use just the second line without the first.

And, whether you have a file with both lines or just the second line,
newer git versions will protect against both accidental nuking and
accidental adding.

In contrast...

Phillip's syntax provides no way to achieve treating accidental nuking
as more important than accidental adding; it can only handle
protection against accidental adding in older Git versions.  And, as I
discussed above, the accidental add problem seems much less severe and
is thus the less important problem to protect against.

Copy link

gitgitgadget bot commented Jan 19, 2024

On the Git mailing list, Elijah Newren wrote (reply to this):

Hi,

On Thu, Jan 18, 2024 at 11:14 AM Junio C Hamano <gitster@pobox.com> wrote:
>
[...]
> So, all it boils down to is these two questions.

Thanks for summarizing this.

>  * Which one between "'git add .' adds '.config' that users did not
>    want to add" and "'git clean -f' removes '.config' together with
>    other files" a larger problem to the users, who participate in a
>    project that already decided to use the new .gitignore feature to
>    mark ".config" as "precious", of older versions of Git that
>    predate "precious"?

Accidental "git add ." comes with 3 opportunities to correct the
problem before it becomes permanent: before commiting, after
committing but before pushing, and after publishing for patch review
(where it can even be caught by third parties) but before the
patch/PR/MR is accepted and included.  At each stage there's a chance
to go back and correct the problem.

Accidental nuking of a file (via either git clean or git checkout or
git merge or whatever), cannot be reviewed or corrected; it's
immediately too late.  And given that we're calling this feature
"precious", that seems a little extra unfortunate.

>  * What are projects doing to paths that they want to make
>    "precious" with the current system?  Do they leave them out of
>    ".gitignore" and have them subject to accidental "git add ." to
>    protect them from "git clean -f"?  Or do they list them in
>    ".gitignore" to prevent "git add ." from touching, but leave them
>    susceptible to accidental removal by "git clean -f"?

Good questions; I have no answers to these.

However, on a closely related note, in my response to Sebastian I
point out that the '$' syntax permits individual teams to prioritize
avoiding either accidental deletions or accidental adds on a filename
or glob granularity, so if folks are concerned with handling by older
Git versions or are just extra concerned with certain files, they can
optimize accordingly.  Sadly, the '#(keep)' syntax does not permit
such prioritization and always treats avoiding accidental adds as the
priority (which, in my opinion, is the less important one to generally
prioritze).

Copy link

gitgitgadget bot commented Jan 19, 2024

On the Git mailing list, Sebastian Thiel wrote (reply to this):

Yes, indeed I was a little confused when making the "git commit..." based examples,
thanks for correcting them.

>
> Reminds me of https://www.emacswiki.org/pics/static/TabsSpacesBoth.png
>
> ;-)
>

😅

> Besides, if for a specific file or filetype, accidental additions are
> more important to protect against than accidental nuking, then can't
> folks achieve that by simply using
>
>     # Don't let older git versions add the file
>     .env.secret
>
>     # For newer git versions, override the above; treat it as precious
> (i.e. don't add AND don't accidentally nuke)
>     $.env.secret
>
> In contrast, if protection against accidental nuking is more important
> for certain files, one can use just the second line without the first.
>

I am glad I can pull my initial proposition of 'having both syntaxes' off
the table to side with this version - it's gorgeous.

It's easy to forget that the search-order when matching ignore patterns
is back to front, which makes this 'trick' work.

If the insights gained with the last couple of emails would see their digest
in the user-facing documentation, I think precious files wouldn't only become
usable but would also allow projects to make the their choice during
the transition period during which some users will inevitably access the repository
with a Git that doesn't know about precious files yet.

Copy link

gitgitgadget bot commented Jan 19, 2024

On the Git mailing list, Phillip Wood wrote (reply to this):

Hi Elijah

On 19/01/2024 02:58, Elijah Newren wrote:
> On Thu, Jan 18, 2024 at 11:14 AM Junio C Hamano <gitster@pobox.com> wrote:
>>
> [...]
>> So, all it boils down to is these two questions.
> > Thanks for summarizing this.

Yes, thank you Junio - I found it very helpful as well

>>   * Which one between "'git add .' adds '.config' that users did not
>>     want to add" and "'git clean -f' removes '.config' together with
>>     other files" a larger problem to the users, who participate in a
>>     project that already decided to use the new .gitignore feature to
>>     mark ".config" as "precious", of older versions of Git that
>>     predate "precious"?
> > Accidental "git add ." comes with 3 opportunities to correct the
> problem before it becomes permanent: before commiting, after
> committing but before pushing, and after publishing for patch review
> (where it can even be caught by third parties) but before the
> patch/PR/MR is accepted and included.  At each stage there's a chance
> to go back and correct the problem.

If you've added a secret then catching it after you've published the patch for review is likely to be too late. I agree there are a couple of chances to catch it before that though.

> Accidental nuking of a file (via either git clean or git checkout or
> git merge or whatever), cannot be reviewed or corrected; it's
> immediately too late.

Indeed, though "git clean" requires the user to pass a flag before it will delete anything does have a dry-run mode to check what's going to happen so there is an opportunity for users to avoid accidental deletions.

> [...] > However, on a closely related note, in my response to Sebastian I
> point out that the '$' syntax permits individual teams to prioritize
> avoiding either accidental deletions or accidental adds on a filename
> or glob granularity, so if folks are concerned with handling by older
> Git versions or are just extra concerned with certain files, they can
> optimize accordingly.

That is an advantage. I do worry that the '$' syntax is unintuitive and will further add to the impression that git is hard to use. I think the choice comes down how much we are worried about the way older versions of git treat ".gitignore" files with the new syntax.

While I can see it would be helpful to settle the syntax question I think parsing the new syntax is a relatively small part of the work that needs to be done to implement precious files.

Best Wishes

Phillip

Copy link

gitgitgadget bot commented Jan 19, 2024

User Phillip Wood <phillip.wood123@gmail.com> has been added to the cc: list.

Copy link

gitgitgadget bot commented Jan 19, 2024

On the Git mailing list, Junio C Hamano wrote (reply to this):

Phillip Wood <phillip.wood123@gmail.com> writes:

> If you've added a secret then catching it after you've published the
> patch for review is likely to be too late. I agree there are a couple
> of chances to catch it before that though.

Yes, this is one of the two remaining things that still make me a
bit worried about the "$.config" syntax.

> Indeed, though "git clean" requires the user to pass a flag before it
> will delete anything does have a dry-run mode to check what's going to
> happen so there is an opportunity for users to avoid accidental
> deletions.

True, too.

The other one that still make me a bit worried about the "$.config"
syntax is what I called "the devil you already know" that is
applicable only for participants of a project that currently mark
precious files as ignored, to avoid the accidental "git add ." of
secrets.

I think we already are in agreement that all other points (aside
from possible ergonomics preferences and future extensibility, both
feel a lot minor) raised during this discussion are in favor of the
"$.config" syntax.

> While I can see it would be helpful to settle the syntax question I
> think parsing the new syntax is a relatively small part of the work
> that needs to be done to implement precious files.

True.  The parser can be isolated and it should be relatively easy
to revamp.  My current preference is to (at least) tentatively agree
on using the "$.config" syntax, which would allow us to update
dir.c:parse_path_pattern(), and that would make it possible for us
to start adjusting dir.c:is_excluded(), adding is_precious() next to
it, and adjusting all current callers of the former.

Thanks.

Copy link

gitgitgadget bot commented Jan 19, 2024

On the Git mailing list, Junio C Hamano wrote (reply to this):

Sebastian Thiel <sebastian.thiel@icloud.com> writes:

> I am glad I can pull my initial proposition of 'having both syntaxes' off
> the table to side with this version - it's gorgeous.
>
> It's easy to forget that the search-order when matching ignore patterns
> is back to front, which makes this 'trick' work.

The true gem is not the search-order, though.  It is the "last one
wins" rule.  Back to front search is merely an implementation detail
to optimize the search so that we can stop at the first hit ;-)

> If the insights gained with the last couple of emails would see their digest
> in the user-facing documentation, I think precious files wouldn't only become
> usable but would also allow projects to make the their choice during
> the transition period during which some users will inevitably access the repository
> with a Git that doesn't know about precious files yet.

OK.

Copy link

gitgitgadget bot commented Jan 24, 2024

On the Git mailing list, Elijah Newren wrote (reply to this):

Hi Phillip,

On Fri, Jan 19, 2024 at 8:53 AM Phillip Wood <phillip.wood123@gmail.com> wrote:
[...]
> >>   * Which one between "'git add .' adds '.config' that users did not
> >>     want to add" and "'git clean -f' removes '.config' together with
> >>     other files" a larger problem to the users, who participate in a
> >>     project that already decided to use the new .gitignore feature to
> >>     mark ".config" as "precious", of older versions of Git that
> >>     predate "precious"?
> >
> > Accidental "git add ." comes with 3 opportunities to correct the
> > problem before it becomes permanent: before commiting, after
> > committing but before pushing, and after publishing for patch review
> > (where it can even be caught by third parties) but before the
> > patch/PR/MR is accepted and included.  At each stage there's a chance
> > to go back and correct the problem.
>
> If you've added a secret then catching it after you've published the
> patch for review is likely to be too late. I agree there are a couple of
> chances to catch it before that though.

Ah, good point.

> > Accidental nuking of a file (via either git clean or git checkout or
> > git merge or whatever), cannot be reviewed or corrected; it's
> > immediately too late.
>
> Indeed, though "git clean" requires the user to pass a flag before it
> will delete anything does have a dry-run mode to check what's going to
> happen so there is an opportunity for users to avoid accidental deletions.

Yes, good point again for "git clean"; it does have one level of check
before the operation users can take advantage of.  The same cannot be
said for the files nuked by checkout/merge/rebase/cherry-pick, though.

> > [...]
> > However, on a closely related note, in my response to Sebastian I
> > point out that the '$' syntax permits individual teams to prioritize
> > avoiding either accidental deletions or accidental adds on a filename
> > or glob granularity, so if folks are concerned with handling by older
> > Git versions or are just extra concerned with certain files, they can
> > optimize accordingly.
>
> That is an advantage. I do worry that the '$' syntax is unintuitive and
> will further add to the impression that git is hard to use. I think the
> choice comes down how much we are worried about the way older versions
> of git treat ".gitignore" files with the new syntax.

Interesting, I thought the mixture of '!' as a prefix and '#(keep)' as
a previous-line directive would be somewhat inconsistent and add
further to the impression that git is hard to use, though I can also
see your point that '$' as a prefix can as well.

> While I can see it would be helpful to settle the syntax question I
> think parsing the new syntax is a relatively small part of the work that
> needs to be done to implement precious files.

Oh, I agree it's a small part of the work, but as stated previously,
I'm not doing that work (Sebastian is).  I'm just trying to help avoid
getting unintended consequences in the design, and to me this is an
important edge case to consider, get an agreement on, and document in
some fashion.

Anyway, Junio seems to have weighed in with a tentative path forward,
and everyone has been very good about bringing up additional
considerations around this issue that are worth documenting in the
design document, so I'll try to put together an update soon-ish.

Copy link

gitgitgadget bot commented Feb 11, 2024

On the Git mailing list, Sebastian Thiel wrote (reply to this):

I didn't know where I would best reply to give an update on my work
on precious file support, but here I go.

On my journey to daring implementing precious files in Git, I decided
to implement it in Gitoxide first to ease myself into it.

After what felt like months of work on the Gitoxide-equivalent of
dir.c, it just took 2 days to cobble together a 'gix clean' with
precious files support.

You might say that something as destructive as a 'clean' subcommand
would better not be rushed, but it was surprisingly straightforward
to implement. It was so inviting even that I could spend the second
day, today, entirely on polishing, yielding a 'gix clean' which is
fun to use, with some extras I never knew I wanted until I had full
control over it and could play around easily.

What I found myself do immediately by the way is adjust `.gitignore`
files of the project to have precious declarations right after
their non-precious counterparts for backwards compatibility.

It works perfectly, from what I can tell, and it is truly wonderful
to be able to wipe a repo clean without fear of destroying anything
valuable. And I am aware that we all know that, but wanted to write
it to underline how psychologically valuable this feature is.

Without further ado, I invite you all to give it a go yourself
for first experiences with precious files maybe.

    git clone https://github.com/Byron/gitoxide
    cd gitoxide
    cargo build --release --bin gix --no-default-features --features max-pure
	target/release/gix clean

This should do the trick - from there the program should guide the
user.

If you want to see some more interesting features besides precious
files, you can run 'cargo test -p gix' and follow the 'gix clean -xd'
instructions along with the `--debug` flag.

A word about performance: It is slower.
It started out to be only about 1% slower even on the biggest repositories
and under optimal conditions (i.e. precomposeUnicode and ignoreCase off
and skipHash true). But as I improved correctness and added features,
that was lost and it's now about 15% slower on bigger repositories.

I appended a benchmark run on the Linux kernel at the end, and it shows
that Gitoxide definitely spends more time in userland. I can only
assume that some performance was lost when I started to deviate from
the 'only do the work you need' recipe that I learned from Git to
'always provide a consistent set of information about directory entries'.

On top of that, there is multiple major shortcomings in this realm:

- Gitoxide doesn't actually get faster when reading indices with multiple
  threads for some reason.
- the icase-hashtable is created only with a single thread.
- the precompose-unicode conversion is very slow and easily costs 25%
  performance.

But that's details, some of which you can see yourself when running
'gix --trace -v clean'.

Now I hope you will have fun trying 'gix clean' with precious files in your
repositories. Also, I am particularly interested in learning how it fares
in situations where you know 'git clean' might have difficulties.
I tried very hard to achieve correctness, and any problem you find
will be fixed ASAP.

With this experience, I think I am in a good position to get precious
files support for 'git clean' implemented, once I get to make the start.

Cheers,
Sebastian

----

Here is the benchmark result (and before I forget, Gitoxide also uses about 25% more memory
for some reason, so really has some catchup to do, eventually)

linux (ffc2532) +369 -819 [!] took 2s
❯ hyperfine -N -w1 -r4  'gix clean -xd --skip-hidden-repositories=non-bare' 'gix -c index.skipHash=1 -c core.ignoreCase=0 -c core.precomposeUnicode=0 clean -xd --skip-hidden-repositories=non-bare' 'git clean -nxd'
Benchmark 1: gix clean -xd --skip-hidden-repositories=non-bare
  Time (mean ± σ):     171.7 ms ±   3.0 ms    [User: 70.4 ms, System: 101.4 ms]
  Range (min … max):   167.4 ms … 174.2 ms    4 runs

Benchmark 2: gix -c index.skipHash=1 -c core.ignoreCase=0 -c core.precomposeUnicode=0 clean -xd --skip-hidden-repositories=non-bare
  Time (mean ± σ):     156.3 ms ±   3.1 ms    [User: 56.9 ms, System: 99.3 ms]
  Range (min … max):   154.1 ms … 160.8 ms    4 runs

Benchmark 3: git clean -nxd
  Time (mean ± σ):     138.4 ms ±   2.7 ms    [User: 40.5 ms, System: 103.7 ms]
  Range (min … max):   136.1 ms … 142.0 ms    4 runs

Summary
  git clean -nxd ran
    1.13 ± 0.03 times faster than gix -c index.skipHash=1 -c core.ignoreCase=0 -c core.precomposeUnicode=0 clean -xd --skip-hidden-repositories=non-bare
    1.24 ± 0.03 times faster than gix clean -xd --skip-hidden-repositories=non-bare


On 27 Dec 2023, at 6:28, Junio C Hamano wrote:

> "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>> From: Elijah Newren <newren@gmail.com>
>>
>> We have traditionally considered all ignored files to be expendable, but
>> users occasionally want ignored files that are not considered
>> expendable.  Add a design document covering how to split ignored files
>> into two types: 'trashable' (what all ignored files are currently
>> considered) and 'precious' (the new type of ignored file).
>
> The proposed syntax is a bit different from what I personally prefer
> (which is Phillip's [P14] or something like it), but I consider that
> the more valuable parts of this document is about how various
> commands ought to interact with precious paths, which shouldn't
> change regardless of the syntax.
>
> Thanks for putting this together.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant