Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

int64 IDs #597

Closed
bradfitz opened this issue Mar 16, 2017 · 29 comments
Closed

int64 IDs #597

bradfitz opened this issue Mar 16, 2017 · 29 comments

Comments

@bradfitz
Copy link
Contributor

On 32-bit platforms, a Go int is an int32.

I worry about the use of int as the type for Github IDs in the go-github package.

Is there documentation to suggest that Github IDs are limited to 2 billion items? Seems unlikely. But I can't find anything.

I propose we keep "Number" fields as int, but change all the ID fields to be int64.

Thoughts?

/cc @gmlewis @willnorris @shurcooL

@gmlewis
Copy link
Collaborator

gmlewis commented Mar 17, 2017

I totally agree. Absolutely.

@dmitshur
Copy link
Member

dmitshur commented Mar 17, 2017

I've noticed that too.

By now, there are definitely some IDs that exceed the capacity of int32. I have the following example. Consider the endpoint that returns all public events:

https://api.github.com/events

The first event I got from a query just now was:

{
  "id": "5510765310",
  "type": "WatchEvent",
  "actor": {
    "id": 12230841,

  },
  "repo": {
    "id": 76620213,

  },
  "payload": {
    "action": "started"
  },
  "public": true,
  "created_at": "2017-03-17T01:19:24Z"
}

The ID of that event exceeds 2^31 - 1.

So far we've been using int IDs for consistency with existing fields. But it definitely makes sense to address all of them at once in a future change.

I propose we keep "Number" fields as int, but change all the ID fields to be int64.

SGTM.

@dmitshur
Copy link
Member

Actually, that's not a great example because https://godoc.org/github.com/google/go-github/github#Event.ID type is string, not int.

@dmitshur
Copy link
Member

dmitshur commented Mar 17, 2017

The fact that both events and notifications (types of objects that are very likely to have IDs surpass 2 billion) encode their IDs as JSON strings rather than JSON numbers suggests that other IDs may be much smaller.

Is there documentation to suggest that Github IDs are limited to 2 billion items? Seems unlikely. But I can't find anything.

I've looked, and couldn't find anything either. But this is something that's important for the API docs to specify.

Perhaps it'd be useful to send GitHub support an email to inquire about this. If they can assure us that those IDs are guaranteed to fit into int32, we won't have to make a change now. If they can confirm that int32 may not be enough, we can feel better about making the change.

@bradfitz
Copy link
Contributor Author

If they can assure us that those IDs are guaranteed to fit into int32

It's a global ID, right? It's not an ID scoped to just a single org or repo, right? If so, we don't need their confirmation. 2 billion is too small.

because https://godoc.org/github.com/google/go-github/github#Event.ID type is string

That is its own question. Why is it a string and not some integer type? Can it actually be non-numeric?

@dmitshur
Copy link
Member

dmitshur commented Mar 17, 2017

It's a global ID, right? It's not an ID scoped to just a single org or repo, right?

Which ID are you talking about? As far as I know, different types of objects have different ID namespaces.

There are IDs for users, repositories, events, notifications, gists (these are non-numbers), installations, etc.

I'm sure that at least the user and repo IDs are separate namespaces (so may share equal IDs; here is user with ID 1, and here's repo with ID 1), but not completely sure about the others.

Why is it a string and not some integer type?

The JSON object actually contains a JSON string, so it has to be decoded into a string.

We could consider the "string" option that encoding/json offers, then decoding into int64 is possible:

The "string" option signals that a field is stored as JSON inside a JSON-encoded string. It applies only to fields of string, floating point, integer, or boolean types. This extra level of encoding is sometimes used when communicating with JavaScript programs:

Int64String int64 `json:",string"`

But then we'd need to be very confident it would always work (not contain non-numeric characters, and not overflow int64). It's easier to be confident that string will always work.

Both Event.ID and Notification.ID were added in 2013-2014, and I didn't see any discussion about the decision to use string type for ID there (maybe the "string" option wasn't even in encoding/json back then).

Can it actually be non-numeric?

It doesn't seem that way. I find it somewhat hard to feel confident if GitHub API docs don't actually specify this.

On the other hand, Gist IDs are non-numeric, and it's clear from examples.

@bradfitz
Copy link
Contributor Author

I didn't know (or don't know whether) there is an ID namespace per resource type. But that also doesn't change my point. That just gives us ~5x more headroom if so.

If we're concerned about quoted integers versus not, we could make our own numeric ID type:

type ID int64

... that implements https://golang.org/pkg/encoding/json/#Unmarshaler and deals with string-quoted integers or non-integers and not work about ",string" or changes to the encoding over time. The accessor methods could return int64 instead of github.ID.

For Gists or other hashes-as-IDs, I'd just keep it as "ID *string".

@dmitshur
Copy link
Member

Relevant issue (in a theoretical discussion sense, not practical one):

golang/go#19623

@sahildua2305
Copy link
Member

@shurcooL @gmlewis I see that there was agreement on changing to int64. What's the take on this at this point?

@dmitshur
Copy link
Member

dmitshur commented Nov 7, 2017

There is no agreement at this time. We're waiting on someone to run into a concrete issue before we can decide how to act on this. So far, there have not been any issue reports, so it's better to wait.

Go 2 might change int to be arbitrary precision (golang/go#19623), so it would be better not to change to int64 if that happens.

Also, see what I wrote in #733 (comment) relatively recently:

IMO it's unclear what the state of #597 is right now. The issue has been opened a while ago, but I haven't seen any reports of issues from production use. I'm not convinced that we should act right away, or how we should act. I think that we need to collect more data in that issue (or, wait until there's a concrete need to act).

@sahildua2305
Copy link
Member

Makes sense! Thanks for clearing, @shurcooL

@dmitshur
Copy link
Member

dmitshur commented Nov 7, 2017

A relevant comment from GitHub Staff at https://platform.github.community/t/a-few-issues-ive-had-with-the-projects-preview-api/531/3:

When we refer to something as an "int", it's a 32-bit int. That being said, I'd like to stress that it may change in the future, and we'd do our best to communicate that to you should that day come.

That, to me, supports the idea that we don't need to act preemptively at this time, and waiting is a sound strategy.

@dmitshur
Copy link
Member

We've had a report of this issue causing real errors in production on a 32-bit architecture in #806. /cc @maruel

It had to do with RepoStatus.ID ID specifically, but the value "4396447431" provided there was within int64 but outside of int32 range. That means go-github, as is, couldn't be used on a 32-bit arch.

@kshitij10496
Copy link
Contributor

In the light of the recent crash report, are we actively interested in migrating all the ID fields corresponding to GitHub IDs to a more suitable type?

In continuation of the same comment from GitHub Staff as linked above,

Based on that information, we'd recommend using an unsigned 64-bit integer.

Also, shouldn't uint64 be the desired final type instead of int64 based on the recommendation?

cc @shurcooL

@dmitshur
Copy link
Member

@kshitij10496 Are you using go-github on a 32-bit environment? The problem is only relevant there.

Also, shouldn't uint64 be the desired final type instead of int64 based on the recommendation?

I had the same thought. That's one of the options we should consider (and why I didn't want to accept PR #807 before discussing all the options in here first).

@maruel
Copy link

maruel commented Dec 15, 2017

My vote against uint64 and for int64 instead. IMHO you shouldn't use unsigned unless strictly necessary and I don't think this is not the case. Preferably via type ID int64 as proposed above.

@kshitij10496
Copy link
Contributor

@kshitij10496 Are you using go-github on a 32-bit environment? The problem is only relevant there.

@shurcooL I thank you for your concern. Currently, all my systems are 64-bit architecture so this doesn't directly affect me. (I did figure the implications of this issue after reading this thread and the external resources linked to it.) I thought of addressing the unsigned numeric unint64 type as an alternative since it had not been mentioned above and I wondered if we might have overlooked it somehow.

IMHO you shouldn't use unsigned unless strictly necessary and I don't think this is not the case.

Thanks @maruel for this great advice. I will make a note of it for future references.
I just have a couple of questions regarding usage of signed numeric types here. It would be a good learning experience for me if you can answer them:

  1. Can the GitHub IDs be negative?
  2. Is it not possible for the size of resource IDs become larger than the maximum value int64 is capable of storing(namely, 2^63-1)?

@bradfitz
Copy link
Contributor Author

Go style is to use signed integers by default. Use of unsigned is reserved for bizarre cases when it's absolutely required.

Some examples of Go using signed:

@dmitshur
Copy link
Member

dmitshur commented Dec 18, 2017

Go style is to use signed integers by default.

That makes sense in situations where the integer is often manipulated (and therefore, may end up being negative, even if it can never start out negative).

That's the best reason I know why the builtin len returns a signed int, it's so you can safely do this without an infinite loop due to underflow:

for i := len(s) - 1; i >= 0; i-- {
	// use s[i]
}

I'm not sure if the same rationale applies to IDs, since as I understand, they're not expected to be manipulated (by adding, subtracting amounts). They're only assigned or copied. So using an unsigned type can be acceptable.

For completeness, some examples of Go using unsigned ints (where it's appropriate):

@bradfitz
Copy link
Contributor Author

Sure. I wouldn't object to uint64 in this case if it's required. But I would be very surprised if GitHub uses the high bit of those 64-bit IDs, as that would be painful for their Java and JavaScript users.

@willnorris
Copy link
Collaborator

Talked this through with @gmlewis a bit, and both of us are having a hard time forming a strong opinion either way on signed versus unsigned int64. I think @bradfitz is right that it is very unlikely that GitHub ever uses the high bit, and my gut tells me that signed int64 would me slightly easier to work with, since I suspect that's what people are using in other backends, and so will result in less casting. So let's migrate all of these IDs to int64.

I'll try and get #772 resolved and submitted, so we've got proper versioning in place before this lands. But someone can go ahead and start the work on it.

@bradfitz
Copy link
Contributor Author

Yet another option that I didn't see mentioned here is to not make changes to the package itself, but instead solve this issue in a similar way to the Google App Engine usage:

That seems gross and unnecessary. Also, App Engine supports Go 1.8 now, so even that existing hack should be deleted.

@dmitshur
Copy link
Member

After thinking about the alternatives, I'm convinced. Changing the ID fields from int to int64 (and incorporating that policy for all future PRs) seems like the best overall choice.

@kshitij10496
Copy link
Contributor

kshitij10496 commented Dec 21, 2017

Go style is to use signed integers by default. Use of unsigned is reserved for bizarre cases when it's absolutely required.

Thanks, @bradfitz for the guideline on the conventional use of numbers. I'm sure this will come in handy for me in the future.

There are two considerations that I would like to make with regards to the migration of ID fields from int to int64:

  1. Modification of the corresponding unit tests to deal explicitly with int64 rather than ints.
  2. Creation of a new helper routine Int64 similar to Int which can be used to allocate int64 values. Given the fundamental role of ID fields in most of the types and the widespread use of Int by many other fields, I think it would be reasonable to create a new helper which can assist with the allocation of ID values.

I can give this a try if no one has already started working on it.

Here is a list of all the fields which correspond to GitHub IDs. This can act as an exhaustive, lookup reference while updating the methods which deal with IDs of particular structs as parameters. Some of them are referred as strings which we can ignore. However, I thought of mentioning them as well for the sake of completeness.
If I have somehow missed any type, feel free to add them to the table.

Field Field Type Struct File
ID *string Event activity_events.go
ID *string Notification activity_notifications.go
ID *int TeamLDAPMapping admin.go
ID *int UserLDAPMapping admin.go
ID *int App apps.go
ID *int Installation apps_installation.go
ID *int MarketplacePlan apps_marketplace.go
ID *int MarketplacePlanAccount apps_marketplace.go
ID *int Authorization authorizations.go
ID *int Grant authorizations.go
ID *int PageBuildEvent event_types.go
HookID *int PingEvent event_types.go
AfterID *int ProjectCardEvent event_types.go
AfterID *int ProjectColumnEvent event_types.go
PushID *int PushEvent event_types.go
ID *int PushEventRepository event_types.go
ID *int StatusEvent event_types.go
ID *string Gist gists.go
ID *string GistFork gists.go
ID *int GistComment gists_comment.go
ID *int Issue issues.go
ID *int IssueComment issues_comments.go
ID *int IssueEvent issues_events.go
ID *int Label issues_labels.go
ID *int Milestone issues_milestones.go
ID *int Timeline issues_timeline.go
ID *int Source issues_timeline.go
ID *int Migration migrations.go
ID *int SourceImportAuthor migrations_source_import.go
RemoteID *string SourceImportAuthor migrations_source_import.go
OID *string LargeFile migrations_source_import.go
ID *int Organization orgs.go
ID *int Team orgs_teams.go
ID *int Invitation orgs_teams.go
ParentTeamID *int NewTeam orgs_teams.go
ID *int Project projects.go
ID *int ProjectColumn projects.go
ID *int ProjectCard projects.go
ColumnID *int ProjectCard projects.go
ContentID *int ProjectCardOptions projects.go
ColumnID *int ProjectCardMoveOptions projects.go
ID *int PullRequest pulls.go
ID *int PullRequestComment pulls_comments.go
InReplyTo *int PullRequestComment pulls_comments.go
ID *int PullRequestReview pulls_reviews.go
CommitID *string PullRequestReview pulls_reviews.go
CommitID *string PullRequestReviewRequest pulls_reviews.go
ID *int Reaction reactions.go
ID *int Repository repos.go
TeamID *int Repository repos.go
Since int RepositoryListAllOptions repos.go
ID *int Contributor repos.go
TeamID []int TransferRequest repos.go
ID *int RepositoryComment repos_comments.go
ID *int Deployment repos_deployments.go
ID *int DeploymentStatus repos_deployments.go
ID *int Hook repos_hooks.go
ID *int RepositoryInvitation repos_invitations.go
ID *int RepositoryRelease repos_releases.go
ID *int ReleaseAsset repos_releases.go
ID *int RepoStatus repos_statuses.go
ID *int User users.go
Since int UserListOptions users.go
ID *int GPGKey users_gpg_keys.go
PrimaryKeyID *int GPGKey users_gpg_keys.go
ID *int Key users_keys.go

kshitij10496 pushed a commit to kshitij10496/go-github that referenced this issue Dec 23, 2017
In this commit, the complete migration of all the struct IDs(which refer
to GitHub IDs) from platform dependent `int` type to the signed `int64`
takes place.

A helper function "Int64" is also implemented to assist the migration of
unit tests.

Refer google#597
kshitij10496 pushed a commit to kshitij10496/go-github that referenced this issue Dec 23, 2017
In this commit, the complete migration of all the struct IDs(which refer
to GitHub IDs) from platform dependent `int` type to the signed `int64`
takes place.

A helper function "Int64" is also implemented to assist the migration of
unit tests.

Refer google#597
kshitij10496 pushed a commit to kshitij10496/go-github that referenced this issue Jan 11, 2018
In this commit, the complete migration of all the struct IDs(which refer
to GitHub IDs) from platform dependent `int` type to the signed `int64`
takes place.

A helper function "Int64" is also implemented to assist the migration of
unit tests.

Refer google#597
@dmitshur
Copy link
Member

Edited the table above to add these fields that were missing in the table but present in PR #816 (and they're valid ID fields):

MarketplacePlanAccount.ID
ProjectCard.ColumnID
Repository.TeamID
GPGKey.PrimaryKeyID

kshitij10496 pushed a commit to kshitij10496/go-github that referenced this issue Jan 18, 2018
In this commit, the complete migration of all the struct IDs(which refer
to GitHub IDs) from platform dependent `int` type to the signed `int64`
takes place.

A helper function "Int64" is also implemented to assist the migration of
unit tests.

Refer google#597
gmlewis pushed a commit that referenced this issue Jan 25, 2018
A helper function "Int64" is also implemented to assist the migration of
unit tests.

Refer #597.
@dmitshur
Copy link
Member

Resolved via #816, closing.

dmitshur added a commit to shurcooL/users that referenced this issue Jan 25, 2018
dmitshur added a commit to shurcooL/issues that referenced this issue Mar 29, 2018
dmitshur added a commit that referenced this issue Mar 30, 2018
This is a breaking API change that is a part of issue #597. It should've
been applied earlier, but we missed it because the parameter was
misleadingly named number rather than id or commentID.

Rename the parameter to commentID to make it more clear that it's the
comment ID and not the PR ID nor PR number. This should help improve
clarity of the API, and reduce the chance to mix up the IDs.

Also document which PullRequestComment fields need to be set in
EditComment method.

Similar to #886.
Updates #885.
Updates #597.
gopherbot pushed a commit to golang/build that referenced this issue Mar 30, 2018
When editing a comment, the current code used the issue number in place
of the comment ID. But the EditComment endpoint needs the comment ID.

The issue number is not needed at all in updateGithubComment because the
EditComment endpoint uses the repository and comment ID to uniquely
identify the comment.

References:

-	https://developer.github.com/v3/issues/comments/#edit-a-comment.
-	https://godoc.org/github.com/google/go-github/github#IssuesService.EditComment.

Updates golang/go#24598.
Updates google/go-github#883.
Updates google/go-github#597.

Change-Id: Iae9d967d7be7a75b1bcee7118a3c80fe8f2375b4
Reviewed-on: https://go-review.googlesource.com/103398
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
dmitshur added a commit that referenced this issue Apr 2, 2018
…888)

This is a breaking API change that is a part of issue #597. It should've
been applied earlier, but we missed it because the parameter was
misleadingly named number rather than id or commentID.

Rename the parameter to commentID to make it more clear that it's the
comment ID and not the PR ID nor PR number. This should help improve
clarity of the API, and reduce the chance to mix up the IDs.

Also document which PullRequestComment fields need to be set in
EditComment method.

Similar to #886.
Updates #885.
Updates #597.
dmitshur pushed a commit that referenced this issue Apr 24, 2018
Field Since in OrganizationsListOptions refers to an organization ID,
so its type needs to be int64 to per prior decision in #597.
This is already the case in other similar structs such as
RepositoryListAllOptions and UserListOptions.

This is a breaking API change that is a part of resolving issue #597.
It should've been applied earlier, but we missed it (likely because
the field doesn't have "ID" in its name, so it's harder to spot).

Updates #597.
nbareil pushed a commit to nbareil/go-github that referenced this issue May 1, 2018
A helper function "Int64" is also implemented to assist the migration of
unit tests.

Refer google#597.
nbareil pushed a commit to nbareil/go-github that referenced this issue May 1, 2018
…oogle#888)

This is a breaking API change that is a part of issue google#597. It should've
been applied earlier, but we missed it because the parameter was
misleadingly named number rather than id or commentID.

Rename the parameter to commentID to make it more clear that it's the
comment ID and not the PR ID nor PR number. This should help improve
clarity of the API, and reduce the chance to mix up the IDs.

Also document which PullRequestComment fields need to be set in
EditComment method.

Similar to google#886.
Updates google#885.
Updates google#597.
nbareil pushed a commit to nbareil/go-github that referenced this issue May 1, 2018
Field Since in OrganizationsListOptions refers to an organization ID,
so its type needs to be int64 to per prior decision in google#597.
This is already the case in other similar structs such as
RepositoryListAllOptions and UserListOptions.

This is a breaking API change that is a part of resolving issue google#597.
It should've been applied earlier, but we missed it (likely because
the field doesn't have "ID" in its name, so it's harder to spot).

Updates google#597.
wlynch added a commit to wlynch/ghinstallation that referenced this issue Oct 9, 2019
This change makes ghinstallation consistent with
github.com/google/go-github for App and Installation identifiers. See
google/go-github#597 for related discussion.

Fixes bradleyfalzon#27
wlynch added a commit to wlynch/ghinstallation that referenced this issue Oct 9, 2019
This change makes ghinstallation consistent with
`github.com/google/go-github` for App and Installation identifiers. See
google/go-github#597 for related discussion.

Fixes bradleyfalzon#27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants