Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature RFC: Flatbuffers support for Optional types. #6014

Closed
CasperN opened this issue Jun 29, 2020 · 136 comments
Closed

Feature RFC: Flatbuffers support for Optional types. #6014

CasperN opened this issue Jun 29, 2020 · 136 comments

Comments

@CasperN
Copy link
Collaborator

CasperN commented Jun 29, 2020

Note: This top comment is heavily edited. It was kept up with the current state until this issue closed.

Motivation

Flatbuffers should allow users to choose to use optional types. There has been some interest in distinguishing between default values (which are not stored in the binary) and some notion of None which the user controls.

Here are some links to previous interest in this idea:

Currently, a user can control a field's presence in the binary by specifying "force_defaults" and checking "IsFieldPresent" which is a bit of a hack. This proposal should define proper Flatbuffers optional types, which should be a better way of doing this. Use of this feature is only advisable for new fields, since changing default values is in general backwards-incompatible.

How do we represent this in the schema file?

We will specify it like so

table Monster { mana: int = null; }

This visually implies that optional types are at odds with default values and is "consistent" since the value to the right of the equals sign is what we interpret non-presence to mean.

Change to Schema Grammar:

field_decl = ident : type [ = (scalar | null) ] metadata ;

We can add a field tag, e.g. "optional" or "no_default", that triggers this behavior. Hopefully no one is using those tags. Maybe we can make it specifiable to flatc, an "--optional-field-keyword-tag" flag, just in case people are using it and can't stop.

How do we represent this in the binary?

We are going with option (A).

(A) Non-Presence means None

Instead of omitting zero-like values, the generated code must store them. Non-presence for optional fields no longer means "whatever the default is," now it means None. You can interpret it as "the default value is None". This also means we cannot specify both specify a non-null default and mark the field as optional.

Pros:

  • This seems more intuitive.
  • It aligns with the "force_defaults" + "IsFieldPresent" hacky manual approximation of this feature.
  • If Nones are more common than zero-likes then this will have smaller binaries.

Cons:

  • @aardappel thinks this is harder to implement

    "making presence an indicator would require we pass this special field status down to the field construction code to override the current val == default check, which means slowdown, codegen and runtime changes in all languages.. whereas my "least likely to be used default" trick requires no changes"

(B) Some Sentinel value means None

In this scenario, zero-like values are still not stored. Instead we choose some "sentinel" value which we interpret to be None (e.g. int can use int_min and float can use some kind of Nan).

Pros:

  • @aardappel thinks this is easier to implement

    "it requires the schema parser to set default values for you, and no changes anywhere else"

  • If zero-likes are more common than None then this will have smaller binaries

Cons:

  • Someone might want to use the sentinel value (Hyrum's law).
    • This can be mitigated by publishing the sentinels and letting users decide whether they need the sentinels.
  • This probably won't work for fields representing raw bits.

How do we represent this in every language API?

We'll need to change the type signature of all generated code (building/reading/mutating/object/etc) around the optional type to signal its optional-ness. I think we should use the language's local standard for optional types. Suggestions:

  • Python: Optional[T].
  • Rust: Option<T>.
  • C++17 has std::optional<T> but its not obvious what to use for earlier versions. T* would work.
  • Java: Optional shows up in Java 1.8 and triggers autoboxing, so idk :/

The exact generated-API for a language should be discussed in the PR implementing this feature in that language.

Out of scope

(I'll add links if you make issues for these feature requests)

  • Syntactic types
  • Default values for strings and tables

TODO

task owner done
Change flatc to support schemas with optional types and cause an error if they're used in unsupported languages @CasperN #6026
Implement optional type API in C++ @vglavnyy #6155
Implement optional type API in Java @paulovap #6212
Implement optional type API in Rust @CasperN #6034
Implement optional type API in Swift @mustiikhalil #6038
Implement optional type API in lobster @aardappel
Implement optional type API in Kotlin @paulovap #6115
Implement optional type API in Python @rw?
Implement optional type API in Go @rw?
Implement optional type API in C @mikkelfj
Implement optional type API in C# @dbaileychess #6217
Implement optional type API in Typescript/javacscript @krojew #6215
Php, Dart, etc ... ?
Update documentation to advertise this feature @cneo #6270

[edits]

  • added todo list
  • added points from the discussion to each section.
  • added out of scope section
  • Decision: go with (A) and use = null syntax in schema file (cross out alternatives)
  • Updated TODO list, finished parser, Rust in progress
  • Change to schema grammar, link to swift PR, Note at top
  • Added more languages to the TODO
  • Lobster support 🦞
  • Kotlin and C support
  • Java, C#, TS/JS support and docs, issue closed, no longer editing.
@aardappel
Copy link
Collaborator

I mentioned no_default since all fields are already optional, in a way. But it's not a great name either :)

A:

  • The slowdown can be avoided if a language implementation has a second set of runtime functions for each kind of scalar that does not take a default at all, and the codegen calls this function instead for fields that have this attribute set. I'd say this would have to be the implementation technique if we decide on this approach, since passing another bool to test down to the existing functions would penalize code that doesn't use this feature, which is currently all of it.
  • Generally the bigger downside of this approach is that the feature won't work correctly until all languages implement it. I'd feel better about it if a PR implementing this feature would attempt to cover most languages at once.
  • In C++, an int field with this attribute set would return an int * which may be nullptr, which is equivalent to the current workaround, which is to make the field of type struct Integer { i:int; }. We could then decide to make this generate std::optional when generating code explicitly for C++17 and up, though frankly that is not actually an improvement in ergonomics. The fun thing about the int * is that it can actually point into the buffer, so doesn't actually touch the int in question.

B:

  • Attractiveness of this approach is that it can work reasonably even if languages don't support it explicitly. It is however not very ergonomic having to deal with these special values manually, though helper functions can make this better bool FieldPresent(int i) { return i != 0x80000000; }
  • This trick is already in use by some users. @lu-wang-g

@aardappel
Copy link
Collaborator

@paulovap
Copy link
Collaborator

paulovap commented Jun 29, 2020

A:
On Java, Optional shows up on API 1.8, so we would have to set this API as minimum requirements. Also, the use of Optional would trigger autoboxing for scalars which is a performance penalty.

B:
I have a question here. Is the plan to set up default sentinels for each type or allowing users to define their own sentinel? Like:

table Test {
    myInt: Int32 (sentinel=-1);
}

That way the user would have control of what sentinel to use.

@mikkelfj
Copy link
Contributor

I agree this is an issue. I mentioned this in the FB2 discussion as analyze SQL NULL values.

I have been thinking about this problem,

A) using is present: this doesn't work well with tables and strings etc. because these have not default value. A null value would have to be an offset to a special null object that would have to be checked on access, or the current null value would have to change interpretation. On a related note I have been considering offering a dummy object instead of NULL when the C api discovers a null because that is a safer and simpler approach. You can ask if it is a dummy object explicitly, and otherwise access it safely but getting only dummy results.

B) sentinel values are not good, and if they are, they are better done by the end user for specific use cases, i.e. FlatBuffers need not change.

C) My proposal. Add an additional bit-vector field at vtable offset 0 for table that have optional values. The bit-vector is stored inline in the table, but could also be an external offset if is believed that a) the are large enough to make this worthwhile, and b) data sharing is likely.

Now you can access data just as before, without overhead, but you can explicitly ask if the field is marked as NULL, if you care. This can be very efficient, and the builder can easily add this information as it already has vtable vector in memory that it writes out, and can add the null vector at the same time.

@mikkelfj
Copy link
Contributor

The reason why B) is not good is because you cannot roundtrip SQL data transparently. This is an absolute requirement for this feature.

@CasperN
Copy link
Collaborator Author

CasperN commented Jun 29, 2020

One thing we can do for (A) and the "simultaneous implementation problem," which I agree is a huge problem, instead of implementing it in a huge PR, is to not generate code for optional fields (and warn) if the language isn't ready yet. I think this will at least make code review easier.

Perhaps we can also give users the option to use existing codegen and zero-defaults if they really want it and promise they'll always check for field presence manually... but such an option is definitely a foot-gun.

I agree with the use of nullable pointers over optional for performance reasons. Rust should prefer Option<&mut T> (if it had a mutation API) since raw pointers are frowned upon in that language and it'll compile down to a nullable pointer anyway.

@mikkelfj
Copy link
Contributor

mikkelfj commented Jun 29, 2020

Oh, regarding C) it might be strange if a NULL field returns 4 because 4 happens to be the default value. For this reason it makes sense to support a default null value also. That is effectively option B), but without the hack since you have precise NULL information.

EDIT: this would add runtime overhead because it would need to check the null table to see if default or null default should be returned. But at least it only happens for absent values.

On Wouters suggestion to have a parallel set of accessors. That would add a lot of loud on compilers to strip out even more templates in C++ or macros generated inlines in C, and complicate code generation quite a bit in C.

@mikkelfj
Copy link
Contributor

We could also require the default values only apply for non-optional fields, and always store values in optional fields, unless they are NULL, and have the bit vector. The default value then becomes the return value if NULL, and NULL can be checked separately.

@aardappel
Copy link
Collaborator

@paulovap the sentinal IS the default, so if you wanted to use -1, you could already write myInt:int = -1 already :)

@mikkelfj certainly do not want to add extra encoding to the binary to support this feature, I think that is too high a cost for such a simple feature, even compared to A).

@CasperN we'd have to actively error-out if someone compiles with an unsupported language, rather than ignoring the field, I think, like we currently do with other unsupported features. Doing that in multiple PRs would be ok, as long there is an effort to support most languages.

@mustiikhalil
Copy link
Collaborator

As @aardappel already said, most of the languages can return a null value (from the ones I’ve worked with) so why not introduce an optional flag ‘age: int (optional)’ and instead of returning a default value we just return null. Most of languages has some type of optional, if they don’t we can mark the name of the object as optional when we generate the code. This won’t require us to play with the binary data, and would only require small refactoring for each of the language generators. As as example swift already can return optional structs, tables, strings (we are just missing scaler optional)

@mikkelfj
Copy link
Contributor

Well, if the cost is too high, I think we should not do it. I'd rather not have a half-baked solution on top of an already half-baked solution.

@aardappel
Copy link
Collaborator

@mikkelfj what is half baked about A) ? only scalars require this feature, all other types can already be null.

@CasperN
Copy link
Collaborator Author

CasperN commented Jun 30, 2020

@mikkelfj

Can you elaborate on

A) using is present: this doesn't work well with tables and strings etc. because these have not default value. A null value would have to be an offset to a special null object that would have to be checked on access, or the current null value would have to change interpretation

I'm not sure what you mean by special null object, can you give an example?

@vglavnyy
Copy link
Contributor

vglavnyy commented Jun 30, 2020

With C++ the simplest way is to embed an optional scalar into a struct.

table Foo{
 bar : int (boxed);
}
// implicitly translated to:
table Foo{
  struct _ { bar : int = 0 }; // anonymous transparent proxy
}
//, or translated to
table Foo{
  struct bar { _ : int = 0 }; // named proxy
}

Struct might be present or not.
With C++17 it is possible to generate Optional or Expected for any pointer-like (non-scalar) fields.
In a json it can be like this:

"Foo" : {
 "bar" : [-12]
}
// or 
"Foo" : {
 "bar" : { "_" :-12 }
}

@krojew
Copy link
Contributor

krojew commented Jun 30, 2020

I'm for option A, since option B is what we have now in practice. On how to implement A - well, that's another problem. As far as I know, all our supported languages (maybe except Lobster, whatever that is 😄 ) have some notion of optional data, so we're fine in terms of compatibility. Of course, this would require making changes everywhere, but I personally, am fine with such cost if we finally get proper optional data support for scalars.

@mikkelfj
Copy link
Contributor

mikkelfj commented Jul 1, 2020

@aardappel

what is half baked about A) ? only scalars require this feature, all other types can already be null.

First, the current use of null and default values is half-baked because strings and other non-inline data do not have default values, different from scalars. We could add default values for strings.

Second, hijacking presence for NULL is half-baked because it hijacks the current use of default values for a differen purposes. Granted, it can work if you change schema semantics for optional values, but now you have yet another layer of whatever presence and null offsets means, depending on a schema flag.

I'd rather have null as a clearly stated non-negotiable data point for all imaginable data types. If not a bit vector, it could also be a special vtable offset value like 0xffff - it takes up more space, but only affects verifiers since non-supporting implementations would just see a future version of the format and choose the default value of the field, while supporting implementations would only take a performance hit if detecting the field is out of range, which it has to check anyway.

@CasperN

I'm not sure what you mean by special null object, can you give an example?

It was actually Wouter's suggestion originally when I once mentioned the lack of Null transparency for SQL data.

The idea is that to avoid interfering with the current null interpretation, you store a special object in the buffer. The uoffset (not voffset) poins to this object for NULL entries. It can be added to the buffer as an object with just an empty vtable (+/- handling of required fields for the type, but ideally just one object for all types, but NULLable shouldn't have required fields?). Any user level code can do this today, except it is hard to tell if an object is the NULL object or some other object. One could argue that you could just store and actuall NULL instead, but that changes the semantics of the current system and would prevent adding default values for strings and tables etc.

@mikkelfj
Copy link
Contributor

mikkelfj commented Jul 1, 2020

To extract a key point from my above comment:

D) store NULL values as virtual table voffset 0xffff because it is transparent to current implementations and acts the same as a missing entry with voffset 0x0000 except it looks like a future version instead of entirely missing. in both cases a default value is returned. It is now possible to add a separate support function: IsNull() similar to IsPresent(), but it avoids overloading current semantics, and it does not introduce a new bitmap data structure as my proposal C). The benefit of C) is that you can have short vtables if all the last fields are NULL, but I think D) is probably preferable.

A larger benefit of D is that you entirely avoid changing the schema semantics with an 'optional' setting because you can just store as value as NULL by adding an API function: SetNull() in the builder. Of course you could add attributes to block, or allow NULL, but it is not required.

@mikkelfj
Copy link
Contributor

mikkelfj commented Jul 1, 2020

Sorry, I'm not thinking straight. You cannot transparently store 0xffff as a vtable offset. Future versioning requires a field in the vtable outside the currently expected max vtable size, but you cannot store invalid offsets within the table because common reader logic will not check the validity of this offset, only verifiers will. The approach could still work if all readers learned to check this similar to the 0 check they perform now, but that is not practical. That is why the bitmap approach is better.

@CasperN
Copy link
Collaborator Author

CasperN commented Jul 1, 2020

@mikkelfj

Second, hijacking presence for NULL is half-baked because it hijacks the current use of default values for a differen purposes. Granted, it can work if you change schema semantics for optional values, but now you have yet another layer of whatever presence and null offsets means, depending on a schema flag.

I see it as as having the same semantics, just that the default value is a value outside of the type of the scalar itself, NULL.


I might be misinterpreting you but I think you're looking for a default vs null system for vectors, tables, and strings (in addition to scalars). The current state of the world is that scalars are default-y and the more complex objects are null-y. This proposal allows users to choose default-y or null-y semantics for scalars but you're worried that it may complicate a future feature where users can choose to make the complex stuff default-y. Therefore this proposal should also design a solution so tables, strings, and vectors can be default-y.


On the bitmap thing,

I'm not sure of the value of introducing bits to track "default value" vs "null value". It seems to me like a premature optimization of a case where some default and null values are very common, that has a cost on every other use case, though I'm not sure what cases you have in mind.

@aardappel
Copy link
Collaborator

aardappel commented Jul 2, 2020

@mikkelfj

First, the current use of null and default values is half-baked because strings and other non-inline data do not have default values, different from scalars. We could add default values for strings.

I believe that is a separate issue. Not having defaults is maybe not great, but it doesn't affect the string's ability to be optional in the sense of this issue. In fact, non-inline data (and structs) are ahead of scalars in this sense.

There are "issues" with supporting default values. For strings it is reasonably easy, but some form of default table is not exactly a lightweight feature.

We're not hijacking anything, we're merely extending the existing semantics. Under proposal A), an int (optional) would have the same semantics and representation (and almost the same API) as struct Int { i:int }. It is not that new.. for me the big deal is getting all languages to support it.

And I am not at all familiar with SQL, so I really don't follow your concerns around null in general. I am mostly familiar with the nulls in C/C++/Java.. and I believe Flatbuffers follows a similar model.

@mikkelfj
Copy link
Contributor

mikkelfj commented Jul 3, 2020

And I am not at all familiar with SQL, so I really don't follow your concerns around null in general.

NULL in SQL is controversial because the original designers (see book by C.J. Date on SQL) considered it impure in an otherwise elegant and consistent model based on algebra. However, this leads to requiring a defined 0-value value (like an empty string, or a numeric -9999, or whatever) in the domain of any given type, such as integers or strings. Pragmatics found it better to explicitly have a NULL value outside the value domain when they had lots of tables with data in which only few contained actual data. You can also select records where fields are not NULL or vice versa without having to define what a 0-value is in a given context.

In actual SQL use all of this can be condensed down to: If you have data extracted from a database that you want to process and reinsert into the database, you can expect to see NULL values, and you cannot consistently assume that you can map NULL to some other value (a 0-value) because that might overlap with a value meaning something else. So if you need to reinsert data without loss, you must keep track of the NULLs. Of course this is a mess, but a practical necessity.

Now, for this to work with FlatBuffers, you need to know exactly what is a NULL and not just what is NULL by convention wether it be a NULL string, or a default scalar value or an absent scalar that is different from a present default value.

If you add optional types to the schema, you can impose exact semantics for NULL simply by defining what NULL means for the given type and make sure that the equivalent type for SQL usage can represent all values in addition to NULL. This can actually work, but it is somewhat messy and hard to explain, but can be helped with API support.

If you do not modify the schema but just decides what constitutes a NULL, you are open to all sorts of interpretations in present use which will not work.

Optional types corresponds to SQL schema types where a field can be allowed to be NULLable, or not.

You also need to consider how the data can be represented in JSON without loosing NULL, but this should be workable.

I'd prefer a bitmap because it takes all guesswork away for any current and future types. It can be directly exported to other data formats such as Arrow that also uses bitmaps for NULLs, and it is fast to process because you can just scan the bitmap rather than special casing each kind of value that can be NULL.

Bitmaps do not interfere with current implementations because they are just an extra field in a table that you are not forced to look at if you do not care, but of course you cannot write NULLs if your API does not support it, unless you opt to write the NULL bitmap yourself, which is possible because it would just appear as a ubyte array.

So, if you do not want bitmaps, it can still be made to work if you introduce the optional attribute in the schema, but if you do not, it will be a mess.

@mzaks
Copy link
Contributor

mzaks commented Jul 3, 2020

What about introducing the optional scalar types:

Byte Ubyte Bool
16 bit: Short Ushort
32 bit: Int Uint Float
64 bit: Long Ulong Double

where compiler when sees those per default creates structs:

struct Int {value:int};
struct UInt {value:int};
etc...

And languages which support actual optionals can replace this default behaviour by generating an API with optionals. This way there is an option to have a "small" non breaking change which will work for all languages and opens up gradual migration path for every language specific code generator.

PS: If by any chance people already have a Int type in their schemas, we can introduce a flag to suppress the Boxed scalar struct creation.

@CasperN
Copy link
Collaborator Author

CasperN commented Jul 3, 2020

@mikkelfj

[null] leads to requiring a defined 0-value value (like an empty string, or a numeric -9999, or whatever) in the domain of any given type, such as integers or strings. Pragmatics found it better to explicitly have a NULL value outside the value domain

This seems like options (B) and (A) respectively.

you can expect to see NULL values, and you cannot consistently assume that you can map NULL to some other value (a 0-value) because that might overlap with a value meaning something else. So if you need to reinsert data without loss, you must keep track of the NULLs ... Now, for this to work with FlatBuffers, you need to know exactly what is a NULL and not just what is NULL by convention whether it be a NULL string, or a default scalar value or an absent scalar that is different from a present default value.

I think we know exactly what is a NULL under (A) for optional scalars in a table. They are null if and only if IsPresent == False.

If you do not modify the schema but just decides what constitutes a NULL, you are open to all sorts of interpretations in present use which will not work.

I'm focusing on the words "in present use": I think you mean fields that are currently scalars cannot safely be redefined into optional scalars because we cannot distinguish between NULL and default. This is correct and intended. For existing fields, changing the optional-ness is a bad idea (and users will need to be educated about that).

Adding bits, as you suggest in option (c), will allow for retrofitting null onto existing scalars. Are you claiming retrofit-ability should be a requirement?

Fwiw, my opinion is that allowing existing scalars to convert into being optional is not worth the extra bits of option (c) and educating users that "optional (or lack thereof) is permanent" is sufficient for this feature.

You also need to consider how the data can be represented in JSON without loosing NULL, but this should be workable.

Good point. At first glance, json's null keyword/literal seems like a good fit to me.


@mzaks

I like your idea because it lets all languages generate a default API for optionals, though it is essentially syntactic sugar in the schema. What would happen to the Int/Uint after the language in question adds optional support? Would that be a code-breaking change?

Regardless, I think we should start with the "error-out if someone compiles [optional scalars] with an unsupported language" approach, since its a lot easier. Your idea should be revisited in the scenario where we go ahead with option (A) but haven't implemented optional scalars in all languages and really need to. However, I don't think we'll end up in that situation since a prerequisite of accepting this feature is confirming we'll actually do the work in enough languages.


@aardappel, what would you like to see before deciding between (a) and (b) and giving a go/no-go on this feature? What is the minimum set of languages that must support this feature for it to be worth it?

I can volunteer for the following tasks (unless someone else wants them):

  • Changes to flatc (adding the tbd optional field attribute and error-ing out)
  • C++ generated API changes
  • Rust generated API changes

@mikkelfj
Copy link
Contributor

mikkelfj commented Jul 4, 2020

Adding bits, as you suggest in option (c), will allow for retrofitting null onto existing scalars. Are you claiming retrofit-ability should be a requirement?

One the one hand it would be useful, on the other hand it still requires marking the table as having NULLable fields, as opposed to marking individual fields. Bitmaps also provide fast processing of data which can be rather helpful, and it takes up less space than forcing default writes on some cases.

But mostly it rubs me the wrong way that you can declare -4 to be the default value, but it really means nothing, since -4 will be stored if not NULL, and if NULL, well, it doesn't really matter what the default value is.

@mzaks
Copy link
Contributor

mzaks commented Jul 4, 2020

Here is how I imagined my proposal in detail:

  1. Make following changes to parser and the AST model. Make sure that parser accepts the new optional types (e.g. Int) and marks which of them were used in schema. Add a new bool property to AST model representing structs (e.g. syntaticType: bool). Generate AST models for marked optional types with syntaticType = true.
  2. Start changing the code generators to not generate types for syntaticType = true, but use native types like Option<i32> in Rust.

Step one can be done by one person and will work even without step two, the "old" code generators will just generate code for the Int struct and use this type for table fields. The new generators which know about syntaticType will not generate the Int struct, but type the fields in the table appropriately (e.g. Option<i32> in Rust).

Potentially the syntaticType concept could be improved upon to support more native types for a language e.g.:

struct UUID (syntaticType) {
  bytes: [u8];
}

Where in all languages which have a native UUID type (e.g. Swift), the native type is used and for other languages a struct is generated.

@aardappel
Copy link
Collaborator

@mikkelfj from what you say, you'd be happy with A) also, it encodes a null value out of the domain explicitly. Adding bits wouldn't make this any different, they are redundant.

@mzaks that could work, but I wouldn't want to clutter generated code with all these definitions for these auto generated structs. Option A) is what you get if you instead access the scalar directly as if it was a struct. I'd prefer that to adding new types.

@CasperN option B) is trivial, it is merely a convenience feature on the status quo. It is just if (attr == "optional") default_val = INT_MIN. I am "warming up" to option A), but I'd like whoever person(s) commit to implementing this also commit to a "majority" of languages over not too long a time. They don't have to do it all themselves, but maybe help in tracking and coordinating with language authors (quite a few have been active on this thread already).

@CasperN
Copy link
Collaborator Author

CasperN commented Jul 6, 2020

@aardappel, I can do tracking/coordination too.

@mzaks I'm a little wary of making that part of making your SyntacticTypes feature part of optional scalars since it seems like a bit of additional work. I think it should be a separate feature request.


Can everyone participating vote on A vs B vs do-nothing and maybe volunteer a language to implement?

I support (A). I'll track work by editing the top comment

@aardappel
Copy link
Collaborator

Ok, @CasperN is in charge of this feature :)
I'll vote (A), then (B), then (do-nothing) in that order.
I can help with C++ if no-one else takes it on.. @vglavnyy @yaoshengzhe ?

@vglavnyy
Copy link
Contributor

vglavnyy commented Jul 8, 2020

What is the plan for implementing of (A)?
In fact, all scalars in binary already are optional values. A scalar value is present or not.
We only need a special attribute:

  • to enable the generation of an additional getter with MayBe return value;
  • to modify implementation of setter (local force_defaults).

That’s all. These changes are compatible with already existing binaries and user programs.
The real problem is generated Object API. This attribute will change a type of a scalar inside T-generated tables.

Example:

table Monster { mana : int = -1 (maybe); }
class Monster {
  int16_t mana() const { return value or default; }
  flatbuffres::optional<int16_t>  get_mana() const;
  void add_mana(int16_t mana) {
    fbb_.AddElement<int16_t>(Monster::VT_MANA, mana); // WITHOUT default value!
  }
};
// Optional for default-constructable and copyable T.
// Can have a special implementation if T is a pointer.
template <class T > class optional {
  optional(); // none with T{};
  optional(bool none, const T& val); // only with explicit default value
  T& value(); // return ref to value or ref to default value (never throw)
  T& operator*(); // return value()
  operator bool() const;
  bool has_value() const;
  std::optional<T> get() const; // down-cast to std::optional<T>
  operator std::optional<T>() const; // implicit down-cast to std::optional<T>
};

@CasperN
Copy link
Collaborator Author

CasperN commented Jul 8, 2020

@vglavnyy,

In fact, all scalars in binary already are optional values. A scalar value is present or not.

Currently they're optional in the binary, but non-presence is interpreted as "return the default value according to the schema" so from the perspective of user's code, "all values are there". Proposal (A) allows non-presence to be interpreted to mean None (ie "Nothing" in the words of haskell's optional type, "maybe"). That is semantically at odds with "non-presence means default".

I think your table:

table Monster { mana : int = -1 (maybe); }

should be an error because I think there shouldn't be a non-null default for optional types. The following is fine.

table Monster { mana : int (maybe); }

Minor bike-shedding idea, we could specify optional types like so:

table Monster { mana : int = null; }

This visually implies that optional types are at odds with default values and is "consistent" since the value to the right of the equals sign is what we interpret non-presence to mean.

The real problem is generated Object API. This attribute will change a type of a scalar inside T-generated tables.

As discussed above, changing a field from default-y to optional is a binary-semantic breaking change. It is also intended to be a code breaking change (in all languages except maybe python) since we want to show, in code, that the type is optional. And not just in the object API, even the normal accessor should change: e.g. optional<int16_t> mana(); or int16_t* mana(); (the exact form should be debated in the C++ PR).

What is the plan for implementing of (A)?

  1. First, flatc should recognize whatever syntax we choose to represent optional values and error-out when optionals are detected.

  2. For each language, implement support for optional types (with whatever API is best for the native language). In that PR, remove the flatc error when generating code in that language.

  3. When we have enough languages supporting this, then start changing the documentation and advertise the feature to users.

@adsharma
Copy link

adsharma commented Oct 28, 2020 via email

@mikkelfj
Copy link
Contributor

flatcc generates flatbuffer code in C. It is also a schema compiler library that can be used for other purposes. I use it for other purposes in addition to processing flatbuffer files.

@CasperN
Copy link
Collaborator Author

CasperN commented Oct 28, 2020

@adsharma

Your original comment seems to suggest you're using (required) to specify optional scalars

I just pushed support for a similar feature, but instead of using
field = null
It uses
field: type (required);

@CasperN
Copy link
Collaborator Author

CasperN commented Oct 28, 2020

@leventov

Will array fields support optional? How the grammar will look like?

No, we are only focusing on table fields in this issue.

@aardappel
Copy link
Collaborator

@adsharma you are free to do as you please of course, but it would be nice for your users to alert them to the fact that your tool uses an incompatible dialect of FlatBuffer schemas, and how so, if you feel strongly that you want to change what things mean to flatc.

@adsharma
Copy link

@aardappel - I don't want to gratuitously do something different, but here's my thinking on the matter.

The way we're using flatc.py, we're competing with:

  • SQL DDL as a way to describe data. I often advocate for flatbuffers to describe things at a conceptual level, whereas the SQL DDL tends to describe data at a physical level (although views could be used for conceptual data, but for various reasons they don't work as well as flatbuffers).

  • Describe common programming language constructs in a language neutral way

If I push required for non-scalars and = null syntax for scalars in these circumstances, my users are likely to say - no thank you, I'll describe data in SQL or $favorite_type_safe_language. I must also admit that I'm sympathetic to removing null as a concept and avoid the billion dollar mistake.

In most programming languages (rust, kotlin and swift among others), we have Optional[T] that works for both scalars and non-scalars.

@aardappel
Copy link
Collaborator

To me, people rejecting your solution because of such a small feature would be unlikely, and I think generally that a tool adheres to (and is compatible with) a well known standard is much more important.

Also, "the billion dollar mistake" is about implicit nulls, and the new use here is explicit (requires checking to access, at least in language that support it, like C++, Rust and C#). We already had nulls before, and any implicitness is due to the language they're implemented in, not FlatBuffers.

@mikkelfj
Copy link
Contributor

I'm not entirely comfortable with the way required and optional differ for different value types, so I understand @adsharmas concern. I still think schema compatibility is more important. However, if we improve FlatBuffers to more suitable as a DDL I'm all for it. That is part of my lengthy arguments in the past. But I fear that ship has sailed.

@adsharma
Copy link

@mikkelfj @aardappel - I can look into supporting the = null syntax in addition (and treat it the same as required). So we remain compatible. Does that address your concern?

CasperN added a commit that referenced this issue Nov 24, 2020
* Documentation updates for Optional Scalars

* Updated Support

* Reword stuff

* s/NULL/null

Co-authored-by: Casper Neo <cneo@google.com>
@Triazic
Copy link

Triazic commented Dec 9, 2020

Voting/requesting support in C# and typescript :)

@CasperN
Copy link
Collaborator Author

CasperN commented Dec 9, 2020

Voting/requesting support in C# and typescript :)

Oh we did that! I just forgot to update the top comment (doing that now).
With the documentation updates, I'll close this issue

@kyleb-imp
Copy link

Is there support for optional types in python? The top comment says there is not. Is that planned?

@adsharma
Copy link

Python: Optional[T]

works just fine for me in python3. Perhaps the issue is specific to how flatbuffers optional idea interacts with the python type system?

@kenroser
Copy link

Is there support for optional types in python? The top comment says there is not. Is that planned?

If I attempt to compile schema that has optional scalars with flatc -p, I get

error: /Users/xxx/test.fbs:14: 23: error: Optional scalars are not yet supported in at least one the of the specified programming languages.

@CasperN
Copy link
Collaborator Author

CasperN commented May 23, 2022

We would support it, there's no technical blocker, its just unstaffed. I'm happy to review a PR though

@kenroser
Copy link

It looks like the support for Python is in this PR, #7318

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests