Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AVRO] Generation of Avro schemas with logical types. #133

Closed
wants to merge 26 commits into from

Conversation

jcustenborder
Copy link

This is the first part. We're not ready to merge. I want to get feedback on the approach I have so far. I have schema generation working. An Avro Decimal must have the precision set. This means that I have to know this during schema generation. Because of this I decided to go with an annotation based approach. This seems to work pretty well so far. For example this is how.

{
  "type" : "record",
  "name" : "DecimalType",
  "namespace" : "com.fasterxml.jackson.dataformat.avro.schema.TestLogicalTypes$",
  "fields" : [ {
    "name" : "value",
    "type" : {
      "type" : "bytes",
      "logicalType" : "decimal",
      "precision" : 5,
      "scale" : 0
    }
  } ]
}

is generated by this java.

  static class RequiredDecimal {
    @JsonProperty(required = true)
    @AvroDecimal(precision = 3, scale = 3)
    public BigDecimal value;
  }

Using annotations gives us a weird benefit / problem with existing code. If you don't have an annotation a BigDecimal will be serialized as a string. This is great if you have existing code but a pain if you forget the attribute.

I went with a similar approach with all of the other logical types. In order to utilize them you need to use the corresponding annotation.

#132

@jcustenborder
Copy link
Author

Decimals are looking good. I have made some great progress here. I now have schema generation support for BYTES and FIXED based schemas via AvroDecimal.

{
  "type" : "record",
  "name" : "FixedDecimalType",
  "namespace" : "com.fasterxml.jackson.dataformat.avro.schema.TestLogicalTypes$",
  "fields" : [ {
    "name" : "value",
    "type" : {
      "type" : "fixed",
      "name" : "foo",
      "namespace" : "com.fasterxml.jackson.dataformat.avro.schema",
      "size" : 8,
      "logicalType" : "decimal",
      "precision" : 5,
      "scale" : 0
    }
  } ]
}
{
  "type" : "record",
  "name" : "BytesDecimalType",
  "namespace" : "com.fasterxml.jackson.dataformat.avro.schema.TestLogicalTypes$",
  "fields" : [ {
    "name" : "value",
    "type" : {
      "type" : "bytes",
      "logicalType" : "decimal",
      "precision" : 5,
      "scale" : 0
    }
  } ]
}

Right now I am going down the path of using the conversions that are built into Avro for this. @cowtowncoder The big thing I could use your assistance for is direction on the deserialization. Should I be looking into extending AvroParser with methods like decodeDecimal(), etc to handle these?

@cowtowncoder
Copy link
Member

Excellent progress.

One update from my side: I am now leaning towards creating 2.10 branch, as sort of forward-looking minor version, with some of concepts from 3.0 (basically simple Builders for ObjectMapper, JsonFactory), to allow writing code that works with 2.10 and is easier to convert to 3.0.
But also limit number of new features in 2.10 to as little as possible; to stabilize it as the only long-living 2.x maintenance branch.

But. I think this feature could make lots of sense for 2.10. That would make more sense than adding it in 2.9 patch.

I hope to release 2.9.6 sometime in May (next 2 weeks), and then create 2.10 branch.

@jcustenborder
Copy link
Author

@cowtowncoder I have decimals finished and working properly. I added some testing so now I'm doing round trips as well as comparing the serialized output against the equivalent GenericRecord generated by Avro. I want to make sure that the output from Jackson is readable Avro.

I need a little help on the date types. Take timestamp-micros. It's the microsecond timestamp format for Avro. I'm not sure where to hook in on the Jackson side to get the actual Date object. It looks like Jackson is converting the field to long before it gets to me. How can I get the actual date value? How do I handle the java.time.* classes? Should this require the JavaTimeModule? Everything I have done so far is forcing a schema to be defined on the Java type.

…avro module to stay java 7 while supporting mapping of java.time out to their own serializers.
@jcustenborder
Copy link
Author

@cowtowncoder We are nearly there. I need your help with one more spot and I'm finished assuming there are no changes required for merging. I'm not sure how to override how java.util.Date is serialized. It seems like the serializer is selected before the AnnotationIntrospector has a change to pass of the serializer. To support timestamp-micros I need to convert this value to microseconds. I also want to grab the time part of the Date for time-micros and time-millis. I can't do this unless I am able to override their Serializers and Deserializers. Can you give me any pointers?

For completeness I have implemented all of the Avro Logical Types. I also created another module that will allow us to have java.time.* support for

@jcustenborder
Copy link
Author

Nevermind on the AnnotationIntrospector. I made a mistake and inherited from the wrong class.

@jcustenborder
Copy link
Author

@cowtowncoder I think I'm done. I have every logical type implemented with their java 7 and java 8 types. The build is failing for the java 8 portion. It's in a different module. Let me know if you want it moved someplace else. Please let me know what changes you need from here. Thanks for your help. It's much appreciated. I'm a huge fan of Jackson.

@cowtowncoder
Copy link
Member

@jcustenborder Excellent. I hope to get 2.9 release done soon, and after that could consider merging this. Java 8 issue is bit tricky since we'll really only require Java 8 with Jackson 3.x.
Although I could consider making Avro module Java 8 for 2.10, I don't know if it'd be easy to do it just for that, and keep other binary format modules Java 7.

I think what we could do is to merge Java 7 portions in 2.10 branch, but Java 8 ones only in master?

But I should probably have a look at code changes -- if changes should be safe for 2.9 (aside from Java 8 part), could just merge for 2.9, but announce changes for 2.10. That is, official addition of features for that.

@cowtowncoder
Copy link
Member

Ah ok. I should have looked at code a bit earlier, as I think misunderstood approach.

One thing I am not quite sure about is the addition of many new annotations. I was kind of assuming that these would not (always?) be needed, since:

  1. Avro schema has logical type information, and that should be exposed via introspection functionality so that encoder/decoder is aware of mapping
  2. POJOs also have type information, for databinding, to do coercions.

I could perhaps see the need for some cases if information is not otherwise sufficient (like override to decide between BYTES/FIXED encoding). And obviously things like precision are problematic for (2) (schema has information, Java type not).

If new annotations are needed, on the other hand, a smaller set -- perhaps just one annotation? -- would seem better; with value being Java type (BigDecimal, date/time types), and possibly another property in case Avro shape (bytes, fixed, string etc, as Enum) can be one of multiple alternatives. This could be problematic if some specific attributes are needed (precision), but at the same time it seems that adding yet another new annotation type for each logical type could become problem wrt maintenance.

@jcustenborder
Copy link
Author

@cowtowncoder Sorry about the delay. I was on vacation. What about moving to a single annotation with an enum for the avro type and logical type? Decimal is really the only weird one because it can be either a fixed or a bytes hence the implementation. My goal was to put something in place where the current code in the wild would still work as before but still support the new logical types. That's why I had so many annotations. I could reduce this all to a single annotation.

…ute for a single AvroType attribute that can handle all types.
@jcustenborder
Copy link
Author

@cowtowncoder I removed all of the attributes for a single attribute and deprecated AvroFixedSize. This will allow you to use a single AvroType attribute that can be used to create logical types and standard types such as a fixed. For example you can create a fixed with @AvroType(typeName = "FixedFieldBytes", fixedSize = 4, schemaType = Schema.Type.FIXED). A decimal with @AvroType(schemaType = Schema.Type.BYTES, logicalType = AvroType.LogicalType.DECIMAL, precision = 5). Let me know your thoughts about this one.

@jcustenborder
Copy link
Author

@cowtowncoder Sorry to bug you. Just wanted to close the gap on this one. Is this closer to what you were looking for?

@jcustenborder
Copy link
Author

@cowtowncoder Sorry to bug you again. Please let me know if you need more changes on this one or if you want me to break any more of it out. I'm also happy to assist on what's needed here for the 3.0 release.

@jcustenborder
Copy link
Author

@cowtowncoder Sorry to bug you again. Do you need me to do anything else with this pull?

@jcustenborder
Copy link
Author

@cowtowncoder Do you need me to do anything else with this pull?

AvroType logicalType = _findAnnotation(a, AvroType.class);
if (null != logicalType) {
switch (logicalType.logicalType()) {
case TIMESTAMP_MILLISECOND:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

timestamps have no timezone, explicit or implicit. it is more correct to model them using java.time.Instant. this can be converted into any XXXDateTime class by providing a timezone, or stating it is implicit - this information is not available from the encoded timestamp itself

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lukejackson That makes sense because AVRO stores everything without timezone as well. With jackson I was under the impression that I had to create a serializer and deserializer for each of the java.time.* classes. Is there another approach that I can take?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My suggestion would be to add java.time.Instant support - this is arguably the canonical representation of timestamps in Java 8+.

As for the other XXXDateTime types, I think there is an argument for them not to be supported, given there is no corresponding type in Avro.

If you do introduce a conversion, it will introduce information not on the wire (e.g. timezone) and also be lossy, given a date time with a time zone can map to two different timestamps during daylight savings time transitions.

I would argue that given these limitations, it should be left to the user to perform the conversion and understand their assumptions and associated limitations.

Note the closest Avro does have is its date and time-xxx logical types. To model a LocalDateTime in Avro requires two fields with these corresponding types. To model a ZonedDateTIme requires a third field with the zone id.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get what you are saying now. Not having Instant support was an oversight. My last comment I thought you were telling me to convert to instant and delegate that to a serializer. I'll add instant support. LocalDateTime and LocalDate would be the closest thing to the avro specification given that neither Local* or the avro specification include timezone information.

Copy link

@lukejackson lukejackson Aug 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was talking about a user of this library doing the conversion themselves, rather than Jackson itself.

I can see how LocalDate is the closest to the avro spec for date, but not sure I understand what you mean by LocalDateTime - I don't think there is anything in the avro spec that relates to that?

Don't want to sound nit picking, and appreciate your effort here. The concepts behind the date and time types are deceptively complicated. Given apparent recent stagnation in the Avro project (and there is a long standing PR to add java.time there), I am hoping that this Jackson module could be a workable alternative to core Avro for some of my use cases.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what I was thinking when I started this PR. The Local* time classes go not have timezone information so they translate perfectly to the corresponding avro logical types. Not having instant was just an oversight. I've already started on adding it to this pull. Here is what I'm thinking. Thinking about your comments on OffsetDateTime and ZonedDateTime. You are correct. I was planning on doing some lossy conversions to support these types. Alternatively I could create logical types for them. The point of logical types in Avro is to support things outside of the specification. To do this I could create a logical type called zoned-date, zoned-time-millis, zoned-time-micros, zoned-timestamp-millis, and zoned-timestamp-millis. This would allow me to store the timezone information along with the date/timestamp information.

This was my original thoughts.

Java Type Avro Logical Type Notes
LocalDate Date Lossless conversion
LocalTime time-millis, time-micros Lossless conversion
LocalDateTime timestamp-millis, timestamp-millis Lossless conversion
Instant timestamp-millis, timestamp-millis Lossless conversion
OffsetDateTime timestamp-millis, timestamp-millis Lossy conversion? Custom logical type? Timestamp adjusted to UTC.
ZonedDateTime timestamp-millis, timestamp-millis Lossy conversion? Custom logical type? Timestamp adjusted to UTC.
OffsetTime time-millis, time-micros Lossy conversion? Custom logical type? Timestamp adjusted to UTC.

Copy link

@lukejackson lukejackson Aug 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hadn't thought about adding new logical types to Avro.

I think the way forward depends on what you're looking to achieve. If it's a way to represent all the Java related time classes in Avro - i.e. the Java POJO is the canonical schema and the Avro representation is just derived from that - then I can understand you'd look for a way to encode them in Avro. There are some options:

  1. reusing another type which is close in meaning (e.g. timestamp-millis for LocalDateTime and have any consumers - which may not be this library - know that they should convert the timestamp into a ZonedDateTime with UTC time zone and convert to LocalDateTime)
  2. encoding the types as strings, and parsing them as such. I believe this is what Jackson does when encoding Java times to JSON.
  3. adding your own logical types. this would mean that any consumer - which again may not be this library - would also have to implement them.
  4. using the record type composed of types supported by Avro. this is what Jackson does when converting POJOs to JSON - they become JSON dictionaries. Avro records would be the equivalent here. e.g. LocalDateTime would be a record containing a date field and a time-millis/time-micros field.

If instead you are looking for a way to directly represent Avro records as Java classes - i.e. the Avro schema is the canonical schema and the Java POJOs are just derived from that - then you just need to support the Avro types and no more, and leave it to the user to define their schemas to model the data they want to encode (e.g. manually define a record representing local date and local time), or let them choose how to shoehorn the value into an existing Avro type (e.g. represent as a string, as a long, etc.)

For my use cases, the Avro schemas are the canonical schemas. As such I only use the Avro types. This is because I use Avro to communicate between applications written in multiple languages, such as Java, C++, Python, and I rely on Kafka tools that only support Avro types (e.g. Connect and its various sinks).

Where the Avro types don't support my use case I either use a different type (e.g. long for nano precision timestamps) or multiple fields (e.g. combination of date and time-millis for a local date time), and I'm happy to accept that these fields would appear as such in Java, C++, Python, my database tables, etc.

@lukejackson
Copy link

big +1 from me on getting this feature reviewed and merged

throw new IllegalStateException("Date type should not have any time fields set to non-zero values.");
}
long unixMillis = calendar.getTimeInMillis();
int output =(int) (unixMillis / MILLIS_PER_DAY);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: can use TimeUnit.MILLISECONDS.toDays(unixMillis) and avoid the MILLIS_PER_DAY constant


@Override
public void serialize(Date value, JsonGenerator jsonGenerator, SerializerProvider serializerProvider) throws IOException {
Calendar calendar = Calendar.getInstance(UTC);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to avoid Thai Buddhist or Japanese Imperial calendars, it may be safer to explicitly instantiate a GregorianCalendar here

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alternatively you can convert via Instant and ZonedDateTime, avoiding Calendar altogether

@jcustenborder
Copy link
Author

@marcospassos I'm not sure if I follow. LocalDateTime and Date are equivalents and are just storage of time since epoch which is UTC. The Avro logical type 'timestamp-millis' is milliseconds since epoch. There is no timezone information stored because Avro does not have a specification for storing timezone information.

@marcospassos
Copy link
Contributor

LocalDateTime and Date are equivalents and are just storage of time since epoch which is UTC.

They're not.

This class does not store or represent a time-zone. Instead, it is a description of the date, as used for birthdays, combined with the local time as seen on a wall clock. It cannot represent an instant on the time-line without additional information such as an offset or time-zone.

@jcustenborder
Copy link
Author

Technically you are correct. EPOCH is UTC and this pull follows what Avro itself does. It uses UTC for the time zone. For example

@marcospassos
Copy link
Contributor

That's precisely why it's wrong. Encoding a LocalDateTime in UTC is not serialization, but conversion. We should not make assumptions like that.

@jcustenborder
Copy link
Author

jcustenborder commented Sep 2, 2019 via email

@marcospassos
Copy link
Contributor

I think you missed my point. LocalDateTime cannot be stored as a timestamp, so it should not be mapped like so.

@jcustenborder
Copy link
Author

jcustenborder commented Sep 2, 2019 via email

@cowtowncoder
Copy link
Member

@jcustenborder I would recommend you read more about Java 8 "local" date/time types: they are NOT timestamps -- "zoned" variant (ZonedDateTime) is (or, OffsetDateTime).
Local here is bit odd naming convention as it could easily be thought to refer to date/time local to timezone but this is not true. Rather, it is "abstract" notion of specific date, or specific time (or both), but not bound to anything, not to timezone and not to UTC either. It must be bound to produce a timestamp. SO has this for example:

https://stackoverflow.com/questions/32437550/whats-the-difference-between-instant-and-localdatetime

But as to my earlier question: "The goal for this PR was to provide support for Avro Logical Types" is bit general as support can mean multiple things. What I was earlier assuming (incorrectly I think), was that it'd

  1. Use logical type information provided in schema to interpret encoded data to allow Jackson to read such Avro data into POJOs.

but I think this would not be doing that but rather

  1. Use additional annotation information to generate Avro Schema with logical type metadata.

That is fine, but leads to follow-up question of exactly where this should be supported. I am ok with format module adding such generation, although in some sense it might make sense to have separate "Avro schema generation" module, as there is one for JSON Schema module.
But at the same time I am not sure this separation works cleanly.

@jcustenborder
Copy link
Author

@cowtowncoder I think the disconnect is my comments are centered around binary compatibility with Avro data. My goal for this PR is to be able to write and write logical type data using Jackson that is compatible with Apache Avro.

We have had some of these discussions before. LocalDateTime is a timestamp without timezone information. The Avro library assumes that objects of LocalDateTime are UTC. The priority of this PR is to read and write data in a way that is compatible with Apache Avro.

@marcospassos
Copy link
Contributor

LocalDateTime is a timestamp without timezone information.

There is no such thing like "timestamp without timezone information." A timestamp is a point in the timeline, always in UTC. What Avro does is conversion, rather than serialization - and it's wrong, IMHO.

@jcustenborder
Copy link
Author

Conversion is fine given it's a known conversion. If we were talking about serialization in general I would agree with you. We are talking about conforming to a specific specification. Avro has a known specification for logical types. Both timestamp-millis and timestamp-micros state they are from UTC.

A timestamp-millis logical type annotates an Avro long, where the long stores the number of milliseconds from the unix epoch, 1 January 1970 00:00:00.000 UTC.

A timestamp-micros logical type annotates an Avro long, where the long stores the number of microseconds from the unix epoch, 1 January 1970 00:00:00.000000 UTC.

<scope>test</scope>
</dependency>
<!-- A bit of help to reduce boiler-plate in dummy test classes -->
<dependency>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick note: no, unfortunately I will not allow Lombok as a dependency (some usage did sneak in, and I have been busy removing it). Originally reason was that Lombok use prevented clean build out of cloned repo (some component had to be locally installed).


public abstract class BaseTimeJsonDeserializer<T> extends JsonDeserializer<T> {
private final TimeUnit resolution;
final ZoneId zoneId = ZoneId.of("UTC");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static?

@cowtowncoder
Copy link
Member

First of all, thank you for explanation of where mapping of LocalDateTime comes from.
I do not know Apache project they added such a misguided conversion, but I can see why for compatibility reasons it would make sense to follow that direction.
Still. To me that is just Wrong and it should only support Zoned- and/or Offset- variants, and for Local- representation should either be textual, or structured to use numeric components for year/month/day hour/minute/second.

Does Avro lib generate LocalDate / LocalTime / LocalDateTime in some form? And no Offset- variant at lest?
If not latter, why? That just simply makes no sense: it is perfectly fine to create OffsetDateTime, default to UTC.

I'm bit at loss here as I can see the issue but not good solution. I would not want to be propagating what seem like misguided processing of date/time values.


import com.fasterxml.jackson.core.Version;
import com.fasterxml.jackson.databind.introspect.Annotated;
import com.fasterxml.jackson.dataformat.avro.AvroAnnotationIntrospector;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will need to use different Java package if this remains as separate component, as Java Module system does not allow Split packages (classes for single Java package coming from multiple modules, that is, jars with module-info).
But this is work-in-progress so it's fine until packaging details decided.

if (null != logicalType) {
switch (logicalType.logicalType()) {
case TIMESTAMP_MILLISECOND:
if (a.getRawType().isAssignableFrom(LocalDateTime.class)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For serializers, should check isAssignableFrom (since subtype serialization probably fine); for deserializers fine (and perhaps preferable) to do equality check -- can not substitute base value to subtype property.


@Override
public T deserialize(JsonParser p, DeserializationContext ctxt) throws IOException {
final long input = p.getLongValue();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably add checking for other input types, give more meaningful error message in case underlying physical token is not a 64-bit long (schema mismatch)? And unit test for the same would be useful as well.

import java.time.temporal.ChronoUnit;
import java.util.concurrent.TimeUnit;

public abstract class BaseTimeJsonDeserializer<T> extends JsonDeserializer<T> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest extending StdScalarDeserializer instead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, on naming, since this is Avro-specific, maybe rather BaseTimeAvroDeserializer (or something).

import java.time.temporal.ChronoUnit;
import java.util.concurrent.TimeUnit;

public abstract class BaseTimeJsonSerializer<T> extends JsonSerializer<T> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As with deserializer, probably better extend StdScalarSerializer; change Json -> Avro

*/
int scale() default 0;

enum LogicalType {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Javadocs would be crucial here, to explain mapping details.

protected final static ScalarDecoder READER_LONG = new LongReader();
protected final static ScalarDecoder READER_NULL = new NullReader();
protected final static ScalarDecoder READER_STRING = new StringReader();
protected final static ScalarDecoder READER_BOOLEAN = new ScalarDecoder.BooleanDecoder();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These could be changed to factory methods I suppose?

import java.io.IOException;
import java.util.UUID;

public class AvroUUIDDeserializer extends JsonDeserializer<UUID> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there need to add this? Wouldn't standard UUIDDeserializer from jackson-databind work?
(it actually supports both textual and 16-byte binary value)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After playing with this a bit on 2.10, it looks like default serialization of java.util.UUID with Avro is unfortunately String. I can work on changing that in 2.11, but for 2.10 this would actually make sense I guess.

String.format("Cannot encode decimal with scale %s as scale %s.", decimal.scale(), scale)
);
}
jsonGenerator.writeBinary(decimal.unscaledValue().toByteArray());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works, but it'd probably be acceptable to just cast JsonGenerator as AvroGenerator if that simplifies things.

@cowtowncoder
Copy link
Member

On goal here: ok, yes. I can see that with serializer, deserializer, use of annotations it would implemented read/write support for Java 8 date/time types, using binary representations. And not just generate schema information.
Approach used makes sense, too, as Avro does not have any such metadata in binary level so it needs to come somewhere

I can probably come around wrt Local date/time types as well, with some more time.

But with that, some follow-up questions:

  1. (open question, thinking out aloud) is there any way to split this into pieces that can be merged first? Like changes to readers that is not specific to new logical datatypes (but will be needed)
  2. Would it make sense to directly register (de)serializers for types where there is nothing to configure? For example, for BigDecimal, simply have Module register serializer, deserializer, to use, overriding what jackson-databind would otherwise use?
    • In fact, even if annotation is required, (de)serializer's createContextual() could check that information (with help of AnnotationIntrospector, perhaps, or directly) to create differently configure (de)serializer
    • Or is there so much necessary metadata for Schema generation that @AvroType is necessary?
  3. Do you think any of binary representations are universal enough that default Jackson deserializers could use those -- as an example, UUIDDeserializer does accept 16-byte binary data as input and UUIDSerializer writes such content if (and only if) JsonGenerator.canWriteBinaryNatively() returns true -- this is used by Smile and CBOR backends

@cowtowncoder
Copy link
Member

Ohhhhh. Looks like Avro backend is NOT overriding JsonGenerator.canWriteBinaryNatively(). Will file an issue, fix. And see if I can write a test to show that handling works....
although this can have downside, too, if someone is counting on UUIDs (having to) be Strings.

@jcustenborder
Copy link
Author

Thank you for reviewing this! Much appreciated. I'll go back over your comments and get some updates to address your feedback. I really appreciate your help.

@cowtowncoder
Copy link
Member

@jcustenborder At this point, I think I have a good understanding of what would be needed here, at high level. But discussion might be easiest to have over email (and summarize here as needed).
I sent you an email note to see if address I had still works.

But for others benefit, here's what I think:

  1. I understand the goals, and I actually am ok with date/time parts: it is very important to document/describe it right, and if so, I think use and mappings actually do make sense, looking purely from Java direction (I won't go deeper into it here but happy elaborate if anyone interested)
  2. There are time constraints in getting 2.10 out by end of September, which requires decision soon on what to include
  3. I have some plans to tackle some aspects (but not all) of what is worked here, in bit more general way -- some issues are similar with CBOR and Protobuf format modules for example
  4. Since general changes (that I haven't done or fully flushed out yet) and specific proposed changes here likely overlap, some coordination would be good so that short-term specific ones do not block general changes (or vice versa)

So.... with that, I was thinking that it would be great to essentially.

  1. Not yet fully integrate this functionality in 2.10 (due to time constraints, and risks in rushing changes -- this is my fault, mostly)
  2. Provide enough of base changes to allow external implementation to be published, and be used with 2.10, to help guide 2.11 design
  3. Likely integrate rest of functionality (however divided across general functionality / Avro-specific handlers) in 2.11 (and 3.0).

The trick, really, is just to figure out sort of minimal set of changes needed in Avro module 2.10.
I can also share some ideas I have for logical types, but I hope to first agree on approach here.

Thank you everyone for working on getting this done: it is pretty ambitious but important undertaking, and once working, should open up new possibilities, beyond what I had thought practical.

@cowtowncoder
Copy link
Member

@jcustenborder Ok. After one more re-read, things much more clear. Changes to existing Avro backend make sense, the only possible caveat being UUID handlers, but even those would be fine I think (could improve later). Changes to add new module are something that I think should start as separate module, first, and then we could figure out how to package -- probably add as sub-module here in 2.11 or so.

Would it be possible to split this into two parts? I think that Avro-package changes are only related to internal API, not externally usable (... or if someone uses it, I think that'd be unsupported), so I would be ok with this going in one of 2.10.x patches if it does not make it in 2.10.0 (we only have 1 more day until that).
But I'd like to get that part in soon, if possible. And I'll create 2.11 branch relatively quickly after 2.10.0 release and we can iterate over some of the other changes.

@cowtowncoder
Copy link
Member

I think this is probably obsoleted by later work, closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants