Discussion:
[ANN] Lancaster 0.6.0 - Avro Schema Creation / Serialization / Deserialization
Chad Harrington
2018-11-18 02:56:58 UTC
Permalink
https://github.com/deercreeklabs/lancaster

Lancaster is an Apache Avro <http://avro.apache.org/docs/current/> library
for Clojure and ClojureScript. It aims to be fully compliant with the Avro
Specification <http://avro.apache.org/docs/current/spec.html>. Lancaster is
faster than JSON encoding / decoding and produces much smaller output. It
also supports Avro schema evolution
<http://avro.apache.org/docs/current/spec.html#Schema+Resolution>, allowing
your data formats to change over time without breaking things.

Issues and PRs are welcomed.

Chad Harrington
***@gmail.com
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to ***@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+***@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Alan Thompson
2018-11-18 05:34:49 UTC
Permalink
Looks nice.
Alan
Post by Chad Harrington
https://github.com/deercreeklabs/lancaster
Lancaster is an Apache Avro <http://avro.apache.org/docs/current/> library
for Clojure and ClojureScript. It aims to be fully compliant with the Avro
Specification <http://avro.apache.org/docs/current/spec.html>. Lancaster
is faster than JSON encoding / decoding and produces much smaller output.
It also supports Avro schema evolution
<http://avro.apache.org/docs/current/spec.html#Schema+Resolution>,
allowing your data formats to change over time without breaking things.
Issues and PRs are welcomed.
Chad Harrington
--
You received this message because you are subscribed to the Google Groups "Clojure" group.
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to ***@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+***@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Matching Socks
2018-11-18 13:13:40 UTC
Permalink
"Faster than JSON" sounds reasonable -- in the JVM -- inasmuch as Avro uses
a binary data format. But what about in Javascript? I ask because the
motivation for Transit-JSON was an observation that the Javascript
interpreters' built-in JSON parsers made binary formats uncompetitive. Has
the playing field been re-leveled in the 4 years or so since Transit was
conceived?
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to ***@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+***@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Gerard Klijs
2018-11-19 04:47:11 UTC
Permalink
The big penalty for the smaller size is that you need the scheme in order to deserialize the message. This can make some uses much more complex. For example when sending multiple types of messages, the client somehow needs to know which schema is used.
How Avro is used with the Confluent Schema registry this is solved by having a way to globally register schema's, and encode the global id in the message.
It looks similar to https://github.com/damballa/abracad which hasn't been updated in a while, but I use it in a pet project to use Avro from Closure. I might try replacing it.
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to ***@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+***@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Tommi Reiman
2018-11-19 05:37:01 UTC
Permalink
Looks good. Would be interesting to see the JVM Json perf tested also with https://github.com/metosin/jsonista.
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to ***@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+***@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Chad Harrington
2018-11-19 18:23:32 UTC
Permalink
Good idea. I will try to add that to the perf results sometime this week.

Chad Harrington
Post by Tommi Reiman
Looks good. Would be interesting to see the JVM Json perf tested also with
https://github.com/metosin/jsonista.
--
You received this message because you are subscribed to the Google Groups "Clojure" group.
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to ***@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+***@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Chad Harrington
2018-11-19 18:33:20 UTC
Permalink
It's definitely a tradeoff. In my experience, it's well worth it. For
messaging, I use a union of the possible message schemas. I wrote a library
(https://github.com/deercreeklabs/capsule) that makes messaging (events and
RPC) simple and easy. It uses Lancaster / Avro over WebSockets.
Unfortunately, it is not yet documented for public use. I hope to get to
that early in the new year.

For persisting data in storage, I store the fingerprint64 of the schema
along with the data, then have a lookup table of fingerprint64 to schemas.
Again, this works very well for me. Your mileage may vary.

Avro's compact size and Lancaster's speed make it very nice for messaging
and data storage. I also value having the data types validated at
serialization time. I love dynamic typing, but believe that types should be
checked at boundaries (messaging, storage, etc.). Lancaster makes it easy
to do that.

Chad Harrington
Post by Gerard Klijs
The big penalty for the smaller size is that you need the scheme in order
to deserialize the message. This can make some uses much more complex. For
example when sending multiple types of messages, the client somehow needs
to know which schema is used.
How Avro is used with the Confluent Schema registry this is solved by
having a way to globally register schema's, and encode the global id in the
message.
It looks similar to https://github.com/damballa/abracad which hasn't been
updated in a while, but I use it in a pet project to use Avro from Closure.
I might try replacing it.
--
You received this message because you are subscribed to the Google Groups "Clojure" group.
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to ***@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+***@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Chad Harrington
2018-11-19 18:36:54 UTC
Permalink
The JavaScript perf seems okay to me:
https://github.com/deercreeklabs/lancaster/blob/master/README.md#performance

I probably need to test Transit alongside the other cljs performance tests.
I will try to do more / better perf testing this week.

Thanks for your interest,

Chad Harrington
Post by Matching Socks
"Faster than JSON" sounds reasonable -- in the JVM -- inasmuch as Avro
uses a binary data format. But what about in Javascript? I ask because
the motivation for Transit-JSON was an observation that the Javascript
interpreters' built-in JSON parsers made binary formats uncompetitive. Has
the playing field been re-leveled in the 4 years or so since Transit was
conceived?
--
You received this message because you are subscribed to the Google Groups "Clojure" group.
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to ***@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+***@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Łukasz Korecki
2018-11-19 08:39:26 UTC
Permalink
This looks great! Thank you for sharing
Any plans for logical types[1] support? That's one of the biggest things
missing in Abracad imho


Łukasz

[1] - https://avro.apache.org/docs/1.8.0/spec.html#Logical+Types
Post by Chad Harrington
https://github.com/deercreeklabs/lancaster
Lancaster is an Apache Avro <http://avro.apache.org/docs/current/> library
for Clojure and ClojureScript. It aims to be fully compliant with the Avro
Specification <http://avro.apache.org/docs/current/spec.html>. Lancaster
is faster than JSON encoding / decoding and produces much smaller output.
It also supports Avro schema evolution
<http://avro.apache.org/docs/current/spec.html#Schema+Resolution>,
allowing your data formats to change over time without breaking things.
Issues and PRs are welcomed.
Chad Harrington
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to ***@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+***@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Chad Harrington
2018-11-19 18:21:59 UTC
Permalink
Hi Łukasz,
Logical types could certainly be added. Are you more interested in
arbitrary logical type support or the specific logical types defined in the
spec (Decimal, Date, Time, Timestamp, Duration)? Understanding your use
case will help with the design.

Thanks,

Chad Harrington
Post by Łukasz Korecki
This looks great! Thank you for sharing
Any plans for logical types[1] support? That's one of the biggest things
missing in Abracad imho
Łukasz
[1] - https://avro.apache.org/docs/1.8.0/spec.html#Logical+Types
Post by Chad Harrington
https://github.com/deercreeklabs/lancaster
Lancaster is an Apache Avro <http://avro.apache.org/docs/current/> library
for Clojure and ClojureScript. It aims to be fully compliant with the Avro
Specification <http://avro.apache.org/docs/current/spec.html>. Lancaster
is faster than JSON encoding / decoding and produces much smaller output.
It also supports Avro schema evolution
<http://avro.apache.org/docs/current/spec.html#Schema+Resolution>,
allowing your data formats to change over time without breaking things.
Issues and PRs are welcomed.
Chad Harrington
--
You received this message because you are subscribed to the Google Groups "Clojure" group.
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to ***@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+***@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Łukasz Korecki
2018-11-19 18:37:55 UTC
Permalink
Hi!

We're interested in the Date/Time/Timestamp part of the spec. A bit of
background: our RabbitMQ framework
(https://github.com/nomnom-insights/nomnom.bunnicula) supports pluggable
serialization and adopting Avro made our life easier. We're using it in
couple of RPC-over-HTTP calls and that also simplified a couple of things.
The downside of current approach is that we have to encode all timestamps
as long and encode in the application logic which attributes need to be
converted to DateTimes after deserializing (or the other way around).

Thanks,

Łukasz
Post by Chad Harrington
Hi Łukasz,
Logical types could certainly be added. Are you more interested in
arbitrary logical type support or the specific logical types defined in the
spec (Decimal, Date, Time, Timestamp, Duration)? Understanding your use
case will help with the design.
Thanks,
Chad Harrington
Post by Łukasz Korecki
This looks great! Thank you for sharing
Any plans for logical types[1] support? That's one of the biggest things
missing in Abracad imho
Łukasz
[1] - https://avro.apache.org/docs/1.8.0/spec.html#Logical+Types
Post by Chad Harrington
https://github.com/deercreeklabs/lancaster
Lancaster is an Apache Avro <http://avro.apache.org/docs/current/> library
for Clojure and ClojureScript. It aims to be fully compliant with the Avro
Specification <http://avro.apache.org/docs/current/spec.html>.
Lancaster is faster than JSON encoding / decoding and produces much smaller
output. It also supports Avro schema evolution
<http://avro.apache.org/docs/current/spec.html#Schema+Resolution>,
allowing your data formats to change over time without breaking things.
Issues and PRs are welcomed.
Chad Harrington
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
<javascript:>
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to ***@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+***@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Chad Harrington
2018-11-19 18:47:20 UTC
Permalink
I do the same thing with dates & times (convert them to a more
Avro-friendly form first.) I didn't want to couple clj-time and cljs-time
to the lancaster library, since not everyone uses those. I think the right
design is to allow an arbitrary logical type mechanism with user-specified
serializers / deserializers. Lancaster could provide default
implmentations where appropriate, but those could be overridden by the
user. I will think more about this. If you'd like, please open a GitHub
issue about this and we can discuss it further there. I think that is a
better forum for working out the details of the feature.

Thanks for your interest,

Chad Harrington
Post by Łukasz Korecki
Hi!
We're interested in the Date/Time/Timestamp part of the spec. A bit of
background: our RabbitMQ framework (
https://github.com/nomnom-insights/nomnom.bunnicula) supports pluggable
serialization and adopting Avro made our life easier. We're using it in
couple of RPC-over-HTTP calls and that also simplified a couple of things.
The downside of current approach is that we have to encode all timestamps
as long and encode in the application logic which attributes need to be
converted to DateTimes after deserializing (or the other way around).
Thanks,
Łukasz
Post by Chad Harrington
Hi Łukasz,
Logical types could certainly be added. Are you more interested in
arbitrary logical type support or the specific logical types defined in the
spec (Decimal, Date, Time, Timestamp, Duration)? Understanding your use
case will help with the design.
Thanks,
Chad Harrington
Post by Łukasz Korecki
This looks great! Thank you for sharing
Any plans for logical types[1] support? That's one of the biggest things
missing in Abracad imho
Łukasz
[1] - https://avro.apache.org/docs/1.8.0/spec.html#Logical+Types
Post by Chad Harrington
https://github.com/deercreeklabs/lancaster
Lancaster is an Apache Avro <http://avro.apache.org/docs/current/> library
for Clojure and ClojureScript. It aims to be fully compliant with the Avro
Specification <http://avro.apache.org/docs/current/spec.html>.
Lancaster is faster than JSON encoding / decoding and produces much smaller
output. It also supports Avro schema evolution
<http://avro.apache.org/docs/current/spec.html#Schema+Resolution>,
allowing your data formats to change over time without breaking things.
Issues and PRs are welcomed.
Chad Harrington
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
Note that posts from new members are moderated - please be patient with
your first post.
To unsubscribe from this group, send email to
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "Clojure" group.
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to ***@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+***@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Łukasz Korecki
2018-11-19 18:52:36 UTC
Permalink
Makes absolute sense 👍 (and I like the approach, it's similar to how
Cheshire handles it)
Post by Chad Harrington
I do the same thing with dates & times (convert them to a more
Avro-friendly form first.) I didn't want to couple clj-time and cljs-time
to the lancaster library, since not everyone uses those. I think the right
design is to allow an arbitrary logical type mechanism with user-specified
serializers / deserializers. Lancaster could provide default
implmentations where appropriate, but those could be overridden by the
user. I will think more about this. If you'd like, please open a GitHub
issue about this and we can discuss it further there. I think that is a
better forum for working out the details of the feature.
Thanks for your interest,
Chad Harrington
Post by Łukasz Korecki
Hi!
We're interested in the Date/Time/Timestamp part of the spec. A bit of
background: our RabbitMQ framework (
https://github.com/nomnom-insights/nomnom.bunnicula) supports pluggable
serialization and adopting Avro made our life easier. We're using it in
couple of RPC-over-HTTP calls and that also simplified a couple of things.
The downside of current approach is that we have to encode all timestamps
as long and encode in the application logic which attributes need to be
converted to DateTimes after deserializing (or the other way around).
Thanks,
Łukasz
Post by Chad Harrington
Hi Łukasz,
Logical types could certainly be added. Are you more interested in
arbitrary logical type support or the specific logical types defined in the
spec (Decimal, Date, Time, Timestamp, Duration)? Understanding your use
case will help with the design.
Thanks,
Chad Harrington
Post by Łukasz Korecki
This looks great! Thank you for sharing
Any plans for logical types[1] support? That's one of the biggest
things missing in Abracad imho
Łukasz
[1] - https://avro.apache.org/docs/1.8.0/spec.html#Logical+Types
Post by Chad Harrington
https://github.com/deercreeklabs/lancaster
Lancaster is an Apache Avro <http://avro.apache.org/docs/current/> library
for Clojure and ClojureScript. It aims to be fully compliant with the Avro
Specification <http://avro.apache.org/docs/current/spec.html>.
Lancaster is faster than JSON encoding / decoding and produces much smaller
output. It also supports Avro schema evolution
<http://avro.apache.org/docs/current/spec.html#Schema+Resolution>,
allowing your data formats to change over time without breaking things.
Issues and PRs are welcomed.
Chad Harrington
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
Note that posts from new members are moderated - please be patient with
your first post.
To unsubscribe from this group, send email to
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
<javascript:>
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to ***@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+***@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Loading...