Release v3.1.0 · databricks/spark-avro

⚠️ Important: If you are using Spark 1.x, then use v2.0.1 instead. 3.x releases of this library are only compatible with Spark 2.x.

The 3.1.0 release (which supports Spark 2.x) contains the following changes:

New Features:

Custom schema support:
- Support for user-defined Avro schemas when reading: using the avroSchema option, users can specify a custom Avro schema (as a JSON string) to use when reading Avro files (#160, #161). Default values specified in the Avro schema will be respected (#176, #195).
- Improved handling of custom schemas specified via .schema(): clarified handling of additional / missing fields to support more types of schema evolution (#155, #96).
- Together, these changes support several schema-evolution-related use-cases: #31, #165, #49.
Improved UNION support: previously, this library only supported union(int, long), union(float, double) and union(*, null), but as of this release it now support all other union types ("complex unions") by converting them into structs containing at most one non-null field (#108, #117).

Bug fixes:

Avro files are now splittable during reads: version 3.0 of this library broke the ability to read a large Avro file as multiple splits / partitions. As of 3.1.0, such splitting / partitioning is supported again (#179, #182).

Provide feedback