Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Placeholder for Maven repo inside Iglu Server #88

Open
alexanderdean opened this issue Nov 16, 2015 · 12 comments
Open

Placeholder for Maven repo inside Iglu Server #88

alexanderdean opened this issue Nov 16, 2015 · 12 comments
Assignees
Milestone

Comments

@alexanderdean
Copy link
Member

alexanderdean commented Nov 16, 2015

To host POJOs, Scala case classes, Clojure Schemas auto-generated from JSONs, Thrifts, Avros etc

@chuwy
Copy link
Contributor

chuwy commented Jul 1, 2016

Initial draft

Not exact Maven repo, but rather "Creating Scala classes from JSON Schemas", but it was referenced from internal issues tracker.

Few projects to explore:

For me SBT Datatype looks like most promising (especially taking in account its origin). I think I will explore what can we do with it. Others libs listed here just for some technical details.

Definition generation

SBT Datatype uses format other than JSON Schema, but fairly straightforward. We can generate it the same way as we do with Redshift DDL. So, for example, having following JSON Schema (for iglu:com.acme/example/jsonschema/1-0-0):

{
  "type": "object",
  "properties": {
    "firstLevel": {
      "type": "integer"
    },
    "nested": {
      "type": "object",
      "properties": {
        "nestedInt": {
          "type": "integer"
        }
      },
      "additionalProperties": false
    }
  },
  "additionalProperties": false
}

We can generate following SBT Datatype definition:

{
  "types": [
    {
      "name": "Example$Nested",
      "type": "record",
      "target": "Scala",
      "fields": [
        {
          "name": "nestedInt",
          "type": "Int"
        }
      ]
    },
    {
      "name": "Example",
      "type": "record",
      "target": "Scala",
      "fields": [
        {
          "name": "firstLevel",
          "type": "String"
        },
        {
          "name": "nested",
          "type": "Example$Nested"
        }
      ]
    }
  ]
}

Adding $ to class name should help us avoid namespace collisions (nested can be defined on many objects). Following should in the end generate case class-like classes (with toString, hashCode, companion object etc, but without unapply):

final class Example$Nested(val nestedInt: Int) extends Serializable
final class Example(val someInt: Int, val nested: Example$Nested) extends Serializable

So we could write following in type-safe way:

example.nested.nestedInt 

Still need to make many decisions about dynamic-json-to-static-scala correspondence, but some simple cases should work.

Iglu integration

So, assuming above will work, we need to:

  1. Include registry into sbt project
  2. reference generated classes in code in some Iglu-compatible way (optional)
  3. parse plain JSON into generated classes

I'm trying to design it assuming as few non-existing features as I can in SBT Datatype. So I'm going to mark everything we cannot do with it (we can fork or PR of course, but not sure they're going to include anything Iglu-specific)

Including into project

Let's assume we want to create SqlQueryEnrichmentConfig class from Iglu Central in Scala Common Enrich.

a. Enable SBT Datatype plugin in SCE's plugins.sbt
b. Run igluctl against Iglu Central JSON Schemas to generate SBT definitions in sbt-datatype directory (along with schemas, ddl etc) in Iglu registry
c. Release Iglu registry to Maven repository
d. Include it as a dependency: "com.snowplow" %% "iglu-central" % "58", so it can be embedded registry (assuming it is ok to publish projects only with resources)
e. Set datatypeSource in generateDatatypes := file("resources/sbt-datatype"). Not sure if SBT Datatype can do it for third-party projects.

For now I'm really unsure only about last one, everything else should work on this step.

Reference and parse JSONs in Iglu-compatible way

It is a bit trickier. It's definitely possible to create some macro flavor to access it using Iglu URI as a string (but it looks like overcomplicated way):

import com.snowplowanalytics.iglu.registry // macro for access to class by string and containing serializers for parse

val json: JValue = ???

// otherwise it can be shapeless-like problem, when type name twice as long as its value
type SqlQueryEnrichmentConfig = 
  registry
  .schema("iglu:com.snowplowanalytics.snowplow.enrichments/sql_query_enrichment_config/jsonschema/1-0-0")
  .OutType

// it won't compile if corresponding URI hasn't been found
val enrichment: Either[String, SqlQueryEnrichmentConfig] = 
  registry
  .schema("iglu:com.snowplowanalytics.snowplow.enrichments/sql_query_enrichment_config/jsonschema/1-0-0") // jsonschema? Or sbt-datatype?
  .parse(json)

/cc @alexanderdean

@chuwy
Copy link
Contributor

chuwy commented Jul 1, 2016

But actually, original idea with other way round (with Maven inside Iglu, not Iglu on Maven) has its own clear benefits.

@alexanderdean
Copy link
Member Author

Very interesting approach @chuwy ! Looking forward to mulling it some more...

@alexanderdean
Copy link
Member Author

alexanderdean commented Jul 3, 2016

Having thought about it some more: while the idea of making e.g. Iglu Central embeddable inside an app is interesting, one of the flaws is that it depends on a versioning scheme for a registry which doesn't really exist: "com.snowplow" %% "iglu-central" % "58". The R58 there has no semantic meaning - it's just an artifact of the fact that we are using git with formal "releases" to back Iglu Central. The versioning inside an Iglu registry is all at the schema level, so really a developer would want to pull in a dependency like:

"com.mandrill" %% "message_opened_1" % "0.0"

where this corresponds to com.mandrill/message_opened/jsonschema/1-0-0.

@chuwy
Copy link
Contributor

chuwy commented Jul 3, 2016

Releasable Registry with explicitly defined milestones is more or less proven against current patching approach.

We can use it without milestones only if we're going to abandon patching after Open Versioning is embraced. I cannot see if open versioning can really help us with patching.

@chuwy
Copy link
Contributor

chuwy commented Jul 3, 2016

To elaborate:

case class Example(foo: Integer)

After patching can easily become:

case class Example(foo: Option[Integer])

Which is a huge problem for both releasable and unreleasable (as it is binary and source incompatible), but having explicit milestone we can at least see how exact Schema was look like in some milestone.

@alexanderdean
Copy link
Member Author

The problem is that releasable registries is just a convention, it's not an intrinsic part of Iglu - a GitHub tag is not first class in any way in an Iglu registry. Even if it were, it's a very clunky level of indirection - "I want to reference Mandrill schema blah in my app, which GitHub tag do I need to cite to get that?"

Schema patching is easily handled like this:

"com.mandrill" %% "message_opened_1" % "0.0.4"

where this corresponds to com.mandrill/message_opened/jsonschema/1-0-0, 4th patch of the schema.

@chuwy
Copy link
Contributor

chuwy commented Jul 3, 2016

Patch approach looks good for me. Not that I really like idea that minor version can introduce source/binary incompatibilities, but for now it is probably best we have.

@chuwy
Copy link
Contributor

chuwy commented Jul 3, 2016

And where patch is defined? If we're going to do it manually - we'll need some sort of release as well?

@alexanderdean
Copy link
Member Author

It feels like we are going to have to make patches first class inside an Iglu registry - i.e. for a given schema you can see which patch release it is currently...

@chuwy
Copy link
Contributor

chuwy commented Jul 3, 2016

Yep, feels like that was going to happen anyway. These patches can be too important sometimes to just drop this information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants