Skip to content
This repository has been archived by the owner on Dec 20, 2018. It is now read-only.

Avro data from DStream #201

Open
lemanuel opened this issue Dec 12, 2016 · 3 comments
Open

Avro data from DStream #201

lemanuel opened this issue Dec 12, 2016 · 3 comments

Comments

@lemanuel
Copy link

lemanuel commented Dec 12, 2016

Is there a possibility to make a dataframe from generic avro records that are in a DStream?
In the tests I have seen something like writing each rdd to a temp file and then read it back with spark-avro, but I do not want to add another step into the process.

@cbyn
Copy link

cbyn commented Feb 9, 2017

You can do this with DStream.foreachRDD { rdd => df = rdd.toDF ... } using the code in #216.

@ananth3010
Copy link

Could you please confirm if the below approach is right? I am not able to create a DF after pulling your code.

DStream.foreachRDD { rdd => df = rddToDataFrame(rdd) }

@cbyn
Copy link

cbyn commented Jun 8, 2017

I implemented it as an implicit on RDD[GenericRecord]. If you import RddUtils.RddToDataFrame then you can call toDF on the RDD as I posted above.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants