Skip to content

v3.0.0

Compare
Choose a tag to compare
@tim-quix tim-quix released this 10 Oct 11:01
· 116 commits to main since this release

Quix Streams v3.0.0

Why the "major" version bump (v2.X --> v3.0)?

Quix Streams v3.0 brings branching and multiple topic consumption support, which changed some functionality under the hood. We want users to be mindful when upgrading to v3.0.

❗ Potential breaking change ❗ - Dropping Python v3.8 support:

Python v3.8 reaches End of Life in October 2024, so we are equivalently dropping support for Python v3.8.

We currently support Python v3.9 through v3.12.

❗ Potential breaking change ❗ - keyword arguments only for Application :

While not really a functional change (and most people are doing this anyway), v3.0 is going to enforce all arguments for Application to be keyword arguments rather than positional, so be sure to check this during your upgrade!

Previously (v2.X):
app = Application("localhost:9092")

Now (v3.0):
app = Application(broker_address="localhost:9092")

❗ Potential "data-altering" change ❗ - changelog topic name adjustment for "named" windows:

This change is primarily for accommodating windowing with branching.

If you have a windowed operation where the name parameter was provided (ex: sdf.tumbling_window(name=<NAME>), that topic naming scheme has been changed, meaning a new topic will be created and the window will temporarily be inaccurate since it will start from scratch.

It's important to note that this change will not cause an exception to be raised, so be aware!!

❗ Existing Sources and Sinks have been moved ❗

To accommodate the new structure in Connectors, we moved existing Sinks and Source to new modules.
To use them, you need to update the import paths:

  • InfluxDB3Sink -> quixstreams.sinks.core.influxdb3.InfluxDB3Sink
  • CSVSink -> quixstreams.sinks.core.csv.CSVSink
  • KafkaReplicatorSource -> quixstreams.sources.core.kafka.KafkaReplicatorSource
  • CSVSource -> quixstreams.sources.core.csv.CSVSource
  • QuixEnvironmentSource -> quixstreams.sources.core.kafka.QuixEnvironmentSource

v3.0 General Backwards compatibility with v2.X

v3.0 should otherwise be fully backwards compatible with any code working with 2.X (assuming no other breaking changes between 2.X versions you upgraded from) and should produce the same results. However, pay close attention to your apps after upgrading, just in case!

To learn more about the specifics of the underlying StreamingDataFrameassignment pattern adjustments along with some additional supplemental clarifications, check out the new assignment rules docs section which also highlights the differences between v2.X to v3.0 (in short: always re-assign your SDFs and you'll be good).

❗ Potential Breaking Changes (summarized) ❗

  • Dropping Support for Python v3.8
  • Topic naming change for explicitly named StreamingDataFrame Window operations.
  • Enforcement of keyword argument usage only for Application
  • Removal of deprecated Application.Quix() (can just use Application now)
  • Moved Sinks and Sources

🌱 New Features 🌱

  1. StreamingDataFrame Branching
  2. Consuming multiple topics per Application ("multiple StreamingDataFrames")
  3. Automatic StreamingDataFrame tracking (no arguments needed for Application.run())

1. StreamingDataFrame (SDF) Branching

Now SDF supports the ability to "branch" (or fork) them into multiple independent operations (no limits on amount).

Previously (v2.X), only linear operations were possible:

sdf
└── apply()
    └── apply()
        └── apply()
            └── apply()

But now (v3.0), things like this are possible:

sdf
└── apply()
    └── apply()
        ├── apply()
        │   └── apply()
        └── filter()  - (does following operations only to this filtered subset)
            ├── apply()
            ├── apply()
            └── apply()

Or, as an (unrelated) simple pseudo code-snippet form:

sdf_0 = app.dataframe().apply(func_a)
sdf_0 = sdf_0.apply(func_b)  # sdf_0 -> sdf_0: NOT a (new) branch
sdf_1 = sdf_0.apply(func_c)  # sdf_0 -> sdf_1: generates new branch off sdf_0
sdf_2 = sdf_0.apply(func_d)  # sdf_0 -> sdf_2: generates new branch off sdf_0

app.run()

What Branches enable:

  • Handle Multiple data formats/transformations in one Application
  • Conditional operations
    • ex: producing to different topics based on different criteria
  • Consolidating Applications that originally spanned multiple due to previous limitations

Limitations of Branching:

  • Cannot filter or column assign using two different branches together at once (see docs for more info)
  • Copies data for each branch, which can have performance implications (but may be better compared to running another Application).

To learn more, check out the in-depth branching docs.

2. Multiple Topic Consumption (multiple StreamingDataFrame).

Applications now support consuming multiple topics by initializing multiple StreamingDataFrame (SDF) with an Application:

from quixstreams import Application

app = Application("localhost:9092")
input_topic_a = app.topic("input_a")
input_topic_b = app.topic("input_b")
output_topic = app.topic("output")

sdf_a = app.dataframe(input_topic_a)
sdf_a = sdf_a.apply(func_x).to_topic(output_topic)

sdf_b = app.dataframe(input_topic_b)
sdf_b.update(func_y).to_topic(output_topic)

app.run()

Each SDF can then do any operations you could normally perform, including branching (but each SDF should be treated like the others do not exist).

Also, note they run concurrently (1 consumer that's subscribed to multiple topics), NOT in parallel.

3. Automatic StreamingDataFrame tracking

As a result of branching and multiple SDFs, it was necessary to automate the tracking of SDFs, so now you no longer need to provide any SDF when doing Application.run():

Previously (v2.X):

app = Application("localhost:9092")
sdf = app.dataframe(topic)
app.run(sdf)

Now (v3.0):

app = Application("localhost:9092")
sdf = app.dataframe(topic)
app.run()

💎 Enhancements 💎

  • Extensive Documentation improvements and additions

🦠 Bugfixes 🦠

  • Fix issue with handling of Quix Cloud topics where topic was being created with the workspace ID appended twice.
  • Overlapping window names now properly print a message saying how to solve it.

Full Changelog: v2.11.1...v3.0.0