Skip to content

Releases: quixio/quix-streams

v3.6.1

17 Jan 14:29
a624bf3
Compare
Choose a tag to compare

What's Changed

⚠️ Fix the bug when creating a changelog topic set the cleanup.policy for the source topic to compact

Only topics created on the fly and repartition topics were affected. The configuration of existing topics is intact.

Please check the cleanup.policy for the topics used in the applications and adjust if necessary.

Introduced in v3.4.0.

Fixed by @quentin-quix in #716

Other changes

  • Influxdb3 Sink: add some functionality and QoL improvements by @tim-quix in #689
  • Bump types-protobuf from 5.28.3.20241030 to 5.29.1.20241207 by @dependabot in #683

Full Changelog: v3.6.0...v3.6.1

v3.6.0

15 Jan 14:40
1c43924
Compare
Choose a tag to compare

What's Changed

Main Changes

⚠️ Switch to "range" assignor strategy from "cooperative-sticky"

Due to discovered issues with the "cooperative-sticky" assignment strategy, commits made during the rebalancing phase were failing.
To avoid that, we changed the partition assignor to "range" which doesn't have such issues.
Note that "range" assignor is enforced for consumers used by Application, but it can be overridden for consumers created via app.get_consumer() API.

How to update:
Since "cooperative-sticky" and "range" strategies must not be mixed, all consumers in the group must first leave the group, and then rejoin it after upgrading the application to Quix Streams v3.6.0.

For more details, see #705 and #712

Other Changes

Docs

Full Changelog: v3.5.0...v3.6.0

v3.5.0

19 Dec 15:17
39ec91b
Compare
Choose a tag to compare

What's Changed

Features

Fixes

Docs

New Contributors

Full Changelog: v3.4.0...v3.5.0

v3.4.0

04 Dec 15:39
01de03e
Compare
Choose a tag to compare

What's Changed

Breaking changes💥

Prefix topic names with source__ for auto-generated source topics

By default, each Source provides a default topic by implementing the default_topic() method.
⚠️Since v3.4.0, the names of default topics are always prefixed with "source__" for better visibility across other topics in the cluster.
This doesn't apply when the topic is passed explicitly via app.dataframe(source, topic) or app.add_source(source, topic).

After upgrading to 3.4.0, the existing Sources using default topics will look for the topic with the new name on restart and create it if
doesn't exist.
To keep using the existing topics, pass the pre-configured Topic instance with the existing name and serialization config:

from quixstreams import Application

app = Application(...)
# Configure the topic instance to use it together with the Source
topic = app.topic("<existing topic name>", value_serializer=..., value_deserializer=..., key_serializer=..., key_deserializer=...)
source = SomeSource(...)

# To run Sources together with a StreamingDataFrame:
sdf = app.dataframe(source=source, topic=topic)

# or for running Sources stand-alone:
app.add_source(source=source, topic=topic)

by @daniil-quix in #651 #662

Features 🌱

Improvements 💎

Docs 📄

  • Remove the list of supported connectors from the Connectors docs. by @daniil-quix in #664

Other

Full Changelog: v3.3.0...v3.4.0

v3.3.0

19 Nov 11:22
9494d8d
Compare
Choose a tag to compare

What's Changed

New Connectors for Google Cloud

In this release, 3 new connectors have been added:

To learn more about them, see the respective docs pages.

Other updates

Full Changelog: v3.2.1...v3.3.0

v3.2.1

08 Nov 15:21
308c197
Compare
Choose a tag to compare

What's Changed

This is a bugfix release downgrading confluent-kafka to 2.4.0 because of the authentication issue introduced in 2.6.0.

Full Changelog: v3.2.0...v3.2.1

v3.2.0

07 Nov 12:51
ec72f97
Compare
Choose a tag to compare

What's Changed

[new] Sliding Windows

Sliding windows are overlapping time-based windows that advance with each incoming message rather than at fixed intervals like hopping windows.
They have a fixed 1 ms resolution, perform better, and are less resource-intensive than hopping windows with a 1 ms step.
Read more in Sliding Windows docs.

PR by @gwaramadze - #515

[new] FileSink and FileSource connectors

FileSink allows to batches of data to files on disk in JSON and Parquet formats.

FileSource enables processing data streams from JSON or Parquet files.
The resulting messages can be produced in "replay" mode, where the time between record producing is matched as close as possible to the original.

Learn more on File Sink and FileSource pages.

PRs:

[upd] Updated time tracking in windowed aggregations

In previous versions, Windowed aggregations were tracking time in the streams per topic-partition, but kept expiring them per keys.
It was not a fully consistent behavior, and it also created problems when processing data from misaligned producers.

For example, IoT and other physical devices may produce data at certain frequency, which results in misaligned data streams within one topic-partition, and more data is considered "late" and dropped from the processing.

To make the processing of such data more complete, Quix Streams now tracks event time per each message key in the windows.

PRs:

[upd] Updated CSVSource

Some breaking changes were made to CSVSource to make it easier to use:

  • It now accepts CSV files in arbitrary formats and produces each row as a message value, making it less opinionated about the data format.
  • It now requires the name to be passed directly. Previously, it was using the file name as a name of the source.
  • Message keys and timestamps can be extracted from the rows via key_extractor and timestamp_extractor params
  • Removed params key_serializer and value_serializer

PR by @daniil-quix in #602

Bug fixes

Dependencies

  • Update confluent-kafka requirement from <2.5,>=2.2 to >=2.6,<2.7 by @dependabot in #578

Docs

Full Changelog: v3.1.1...v3.2.0

v3.1.1

30 Oct 15:24
fafc2f0
Compare
Choose a tag to compare

What's Changed

Fixes

  • Fix topics management for apps connecting to the Quix brokers by @tim-quix in #594

Other

Full Changelog: v3.1.0...v3.1.1

v3.1.0

22 Oct 18:07
5aedce7
Compare
Choose a tag to compare

What's Changed

[NEW] Apache Iceberg sink

A new sink that writes batches of data to an Apache Iceberg table.

It serializes incoming data batches into Parquet format and appends them to the
Iceberg table, updating the table schema as necessary.

Currently, it supports Apache Iceberg hosted in AWS and AWS Glue data catalogs.

To learn more about the Iceberg sink, see the docs.

Added by @tomas-quix in #555

Docs

Dependencies

  • Update pydantic-settings requirement from <2.6,>=2.3 to >=2.3,<2.7 by @dependabot in #583
  • Bump testcontainers from 4.8.1 to 4.8.2 by @dependabot in #579

Misc

New Contributors

Full Changelog: v3.0.0...v3.1.0

v3.0.0

10 Oct 11:01
Compare
Choose a tag to compare

Quix Streams v3.0.0

Why the "major" version bump (v2.X --> v3.0)?

Quix Streams v3.0 brings branching and multiple topic consumption support, which changed some functionality under the hood. We want users to be mindful when upgrading to v3.0.

❗ Potential breaking change ❗ - Dropping Python v3.8 support:

Python v3.8 reaches End of Life in October 2024, so we are equivalently dropping support for Python v3.8.

We currently support Python v3.9 through v3.12.

❗ Potential breaking change ❗ - keyword arguments only for Application :

While not really a functional change (and most people are doing this anyway), v3.0 is going to enforce all arguments for Application to be keyword arguments rather than positional, so be sure to check this during your upgrade!

Previously (v2.X):
app = Application("localhost:9092")

Now (v3.0):
app = Application(broker_address="localhost:9092")

❗ Potential "data-altering" change ❗ - changelog topic name adjustment for "named" windows:

This change is primarily for accommodating windowing with branching.

If you have a windowed operation where the name parameter was provided (ex: sdf.tumbling_window(name=<NAME>), that topic naming scheme has been changed, meaning a new topic will be created and the window will temporarily be inaccurate since it will start from scratch.

It's important to note that this change will not cause an exception to be raised, so be aware!!

❗ Existing Sources and Sinks have been moved ❗

To accommodate the new structure in Connectors, we moved existing Sinks and Source to new modules.
To use them, you need to update the import paths:

  • InfluxDB3Sink -> quixstreams.sinks.core.influxdb3.InfluxDB3Sink
  • CSVSink -> quixstreams.sinks.core.csv.CSVSink
  • KafkaReplicatorSource -> quixstreams.sources.core.kafka.KafkaReplicatorSource
  • CSVSource -> quixstreams.sources.core.csv.CSVSource
  • QuixEnvironmentSource -> quixstreams.sources.core.kafka.QuixEnvironmentSource

v3.0 General Backwards compatibility with v2.X

v3.0 should otherwise be fully backwards compatible with any code working with 2.X (assuming no other breaking changes between 2.X versions you upgraded from) and should produce the same results. However, pay close attention to your apps after upgrading, just in case!

To learn more about the specifics of the underlying StreamingDataFrameassignment pattern adjustments along with some additional supplemental clarifications, check out the new assignment rules docs section which also highlights the differences between v2.X to v3.0 (in short: always re-assign your SDFs and you'll be good).

❗ Potential Breaking Changes (summarized) ❗

  • Dropping Support for Python v3.8
  • Topic naming change for explicitly named StreamingDataFrame Window operations.
  • Enforcement of keyword argument usage only for Application
  • Removal of deprecated Application.Quix() (can just use Application now)
  • Moved Sinks and Sources

🌱 New Features 🌱

  1. StreamingDataFrame Branching
  2. Consuming multiple topics per Application ("multiple StreamingDataFrames")
  3. Automatic StreamingDataFrame tracking (no arguments needed for Application.run())

1. StreamingDataFrame (SDF) Branching

Now SDF supports the ability to "branch" (or fork) them into multiple independent operations (no limits on amount).

Previously (v2.X), only linear operations were possible:

sdf
└── apply()
    └── apply()
        └── apply()
            └── apply()

But now (v3.0), things like this are possible:

sdf
└── apply()
    └── apply()
        ├── apply()
        │   └── apply()
        └── filter()  - (does following operations only to this filtered subset)
            ├── apply()
            ├── apply()
            └── apply()

Or, as an (unrelated) simple pseudo code-snippet form:

sdf_0 = app.dataframe().apply(func_a)
sdf_0 = sdf_0.apply(func_b)  # sdf_0 -> sdf_0: NOT a (new) branch
sdf_1 = sdf_0.apply(func_c)  # sdf_0 -> sdf_1: generates new branch off sdf_0
sdf_2 = sdf_0.apply(func_d)  # sdf_0 -> sdf_2: generates new branch off sdf_0

app.run()

What Branches enable:

  • Handle Multiple data formats/transformations in one Application
  • Conditional operations
    • ex: producing to different topics based on different criteria
  • Consolidating Applications that originally spanned multiple due to previous limitations

Limitations of Branching:

  • Cannot filter or column assign using two different branches together at once (see docs for more info)
  • Copies data for each branch, which can have performance implications (but may be better compared to running another Application).

To learn more, check out the in-depth branching docs.

2. Multiple Topic Consumption (multiple StreamingDataFrame).

Applications now support consuming multiple topics by initializing multiple StreamingDataFrame (SDF) with an Application:

from quixstreams import Application

app = Application("localhost:9092")
input_topic_a = app.topic("input_a")
input_topic_b = app.topic("input_b")
output_topic = app.topic("output")

sdf_a = app.dataframe(input_topic_a)
sdf_a = sdf_a.apply(func_x).to_topic(output_topic)

sdf_b = app.dataframe(input_topic_b)
sdf_b.update(func_y).to_topic(output_topic)

app.run()

Each SDF can then do any operations you could normally perform, including branching (but each SDF should be treated like the others do not exist).

Also, note they run concurrently (1 consumer that's subscribed to multiple topics), NOT in parallel.

3. Automatic StreamingDataFrame tracking

As a result of branching and multiple SDFs, it was necessary to automate the tracking of SDFs, so now you no longer need to provide any SDF when doing Application.run():

Previously (v2.X):

app = Application("localhost:9092")
sdf = app.dataframe(topic)
app.run(sdf)

Now (v3.0):

app = Application("localhost:9092")
sdf = app.dataframe(topic)
app.run()

💎 Enhancements 💎

  • Extensive Documentation improvements and additions

🦠 Bugfixes 🦠

  • Fix issue with handling of Quix Cloud topics where topic was being created with the workspace ID appended twice.
  • Overlapping window names now properly print a message saying how to solve it.

Full Changelog: v2.11.1...v3.0.0