Skip to content

Commit

Permalink
Merge remote-tracking branch 'origin/topic/awelzel/cluster-backends'
Browse files Browse the repository at this point in the history
* origin/topic/awelzel/cluster-backends:
  devel: Add notes about the ZeroMQ cluster backend implementation
  cluster: Add Cluster::publish()
  • Loading branch information
ckreibich committed Jan 6, 2025
2 parents 423885f + b4b6b99 commit 1e58775
Show file tree
Hide file tree
Showing 11 changed files with 388 additions and 2 deletions.
120 changes: 120 additions & 0 deletions devel/cluster-backend-zeromq.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
.. _cluster_backend_zeromq:

======================
ZeroMQ Cluster Backend
======================

.. versionadded:: 7.1

*Experimental*

Quickstart
==========

To switch a Zeek cluster with a static cluster layout over to use ZeroMQ
as cluster backend, add the following snippet to ``local.zeek``:

.. code-block:: zeek
@load frameworks/cluster/backend/zeromq/connect
Note that the function :zeek:see:`Broker::publish` will be non-functional
and a warning emitted when used - use :zeek:see:`Cluster::publish` instead.

By default, a configuration based on hard-coded endpoints and cluster layout
information is created. For more customization, refer to the module documentation
at :doc:`cluster/backend/zeromq/main.zeek </scripts/policy/frameworks/cluster/backend/zeromq/main.zeek>`.


Architecture
============

Publish-Subscribe of Zeek Events
--------------------------------

The `ZeroMQ <https://zeromq.org/>`_ based cluster backend uses a central
XPUB/XSUB broker for publish-subscribe functionality. Zeek events published
via :zeek:see:`Cluster::publish` are distributed by this central broker to
interested nodes.

.. figure:: /images/cluster/zeromq-pubsub.png


As depicted in the figure above, each cluster node connects to the central
broker twice, once via its XPUB socket and once via its XSUB socket. This
results in two TCP connections from every cluster node to the central broker.
This setup allows every node in the cluster to see messages from all other
nodes, avoiding the need for cluster topology awareness.

.. note::

Scalability of the central broker in production setups, but for small
clusters on a single node, may be fast enough.

On a cluster node, the XPUB socket provides notifications about subscriptions
created by other nodes: For every subscription created by any node in
the cluster, the :zeek:see:`Cluster::Backend::ZeroMQ::subscription` event is
raised locally on every other node (unless another node had created the same
subscription previously).

This mechanism is used to discover the existence of other cluster nodes by
matching the topics with the prefix for node specific subscriptions as produced
by :zeek:see:`Cluster::nodeid_topic`.

As of now, the implementation of the central broker calls ZeroMQ's
``zmq::proxy()`` function to forward messages between the XPUB and
XSUB socket.

While the diagram above indicates the central broker being deployed separately
from Zeek cluster nodes, by default the manager node will start and run this
broker using a separate thread. There's nothing that would prevent from running
a long running central broker independently from the Zeek cluster nodes, however.

The serialization of Zeek events is done by the selected
:zeek:see:`Cluster::event_serializer` and is independent of ZeroMQ.
The central broker needs no knowledge about the chosen format, it is
only shuffling messages between nodes.


Logging
-------

While remote events always pass through the central broker, nodes connect and
send log writes directly to logger nodes in a cluster. The ZeroMQ cluster backend
leverages ZeroMQ's pipeline pattern for this functionality. That is, logger nodes
(including the manager if configured using :zeek:see:`Cluster::manager_is_logger`)
open a ZeroMQ PULL socket to receive log writes. All other nodes connect their
PUSH socket to all available PULL sockets. These connections are separate from
the publish-subscribe setup outlined above.

When sending log-writes over a PUSH socket, load balancing is done by ZeroMQ.
Individual cluster nodes do not have control over the decision which logger
node receives log writes at any given time.

.. figure:: /images/cluster/zeromq-logging.png

While the previous paragraph used "log writes", a single message to a logger
node actually contains a batch of log writes. The options :zeek:see:`Log::flush_interval`
and :zeek:see:`Log::write_buffer_size` control the frequency and maximum size
of these batches.

The serialization format used to encode such batches is controlled by the
selected :zeek:see:`Cluster::log_serializer` and is independent of ZeroMQ.

With the default serializer (:zeek:see:`Cluster::LOG_SERIALIZER_ZEEK_BIN_V1`),
every log batch on the wire has a header prepended that describes it. This allows
interpretation of log writes even by non-Zeek processes. This opens the possibility
to implement non-Zeek logger processes as long as the chosen serializer format
is understood by the receiving process. In the future, a JSON lines serialization
may be provided, allowing easier interpretation than a proprietary binary format.


Summary
-------

Combining the diagrams above, the connections between the different socket
types in a Zeek cluster looks something like the following.

.. figure:: /images/cluster/zeromq-cluster.png

1 change: 1 addition & 0 deletions devel/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,3 +17,4 @@ transient, etc. compared to other documentation).
Documentation Guide </README.rst>
contributors
maintainers
cluster-backend-zeromq
23 changes: 21 additions & 2 deletions frameworks/cluster.rst
Original file line number Diff line number Diff line change
Expand Up @@ -329,10 +329,10 @@ function to publish events, including:
- Description
- Use

* - :zeek:see:`Broker::publish`
* - :zeek:see:`Cluster::publish`
- Publishes an event at a given topic
- Standard function to send an event to all nodes subscribed to a given
topic
topic.

* - :zeek:see:`Cluster::publish_hrw`
- Publishes an event to a node within a pool according to
Expand All @@ -345,6 +345,25 @@ function to publish events, including:
distribution strategy.
- Generally used inside Zeek for multiple logger nodes.

* - :zeek:see:`Broker::publish`
- Publishes an event at a given topic
- Standard function to send an event to all nodes subscribed to a given
topic.

Starting with Zeek 7.1, this function should only be used in
Broker-specific scripts. Use :zeek:see:`Cluster::publish` otherwise.


.. note::

The ``Cluster::publish`` function was added in Zeek 7.1. In contrast to
``Broker:publish``, it publishes events even when a non-Broker cluster
backend is in use. Going forward, ``Cluster:publish`` should be preferred
over ``Broker::publish``, unless the script is specific to the Broker backend,
e.g. when interacting with an external application using native Python
bindings for Broker.


An example sending an event from worker to manager:

.. code-block:: zeek
Expand Down
6 changes: 6 additions & 0 deletions images/cluster/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
MMDC?=./node_modules/.bin/mmdc

%.png : %.mermaid
$(MMDC) -i $< -e png -o $@

all: zeromq-cluster.png zeromq-pubsub.png zeromq-logging.png
24 changes: 24 additions & 0 deletions images/cluster/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
## Install mermaid-cli

npm install @mermaid-js/mermaid-cli

## Apparmor Errors

If running ``mmdc`` fails under Linux (e.g. with Ubuntu 24.04) with apparmor
errors about ``userns_create`` in the ``demsg`` output, put the following into
``/etc/apparmor.d/chrome-headless``

# This profile allows everything and only exists to give the
# application a name instead of having the label "unconfined"
abi <abi/4.0>,
include <tunables/global>

profile chrome /home/awelzel/.cache/puppeteer/**/chrome-headless-shell flags=(unconfined) {
userns,

# Site-specific additions and overrides. See local/README for details.
include if exists <local/chrome>
}


See also: https://chromium.googlesource.com/chromium/src/+/main/docs/security/apparmor-userns-restrictions.md#option-2_a-safer-way
95 changes: 95 additions & 0 deletions images/cluster/zeromq-cluster.mermaid
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
graph TD

w1-xpub-->broker-xsub
broker-xpub-->w1-xsub

w2-xpub-->broker-xsub
broker-xpub-->w2-xsub

w3-xpub-->broker-xsub
broker-xpub-->w3-xsub

p1-xpub-->broker-xsub
broker-xpub-->p1-xsub

p2-xpub-->broker-xsub
broker-xpub-->p2-xsub

l1-xpub-->broker-xsub
broker-xpub-->l1-xsub

l2-xpub-->broker-xsub
broker-xpub-->l2-xsub

m-xpub-->broker-xsub
broker-xpub-->m-xsub

%% Logging
w1-push-->l1-pull
w1-push-->l2-pull
w2-push-->l1-pull
w2-push-->l2-pull
w3-push-->l1-pull
w3-push-->l2-pull
p1-push-->l1-pull
p1-push-->l2-pull
p2-push-->l1-pull
p2-push-->l2-pull
m-push-->l1-pull
m-push-->l2-pull

subgraph broker ["broker"]
broker-xpub((XPUB))
broker-xsub((XSUB))
broker-xpub-->broker-xsub
broker-xsub-->broker-xpub
end


subgraph l1 [logger-1]
l1-xpub((XPUB))
l1-xsub((XSUB))
l1-pull(PULL)
end

subgraph l2 [logger-2]
l2-xpub((XPUB))
l2-xsub((XSUB))
l2-pull(PULL)
end

subgraph manager
m-xpub((XPUB))
m-xsub((XSUB))
m-push(PUSH)
end

subgraph p1 [proxy-1]
p1-xpub((XPUB))
p1-xsub((XSUB))
p1-push(PUSH)
end

subgraph p2 [proxy-2]
p2-xpub((XPUB))
p2-xsub((XSUB))
p2-push(PUSH)
end

subgraph w1 [worker-1]
w1-xpub((XPUB))
w1-xsub((XSUB))
w1-push(PUSH)
end

subgraph w2 [worker-2]
w2-xpub((XPUB))
w2-xsub((XSUB))
w2-push(PUSH)
end

subgraph w3 [worker-3]
w3-xpub((XPUB))
w3-xsub((XSUB))
w3-push(PUSH)
end
Binary file added images/cluster/zeromq-cluster.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
49 changes: 49 additions & 0 deletions images/cluster/zeromq-logging.mermaid
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
flowchart TD

%% Logging
w1-push-->l1-pull
w1-push-->l2-pull
w2-push-->l1-pull
w2-push-->l2-pull
w3-push-->l1-pull
w3-push-->l2-pull
p1-push-->l1-pull
p1-push-->l2-pull
p2-push-->l1-pull
p2-push-->l2-pull
m-push-->l1-pull
m-push-->l2-pull

subgraph l1 [logger-1]
l1-pull(PULL)
end

subgraph l2 [logger-2]
l2-pull(PULL)
end

subgraph m [manager]
m-push(PUSH)
end

subgraph p1 [proxy-1]
p1-push(PUSH)
end

subgraph p2 [proxy-2]
p2-push(PUSH)
end

subgraph w1 [worker-1]
w1-push(PUSH)
end

subgraph w2 [worker-2]
w2-push(PUSH)
end

subgraph w3 [worker-3]

w3-push(PUSH)
end

Binary file added images/cluster/zeromq-logging.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit 1e58775

Please sign in to comment.