Skip to main content

How does Zeebe behave with NFS

· 13 min read
Christopher Kujawa
Chaos Engineer @ Zeebe

This week, we (Lena, Nicolas, Roman, and I) held a workshop where we looked into how Zeebe behaves with network file storage (NFS).

We ran several experiments with NFS and Zeebe, and messing around with connectivity.

TL;DR; We were able to show that NFS can handle certain connectivity issues, just causing Zeebe to process slower. IF we completely lose the connection to the NFS server, several issues can arise, like IOExceptions on flush (where RAFT goes into inactive mode) or SIGBUS errors on reading (like replay), causing the JVM to crash.

Setup

Note:

You can skip this section if you're not interested in how we set up the NFS server

For our experiments, we want to have a quick feedback loop and small blast radius, meaning avoiding using K8, or any other cloud services. The idea was to set up a NFS server via docker, and mess with the network, to cause NFS errors.

Run NFS Docker Container

After a smaller research we were able to find a project, that provides us a NFS server docker image.

This can be run via:

sudo podman run \
# Needs privileged access for setting up the exports rule, etc\
--privileged
# Mounting a local directory as volume into the container
-v /home/cqjawa/nfs-workshop/nfs:/mnt/data:rw \
# expose the NFS por
-p 2049:2049t \
# Allowing the local host IP to access the NFS server
-e NFS_SERVER_ALLOWED_CLIENTS=10.88.0.0/12 \
# Enable DEBUG LOGS
-e NFS_SERVER_DEBUG=1 \
ghcr.io/normal-computing/nfs-server:latest

Mount the NFS to local file storage

To use the NFS server and make it available to our Zeebe container, we first have to mount it via the NFS client.

This can be done via:

sudo mount -v -t nfs4 \
-o proto=tcp,port=2049,soft,timeo=10 \
localhost:/ \
~/nfs-workshop/nfs-client-mount/
  • -v verbose
  • -t file system type: tells the client to use NFS4
  • -o Options for the mount: proto=tcp,port=2049,soft,timeo=10
    • Protocol options, like transport via tcp, port to be used, soft mount to make sure to retry on unavailability and not block, timeout after 10s

Run the Zeebe Container

After we mounted the NFS to our local filesystem, we can start our Zeebe container.

 podman run -d \
-v /home/cqjawa/nfs-workshop/nfs-client-mount/:/usr/local/zeebe/data \
-p 26500:26500 \
-p 9600:9600 \
gcr.io/zeebe-io/zeebe:8.7.5-root

This is mounting our NFS mounted directory into the container as the data directory for the Zeebe container.

Running load

For simplicity, we used zbctl to start some load. As a first step, we had to deploy a process model.

 zbctl --insecure deploy one_task.bpmn 

This was using the one_task.bpmn from go-chaos/.

Creating instances in a loop:

while [[ true ]];
do
zbctl --insecure \
create instance 2251799813685250;
sleep 5;
done

Running worker:

 zbctl --insecure \
create worker "benchmark-task" \
--handler "echo {\"result\":\"Pong\"}"

Chaos Experiment - Use iptables with containerized NFS Server

We wanted to disrupt the NFS connections with iptables and cause some errors.

Expected

We can drop packages with iptables, and we can observe errors in the Zeebe container logs.

Actual

Setting up the following iptables rule should allow us to disrupt the NFS connection, but it didn't worked.

sudo iptables -A OUTPUT -p tcp --dport 2049 --sport 2049 -d localhost -j DROP

At the end we were setting up a lots of different rules, but nothing seem to work.

Every 1.0s: sudo iptables -L -v                                                                                                             cq-p14s: Thu Jun 12 16:01:28 2025

Chain INPUT (policy ACCEPT 6090K packets, 11G bytes)
pkts bytes target prot opt in out source destination
0 0 DROP tcp -- any any anywhere cq-p14s tcp dpt:nfs
0 0 DROP tcp -- any any anywhere localhost tcp dpt:nfs
0 0 DROP tcp -- any any anywhere localhost tcp dpt:nfs
0 0 DROP tcp -- any any anywhere localhost tcp spt:nfs dpt:nfs
0 0 DROP tcp -- any any anywhere localhost tcp spt:nfs dpt:nfs
0 0 DROP tcp -- any any anywhere 10.0.88.5 tcp spt:nfs dpt:nfs
0 0 DROP tcp -- any any anywhere 10.0.88.1 tcp spt:nfs dpt:nfs
0 0 DROP tcp -- any any anywhere 10.88.0.5 tcp spt:nfs dpt:nfs
0 0 DROP tcp -- any any anywhere cq-p14s tcp spt:nfs dpt:nfs
0 0 DROP tcp -- any any anywhere anywhere tcp spt:nfs dpt:nfs

Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination

Chain OUTPUT (policy ACCEPT 6182K packets, 22G bytes)
pkts bytes target prot opt in out source destination
0 0 DROP tcp -- any any anywhere localhost tcp dpt:nfs
0 0 DROP tcp -- any any anywhere localhost tcp dpt:nfs
0 0 DROP tcp -- any any anywhere localhost tcp dpt:nfs
0 0 DROP tcp -- any any anywhere 0.0.0.0 tcp dpt:nfs
0 0 DROP tcp -- any any anywhere cq-p14s tcp dpt:nfs
0 0 DROP tcp -- any any anywhere localhost tcp spt:nfs dpt:nfs
0 0 DROP tcp -- any any anywhere localhost tcp spt:nfs dpt:nfs
0 0 DROP tcp -- any any anywhere 11.0.88.5 tcp spt:nfs dpt:nfs
0 0 DROP tcp -- any any anywhere 10.0.88.5 tcp spt:nfs dpt:nfs
0 0 DROP tcp -- any any anywhere 0.0.0.0 tcp spt:nfs dpt:nfs
0 0 DROP tcp -- any any anywhere 10.0.88.1 tcp spt:nfs dpt:nfs
0 0 DROP tcp -- any any anywhere 10.88.0.5 tcp spt:nfs dpt:nfs
0 0 DROP tcp -- any any anywhere cq-p14s tcp spt:nfs dpt:nfs
0 0 DROP tcp -- any any anywhere anywhere tcp spt:nfs dpt:nfs

We even suspended the NFS server, via docker pause. We were able to observe that data was still synced between directories.

This was some indication for us, that the kernel might do some magic behind the scenes, and the NFS server didn't worked as we expected it to.

Chaos Experiment - Use iptables with an external NFS Server

As we were not able to disrupt the network, we thought it might make sense to externalize the NFS server (to a different host).

Setup external NFS

We followed this guide, to set up a NFS server running on a different machine.

Mount external NFS

The mounting was quite similar to before, now using a different host

sudo  mount -v -t nfs4 -o proto=tcp,port=2049,soft,timeo=10 192.168.24.110:/ ~/nfs-workshop/nfs-client-mount/

Run Zeebe Container

The same for running the Zeebe container.

podman run -d -v /home/cqjawa/nfs-workshop/nfs-client-mount/srv/nfs/:/usr/local/zeebe/data -p 26500:26500 -p 9600:9600 gcr.io/zeebe-io/zeebe:8.7.5-root

Expected

We were expecting some errors during processing and writing when the connection was completely dropped.

Actual

Similar to previous iptables we dropped all outgoing packages for the port 2049 with the new destination.

sudo iptables -A OUTPUT -p tcp --dport 2049 -d 192.168.24.110 -j DROP
Every 1.0s: sudo iptables -L -v                                                                                                            cq-p14s: Thu Jun 12 16:13:44 2025

Chain INPUT (policy ACCEPT 6211K packets, 11G bytes)
pkts bytes target prot opt in out source destination

Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination

Chain OUTPUT (policy ACCEPT 6297K packets, 23G bytes)
pkts bytes target prot opt in out source destination
35 2064K DROP tcp -- any any anywhere 192.168.24.110 tcp dpt:nfs

Now we were actually able to observe some errors. The clients were receiving DEADLINE EXCEEDED exceptions (starter and worker).

2025/06/12 16:14:03 Failed to activate jobs for worker 'zbctl': rpc error: code = DeadlineExceeded desc = context deadline exceeded
Error: rpc error: code = DeadlineExceeded desc = context deadline exceeded
Error: rpc error: code = DeadlineExceeded desc = context deadline exceeded
Error: rpc error: code = DeadlineExceeded desc = stream terminated by RST_STREAM with error code: CANCEL
Error: rpc error: code = DeadlineExceeded desc = context deadline exceeded

After some time running with the disconnected NFS server, Zeebe actually failed to flush

[2025-06-12 09:02:00.819] [raft-server-0-1] [{actor-name=raft-server-1, actor-scheduler=Broker-0, partitionId=1, raft-role=LEADER}] ERROR
io.atomix.raft.impl.RaftContext - An uncaught exception occurred, transition to inactive role
java.io.UncheckedIOException: java.io.IOException: Input/output error (msync with parameter MS_SYNC failed)
at java.base/java.nio.MappedMemoryUtils.force(Unknown Source) ~[?:?]
at java.base/java.nio.Buffer$2.force(Unknown Source) ~[?:?]
at java.base/jdk.internal.misc.ScopedMemoryAccess.forceInternal(Unknown Source) ~[?:?]
at java.base/jdk.internal.misc.ScopedMemoryAccess.force(Unknown Source) ~[?:?]
at java.base/java.nio.MappedByteBuffer.force(Unknown Source) ~[?:?]
at java.base/java.nio.MappedByteBuffer.force(Unknown Source) ~[?:?]
at io.camunda.zeebe.journal.file.Segment.flush(Segment.java:125) ~[zeebe-journal-8.7.5.jar:8.7.5]
at io.camunda.zeebe.journal.file.SegmentsFlusher.flush(SegmentsFlusher.java:58) ~[zeebe-journal-8.7.5.jar:8.7.5]
at io.camunda.zeebe.journal.file.SegmentedJournalWriter.flush(SegmentedJournalWriter.java:125) ~[zeebe-journal-8.7.5.jar:8.7.5]
at io.camunda.zeebe.journal.file.SegmentedJournal.flush(SegmentedJournal.java:173) ~[zeebe-journal-8.7.5.jar:8.7.5]
at io.atomix.raft.storage.log.RaftLogFlusher$DirectFlusher.flush(RaftLogFlusher.java:73) ~[zeebe-atomix-cluster-8.7.5.jar:8.7.5]
at io.atomix.raft.storage.log.RaftLog.flush(RaftLog.java:196) ~[zeebe-atomix-cluster-8.7.5.jar:8.7.5]
at io.atomix.raft.impl.RaftContext.setCommitIndex(RaftContext.java:538) ~[zeebe-atomix-cluster-8.7.5.jar:8.7.5]
at io.atomix.raft.roles.LeaderAppender.appendEntries(LeaderAppender.java:560) ~[zeebe-atomix-cluster-8.7.5.jar:8.7.5]
at io.atomix.raft.roles.LeaderRole.replicate(LeaderRole.java:740) ~[zeebe-atomix-cluster-8.7.5.jar:8.7.5]
at io.atomix.raft.roles.LeaderRole.safeAppendEntry(LeaderRole.java:735) ~[zeebe-atomix-cluster-8.7.5.jar:8.7.5]
at io.atomix.raft.roles.LeaderRole.lambda$appendEntry$15(LeaderRole.java:701) ~[zeebe-atomix-cluster-8.7.5.jar:8.7.5]
at io.atomix.utils.concurrent.SingleThreadContext$WrappedRunnable.run(SingleThreadContext.java:178) ~[zeebe-atomix-utils-8.7.5.jar:8.7.5]
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) ~[?:?]
at java.base/java.util.concurrent.FutureTask.run(Unknown Source) ~[?:?]
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source) ~[?:?]
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) ~[?:?]
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) ~[?:?]
at java.base/java.lang.Thread.run(Unknown Source) [?:?]
Caused by: java.io.IOException: Input/output error (msync with parameter MS_SYNC failed)
at java.base/java.nio.MappedMemoryUtils.force0(Native Method) ~[?:?]
... 24 more

This caused the RAFT Leader role to become inactive, and uninstalling all related services.

INFO io.atomix.raft.impl.RaftContext - Transitioning to INACTIVE

Furthermore, interesting is that the DiskSpaceMonitor was detecting OOD and pausing the stream processor.

[2025-06-12 09:02:00.795] [zb-actors-0] [{actor-name=DiskSpaceUsageMonitorActor, actor-scheduler=Broker-0}] WARN 
io.camunda.zeebe.broker.system - Out of disk space. Current available 0 bytes. Minimum needed 2147483648 bytes.
[2025-06-12 09:02:00.796] [zb-actors-0] [{actor-name=ZeebePartition-1, actor-scheduler=Broker-0, partitionId=1}] WARN
io.camunda.zeebe.broker.system - Disk space usage is above threshold. Pausing stream processor.

At the end, the system was not running anymore. This means availability was impacted, but not durability, as we do not write anything wrong (or do not continue with dirty data)

Chaos Experiment 3 - Random dropping packages

It is possible with iptables to randomly drop packages, allow to validate how the system behaves on certain package loss.

Expected

We expected that here the system might also fail, potentially, with some exceptions.

Actual

Running the following command sets up an iptables rule that drops random packets with 80% probability for destination port 2049

sudo iptables -A OUTPUT -p tcp --dport 2049 -d 192.168.24.110 -m statistic --mode random --probability 0.80 -j DROP

As NFS is TCP based it seem to be that NFS can handle certain data/package loss, and is repeating the packages.

The general processing was much slower, this was observed by the rate of how many instances were created and jobs completed.

Other than that the system continued to run healthy.

Chaos Experiment 4 - Drop connection on reading

We wanted to cause some SIGBUS errors, as we knew this can happen with mmapped files, like it is used in Zeebe. This might be reproduced on reading of memory mapped data.

For this we planned to create a lot of data on our Zeebe system and restarting it, causing Zeebe to fail on replay when the connection is blocked.

Expected

We expected that during read, we would cause a SIGBUS, causing the system to crash

Actual

To make sure we are creating continuous segments, and not compacting (causing longer replay) we increased the snapshot period and reduced the log segment size.

podman run -d \
-v /home/cqjawa/nfs-workshop/nfs-client-mount/srv/nfs/:/usr/local/zeebe/data \
-p 26500:26500 -p 9600:9600 \
-e ZEEBE_BROKER_THREADS_CPUTHREADCOUNT=2 \
-e ZEEBE_BROKER_THREADS_IOTHREADCOUNT=2 \
-e ZEEBE_BROKER_DATA_LOGSEGMENTSIZE=16MB \
-e ZEEBE_BROKER_DATA_SNAPSHOTPERIOD=8h \
gcr.io/zeebe-io/zeebe:8.7.5-root

First we set up an iptable rule to make sure that the reading was slower from NFS (by random dropping ~80% of packages).

sudo iptables -A OUTPUT -p tcp --dport 2049 -d 192.168.24.110 -m statistic --mode random --probability 0.80 -j DROP
[2025-06-12 09:25:00.543] [zb-actors-1] [{actor-name=StreamProcessor-1, actor-scheduler=Broker-0, partitionId=1}] INFO 
io.camunda.zeebe.processor - Processor starts replay of events. [snapshot-position: 611, replay-mode: PROCESSING]

When we saw that the StreamProcessor was starting with replay we started to drop packages again completely.

sudo iptables -A OUTPUT -p tcp --dport 2049 -d 192.168.24.110 -j DROP

After a certain period of time, we ran into a SIGBUS Error

[2025-06-12 09:25:00.543] [zb-actors-1] [{actor-name=StreamProcessor-1, actor-scheduler=Broker-0, partitionId=1}] INFO 
io.camunda.zeebe.processor - Processor starts replay of events. [snapshot-position: 611, replay-mode: PROCESSING]
[2025-06-12 09:25:00.545] [zb-actors-1] [{actor-name=ZeebePartition-1, actor-scheduler=Broker-0, partitionId=1}] INFO
io.camunda.zeebe.broker.system - Transition to LEADER on term 4 - transitioning CommandApiService
[2025-06-12 09:25:00.547] [zb-actors-1] [{actor-name=ZeebePartition-1, actor-scheduler=Broker-0, partitionId=1}] INFO
io.camunda.zeebe.broker.system - Transition to LEADER on term 4 - transitioning SnapshotDirector
[2025-06-12 09:25:00.549] [zb-actors-1] [{actor-name=ZeebePartition-1, actor-scheduler=Broker-0, partitionId=1}] INFO
io.camunda.zeebe.broker.system - Transition to LEADER on term 4 - transitioning ExporterDirector
[2025-06-12 09:25:00.555] [zb-actors-1] [{actor-name=ZeebePartition-1, actor-scheduler=Broker-0, partitionId=1}] INFO
io.camunda.zeebe.broker.system - Transition to LEADER on term 4 - transitioning BackupApiRequestHandler
[2025-06-12 09:25:00.557] [zb-actors-1] [{actor-name=ZeebePartition-1, actor-scheduler=Broker-0, partitionId=1}] INFO
io.camunda.zeebe.broker.system - Transition to LEADER on term 4 - transitioning Admin API
[2025-06-12 09:25:00.558] [zb-actors-1] [{actor-name=ZeebePartition-1, actor-scheduler=Broker-0, partitionId=1}] INFO
io.camunda.zeebe.broker.system - Transition to LEADER on term 4 completed
[2025-06-12 09:25:00.561] [zb-actors-1] [{actor-name=ZeebePartition-1, actor-scheduler=Broker-0, partitionId=1}] INFO
io.camunda.zeebe.broker.system - ZeebePartition-1 recovered, marking it as healthy
[2025-06-12 09:25:00.562] [zb-actors-1] [{actor-name=HealthCheckService, actor-scheduler=Broker-0}] INFO
io.camunda.zeebe.broker.system - Partition-1 recovered, marking it as healthy
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGBUS (0x7) at pc=0x00007f89ec4601a5, pid=2, tid=49
#
# JRE version: OpenJDK Runtime Environment Temurin-21.0.7+6 (21.0.7+6) (build 21.0.7+6-LTS)
# Java VM: OpenJDK 64-Bit Server VM Temurin-21.0.7+6 (21.0.7+6-LTS, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# v ~StubRoutines::updateBytesCRC32C 0x00007f89ec4601a5
#
# Core dump will be written. Default location: Core dumps may be processed with "/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h %d" (or dumping to /usr/local/zeebe/core.2)
#
# An error report file with more information is saved as:
# /usr/local/zeebe/hs_err_pid2.log
[275.689s][warning][os] Loading hsdis library failed
#
# If you would like to submit a bug report, please visit:
# https://github.com/adoptium/adoptium-support/issues

This caused the JVM to crash and stop the Docker container, as expected.

Results

With the workshop on experimenting with NFS, we got several learnings on how Zeebe and NFS behave on connectivity issues, summarized as follows:

  • We could confirm that network errors lead to unrecoverable SIGBUS errors, which cause the broker to crash.
    • This is due primarily to our usage of mmap both in RocksDB and Zeebe.
    • There is an easy workaround with RocksDB where you can simply turn off mmap, but no such workaround exists in Zeebe at the moment.
    • This only impacts availability as the application crashes, but since Zeebe is designed to be crash resilient, so no inconsistencies or data corruption.
    • We don’t have a clear idea of the frequency of these errors - it’s essentially environment-based (i.e., how bad the network connectivity is).
  • With only partial connectivity (simulated by dropping packets, e.g. 70% of packets), we mostly observed performance issues, as things got slower; however, messages were retried, so no errors occurred.
  • Network errors when using normal file I/O resulted in IOException as expected.
    • This caused the Raft partition to go inactive, for example, when the leader fails to flush on commit (a known issue which is already planned to be fixed for graceful error handling).
  • When the NFS server was unavailable, the disk space monitor detected that there was no more disk space available, and writes stopped.
  • Did not test that it recovers when the server is back, but we expect it would.
  • Minor, but we should open an issue for it:
    • When the leader goes inactive, we report an internal error that there is no message handler for command-api-1, but really we should be returning an UNAVAILABLE as a proper error, and not logging this as error level (we have other means to detect this).

What does this mean?

  • We can say Zeebe can work with NFS, but it is not yet supported.
  • We need to improve certain error handling, like flushing errors, to better support it.
  • When operating Zeebe on bare-metal and having an unreliable environment SIGBUS might be more likely and crashin JVM be more problematic then using an K8 deployment, where pods automatically getting rescheduled