Recovery (Fail Over) time
In the last quarter we worked on a new "feature" which is called "building state on followers". In short, it means that the followers apply the events to build there state, which makes regular snapshot replication unnecessary and allows faster role transition between Follower-to-Leader. In this chaos day I wanted to experiment a bit with this property, we already did some benchmarks here. Today, I want to see how it behaves with larger state (bigger snapshots), since this needed to be copied in previous versions of Zeebe, and the broker had to replay more than with the newest version.
If you want to now more about build state on followers check out the ZEP
TL;DR; In our experiment we had almost no downtime, with version 1.2, the new leader was very fast able to pick up the next work (accept new commands).