Elastic restart impact on Camunda
In today's Chaos Day, we explored the impact of Elasticsearch availability on Camunda 8.9+ (testing against main).
While we already tested last year the resiliency of our System against ES restarts (see previous post, we have run the OC cluster only. Additionally, certain configurations have been improved (default replica configurations, etc.).
This time, we wanted to see how the system behaves with OC + ES Exporter + Optimize enabled.
I was joined by Jon and Pranjal, the newest members of the reliability testing team.
TL;DR; While we found that short ES unavailability does not affect processing performance, depending on the configuration, it can affect data availability. For longer outages, this would then also impact Camunda processing. To mitigate this problem, corresponding exporters should be configured, but the necessary configurations are not properly exposed and need to be fixed in the Helm Chart.




