João Freitas

The following is a retrospective on recent configuration changes outages and how they can impact software production lines, although being designed to be failproof:

From a glance, a good percentage of outages are caused by bad configuration changes – the 2021 global Facebook outage, the $440mm bad configuration that brought down Knight Capital in 2012, numerous global outages at Google Cloud, Microsoft Azure, Cloudflare, and other companies with serious engineering cultures. Why do configuration changes cause so many outages?

What helps prevent outages due to bad configuration?

But configuration change outages are anything but a solved problem.

#reads #matt rickard #sre #cloud