Skip to content

The SRE Manifesto

We, site reliability engineers from all over the world, met together to discover better ways of defining who is the site reliability engineer (SRE) persona.

We believe we can create a more unified, concise, and solid SRE persona by stating their essential responsibilities:

  • SREs make sure their systems are reliable and not just available and resilient.

  • SREs guarantee all systems and applications are observable and under monitoring.

  • SREs manage systems, services, and infrastructure to learn how to automate toil.

  • SREs use data science and statistical methods to understand the observability of data.

  • SREs identify, measure, and reduce toil arising from operational and engineering work.

  • SREs implement test cases, execute software delivery tests, and stay ahead with capacity planning.

  • SREs employ chaos engineering to unveil systemic weaknesses in production.

  • SREs respond to meaningful incidents, implement complex changes, and conduct blameless postmortems.

We welcome anyone to use this manifesto at will as we made it public. We will review this manifesto in the GitHub repository and update it yearly.

February 27, 2024.

Signing,

Signatories and supporters team

Disclaimer: The opinions expressed by the signatories and supporters on this material are their own, not necessarily those of companies listed here or their subsidiaries. The mention of the above companies is solely for the transparency and fairness purposes. Any SRE from any company (or as a person) is welcome to sign/support this manifesto.

Revisions

  • Link to the doc

End