The SRE Manifesto
We, site reliability engineers from all over the world, met together to discover better ways of defining who is the site reliability engineer (SRE) persona.
We believe we can create a more unified, concise, and solid SRE persona by stating their essential responsibilities:
-
SREs make sure their
systems
arereliable
and not just available and resilient. -
SREs guarantee all systems and applications are
observable
and undermonitoring
. -
SREs manage systems, services, and infrastructure to learn how to
automate toil
. -
SREs use
data science
andstatistical methods
to understand the observability of data. -
SREs identify, measure, and reduce
toil
arising from operational and engineering work. -
SREs implement
test cases
, execute softwaredelivery tests
, and stay ahead withcapacity planning
. -
SREs employ
chaos engineering
to unveil systemic weaknesses in production. -
SREs respond to meaningful
incidents
, implement complexchanges
, and conduct blamelesspostmortems
.
We welcome anyone to use this manifesto at will as we made it public. We will review this manifesto in the GitHub repository and update it yearly.
February 27, 2024
.
Signing,
Signatories and supporters team
Disclaimer:
The opinions expressed by the signatories and supporters on this material are their own, not necessarily those of companies listed here or their subsidiaries. The mention of the above companies is solely for the transparency and fairness purposes. Any SRE from any company (or as a person) is welcome to sign/support this manifesto.
Revisions
- Link to the doc