The SRE Manifesto
Short version
An open-source site reliability engineering manifesto of the SREs, by the SREs, for the SREs.
- Jump to The SRE Manifesto
- Jump to The SRE Practices
- Jump to The SRE Resources
TL;DR version
Intro
In 2003, Google published the book titled Site Reliability Engineering unveiling the operational model behind its notorious success. Although it's a well-written work, it didn't intend to prescribe the primary responsibilities of a site reliability engineer (SRE) or their core skills.
The SRE Manifesto is a timid project that conveys the SRE's primary responsibilities and core skills in a single document, thus giving shape to this profession outside Google and trying to universalize the role. Also, it intends to be a vendor-agnostic one-stop shop for all precious Site Reliability Engineering practices and resources.
Basic Definitions
Term | Definition |
---|---|
Reliability | A measurement of how trustworthy a system or service is. It's a function of many other dimensions, including availability, resiliency, robustness, scalability, security, and performance. |
Site Reliability Engineer | A person that employs Site Reliability Engineering to applications, platforms, solutions, systems, and services. |
Site Reliability Engineering | An engineering discipline that combines software and systems engineering aspects, practices, and techniques and applies them to infrastructure and operations problems to improve overall reliability. |
Background
We started this work with the Becoming a Rockstar SRE book as its appendix; however, site reliability engineering is too big to fit into a single book.
Who should read this document?
Besides site reliability engineers (SREs), we recommend the following audience for this manifesto:
- DevOps Engineers
- Platform Engineers
- Cloud Engineers
- Performance Engineers
- Software Engineers
- Engineering managers
And all managerial roles that are interested in understanding what an SRE does.