Skip to content

The SRE Manifesto

Short version

An open-source site reliability engineering manifesto of the SREs, by the SREs, for the SREs.

TL;DR version

Intro

In 2003, Google published the book titled Site Reliability Engineering unveiling the operational model behind its notorious success. Although it's a well-written work, it didn't intend to prescribe the primary responsibilities of a site reliability engineer (SRE) or their core skills.

The SRE Manifesto is a timid project that conveys the SRE's primary responsibilities and core skills in a single document, thus giving shape to this profession outside Google and trying to universalize the role. Also, it intends to be a vendor-agnostic one-stop shop for all precious Site Reliability Engineering practices and resources.

Basic Definitions

Term Definition
Reliability A measurement of how trustworthy a system or service is. It's a function of many other dimensions, including availability, resiliency, robustness, scalability, security, and performance.
Site Reliability Engineer A person that employs Site Reliability Engineering to applications, platforms, solutions, systems, and services.
Site Reliability Engineering An engineering discipline that combines software and systems engineering aspects, practices, and techniques and applies them to infrastructure and operations problems to improve overall reliability.

Background

We started this work with the Becoming a Rockstar SRE book as its appendix; however, site reliability engineering is too big to fit into a single book.

Who should read this document?

Besides site reliability engineers (SREs), we recommend the following audience for this manifesto:

  • DevOps Engineers
  • Platform Engineers
  • Cloud Engineers
  • Performance Engineers
  • Software Engineers
  • Engineering managers

And all managerial roles that are interested in understanding what an SRE does.

End