Site Reliability Engineering: Measuring and Managing Reliability

  • 4.5
Approx. 12 hours to complete

Course Summary

This course teaches the principles of Site Reliability Engineering (SRE) and how to use Service Level Objectives (SLOs) to manage and improve service reliability.

Key Learning Points

  • Learn the fundamentals of Site Reliability Engineering (SRE)
  • Understand how to use Service Level Objectives (SLOs) to measure and improve service reliability
  • Gain practical experience with real-world case studies

Related Topics for further study


Learning Outcomes

  • Understand the principles and best practices of Site Reliability Engineering
  • Learn how to use Service Level Objectives (SLOs) to improve service reliability
  • Gain practical experience through real-world case studies

Prerequisites or good to have knowledge before taking this course

  • Basic knowledge of software engineering principles
  • Familiarity with Linux command line

Course Difficulty Level

Intermediate

Course Format

  • Online
  • Self-paced

Similar Courses

  • Google Cloud Platform Fundamentals: Core Infrastructure
  • Introduction to DevOps: Transforming and Improving Operations

Related Education Paths


Notable People in This Field

  • Ben Treynor Sloss
  • Niall Murphy

Related Books

Description

This course teaches the theory of Service Level Objectives (SLOs), a principled way of describing and measuring the desired reliability of a service. Upon completion, learners should be able to apply these principles to develop the first SLOs for services they are familiar with in their own organizations.

Knowledge

  • How to make systems reliable
  • Understanding SLIs, SLOs and SLAs
  • Quantifying risks to and consequences of SLOs

Outline

  • Introduction to SRE
  • Course structure
  • Introduction
  • Intro
  • CRE's Three Reliability Principles
  • Reliability in the Cloud
  • How SLOs help your business make decisions
  • How SLOs help you build features faster
  • How SLOs help you balance operational and project work
  • Making SLOs work for your organization
  • DevOps/SRE
  • Targeting Reliability
  • Introduction
  • SLOs vs SLAs
  • The happiness test
  • How do we measure reliability?
  • Edge cases
  • 100% is the wrong target
  • Iterating
  • A working service
  • SLOs and SLAs
  • Reliability and iterating
  • Targeting Reliability Assessment
  • Operating for Reliability
  • Introduction
  • Error budgets
  • Everything is a trade-off
  • Error budgets: advanced concepts
  • Axes of improvement
  • Operational approach to increasing reliability
  • Module summary
  • Error budgets
  • Increasing reliability
  • Operating for Reliability Assessment
  • Choosing a Good SLI
  • Introduction
  • User happiness in metric form
  • The properties of good SLI metrics
  • Ways of measuring SLIs
  • The SLI menu
  • The SLI equation
  • Request / Response SLIs
  • Data processing SLIs
  • "But my system is really complex!"
  • Managing complexity with aggregation
  • Managing complexity with bucketing
  • Achieveable SLOs
  • Aspirational SLOs
  • Continuous improvement
  • Measuring happiness
  • Commonly used SLIs
  • Correctness and Coverage
  • Developing SLOs and SLIs
  • Introduction
  • The 4 step process
  • Our example game
  • Loading the profile page
  • Refining SLI specifications
  • Looking for observability gaps
  • Failure modes
  • Postmortem!
  • Setting Achievable SLO targets
  • Quantifying Risks to SLOs
  • Introduction
  • Is your error budget realistic?
  • Modeling risks in our spreadsheet
  • Analyzing risk
  • Consequences of SLO Misses
  • Introduction
  • No surprises
  • A dashboard example
  • Why an error budget policy?
  • Fundamentals of an error budget policy
  • How to draft an error budget policy
  • Example policy thresholds
  • A hypothetical policy scenario
  • Course conclusion and video wrap up
  • Error budget policies
  • Error budget policy -- considerations
  • Consequences of SLO Misses

Summary of User Reviews

Learn site reliability engineering and service level objectives with Coursera. Students highly recommend this course, praising its real-world relevance and practical application. However, some users note that the course may be too basic for experienced engineers.

Key Aspect Users Liked About This Course

Real-world relevance and practical application

Pros from User Reviews

  • Course content is relevant and applicable to real-world scenarios
  • Instructors are knowledgeable and provide clear explanations
  • Hands-on labs and exercises reinforce learning

Cons from User Reviews

  • May be too basic for experienced engineers
  • Some technical difficulties with the Coursera platform
  • Limited interaction with instructors and other students
English
Available now
Approx. 12 hours to complete
Google Cloud Training
Google Cloud
Coursera

Instructor

Google Cloud Training

  • 4.5 Raiting
Share
Saved Course list
Cancel
Get Course Update
Computer Courses