PagerDuty Summit 2020

Sep. 24, 2020 · 5 min read
Learn how you can move faster and focus on the things that matter by using incident analysis as your secret weapon. Operating at speed and at scale tests the capabilities of even the most experienced engineering teams. In this software world, it is inevitable that things will break. When they do, what do you do? Pick up the pieces and carry on? What if that’s not enough? Learning from incidents has taught us that broken things can lead to powerful opportunities, but only when we’re looking at them through the right lens.

Resilient Management

Aug. 17, 2020 · 17 min read
Think about your team for a moment. How well is it functioning? Are you currently on a high-performing team, having found your groove and flow state as a group? Are you on a team that isn’t quite in that magical state of being yet, but it feels like you’re on your way? Are you feeling some friction—frustration, confusion—with your team? Or have you just joined a new team, so none of these apply yet?

Chaos Engineering

Jun. 25, 2020 · 6 min read
Breaking things on purpose. Chaos Engineering is the facilitation of experiments to uncover systemic weakness. Chaos Engineering is the discipline of experimenting on a system in order to build confidence in the system’s capability to withstand turbulent conditions in production. Performance Improvement = Errors ↓ + Insights ↑

Team Topologies

May. 14, 2020 · 12 min read
Team Topologies provides four fundamental team types – stream-aligned, platform, enabling, and complicated-subsystem – and three core team interaction modes – collaboration, X-as-a-Service, and facilitating. Together with awareness of Conway’s law, team cognitive load, and how to become a sensing organization, Team Topologies results in an effective and humanistic approach to building and running software systems.

Making Work Visible

Nov. 8, 2019 · 3 min read
We grumble that there just aren’t enough hours in the day and that someone else sure seems to have a lot of free time. But we regular mortals only have twenty-four hours in a day. The problem is that we don’t protect our hours from being stolen. We allow thieves to steal time from us, day after day after day. Who are these thieves of time?


Oct. 30, 2019 · 3 min read
Resilience is something a system does, not what a system has; creating and sustaining ‘adaptive capacity’ within an organisation (while being unable to justify doing it specifically) is resilient action; and learning about how people cope with surprise is the path to finding sources of resilience.

Remote-First vs Remote-Friendly

Sep. 27, 2019 · 4 min read
So how do we do remote work right? It takes much more than the half-hearted “you’re allowed to work from home” policy you see at companies nowadays. Remote-first means working remote is the default. It means making sure your remote employees are as much a part of the team as those in the office.

Turn the Ship Around!

Aug. 5, 2019 · 6 min read
Achieve excellence, don’t just avoid errors. Build trust and take care of your people. Use your legacy for inspiration. Use guiding principles for decision criteria. Use immediate recognition to reinforce desired behaviors. Begin with the end in mind. Encourage a questioning attitude over blind obedience

The Dip

Jan. 21, 2019 · 1 min read
How do you avoid killing something too early, or celebrating too early. And last, how do you know when to kill a dud?

Just Culture

Sep. 26, 2016 · 10 min read
A just culture balances the need for an open and honest reporting environment with the end of a quality learning environment and culture. While the organization has a duty and responsibility to employees (and ultimately to patients), all employees are held responsible for the quality of their choices. Just culture requires a change in focus from errors and outcomes to system design and management of the behavioral choices of all employees.