The central thesis of this website is that software should be predictable. Originally I took this view from learning about Statistical Process Control (SPC), but I've since come to think it can also be used to awkwardly weld together ideas from other fields as well.
For my purposes, I see predictability as having three main facets:
- As a statistical or probablistic subject. A system that is predictable operates within an envelope of measurements which can be described as a probability distribution. Within the envelope, the system may vary at random; it is the likelihood of any given measurement which is predictable. All predictable systems can be compared to their reference predictions to see if they are beginning to misbehave.
- As a psychological phenomenon. A system that is predictable operates in harmony with how human beings sense, perceive, process, reason about and act upon the world. Systems should never cause a nasty surprise. All states should be comprehensible and predictions of future states shouldn't require heavy use of direct attention.
- As subject to microeconomics. Humans act according to their environment, according to other humans, to incentives created by these and within the limits of information they have in hand. Economics in the 20th century made massive strides in understanding diverse phenomena that had previously been weakly treated by the use of pure equations. Economists can now talk about surprise, deception, confusion and so forth in a way that can make an overall system's behaviour more predictable.
Woven through these is the concepts of Systems Dynamics, particularly in the style pioneered by Forrester at MIT. System Dynamics can create models pulling threads from the statistical, psychological and economic facets of predictability.
Not listed is determinism. To be sure, all deterministic systems are, in theory, predictable. But not all predictable systems can be represented in a deterministic way and not all deterministic systems are tractably predictable. Some systems are too complex to model easily, but important enough that we at least want an approximate answer. Whenever achieveable, software should be as deterministic as possible. But frequently this won't be economical, and approximate methods will be necessary.
I dislike uncertainty. I am not alone in disliking uncertainty. But it's a fact of life. Statistics, along with cousins like fuzzy logic and belief functions, provides mechanisms for wrestling the world's uncertainty into useful outcomes.
The fields I most borrow from are Statistical Process Control (SPC) and Statistical Quality Control (SQC). These usually ride together. I will often use the terms interchangably.
SPC is focused on a process. We can think of this as a function that takes a stream of inputs and produces a stream of outputs.
So far uninteresting, because my inputs are fixed, discrete and identical. But "input" and "output" may not be fixed, might not be discrete and may never be identical:
And this is assuming that we have any kind of visibility into the inputs to our process, or into the process itself. Most of the time we're doing well to see the outputs.
SPC introduces various terms that could be framed as levels of maturity. Specifically:
Stable, sometimes termed as in control.
A process that is stable is operating within predictable limits. They might be very wide limits. Undesirably wide, in fact. But known. When a process is stable, we can ask questions like "is this measured value ordinary or extraordinary?"
Stable here is not a human-centric term. A process may be bouncing around in a way that upsets observers and still be a stable process. Spinning the roulette wheel is chaotic and for any given spin, the outcome is unknown in advance. But over time the wheel will show a predictable mix of outcomes (and, if it doesn't, regulators and casinos both take a close interest).
Similarly, the definition of "in control" is not about control in a feedback system sense. Rather than being about the behaviour of controllers that govern some system, "in control" simply refers to the fact that a process is stable. In SPC/SQC land the distinction used is between "process control" or "quality control" on the one hand, and "engineering control" or "feedback control" on the other.
A specified process has chosen limits for acceptable outcomes. For example, it may be acceptable for the P99 of request times on a website to be 100ms. This is an "upper specification limit". The relation to the concept of SLO s is hopefully obvious.
Given that the process is stable and specified, can it actually perform within specification? How comfortably can it do so? What's the margin between control and specification limits?
This is where target-setting by itself usually fails to deliver. Periodically we set SLOs, or we create alerts for this metric or that, without knowing (1) whether the threshold is abnormal or (2) whether the threshold is achievable. Process capability is about know how well you can hit a target specification. Without it you don't know if improvements are improvements. In their introductory book about SPC1, Wheeler & Chambers make a relevant point about blind goal-setting.
It is easy to be confused.
Research into aeronautical accidents demonstrates that no matter how skilled or experienced a flight crew, they can lose situational awareness with fatal consequences. Out of fields like cognitive science, human factors, engineering psychology and the like comes a vast literature of helping humans to do better what they do best.
In describing software as psychologically predictable I mean two things:
- Human mental models of system state are broadly in sync with the system state.
- Future trajectories of system state can be mentally simulated.
One way to think of these is that a system should not create surprises. Surprise occurs because situational awareness of the software has broken down. The mental model of what the software does and is doing has become detached from what the software actually does and is actually doing. Surprise also includes the system taking a trajectory that was not foreseen.
Take a moment to look at the metrics dashboard of any production system of moderate vintage. It is crowded with various plots, histograms, dials and geegaws of all kinds. Ignoring out-of-the-box defaults, what is the unifying theme of such displays? It's past surprises. If you then look at alerts which were manually crafted, you will see the same phenomenon. Every plot or alert is the scar tissue of some previous incident.
Taking a cue from Systems Dynamics, I figure that the struggle to maintain a mental model comes from two major sources: detail complexity and dynamic complexity. The former is what we are all most familiar. Many moving parts, many layers, many possible interactions, many concepts to hold in our minds simultaneously. It's to detail complexity that a large amount of software engineering lore is directed: cohesion and coupling, choice of names, automated testing, type systems ... this multi-decadal program has largely grown in splendid, majestic isolation from the psychological principles that have made it necessary.
But of more interest to me is dynamic complexity. This is the complexity of forecasting how a system's state will evolve through time.
Humans are actually bad at this. Consider some empirical results:
- Tetlock2 found that geopolitical experts consistently over-predicted the likelihood of dramatic events occurring. In fact, most periods are much like the period immediately preceding them, so "yesterday's weather" is usually a good bet.
- Dörner3 gave people the simplest control problem: a thermostat. This problem involves only one controlled variable and one response variable. He asked subjects to keep a simulated room at a target temperature by controlling the rate of cold air flow into the room. Most subjects performed atrociously. Those who succeeded often formed elaborate theories of correct procedure. Those who failed often blamed the experiment or experimenters (Tetlock also recorded this phenomenon).
- Cronin et al4 presented a variety of highly educated, highly numerate subjects with a simple problem of estimating accumulations over time. Despite trying different subpopulations and variations on the questions and their presentation, subjects typically performed poorly. They discovered that most subjects were using a simple heuristic: that accumulations (stocks) are perfectly correlated with flows. But this is not generally true at all.
In short, human reckoning breaks down for both detail and dynamic complexity. We need machines to assist us to maintain an accurate mental model of the current and future states of the world. Software which reduces cognitive burden and human error is more psychologically predictable.
Economics is typically ill-treated by commentators, because most of their insights are in reaction to gross caricatures of actual economists. In fact it is a vast and highly variegated subject.
Economists are able to provide explanations for a wide variety of phenomena. For example, the concept of externalities and the study of how they are distributed can explain the origin of DevOps. Coaseian theories of the firm explain software architecture and the rise of microservices. The theory of incomplete contracts can be applied to explain the roots of abstraction layers. Information economics can describe the spread of technology in terms of "information cascades", where observation of other agents can rationally obliterate one's own beliefs about the world.
It goes on and on. And so will I. But for now I leave you with a question to ponder: who is responsible for the reliability of a production system? Why do you think that way?
1 Donald J. Wheeler & David S. Chambers. Understanding Statistical Process Control, 3rd Ed.
2 Philip E. Tetlock. Expert Political Judgment: How Good Is It? How Can We Know?
3 Dietrich Dörner. The Logic Of Failure: Recognizing And Avoiding Error In Complex Situations.
4 Matthew A. Cronin, Cleotilde Gonzales, John D. Sterman. "Why don’t well-educated adults understand accumulation? A challenge to researchers, educators, and citizens". Organizational Behavior and Human Decision Processes, 108(1) 116-130.