Whenever we experience a problem in production


the usual intervention I observe is introducing more gates on the way to production.

Typical examples are: additional reviews and approvals, isolating work on the branch and for a longer time, deploying to a separate dev/QA/staging environments, adding more integration and e2e tests, etc.

As Jay Forrester would say, good intervention point in the system, but wrong direction.

Counterintuitively, as a result, you make things even worse.

Or, in other words, systems often deteriorate not in spite of our efforts to improve them, but because of them.

When you introduce more stages, it takes longer to reach the production, which signals higher batch transaction cost, which drives average batch size up, which drives risk up exponentially along with delaying feedback. In the end, you end up with more risk and more rework.

You need faster, not slower way to reach the production. That makes the relative cost of releasing smaller changes less expensive, which allows you to validate assumptions sooner and reduce the risk along the way.

Think what should’ve been a smaller incremental step you should’ve made that would’ve lowered the risk of things going south.

As a thought experiment, which can serve as an enabling constraint, imagine every commit you make will land in production immediately as you commit.

How would you go about making the change you want to make with less risk?

Keep the cadence or speed it up, lower the risk. In fact, it’s hard to lower the risk if you don’t keep or speed up the cadence.