Every team I've worked with eventually ships a "graceful" fallback path. The intent is honest: when the primary path is unhealthy, route around it, return a cached value, drop a feature, anything to keep the user from seeing a blank page. The implementation is usually small — a few hundred lines, an extra timeout, a circuit breaker.
And then, twelve months later, the fallback is the only path anyone trusts.
The pattern
The slide is always the same. A sharp, easy-to-spot incident in month one — the fallback fires, the team praises itself, the postmortem ships. Then the fallback fires again, this time for a reason no one can immediately explain. It works, so the on-call moves on. Six weeks later it's firing daily. Eight weeks later it's firing during deploys. By month nine the team has built tooling to monitor the fallback, not the primary.
You've now built a second system, with second-class observability, that runs in production more often than the one you intended to ship. Your happy path has withered.
Why it happens
Three forces, all rational:
- The fallback is cheap to extend. It already returns "something". Adding a small caveat is one PR.
- The fallback owns the incident. When it fires, the on-call sees it. The primary path silently doing fine doesn't get a Slack ping.
- The fallback is forgiving. Latency budgets, freshness, consistency — all a notch lower. Engineers naturally prefer the codepath that doesn't punish them.
What I do now
Two architectural choices, made early, that have stopped this for me:
- Treat the fallback as a separate product. Different SLO, different dashboards, different on-call rotation if you can afford it. If it's worth shipping, it's worth owning.
- Alert when the fallback ratio creeps up. Not when it fires — that's noisy. When the rolling 7-day ratio of fallback-served requests exceeds, say, 1%. That's the early warning that the primary is rotting.
"A graceful degradation is a contract with your future self. Read the contract."
None of this is novel. It's just hard to do when the fallback is winning every postmortem.