AmanERPAmanERP
Blog पर वापस
ENGINEERING DISCIPLINE

Schema-Aware Gates: Validate, Don't Coerce

A missing field is not zero. If a CI gate pretends it is, the green check becomes false evidence.

Niraj Kumar2026-06-026 min read
A structured drill receipt with its recovery-time row blank, a ghosted zero being stamped into the gap beneath a green pass verdict.

A disaster-recovery drill that restored nothing passed as the fastest recovery in the project's history. The receipt said recovery time: zero seconds, outcome: pass. Nothing failed. No alert fired. The gate that was supposed to catch exactly this waved it through — because one line read a missing field with a default and turned absence into success.

I now use a blunt rule for it: a schema-aware gate validates, it does not coerce. Defaulting a missing field does not make the check safer — it swaps a correctness question ("is this document right?") for a completeness question ("does this document exist?"). The damage is quiet, which is what makes it dangerous.

The gate that approved a disaster that didn't recover

A disaster-recovery drill is supposed to prove one thing: that you can destroy your infrastructure and rebuild it. The proof is a committed receipt — a structured file recording the outcome, the recovery time, and the scope of what was validated. A freshness gate reads that receipt and asserts the drill actually happened and actually passed.

The gate had a single innocent-looking line. It read the recovery-time field with a default: if the field was missing, treat it as zero. One parameter, one fallback value, the kind of defensive default you write without thinking.

# Banned — turns absence into a pass
rto = receipt.get("rto_seconds", 0)

# Required — fail loud, name the field
if not receipt.get("rto_seconds"):
    reject("missing rto_seconds in drill receipt")
if receipt["scope"] not in QUALIFYING_SCOPES:
    reject(f"scope {receipt['scope']!r} not in qualifying set")

Then a different kind of receipt arrived — one from a drift-detection-only run, which checks for configuration drift but never destroys or restores anything. It had no recovery-time field, because no recovery happened. The gate read the missing field, substituted zero, and signed off: zero-second recovery, pass.

The gate did not fail. It approved a non-event as a success — and filed it as the best recovery number on record.

The category error underneath it

The bug looks like a typo. It is actually a category error, and naming it is what makes it fixable.

A correctness gate asks: is this document right? A completeness gate asks: does this document exist? The moment a gate defaults a missing field, the question it answers changes without anyone noticing. It stops asking whether the document is right and starts asking only whether a document showed up. Once that swap happens the gate will accept present-but-wrong inputs, because it just invented a reading for the absent parts.

No gate at all is at least an honest admission that you don't know. The coercing version hands you a green check that says the thing you cared about was verified when the exact failure case sailed through. You will trust the check. That is the trap.

The reason the bug is everywhere is that the language makes it easy. get("field", 0) reads as defensive. It feels safe. It never throws. But in a validation context, failing loud is the entire feature. The fallback that protects you in application code is the thing that blinds you in a gate. The same shape hides everywhere a schema gets parsed: coercing an enum to "unknown", substituting "default" for a missing mode, filling an absent status. Each one converts a failure into a silent pass.

Two gates, one mistake

If this were a single careless line, it would not be worth a name. It is worth a name because it recurred.

The same family of bug appeared in a second gate in the same codebase: a backup-verification path where the reader's expected format silently diverged from the writer's actual format. The verification step matched a manifest it would never receive, found nothing to check, and reported success. It ran as a silent no-op for thirteen days while reporting pass the entire time.

Two different gates, two different authors, the same underlying error: a check that resolves an absent or mismatched input into a passing result instead of refusing it. When the same mistake shows up twice, it isn't carelessness — it's a pattern the language and your instincts actively push you toward. That's when you name the rule.

The rule, in one line

Validate explicitly, never coerce to defaults.

Concretely: when a gate parses a versioned schema, every required field must be checked for presence and for a qualifying value before anything else runs. If the field is missing, or its value falls outside the allowed set, the gate rejects — and the rejection message names the offending field and the file it came from, so the failure is diagnosable in one read.

The banned shapes are specific: reading a numeric field with a zero default, coercing an enum to "unknown," substituting any string for an absent required value. The required shape is equally specific: if the field is absent or its scope isn't in the qualifying set, reject with a clear message. The discipline is narrower than "validate more" — never let an absence resolve into a pass.

Where this rule breaks

Validate-don't-coerce is not free, and there is a place it stops being worth it: a coercing gate beats a missing gate, even if it loses to a strict one. If the alternative to a defaulting check is no check at all — because writing the strict version blocks a release you need today — a default that catches the common case is a defensible interim, as long as you name it honestly. It is a completeness gate wearing a correctness gate's badge. The danger is not the fallback you can see; it is the fallback you have forgotten is there. Treat every fallback in a schema-parsing gate as debt with a name, not as finished work.

There is also a real cost to strictness: a gate that rejects on every absent optional field becomes brittle in a different direction, failing loud on documents that were fine. The rule applies to required fields and qualifying values — the ones a pass actually depends on. Defaulting a genuinely optional field is not the bug; defaulting a field the verdict hinges on is. Match the strictness to what the pass is asserting, not to every key in the schema.

What to actually change Monday

Grep your gate code for the coercion shape — get("field", 0), or "", or "unknown" — wherever you parse a schema you trust to make a pass/fail call. Each one is a candidate completeness-gate-in-disguise. For every required field, replace the fallback with an explicit presence-and-validity check that fails loud and names the field and source file.

Then add the test that proves it: feed the gate a document with the field missing and confirm it rejects. A gate with no committed test proving it rejects bad input is a green light you haven't earned. The whole value of a gate is that it says no to the right things.

A gate that can't say no is decoration.

Series linkage

Part 1 of 3 in Gates that make rigor and speed compound. Read in order: this piece (schema-aware gates), definition of done (cite an artifact, not a self-assertion), then RFC before ADR (the maybe layer that saves bad decisions).

About AmanERP

AmanERP is an ERP for SMBs built in public. We ship gates that fail loud — green checks are supposed to mean something, not decorate a pipeline. www.amanerp.com