In 2020, a Citibank operator intended to send a $7.8 million interest payment to Revlon's lenders. What arrived was $900 million: the full principal of the loan.
This was not a typo. The operator used Oracle Flexcube's "principal forgiveness" payment type, which required setting three non-obvious fields: "front," "fund," and "principal," to prevent a full repayment. The operator set them incorrectly. Three reviewers in Citibank's six eyes protocol reviewed the same screen and missed the same error. All three. Simultaneously.
Citibank lost approximately $500 million when courts ruled the lenders were not required to return the inadvertent payment. A judge analyzing the case concluded that the Flexcube interface "violates almost every single design heuristic known to mankind."
The question worth sitting with isn't how Citibank hired people who make that mistake. It is what class of interface design produces identical errors across multiple trained, independent reviewers.
What is mode confusion in software design?
That class has a name. Mode confusion is what happens when a system multiplies operational states without supporting the cognitive work of tracking which one you are in. Human-factors research named it three decades ago; it is a design failure, not user error, and its fingerprint is exactly what that screen produced: trained people, checking independently, failing identically.
Sarter and Woods named it in 1995
Nadine Sarter and David Woods formalized mode confusion in their 1995 paper "How in the World Did We Ever Get into That Mode?", a landmark in human factors research with over 860 citations. Their taxonomy identified four failure types:
- Lack of modal awareness: operators do not know which mode the system is in or how they entered it.
- Inaccurate mental models: incomplete understanding of how modes interact.
- Indirect mode activation: modes that change through non-obvious triggers rather than direct commands.
- Delayed feedback: time between input and observable result long enough to make errors hard to detect.
Their key finding: "When designers proliferate modes without supporting new cognitive demands, new mode-related error forms and failure paths result."
Aviation had already accumulated the body count. Korean Air Flight 007 strayed into Soviet airspace after a mode transition failure: the aircraft was in heading mode instead of inertial navigation. Asiana Flight 214 lost airspeed because a co-pilot believed the auto-throttle was in automatic intervention mode when it was not. Victor Riley, writing in 2005, noted that despite three decades of research, aviation interfaces had "changed little and mode confusion continues to contribute to safety events."
The pattern migrated into enterprise software as soon as enterprise software accumulated enough modes to matter.
Working memory is not the problem
The instinct when reading the Citibank case is to reach for a capacity argument: the interface was too complex, users needed better training, the six-eyes process needed a fourth reviewer. That is the wrong frame.
The research on working memory is directional, not definitive on exact numbers. But its core finding is consistent: human working memory capacity under realistic conditions is limited, classic estimates suggest 4 +/- 1 items when cognitive demand is high, and context switching consumes a significant share of productive cognitive capacity. Under time pressure, these effects compound. Research in Frontiers in Psychology has documented that time pressure restricts cognitive resource allocation and reduces decision accuracy, effects Sarter and Woods identified thirty years ago under the specific label of "delayed feedback" making errors harder to detect.
The design implication is not that users need more capacity but that modes must be designed to work within the capacity that exists, not the capacity the designer assumed.
The banner blindness problem compounds this
There is a second mechanism operating on top of mode confusion: habituation.
Anderson et al.'s 2016 JMIS study "From Warning to Wallpaper" used fMRI to observe warning habituation directly. The key finding: "A dramatic drop in visual processing centers of the brain after only the second exposure to a warning, with further decreases with subsequent exposures."
A static environment indicator, a yellow "STAGING" banner, a red "TEST MODE" badge, habituates in the user's brain after as few as two exposures. The banner is still visible to the camera. It is not visible to the user.
Anderson and Kirwan's 2015 CHI paper on polymorphic warnings showed that warnings which change their appearance, varying size, color, and text ordering, sustained neural activation significantly longer; habituation only set in on average after the 13th variation, versus 2 for conventional static warnings. That is a 6.5x difference in the amount of exposure before the warning becomes invisible.
The enterprise software implications are direct: any system that uses a single persistent visual indicator to communicate mode, "you are in production," "you are in training," is shipping a design that neurologically stops working within minutes of the user starting work for the day.
The GitLab postmortem as a case study in mode error
In 2017, a GitLab engineer ran a cleanup command on the production database instead of the replica. The engineer's explanation in the public postmortem: "The primary and secondary databases looked identical in the terminal." The result was 300GB of production data lost.
All five of GitLab's backup methods had independently failed. But the root cause of the triggering action was mode confusion: two environments, visually indistinguishable, one bearing full production data.
GitLab published the incident openly, including the contributing factors and corrective actions. The incident is a clean example of the Sarter/Woods taxonomy: the operator lacked modal awareness of which environment they were in, the interfaces provided insufficient differentiation, and the feedback from the incorrect action came too late.
The design rule
The research converges on a framework with three components.
First: minimize modes. Every mode the system has is a cognitive demand the user must track. If two modes can be collapsed, if a configuration that requires three fields can be replaced by a configuration that requires one, the right answer is almost always to collapse them.
Second: make modes unmissable, not just visible. NN/G research on visual indicators found that combining a unique icon with a color performed best across all usability metrics; users were 37% faster at locating items when indicators varied both in color and icon compared to text alone. The implication: a static color-coded banner is insufficient. At minimum two of color, shape/icon, and text label should vary, and applied persistently.
Third: add structural barriers, not visual reminders. The Citibank disaster did not require a better-looking screen; it required a workflow that made the accidental action structurally difficult. The difference between "a warning that you will send the full principal" and "a second confirmation screen that requires you to type the amount before proceeding" is not a UX preference. It is a category difference in failure prevention.
Where minimalism is the wrong lesson
The rule is not simplify everything. Some domains are irreducibly complex, and stripping a mode a power user needs to hit speed is its own failure. The claim is narrower and harder: never add a mode without supporting the cognitive demand it creates, and never assume the trained expert under time pressure is immune. They are the ones mode errors hit most. Collapse the modes you can; for the ones you cannot, make them unmissable and put a structural barrier in front of the irreversible ones.
Why this matters for ERP
Most ERP software is built for power users who will eventually memorize the modes. That design philosophy is explicit in the product teams I have talked to. "The learning curve is high, but once you are trained, you are productive."
Sarter and Woods spent thirty years demonstrating why this is wrong. Mode errors happen most to trained experts under time pressure: exactly the users who "know the system." The Citibank reviewers were not untrained. They were the six-eyes protocol.
The goal is not to remove all complexity from enterprise software but to refuse cognitive burden that adds no capability, and to stop assuming the user will always operate under ideal conditions. Citibank's six reviewers were presumably under pressure, in a time-bounded process, reviewing a routine payment.
Of course this is how I expected it to work is not just a pleasant aspiration for business software. It is a safety requirement.
About AmanERP
AmanERP is an AI-native ERP for SMBs, built in public around a single promise: Aman means peace, and business software should feel calm instead of chaotic. This piece is part of taking that promise seriously as engineering.
A note on sources
The Citibank Flexcube case is documented across multiple legal and journalistic sources; the figure of approximately $500M in recovered funds reflects the portion Citibank could not recover per court rulings. The Sarter/Woods and Anderson/Vance studies are academic papers; the NN/G 37% figure is from NN/G visual indicators research. External readers should consult primary sources for exact figures.
