Decision stress-testing: not whether AI gives the right answer, but that it does not go wrong in a crisis

Most AI evaluation asks 'does it give the right answer?' But the real risk is not on a normal day; it is in a crisis: price war, stock shock, demand collapse. Decision stress-testing measures how the system behaves in the worst case.

When an AI decision system is evaluated, the question asked is usually the same: “Does it give the right answer?” The system is tested under normal conditions, gives good answers, is approved.

But the real test of a decision system is not on a normal day. The real test is in a crisis.

This is exactly the logic of the stress test used to evaluate a structure in finance: testing the system not under average conditions but under the worst conditions. A bridge is tested not in good weather but in a storm. An AI decision system should also be evaluated by how it behaves not on a day when demand is calm, but at the moment a price war breaks out, stock collapses, the competitor makes an aggressive move.

Because AI’s expensive mistakes happen not on a normal day but in a crisis. Precisely under pressure, when speed is needed and the cost of error is highest. A system can be perfect on a normal day and a disaster in a crisis — and the only way to know this in advance is to test it with a crisis scenario.

The right question is not “does the AI give the right answer?” It is:

How does this AI behave in the worst case — price war, stock shock, demand collapse; even if it does not do the right thing, does it know not to do the wrong thing?

Not the right answer, but resilience

Giving the right answer under normal conditions is necessary but not sufficient. The real issue is whether the system is resilient under pressure.

An AI system works according to well-learned patterns under normal conditions. But a crisis, by definition, is outside the normal. It is an unexpected, extreme situation rarely seen in past data. If the system has not been tested with such situations, how it will behave in a crisis is unknown. And the unknown often turns out to be a bad surprise.

Resilience is a different thing from the right answer. The right answer answers “what should I do in this situation?” Resilience answers “in this situation I do not recognize, do I at least know not to cause harm?” A stress test measures the second.

Not going wrong can matter more than going right

In a crisis, the first thing expected from a decision system is not a brilliant move. The first thing expected is that it does not make a disastrous move.

In a price war, the AI automatically matching every competitor discount can be margin suicide. In a stock shock, the AI panicking and recommending excessive ordering can amplify the crisis. In a demand collapse, the AI trusting past patterns and producing a wrong forecast can lead to wrong resource allocation. In all these situations, finding the “right” move is hard — but not making the “disastrous” move is critical.

A well-designed decision system knows, in a crisis, when to stop, when to escalate to a human and when to act cautiously. Sometimes the best decision is to take no aggressive action at all. A stress test measures this “not going wrong” capability of the system.

Designing stress scenarios

Decision stress-testing is comparing the system against realistic crisis scenarios before putting it into production. These scenarios vary by sector but the logic is the same: constructing the worst but plausible situations.

Price war: If the main competitor makes a sudden, deep cut, what does the system recommend?
Stock shock: If supply is suddenly cut or demand spikes, how does the system behave?
Demand collapse: If a category unexpectedly drops, what does the system do?
Data corruption: If input data arrives partly wrong, does the system notice, or proceed blindly?
Conflicting signal: If two indicators point in opposite directions, what does the system do?

In each scenario, what is measured is not the system’s “right” answer but the quality of its behaviour: Did it stop? Did it escalate? Did it act cautiously? Or did it recommend a disastrous action under pressure? This is the crisis version of an evaluation set.

A stress test is a trust decision

Putting an AI system into production without preparing it for a crisis is signing it without seeing it. The system can work well for months on normal days; then in the first real crisis, in a situation it was never tested for, it makes an expensive mistake.

Decision stress-testing makes this risk visible in advance. It shows the system’s behaviour in the worst scenarios, before the real crisis comes, in a controlled environment. This way, before the system is put into production, a trust decision is made knowing its crisis behaviour: is this system resilient under pressure, or does it only work in good weather?

This is an extension of GDP’s fundamental stance: trust comes not from a claim that the system will never fail, but from knowing in advance how it behaves at the hardest moment.

Closing

Most AI evaluation asks “does it give the right answer?” and tests the system under normal conditions. But the real test of a decision system is in a crisis: price war, stock shock, demand collapse. AI’s expensive mistakes happen precisely here, under pressure and when the cost of error is highest.

Decision stress-testing compares the system against the worst scenarios before production and measures not the “right answer” but the quality of its behaviour: does it know, in a crisis, to stop, to escalate and not to make a disastrous move? Because not going wrong at the hardest moment often matters more than going right.

The right question is:

Are we testing whether the AI gives the right answer on a normal day, or whether it is resilient in a crisis?