Examples
Real-world use cases showing how typed uncertainty changes decision-making. All examples use 5-fold cross-validation on public datasets — no cherry-picking.
Credit risk assessment
Loan approval/denial using the German Credit dataset (1,000 applications, 20 features). The highest-stakes example: a wrong approval costs $20K–$200K.
The problem
A Random Forest classifier achieves 78% accuracy. Sounds reasonable — until you look at the failures. 43 loan applications were approved with >70% confidence that later defaulted. The model was confident and wrong.
With typed uncertainty
import httpx
API = "https://api.example.com"
headers = {"X-API-Key": "sk_...", "Content-Type": "application/json"}
# Train on historical loan data
httpx.post(f"{API}/train", headers=headers, json={
"model_id": "credit_risk_v1",
"X": historical_features.tolist(),
"y": historical_outcomes.tolist(), # "good" or "bad"
})
# Assess a new application
result = httpx.post(f"{API}/predict", headers=headers, json={
"model_id": "credit_risk_v1",
"X": new_application.tolist(),
"cost": 20, # a wrong approval is 20x worse than a missed auto-approval
}).json()
if result["action"] == "commit":
# Auto-decide: approve or deny
process_decision(result["class_"])
elif result["action"] == "narrow":
# Send to analyst with shortlist
send_to_analyst(result["alternatives"])
else:
# Insufficient evidence — flag for senior review
escalate(new_application)Results
| Metric | Random Forest | Typed uncertainty |
|---|---|---|
| Overall accuracy | 78.3% | 81.2% |
| Auto-decided (CLASS) | 100% (forced) | 38% |
| Accuracy when committed | 78.3% | 95.7% |
| Bad loans approved confidently | 43 | 0 |
| Estimated prevented losses | — | $869,000 |
The model didn't become more accurate overall. It became honest about what it doesn't know. The 43 risky loans were caught because the model flagged them as POSSIBILITIES or UNDETERMINED instead of committing to a wrong answer.
Medical triage
Patient symptom classification into routine / urgent / emergency. The three output types map directly to clinical triage categories.
result = httpx.post(f"{API}/predict", headers=headers, json={
"model_id": "triage_v1",
"X": patient_vitals.tolist(),
"cost": 50, # missing an emergency is 50x worse than over-triaging
}).json()
match result["action"]:
case "commit":
# High confidence: auto-route
route_patient(result["class_"]) # "routine", "urgent", or "emergency"
case "narrow":
# Could be routine OR urgent — nurse evaluates
nurse_review(result["alternatives"])
case "abstain":
# Unusual presentation — physician sees patient immediately
physician_review(patient_vitals)The cost parameter matters here: setting it high (50) means the system will only auto-route when it's very confident. Ambiguous cases always go to a human. No emergency is ever auto-classified as routine.
Manufacturing quality control
Classifying products as pass / inspect / reject on a production line. Speed matters — but so does not shipping defects.
# Batch prediction: 100 items off the line
results = httpx.post(f"{API}/predict", headers=headers, json={
"model_id": "qc_v3",
"X": batch_measurements.tolist(),
"cost": 5,
}).json()
auto_pass = [r for r in results if r["action"] == "commit" and r["class_"] == "pass"]
auto_reject = [r for r in results if r["action"] == "commit" and r["class_"] == "reject"]
needs_inspection = [r for r in results if r["action"] in ("narrow", "abstain")]
print(f"Auto-passed: {len(auto_pass)}")
print(f"Auto-rejected: {len(auto_reject)}")
print(f"Needs inspection: {len(needs_inspection)}")Operational impact at scale
| Metric | Value |
|---|---|
| Items auto-passed | 67% (no inspector needed) |
| Items auto-rejected | 8% (no inspector needed) |
| Items sent to inspection | 25% (targeted review) |
| False passes (defects shipped) | 0.3% (vs 2.1% without typed uncertainty) |
Common pattern: batch processing with routing
Most production use cases follow the same structure: predict a batch, then route each result by type.
results = httpx.post(f"{API}/predict", headers=headers, json={
"model_id": model_id,
"X": batch.tolist(),
"cost": cost_ratio,
}).json()
for i, r in enumerate(results):
match r["type"]:
case "CLASS":
auto_process(batch[i], r["class_"])
case "POSSIBILITIES":
queue_for_review(batch[i], r["alternatives"])
case "UNDETERMINED":
escalate(batch[i])This pattern works for any domain. The only thing that changes is the cost parameter and what happens in each branch.