5 lines
Of middleware, before any AI call
- Per-feature ceiling
- Automatic shutoff + alert
- Fallback path on every feature
Per-feature cost ceiling, automatic shutoff, alert, fallback. Five lines of middleware. Saves teams from $5K to $50K surprises monthly. Every AI feature in production should have one. Below: the pattern, code for Node and Python, and the operational playbook.
5 lines
Of middleware, before any AI call
The pattern
[Request] │ ▼ [Middleware: cost check] ├── Under ceiling: pass through └── Over ceiling: disable feature, alert, fallback │ ▼ [AI provider call] │ ▼ [Cost record per feature] │ ▼ [Response]
The five components
Drop any one of these and the pattern fails: silent overage, no diagnosis, or a UX cliff.
Not one global ceiling for all AI calls. Per-feature, so you know which feature is the runaway. Maps cleanly to product feature flags.
Both OpenAI and Anthropic return token counts. Multiply by the current model rate. Store with feature tag, customer ID, timestamp.
Slack or PagerDuty. Without the 80% alert, the first signal is a full kill. With it, you can raise the ceiling deliberately or investigate.
When monthly spend hits the ceiling, the feature returns a fallback response instead of calling the model. Atomic, no human intervention.
Cached response, non-AI alternative, or a clear “temporarily unavailable” message. Pick based on user impact; never crash the request.
Code: Node.js
Wrap every AI call. Ceiling map per feature, atomic check, atomic record, alert at 80%, kill at 100%.
// middleware/aiKillSwitch.ts
import { db } from "@/db";
import { alertSlack } from "@/lib/alerts";
const FEATURE_CEILINGS: Record<string, number> = {
support_agent: 500, // $500/month
doc_search: 300,
autocomplete: 1000,
};
export async function withKillSwitch<T>(
featureKey: string,
fn: () => Promise<{ result: T; costUsd: number }>,
) {
const monthSpend = await db.aiCost.sum({
feature: featureKey,
month: getCurrentMonth(),
});
const ceiling = FEATURE_CEILINGS[featureKey] ?? 100;
if (monthSpend >= ceiling) {
await alertSlack(`KILL: ${featureKey} hit $${ceiling} ceiling`);
return { fallback: true, message: "Feature temporarily unavailable" };
}
if (monthSpend >= ceiling * 0.8) {
await alertSlack(`WARN: ${featureKey} at 80% of $${ceiling}`);
}
const { result, costUsd } = await fn();
await db.aiCost.insert({
feature: featureKey,
month: getCurrentMonth(),
cost_usd: costUsd,
timestamp: new Date(),
});
return { result, fallback: false };
}Code: Python
# middleware/ai_kill_switch.py
from typing import Callable, Awaitable
from db import db
from alerts import alert_slack
FEATURE_CEILINGS = {
"support_agent": 500, # $500/month
"doc_search": 300,
"autocomplete": 1000,
}
async def with_kill_switch(feature_key: str, fn: Callable[[], Awaitable[dict]]):
month_spend = await db.ai_cost.sum(
feature=feature_key,
month=current_month(),
)
ceiling = FEATURE_CEILINGS.get(feature_key, 100)
if month_spend >= ceiling:
await alert_slack(f"KILL: {feature_key} hit ${ceiling} ceiling")
return {"fallback": True, "message": "Feature temporarily unavailable"}
if month_spend >= ceiling * 0.8:
await alert_slack(f"WARN: {feature_key} at 80% of ${ceiling}")
payload = await fn()
await db.ai_cost.insert(
feature=feature_key,
month=current_month(),
cost_usd=payload["cost_usd"],
timestamp=now(),
)
return {"result": payload["result"], "fallback": False}Multi-tenant
Useful for B2B SaaS where a single customer can burn a feature budget. Add the customer check before the feature check.
// Per-customer ceiling on top of per-feature ceiling
const customerSpend = await db.aiCost.sum({
customer_id: customerId,
feature: featureKey,
month: getCurrentMonth(),
});
if (customerSpend >= customerCeiling) {
return {
fallback: true,
message: "Plan limit reached. Upgrade to continue?",
};
}Fallback strategies
“This feature is temporarily unavailable.” Cheap to ship, hard on users. Acceptable for low-traffic, non-critical features.
Last successful response for similar input. Works well for autocomplete, search, summary use cases.
Static behavior that delivers some value (rule-based search, default ranking). Engineered up front; pays back every kill event.
Common mistakes
The pattern is simple. The failure modes are subtle and only show up under load.
When you hit the cap, you cannot tell which feature is the runaway. Always per-feature.
Full kill becomes the first signal. By then you have lost a day to investigating instead of raising the ceiling deliberately.
UX cliff when the ceiling is hit. Always launch with a fallback, even a basic one.
Ratchet creep: ceiling rises every month because last month's overage is normalised. Quarterly review with a reason for every change.
One heavy customer burns the whole feature budget. Per-feature plus per-tenant ceiling for B2B SaaS.
FAQ
A per-feature cost ceiling implemented in middleware. When monthly spend on a feature crosses a threshold, the feature is automatically disabled, an alert fires, and requests fall back to a non-AI experience. Five lines of code on most stacks; saves teams from $5K to $50K surprises monthly.
Add exponential backoff and a circuit breaker around AI calls separately. Kill-switch handles cost; circuit breaker handles availability. Both are needed for production AI features.
Only if you launched without a fallback. Always launch with a fallback (cached response, non-AI alternative, or a clear unavailable message). Kill-switch without fallback is a UX cliff, not a safety net.
Start at 1.5× your highest reasonable forecast. Watch for two to three months. Raise on demonstrated need, with a reason in your runbook. Easier to explain a brief disable than a 5× over-spend.
Kill-switch catches the budget impact (e.g. an attacker burning your free tier). Prompt injection prevention is a separate concern: input validation, output filters, sandbox tools.
Yes. Per-request token cap stops a single runaway prompt; kill-switch stops a feature trend. Different timescales, different mechanisms, both useful.
Helicone, LangSmith, and OpenAI usage limits each cover part of the pattern. Custom middleware gives more control over fallback logic and per-tenant ceilings; off-the-shelf tools are faster to start.
Free, 48-hour SLA, no sales call
Every YATE Web build includes a kill-switch on every AI feature, fallback paths, and live cost dashboards from day one. Free Product Audit returns the ceiling and fallback design for your specific feature.