Engineering pattern · Updated April 2026

Kill-switch pattern for AI cost control

Per-feature cost ceiling, automatic shutoff, alert, fallback. Five lines of middleware. Saves teams from $5K to $50K surprises monthly. Every AI feature in production should have one. Below: the pattern, code for Node and Python, and the operational playbook.

The cap

5 lines

Of middleware, before any AI call

  • Per-feature ceiling
  • Automatic shutoff + alert
  • Fallback path on every feature

The pattern

Architecture in one diagram

[Request]
   │
   ▼
[Middleware: cost check]
   ├── Under ceiling: pass through
   └── Over ceiling: disable feature, alert, fallback
   │
   ▼
[AI provider call]
   │
   ▼
[Cost record per feature]
   │
   ▼
[Response]

The five components

What every kill-switch needs

Drop any one of these and the pattern fails: silent overage, no diagnosis, or a UX cliff.

  • Per-feature ceiling

    Not one global ceiling for all AI calls. Per-feature, so you know which feature is the runaway. Maps cleanly to product feature flags.

  • Cost recording per call

    Both OpenAI and Anthropic return token counts. Multiply by the current model rate. Store with feature tag, customer ID, timestamp.

  • Threshold alerts at 50/80/100%

    Slack or PagerDuty. Without the 80% alert, the first signal is a full kill. With it, you can raise the ceiling deliberately or investigate.

  • Automatic shutoff

    When monthly spend hits the ceiling, the feature returns a fallback response instead of calling the model. Atomic, no human intervention.

  • Fallback path

    Cached response, non-AI alternative, or a clear “temporarily unavailable” message. Pick based on user impact; never crash the request.

Code: Node.js

Drop-in middleware (TypeScript)

Wrap every AI call. Ceiling map per feature, atomic check, atomic record, alert at 80%, kill at 100%.

// middleware/aiKillSwitch.ts
import { db } from "@/db";
import { alertSlack } from "@/lib/alerts";

const FEATURE_CEILINGS: Record<string, number> = {
  support_agent: 500,    // $500/month
  doc_search: 300,
  autocomplete: 1000,
};

export async function withKillSwitch<T>(
  featureKey: string,
  fn: () => Promise<{ result: T; costUsd: number }>,
) {
  const monthSpend = await db.aiCost.sum({
    feature: featureKey,
    month: getCurrentMonth(),
  });

  const ceiling = FEATURE_CEILINGS[featureKey] ?? 100;

  if (monthSpend >= ceiling) {
    await alertSlack(`KILL: ${featureKey} hit $${ceiling} ceiling`);
    return { fallback: true, message: "Feature temporarily unavailable" };
  }

  if (monthSpend >= ceiling * 0.8) {
    await alertSlack(`WARN: ${featureKey} at 80% of $${ceiling}`);
  }

  const { result, costUsd } = await fn();

  await db.aiCost.insert({
    feature: featureKey,
    month: getCurrentMonth(),
    cost_usd: costUsd,
    timestamp: new Date(),
  });

  return { result, fallback: false };
}

Code: Python

Same pattern, async Python

# middleware/ai_kill_switch.py
from typing import Callable, Awaitable
from db import db
from alerts import alert_slack

FEATURE_CEILINGS = {
    "support_agent": 500,    # $500/month
    "doc_search": 300,
    "autocomplete": 1000,
}

async def with_kill_switch(feature_key: str, fn: Callable[[], Awaitable[dict]]):
    month_spend = await db.ai_cost.sum(
        feature=feature_key,
        month=current_month(),
    )

    ceiling = FEATURE_CEILINGS.get(feature_key, 100)

    if month_spend >= ceiling:
        await alert_slack(f"KILL: {feature_key} hit ${ceiling} ceiling")
        return {"fallback": True, "message": "Feature temporarily unavailable"}

    if month_spend >= ceiling * 0.8:
        await alert_slack(f"WARN: {feature_key} at 80% of ${ceiling}")

    payload = await fn()

    await db.ai_cost.insert(
        feature=feature_key,
        month=current_month(),
        cost_usd=payload["cost_usd"],
        timestamp=now(),
    )

    return {"result": payload["result"], "fallback": False}

Multi-tenant

Per-customer ceiling on top of per-feature

Useful for B2B SaaS where a single customer can burn a feature budget. Add the customer check before the feature check.

// Per-customer ceiling on top of per-feature ceiling
const customerSpend = await db.aiCost.sum({
  customer_id: customerId,
  feature: featureKey,
  month: getCurrentMonth(),
});

if (customerSpend >= customerCeiling) {
  return {
    fallback: true,
    message: "Plan limit reached. Upgrade to continue?",
  };
}

Fallback strategies

Three options, ranked by user impact

Plain unavailable message

UX: WorstEffort: Lowest

“This feature is temporarily unavailable.” Cheap to ship, hard on users. Acceptable for low-traffic, non-critical features.

Cached response

UX: MediumEffort: Medium

Last successful response for similar input. Works well for autocomplete, search, summary use cases.

Non-AI fallback

UX: BestEffort: Highest

Static behavior that delivers some value (rule-based search, default ranking). Engineered up front; pays back every kill event.

Common mistakes

Five ways the pattern silently fails

The pattern is simple. The failure modes are subtle and only show up under load.

  • One global ceiling for all AI calls

    When you hit the cap, you cannot tell which feature is the runaway. Always per-feature.

  • No alert at 80%

    Full kill becomes the first signal. By then you have lost a day to investigating instead of raising the ceiling deliberately.

  • No fallback path

    UX cliff when the ceiling is hit. Always launch with a fallback, even a basic one.

  • Ceiling reset monthly without review

    Ratchet creep: ceiling rises every month because last month's overage is normalised. Quarterly review with a reason for every change.

  • Missing per-tenant cap

    One heavy customer burns the whole feature budget. Per-feature plus per-tenant ceiling for B2B SaaS.

FAQ

Common questions on AI cost control

What is an AI kill-switch?

A per-feature cost ceiling implemented in middleware. When monthly spend on a feature crosses a threshold, the feature is automatically disabled, an alert fires, and requests fall back to a non-AI experience. Five lines of code on most stacks; saves teams from $5K to $50K surprises monthly.

What about retry storms?

Add exponential backoff and a circuit breaker around AI calls separately. Kill-switch handles cost; circuit breaker handles availability. Both are needed for production AI features.

Does the kill-switch break user flows?

Only if you launched without a fallback. Always launch with a fallback (cached response, non-AI alternative, or a clear unavailable message). Kill-switch without fallback is a UX cliff, not a safety net.

How do I set the ceiling?

Start at 1.5× your highest reasonable forecast. Watch for two to three months. Raise on demonstrated need, with a reason in your runbook. Easier to explain a brief disable than a 5× over-spend.

What about prompt injection attacks?

Kill-switch catches the budget impact (e.g. an attacker burning your free tier). Prompt injection prevention is a separate concern: input validation, output filters, sandbox tools.

Should I also cap tokens per call?

Yes. Per-request token cap stops a single runaway prompt; kill-switch stops a feature trend. Different timescales, different mechanisms, both useful.

Are there tools that do this for me?

Helicone, LangSmith, and OpenAI usage limits each cover part of the pattern. Custom middleware gives more control over fallback logic and per-tenant ceilings; off-the-shelf tools are faster to start.

Free, 48-hour SLA, no sales call

Ship AI features with cost control built in

Every YATE Web build includes a kill-switch on every AI feature, fallback paths, and live cost dashboards from day one. Free Product Audit returns the ceiling and fallback design for your specific feature.