Client-side · BYOK · $0 to run

Stop paying flagship prices for trivial prompts.

ModelRoute scores every prompt's difficulty in your browser, then routes it to the cheapest model that should still handle it well — instead of defaulting to the most expensive one. It runs your prompt live with your own API key, measures real latency, computes real cost from actual token usage, and shows the savings against an "always use the top model" baseline, side-by-side.

Try the live tool How the routing works

🔒 Your key stays in your browser, is sent only to your chosen provider over HTTPS, and is never stored or logged.

1 The Router

Pick a sample prompt or write your own. With no key set, you get a transparent estimate using typical token counts. Add your own key below to run it live against both the routed model and the top model for real numbers.

Sample prompts (easy → hard) Prompt

Routing threshold: 35/100 — prompts scoring below this go to the CHEAP model

Provider

OpenAI (gpt-4o-mini / gpt-4o) Anthropic (Haiku / Sonnet)

Your API key (optional — leave blank for estimate mode)

Stored only in memory for this tab (sessionStorage). Cleared on tab close. Never sent anywhere except the official endpoint for your selected provider.

Routing decision

Run a prompt to see the routing decision.

Session totals no runs yet

Requests run 0

Routed cost (total) $0.00

Baseline cost (total) $0.00

Overall % saved 0%

Recorded sample run real numbers, captured earlier

This table is not live and not an estimate — it's a real run of all 6 samples against Anthropic's API, captured once and frozen here so the methodology is provable even with zero key. Run the batch above with your own key to reproduce it live.

Sample	Score	Tier	Routed model	Tokens in/out	Latency	Routed cost	Baseline cost	Saved

2 Methodology

Classifier (transparent, inspectable)

A heuristic scorer reads the raw prompt text in your browser — no network call — and produces a 0–100 difficulty score from signals including:

Estimated token length (very short prompts score low)
Presence of code (fenced blocks, common syntax tokens)
Math / proof / formal-reasoning keywords ("prove", "derive", "complexity")
Multi-step or analytical asks ("explain step by step", "analyze", "compare")
Number of distinct questions / sub-requests in one prompt
Ambiguity / open-endedness signals ("design", "architecture", "trade-offs")

Prompts scoring below your threshold route to the CHEAP model; everything else routes to the TOP model. The exact score and the specific reasons that fired are shown for every request — nothing is hidden.

Cost

Cost is (input_tokens × input_price + output_tokens × output_price) / 1,000,000, using the literal usage object each provider's API returns with the response, multiplied by the price table below. In live mode, nothing is fabricated — if a call fails or usage is missing, cost shows as unavailable rather than guessed.

Latency

Measured client-side with performance.now() immediately before the request is sent and immediately after the response resolves — wall-clock time for the full round trip, including network.

Baseline

Every live request is sent twice: once to whichever model the router chose, and once to the TOP model for that provider — so the savings comparison and the quality comparison are both real, not assumed.

3 Price table

Public list prices per 1M tokens, as of June 24, 2026. Prices change — edit the PRICES object near the top of app.js if a provider updates theirs. The cost math above reads directly from this table at request time.

Provider	Tier	Model	Input $/1M	Output $/1M

4 Limitations & threat model

Honest limitations

The classifier is a heuristic, not a guarantee — it can misjudge an unusually phrased prompt in either direction.
Real-world savings depend entirely on your prompt mix, your chosen threshold, and current provider prices — the demo numbers are illustrative, not a promise.
"Handles it well" is judged by you, looking at both responses — ModelRoute does not score output quality automatically.
Token estimates in Demo mode are typical-case approximations; only BYOK live mode uses the provider's real usage counts.

Security model

Endpoint allowlist: requests can only go to api.openai.com or api.anthropic.com — both hardcoded and frozen in code. There is no field anywhere to type a custom upstream URL, so this page can never become an open relay.
Key storage: your key lives in sessionStorage for this tab only. It is cleared when the tab closes, never written to localStorage, never logged to the console, and never rendered back to the screen.
Empty-key guard: live mode refuses to fire a network request with an empty key and shows an inline error instead.
Output handling: all model output is inserted with textContent — never innerHTML, never eval — so a response can't execute as markup or script.
Transport: both allowlisted endpoints are HTTPS-only.