Service tiers & savings

Features

Service tiers & savings

Choose how each request is routed — by cost, speed, quality, or throughput — step up to enterprise capacity tiers, and understand how providers are ranked and how savings are credited.

Service types

Every request is routed under a service type that sets the optimization objective. Set it per workspace or override it per request with the service field.

request
{
  "model": "auto",
  "service": "saver",   // standard | saver | turbo | precision | scale | agent | custom
  "messages": [ ... ]
}

Standard

The balanced default. Neural Router scores eligible models on quality-per-dollar and dispatches to the best one, failing over automatically if a provider degrades. Use it when you have no strong cost or latency preference.

Saver

Optimizes for the lowest cost that still clears the request's quality bar — ideal for high-volume, cost-sensitive workloads like classification or summarization.

example
"service": "saver"

Turbo

Optimizes for the lowest latency, routing to the fastest healthy provider. Use it for interactive, user-facing calls where time-to-first-token matters most.

Precision

Optimizes for the highest quality, preferring top-scoring models regardless of price. Use it for hard reasoning, evaluation, or final-output generation.

Scale

Tuned for high throughput: spreads load across providers and absorbs bursts without throttling, so large batch jobs complete predictably.

Agent

Optimized for tool-calling and multi-step agent loops, favoring models with reliable function calling and stable structured output across turns.

Custom

Define your own objective and constraints — allowed models, regions, and per-request budgets — for full control over routing.

request
{
  "service": "custom",
  "routing": {
    "objective": "quality",
    "max_cost_usd": 0.02,
    "allow_models": ["gpt-4o", "claude-3-5-sonnet"]
  }
}

Enterprise tiers

Enterprise tiers layer reserved capacity and stronger guarantees on top of any service type.

Priority

Reserved capacity and elevated rate limits so your traffic is served ahead of standard demand during peaks.

Platinum

Everything in Priority plus stricter SLAs, faster support response targets, and enhanced observability and reporting.

Dedicated

Isolated, single-tenant capacity with custom SLAs, dedicated routing, and white-glove onboarding for the most demanding workloads.

Provider scorecard

The router continuously measures each provider's quality, latency, uptime, and price and rolls them into a scorecard. Routing decisions rank eligible providers by these live metrics, so traffic naturally shifts toward the best-performing options.

Provider eligibility

Before scoring, Neural Router filters the provider set by your policies — data region, allow-lists, BYOK requirements, and current health. Only eligible providers are ranked, so a request never routes somewhere your governance rules forbid.

Value-based savings

Savings figures are value-based: each routed (or cached) request is credited against what the equivalent direct provider call would have cost. The dollars shown therefore reflect real avoided spend, not list-price guesses.

Dashboard reference

Each feature in the dashboard carries a (?) icon that links back to the matching reference section below.

Overview

The Overview gives a snapshot of the active workspace — recent spend, requests, and tokens — with quick links into keys, usage, and routing so you can jump straight to whatever needs attention.

Usage

Usage charts spend, requests, and token volume over time for the active workspace, broken down by model and by key so you can see where consumption concentrates.

Logs

Logs is a live feed of recent inference requests — model, status, latency, and cost per call — for quickly spotting errors, slow calls, or unexpected spend.

Model catalog

The model catalog lists every model the router can dispatch to, with its provider, context window, and per-million-token pricing for prompt, completion, and cached tokens.

API keys

Create, rotate, and revoke API keys. Each key carries a service type, an optional spend limit, an allowed-models list, and a rate limit — all enforced by the router on every request.

Routing policy

The routing policy is the workspace default: the service type and candidate models applied to any request that doesn't override them. Per-request and per-key settings take precedence over this default.

Enterprise tiers (reservations)

Reserve dedicated capacity for a workspace and choose a tier — Priority, Platinum, or Dedicated — that layers stronger guarantees on top of whatever service type each request uses. See the tier descriptions above for what each one adds.

Cost Advisor

Cost Advisor surfaces estimated savings, spend anomalies against the trailing baseline, and one-click suggestions such as moving a key to a cheaper service type when its traffic allows it.

Billing

Billing shows your plan and its limits alongside monthly statements aggregated from real spend and top-ups recorded in the ledger — never fabricated rows.

Credits

Credits is your org's prepaid balance. Buy credits and configure auto top-up — a threshold and refill amount — so traffic never stalls on an empty balance.

Team

Team manages org members and pending invites. Roles — owner, admin, and member — govern who can change keys, billing, and routing policy.

Workspaces

Each workspace isolates its own API keys, routing policy, and usage. Switching the active workspace scopes the rest of the dashboard to it.

Presets

A preset is a saved, named routing configuration — a service type plus tuned intent dimensions — that you can reuse across keys instead of re-tuning the same Custom settings each time.

Provider reference

If your organization serves models on the network, the supply-side pages below each carry their own (?) reference.

Provider hub

The Provider hub is the supply-side home for organizations serving models — application status, listed models, endpoints, and earnings in one place.

Audition

New provider endpoints start in audition: the router probes them on live-shadow traffic and only promotes them to serve real requests once they clear the quality and uptime bar.

Earnings

Earnings reports the revenue from traffic routed to your models, broken down by model and request volume over the selected period.

Yield

Yield turns the router's real routing decisions into supply-side telemetry: how often each of your models won, why it lost when it didn't (price, quality, latency, or eligibility), and — for price losses — the average price gap to the winner. Use it to see where a small price move would win volume.

Endpoints

Endpoints lists the API endpoints you serve along with the router's last connectivity probe — reachability, status code, and latency — so you can confirm the router can reach you.

Provider models

Provider models are the models you list on the network, including family, context window, and the input/output pricing the router quotes to buyers.

Where to set this

Pick a default service type per workspace, then override per request with the service field. The (?) icons across the dashboard link back to the matching section here.