Provider routing

Routing

Provider routing

A model is often served by several providers. Neural Router load-balances across them for uptime and cost, and you can shape that behavior per request with the provider object.

Default behavior

By default, requests are load-balanced across the providers that serve a model, weighted toward lower price and recent uptime, and the router fails over to the next provider if one errors or times out. This maximizes reliability without any configuration.

The provider object

Add a provider object to a chat completion request to control selection:

provider object
"provider": {
  "order": ["acme", "openai"],        // try these providers, in order
  "allow_fallbacks": true,             // fall back to others if they fail
  "only": ["acme", "anthropic"],       // allowlist
  "ignore": ["some-provider"],         // denylist
  "sort": "price",                     // or "latency" | "throughput"
  "data_collection": "deny",           // exclude providers that retain data
  "zero_retention": true,              // route only to zero-retention endpoints
  "max_price": { "input_per_m": 1.0 }  // spend ceiling per million tokens
}

Ordering & fallbacks

order tries the listed providers first, in order; any not listed are used as fallbacks unless allow_fallbacks is false. With fallbacks off, the request fails if your listed providers are all unavailable.

Sorting

Setting sort disables load balancing and tries providers in order of the chosen metric — lowest price, lowest latency (p50), or highest throughput. Latency and throughput are tracked as rolling percentiles per provider.

Filtering & compliance

only / ignore create allow/deny lists. data_collection and zero_retention restrict routing to providers that meet your data policy — the same guarantees you can enforce workspace-wide under Zero-retention & residency.