Routing
Provider routing
A model is often served by several providers. Neural Router load-balances across them for uptime and cost, and you can shape that behavior per request with the provider object.
Default behavior
By default, requests are load-balanced across the providers that serve a model, weighted toward lower price and recent uptime, and the router fails over to the next provider if one errors or times out. This maximizes reliability without any configuration.
The provider object
Add a provider object to a chat completion request to control selection:
"provider": {
"order": ["acme", "openai"], // try these providers, in order
"allow_fallbacks": true, // fall back to others if they fail
"only": ["acme", "anthropic"], // allowlist
"ignore": ["some-provider"], // denylist
"sort": "price", // or "latency" | "throughput"
"data_collection": "deny", // exclude providers that retain data
"zero_retention": true, // route only to zero-retention endpoints
"max_price": { "input_per_m": 1.0 } // spend ceiling per million tokens
}Ordering & fallbacks
order tries the listed providers first, in order; any not listed are used as fallbacks unless allow_fallbacks is false. With fallbacks off, the request fails if your listed providers are all unavailable.
Sorting
Setting sort disables load balancing and tries providers in order of the chosen metric — lowest price, lowest latency (p50), or highest throughput. Latency and throughput are tracked as rolling percentiles per provider.
Filtering & compliance
only / ignore create allow/deny lists. data_collection and zero_retention restrict routing to providers that meet your data policy — the same guarantees you can enforce workspace-wide under Zero-retention & residency.