Overview
All requests are made over HTTPS to:
https://api.neuralrouter.ai/v1Requests and responses are JSON and follow the OpenAI Chat Completions schema, with additive Neural Router fields (notably route on requests and routing on responses). Existing OpenAI client libraries work by setting the base URL and API key.
Authentication
Pass your workspace key as a bearer token. Keys are prefixed nr- and scoped to one workspace.
Authorization: Bearer nr-xxxxxxxxxxxxxxxxxxxxChat completions
Generate a model response for a conversation. Request parameters:
| Field | Type | Description | |
|---|---|---|---|
| model | string | required | A model id, or "auto" to let the router choose. |
| messages | array | required | Conversation messages in OpenAI chat format. |
| route | object | optional | Routing extension (objective, fallbacks, max_latency_ms, task_class). |
| stream | boolean | optional | Stream the response as server-sent events. Defaults to false. |
| temperature | number | optional | Sampling temperature, 0–2. |
| max_tokens | integer | optional | Maximum tokens to generate. |
| tools | array | optional | Tool/function definitions, OpenAI-compatible. |
| response_format | object | optional | Force JSON output with { "type": "json_object" }. |
Example request body:
{
"model": "auto",
"messages": [
{ "role": "system", "content": "You are concise." },
{ "role": "user", "content": "Summarize routing in one line." }
],
"route": { "objective": "quality-per-dollar", "fallbacks": ["gpt-4o"] },
"temperature": 0.7,
"max_tokens": 256,
"stream": false
}Example response:
{
"id": "nrc_8f2a1c...",
"object": "chat.completion",
"created": 1782000000,
"model": "claude-sonnet",
"choices": [
{
"index": 0,
"message": { "role": "assistant", "content": "Routing sends each request to the best model for your objective." },
"finish_reason": "stop"
}
],
"usage": { "prompt_tokens": 24, "completion_tokens": 16, "total_tokens": 40 },
"routing": {
"selected": "claude-sonnet",
"objective": "quality-per-dollar",
"candidates_considered": 7,
"reason": "Best quality-per-dollar within latency budget"
}
}Models
Returns the models available to your workspace, each with provider, pricing, context window, and live health.
{
"object": "list",
"data": [
{
"id": "claude-sonnet",
"provider": "Anthropic",
"context_window": 200000,
"input_per_m": 3.0,
"output_per_m": 15.0,
"health": "healthy"
}
]
}Embeddings
Create an embedding vector for the given input, OpenAI-compatible. Routing applies the same way — set model to a specific embedding model or "auto".
{
"model": "auto",
"input": "The quick brown fox."
}Streaming format
When stream is true, the response is a sequence of server-sent events. Each event is a data: line carrying a chat.completion.chunk, ending with data: [DONE].
data: {"choices":[{"delta":{"content":"Rou"}}]}
data: {"choices":[{"delta":{"content":"ting"}}]}
data: [DONE]Rate limits
Rate limits are per workspace and returned on every response via x-ratelimit-remaining and x-ratelimit-reset headers. A 429 indicates you should retry after the reset window with exponential backoff. Upstream provider 429s are absorbed by failover and do not count against your limit.
Errors
Errors use standard HTTP status codes and a JSON body of the form { "error": { "type", "message" } }.
| Status | Type | Description |
|---|---|---|
| 400 | invalid_request | Malformed request or missing required field. |
| 401 | unauthenticated | Missing or invalid API key. |
| 403 | forbidden | Key lacks access, or a guardrail blocked the request. |
| 402 | budget_exceeded | The workspace budget cap has been reached. |
| 404 | model_not_found | Requested model id is not available to this workspace. |
| 429 | rate_limited | Too many requests; retry with backoff. |
| 503 | no_provider_available | All candidates and fallbacks are unavailable. |
New to the API? Start with the Documentation.