API Reference

The Neural Router API

An OpenAI-compatible REST API. Point an existing OpenAI SDK at the base URL below and add the route extension to control model selection.

Overview

All requests are made over HTTPS to:

Base URL
https://api.neuralrouter.ai/v1

Requests and responses are JSON and follow the OpenAI Chat Completions schema, with additive Neural Router fields (notably route on requests and routing on responses). Existing OpenAI client libraries work by setting the base URL and API key.

Authentication

Pass your workspace key as a bearer token. Keys are prefixed nr- and scoped to one workspace.

Header
Authorization: Bearer nr-xxxxxxxxxxxxxxxxxxxx

Chat completions

POST/v1/chat/completions

Generate a model response for a conversation. Request parameters:

FieldTypeDescription
modelstringrequiredA model id, or "auto" to let the router choose.
messagesarrayrequiredConversation messages in OpenAI chat format.
routeobjectoptionalRouting extension (objective, fallbacks, max_latency_ms, task_class).
streambooleanoptionalStream the response as server-sent events. Defaults to false.
temperaturenumberoptionalSampling temperature, 0–2.
max_tokensintegeroptionalMaximum tokens to generate.
toolsarrayoptionalTool/function definitions, OpenAI-compatible.
response_formatobjectoptionalForce JSON output with { "type": "json_object" }.

Example request body:

Request
{
  "model": "auto",
  "messages": [
    { "role": "system", "content": "You are concise." },
    { "role": "user", "content": "Summarize routing in one line." }
  ],
  "route": { "objective": "quality-per-dollar", "fallbacks": ["gpt-4o"] },
  "temperature": 0.7,
  "max_tokens": 256,
  "stream": false
}

Example response:

200 OK
{
  "id": "nrc_8f2a1c...",
  "object": "chat.completion",
  "created": 1782000000,
  "model": "claude-sonnet",
  "choices": [
    {
      "index": 0,
      "message": { "role": "assistant", "content": "Routing sends each request to the best model for your objective." },
      "finish_reason": "stop"
    }
  ],
  "usage": { "prompt_tokens": 24, "completion_tokens": 16, "total_tokens": 40 },
  "routing": {
    "selected": "claude-sonnet",
    "objective": "quality-per-dollar",
    "candidates_considered": 7,
    "reason": "Best quality-per-dollar within latency budget"
  }
}

Models

GET/v1/models

Returns the models available to your workspace, each with provider, pricing, context window, and live health.

Response
{
  "object": "list",
  "data": [
    {
      "id": "claude-sonnet",
      "provider": "Anthropic",
      "context_window": 200000,
      "input_per_m": 3.0,
      "output_per_m": 15.0,
      "health": "healthy"
    }
  ]
}

Embeddings

POST/v1/embeddings

Create an embedding vector for the given input, OpenAI-compatible. Routing applies the same way — set model to a specific embedding model or "auto".

Request
{
  "model": "auto",
  "input": "The quick brown fox."
}

Streaming format

When stream is true, the response is a sequence of server-sent events. Each event is a data: line carrying a chat.completion.chunk, ending with data: [DONE].

text/event-stream
data: {"choices":[{"delta":{"content":"Rou"}}]}

data: {"choices":[{"delta":{"content":"ting"}}]}

data: [DONE]

Rate limits

Rate limits are per workspace and returned on every response via x-ratelimit-remaining and x-ratelimit-reset headers. A 429 indicates you should retry after the reset window with exponential backoff. Upstream provider 429s are absorbed by failover and do not count against your limit.

Errors

Errors use standard HTTP status codes and a JSON body of the form { "error": { "type", "message" } }.

StatusTypeDescription
400invalid_requestMalformed request or missing required field.
401unauthenticatedMissing or invalid API key.
403forbiddenKey lacks access, or a guardrail blocked the request.
402budget_exceededThe workspace budget cap has been reached.
404model_not_foundRequested model id is not available to this workspace.
429rate_limitedToo many requests; retry with backoff.
503no_provider_availableAll candidates and fallbacks are unavailable.

New to the API? Start with the Documentation.