API Reference | Neural Router

Overview

All requests are made over HTTPS to:

Base URL

https://api.neuralrouter.ai/v1

Requests and responses are JSON and follow the OpenAI Chat Completions schema, with additive Neural Router fields (notably route on requests and routing on responses). Existing OpenAI client libraries work by setting the base URL and API key.

Authentication

Pass your workspace key as a bearer token. Keys are prefixed nr- and scoped to one workspace.

Header

Authorization: Bearer nr-xxxxxxxxxxxxxxxxxxxx

Chat completions

POST/v1/chat/completions

Generate a model response for a conversation. Request parameters:

Field	Type		Description
model	string	required	A model id, or "auto" to let the router choose.
messages	array	required	Conversation messages in OpenAI chat format.
route	object	optional	Routing extension (objective, fallbacks, max_latency_ms, task_class).
stream	boolean	optional	Stream the response as server-sent events. Defaults to false.
temperature	number	optional	Sampling temperature, 0–2.
max_tokens	integer	optional	Maximum tokens to generate.
tools	array	optional	Tool/function definitions, OpenAI-compatible.
response_format	object	optional	Force JSON output with { "type": "json_object" }.

Example request body:

Request

{
  "model": "auto",
  "messages": [
    { "role": "system", "content": "You are concise." },
    { "role": "user", "content": "Summarize routing in one line." }
  ],
  "route": { "objective": "quality-per-dollar", "fallbacks": ["gpt-4o"] },
  "temperature": 0.7,
  "max_tokens": 256,
  "stream": false
}

Example response:

200 OK

{
  "id": "nrc_8f2a1c...",
  "object": "chat.completion",
  "created": 1782000000,
  "model": "claude-sonnet",
  "choices": [
    {
      "index": 0,
      "message": { "role": "assistant", "content": "Routing sends each request to the best model for your objective." },
      "finish_reason": "stop"
    }
  ],
  "usage": { "prompt_tokens": 24, "completion_tokens": 16, "total_tokens": 40 },
  "routing": {
    "selected": "claude-sonnet",
    "objective": "quality-per-dollar",
    "candidates_considered": 7,
    "reason": "Best quality-per-dollar within latency budget"
  }
}

Models

GET/v1/models

Returns the models available to your workspace, each with provider, pricing, context window, and live health.

Response

{
  "object": "list",
  "data": [
    {
      "id": "claude-sonnet",
      "provider": "Anthropic",
      "context_window": 200000,
      "input_per_m": 3.0,
      "output_per_m": 15.0,
      "health": "healthy"
    }
  ]
}

Embeddings

POST/v1/embeddings

Create an embedding vector for the given input, OpenAI-compatible. Routing applies the same way — set model to a specific embedding model or "auto".

Request

{
  "model": "auto",
  "input": "The quick brown fox."
}

Streaming format

When stream is true, the response is a sequence of server-sent events. Each event is a data: line carrying a chat.completion.chunk, ending with data: [DONE].

text/event-stream

data: {"choices":[{"delta":{"content":"Rou"}}]}

data: {"choices":[{"delta":{"content":"ting"}}]}

data: [DONE]

Rate limits

Rate limits are per workspace and returned on every response via x-ratelimit-remaining and x-ratelimit-reset headers. A 429 indicates you should retry after the reset window with exponential backoff. Upstream provider 429s are absorbed by failover and do not count against your limit.

Errors

Errors use standard HTTP status codes and a JSON body of the form { "error": { "type", "message" } }.

Status	Type	Description
400	invalid_request	Malformed request or missing required field.
401	unauthenticated	Missing or invalid API key.
403	forbidden	Key lacks access, or a guardrail blocked the request.
402	budget_exceeded	The workspace budget cap has been reached.
404	model_not_found	Requested model id is not available to this workspace.
429	rate_limited	Too many requests; retry with backoff.
503	no_provider_available	All candidates and fallbacks are unavailable.

New to the API? Start with the Documentation.