REST API Reference¶
HTTP REST endpoints for the StrataRouter Runtime server.
Base URL¶
All endpoints return JSON. Authentication is via Authorization: Bearer <token> header when enabled.
Health¶
GET /health¶
Liveness check. Returns 200 OK when the server is running.
GET /health/ready¶
Readiness check — includes database and cache connectivity.
Routing¶
POST /route¶
Route a query to the best matching route.
Request
| Field | Type | Required | Description |
|---|---|---|---|
query |
string | Yes | The query text |
embedding |
float[] | Yes | Query embedding vector |
k |
int | No | Number of alternatives to return (default: 1) |
Response
{
"route_id": "billing",
"scores": {
"semantic": 0.91,
"keyword": 0.85,
"pattern": 0.00,
"total": 0.88,
"confidence": 0.93
},
"latency_ms": 1.3,
"alternatives": []
}
Status codes: 200 OK · 400 Bad Request · 429 Rate Limit Exceeded
POST /execute¶
Route a query and execute it via a configured LLM provider. Returns both the routing decision and the LLM response.
Request
{
"query": "Where is my invoice?",
"embedding": [0.12, -0.34, "..."],
"context": {
"user_id": "user-123",
"session_id": "sess-456"
}
}
| Field | Type | Required | Description |
|---|---|---|---|
query |
string | Yes | The query text |
embedding |
float[] | Yes | Query embedding vector |
context |
object | No | Execution context (user, session, metadata) |
Response
{
"route_id": "billing",
"response": "Here is your invoice for March 2026...",
"cache_hit": false,
"latency_ms": 312.4,
"cost_usd": 0.0021,
"provider": "openai",
"metadata": {
"model": "gpt-4",
"tokens_used": 180
}
}
Route Management¶
GET /routes¶
List all registered routes.
Response
{
"routes": [
{
"id": "billing",
"description": "Billing and payment questions",
"examples": ["Where's my invoice?"],
"keywords": ["invoice", "payment"],
"threshold": null
}
],
"total": 12
}
POST /routes¶
Create a new route.
Request
{
"id": "billing",
"description": "Billing and payment questions",
"examples": ["Where's my invoice?", "Update payment"],
"keywords": ["invoice", "payment", "billing"],
"threshold": 0.75
}
Response: 201 Created with the full route object.
GET /routes/{route_id}¶
Get a single route by ID.
Response: Route object or 404 Not Found.
PUT /routes/{route_id}¶
Update an existing route. Accepts the same body as POST /routes.
Response: 200 OK with updated route object, or 404 Not Found.
DELETE /routes/{route_id}¶
Delete a route.
Response: 204 No Content or 404 Not Found.
Batch¶
POST /batch/route¶
Route multiple queries in a single request. More efficient than individual calls.
Request
{
"queries": [
{ "query": "Where's my invoice?", "embedding": ["..."] },
{ "query": "App is crashing", "embedding": ["..."] }
]
}
Response
{
"results": [
{ "route_id": "billing", "confidence": 0.93, "latency_ms": 1.2 },
{ "route_id": "support", "confidence": 0.88, "latency_ms": 1.1 }
],
"batch_latency_ms": 2.8,
"deduplicated": 0
}
Cache¶
GET /cache/stats¶
Get current cache statistics.
{
"hit_rate": 0.847,
"total_requests": 52840,
"cache_hits": 44756,
"cache_misses": 8084,
"entries": 4312,
"memory_mb": 128.4
}
DELETE /cache¶
Flush the entire cache.
Response: 204 No Content
DELETE /cache/{key}¶
Evict a specific cache entry by key.
Response: 204 No Content
Metrics¶
GET /metrics¶
Prometheus metrics in text format.
# HELP stratarouter_requests_total Total routing requests
# TYPE stratarouter_requests_total counter
stratarouter_requests_total{route="billing",status="success"} 14821
# HELP stratarouter_latency_seconds Routing latency histogram
# TYPE stratarouter_latency_seconds histogram
stratarouter_latency_seconds_bucket{le="0.01"} 48921
stratarouter_latency_seconds_bucket{le="0.05"} 52384
stratarouter_latency_seconds_bucket{le="+Inf"} 52840
stratarouter_latency_seconds_sum 241.3
stratarouter_latency_seconds_count 52840
# HELP stratarouter_cache_hit_rate Cache hit rate
# TYPE stratarouter_cache_hit_rate gauge
stratarouter_cache_hit_rate 0.847
Error Responses¶
All errors use a consistent JSON envelope:
{
"error": "RateLimitExceeded",
"message": "Rate limit: 100 req/s exceeded",
"code": "RATE_LIMIT_EXCEEDED",
"retry_after": 5
}
HTTP Status Code Reference
| Code | Error | Meaning |
|---|---|---|
400 |
InvalidRequest |
Invalid request body or missing fields |
404 |
RouteNotFound |
Route ID does not exist |
429 |
RateLimitExceeded |
Request rate limit exceeded |
500 |
InternalError |
Internal server error |
502 |
ProviderError |
LLM provider returned an error |
504 |
ExecutionTimeout |
Execution exceeded timeout |
Authentication¶
When authentication is enabled, include your API key in the Authorization header:
curl -X POST http://localhost:8080/route \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{"query": "my invoice", "embedding": [...]}'
Rate Limiting¶
Rate limit headers are included in every response:
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 87
X-RateLimit-Reset: 1709500800
Retry-After: 5 # Only on 429 responses
Quick Examples¶
Route a query¶
curl -X POST http://localhost:8080/route \
-H "Content-Type: application/json" \
-d '{
"query": "I need my invoice",
"embedding": [0.12, -0.34, 0.56]
}'