Runtime System¶
How StrataRouter Runtime orchestrates routing decisions into production-grade LLM executions.
What is the Runtime?¶
The Runtime is the "operating system" for semantic routing. It takes routing decisions from Core and executes them safely, efficiently, and observably:
- Execution Engine — Run routing decisions in isolated, retryable environments
- Provider Management — Unified interface for OpenAI, Anthropic, Google, and local models
- Semantic Caching — Cache similar queries to reduce LLM costs by 70–80%
- Batch Coordination — Deduplicate and batch similar requests for 3–5× throughput
- State Management — Persistent execution state with automatic crash recovery
- Observability — Prometheus metrics, OpenTelemetry traces, structured JSON logs
Architecture¶
flowchart TB
subgraph APP["Application Layer"]
A["Your Application"]
end
subgraph CORE["StrataRouter Runtime"]
B["Core-Runtime Bridge"]
C["Execution Engine"]
D["Cache Layer"]
E["Batch Coordinator"]
F["State Manager"]
G["Provider Clients"]
end
subgraph INFRA["Infrastructure"]
H[(PostgreSQL)]
I[(Redis)]
J["Prometheus"]
K["OpenTelemetry"]
end
subgraph LLM["LLM Providers"]
L["OpenAI"]
M["Anthropic"]
N["Google"]
O["Local / vLLM"]
end
A --> B
B --> C
C --> D
C --> E
C --> F
C --> G
D --> I
F --> H
C --> J
C --> K
G --> L
G --> M
G --> N
G --> O
End-to-End Latency Breakdown¶
End-to-End (P99) — with cache miss
├─ Core routing: 1.2ms (2%)
├─ Bridge: 0.5ms (1%)
├─ Cache lookup: 2.0ms (4%)
│ ├─ Exact match: 0.5ms
│ └─ Semantic: 1.5ms
├─ Provider call: 45.0ms (90%)
│ ├─ Network: 15ms
│ └─ LLM: 30ms
└─ Post-process: 1.3ms (3%)
─────────────────────────────────
Total: 50.0ms
With cache hit: 4.0ms (12.5× faster)
Basic Usage¶
from stratarouter import Router
from stratarouter_runtime import CoreRuntimeBridge, RuntimeConfig
# Initialize Core
core = Router()
core.add_routes(routes)
core.build_index()
# Initialize Runtime
config = RuntimeConfig(
cache_enabled=True,
batch_enabled=True,
execution_timeout=60,
)
bridge = CoreRuntimeBridge(config)
async def handle_query(query: str, user_id: str):
embedding = await get_embedding(query)
decision = core.route(query, embedding)
result = await bridge.execute(
decision=decision,
context={"user_id": user_id},
)
return result.response
Runtime Configuration¶
from stratarouter_runtime import RuntimeConfig
config = RuntimeConfig(
# Execution
execution_timeout=60,
max_retries=3,
retry_delay_ms=100,
# Cache
cache_enabled=True,
cache_backend="redis", # "redis" | "memory"
cache_ttl=3600,
cache_similarity_threshold=0.95,
# Batch
batch_enabled=True,
batch_window_ms=50,
batch_max_size=32,
batch_similarity_threshold=0.98,
# State
state_backend="postgresql",
checkpoint_interval=10,
# Observability
metrics_enabled=True,
tracing_enabled=True,
log_level="info",
)
Environment Variables¶
# Database
export DATABASE_URL="postgresql://localhost/stratarouter"
# Cache
export REDIS_URL="redis://localhost:6379"
# LLM providers
export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
# Observability
export OTEL_EXPORTER_OTLP_ENDPOINT="http://localhost:4317"
export PROMETHEUS_PORT="9090"