Runtime System¶

How StrataRouter Runtime orchestrates routing decisions into production-grade LLM executions.

What is the Runtime?¶

The Runtime is the "operating system" for semantic routing. It takes routing decisions from Core and executes them safely, efficiently, and observably:

Execution Engine — Run routing decisions in isolated, retryable environments
Provider Management — Unified interface for OpenAI, Anthropic, Google, and local models
Semantic Caching — Cache similar queries to reduce LLM costs by 70–80%
Batch Coordination — Deduplicate and batch similar requests for 3–5× throughput
State Management — Persistent execution state with automatic crash recovery
Observability — Prometheus metrics, OpenTelemetry traces, structured JSON logs

Architecture¶

flowchart TB
    subgraph APP["Application Layer"]
        A["Your Application"]
    end

    subgraph CORE["StrataRouter Runtime"]
        B["Core-Runtime Bridge"]
        C["Execution Engine"]
        D["Cache Layer"]
        E["Batch Coordinator"]
        F["State Manager"]
        G["Provider Clients"]
    end

    subgraph INFRA["Infrastructure"]
        H[(PostgreSQL)]
        I[(Redis)]
        J["Prometheus"]
        K["OpenTelemetry"]
    end

    subgraph LLM["LLM Providers"]
        L["OpenAI"]
        M["Anthropic"]
        N["Google"]
        O["Local / vLLM"]
    end

    A --> B
    B --> C
    C --> D
    C --> E
    C --> F
    C --> G
    D --> I
    F --> H
    C --> J
    C --> K
    G --> L
    G --> M
    G --> N
    G --> O

End-to-End Latency Breakdown¶

End-to-End (P99) — with cache miss
├─ Core routing:     1.2ms   (2%)
├─ Bridge:           0.5ms   (1%)
├─ Cache lookup:     2.0ms   (4%)
│  ├─ Exact match:   0.5ms
│  └─ Semantic:      1.5ms
├─ Provider call:   45.0ms  (90%)
│  ├─ Network:      15ms
│  └─ LLM:          30ms
└─ Post-process:     1.3ms   (3%)
─────────────────────────────────
Total:              50.0ms

With cache hit:      4.0ms  (12.5× faster)

Basic Usage¶

from stratarouter import Router
from stratarouter_runtime import CoreRuntimeBridge, RuntimeConfig

# Initialize Core
core = Router()
core.add_routes(routes)
core.build_index()

# Initialize Runtime
config = RuntimeConfig(
    cache_enabled=True,
    batch_enabled=True,
    execution_timeout=60,
)
bridge = CoreRuntimeBridge(config)

async def handle_query(query: str, user_id: str):
    embedding = await get_embedding(query)
    decision = core.route(query, embedding)
    result = await bridge.execute(
        decision=decision,
        context={"user_id": user_id},
    )
    return result.response

Runtime Configuration¶

from stratarouter_runtime import RuntimeConfig

config = RuntimeConfig(
    # Execution
    execution_timeout=60,
    max_retries=3,
    retry_delay_ms=100,

    # Cache
    cache_enabled=True,
    cache_backend="redis",        # "redis" | "memory"
    cache_ttl=3600,
    cache_similarity_threshold=0.95,

    # Batch
    batch_enabled=True,
    batch_window_ms=50,
    batch_max_size=32,
    batch_similarity_threshold=0.98,

    # State
    state_backend="postgresql",
    checkpoint_interval=10,

    # Observability
    metrics_enabled=True,
    tracing_enabled=True,
    log_level="info",
)

Environment Variables¶

# Database
export DATABASE_URL="postgresql://localhost/stratarouter"

# Cache
export REDIS_URL="redis://localhost:6379"

# LLM providers
export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."

# Observability
export OTEL_EXPORTER_OTLP_ENDPOINT="http://localhost:4317"
export PROMETHEUS_PORT="9090"

Next Steps¶

Core-Runtime Bridge

Translation, validation, and feedback loop.

→

Execution Engine

Retry logic and circuit breakers.

→

Caching

Semantic caching configuration and tuning.

→

Observability

Metrics, traces, and structured logs.

→