Runtime Architecture¶

Comprehensive architecture guide for StrataRouter Runtime.

System Overview¶

StrataRouter Runtime provides production-grade execution infrastructure for semantic routing at scale.

graph TB
    subgraph "Application Layer"
        A[Web Applications]
        B[AI Agents]
        C[Workflows]
    end

    subgraph "Runtime Layer"
        D[Core-Runtime Bridge]
        E[Execution Engine]
        F[Cache Layer]
        G[Batch Processor]
    end

    subgraph "Infrastructure"
        H[Provider Clients]
        I[State Manager]
        J[Observability]
    end

    subgraph "External Services"
        K[(PostgreSQL)]
        L[(Redis)]
        M[OpenAI]
        N[Anthropic]
        O[Google]
    end

    A --> D
    B --> D
    C --> D

    D --> E
    D --> F
    D --> G

    E --> H
    E --> I
    E --> J

    F --> L
    I --> K

    H --> M
    H --> N
    H --> O

    style D fill:#14b8a6
    style E fill:#10b981
    style F fill:#f59e0b
    style G fill:#3b82f6

Core Components¶

1. Core-Runtime Bridge¶

Purpose: Connects routing decisions to execution

graph LR
    A[Route Decision] --> B[Bridge]
    B --> C[Validation]
    B --> D[Translation]
    B --> E[Context Enrichment]
    C --> F[Execution Plan]
    D --> F
    E --> F

    style B fill:#14b8a6

Key Responsibilities: - Translate route IDs to execution plans - Validate execution context - Enrich with user metadata - Collect feedback for learning

Example:

from stratarouter_runtime import CoreRuntimeBridge

bridge = CoreRuntimeBridge(
    core_router=router,
    runtime_config=config
)

result = await bridge.execute(
    query="Where's my invoice?",
    context={"user_id": "user-123", "org_id": "org-456"}
)

2. Execution Engine¶

Purpose: Safe, isolated execution with reliability features

graph TB
    A[Execution Request] --> B{Timeout Check}
    B -->|Within Limit| C[Execute]
    B -->|Exceeded| D[Timeout Error]

    C --> E{Success?}
    E -->|Yes| F[Return Result]
    E -->|No| G{Retries Left?}

    G -->|Yes| H[Exponential Backoff]
    H --> C
    G -->|No| I[Circuit Breaker]

    style C fill:#10b981
    style I fill:#ef4444

Features: - Process isolation with sandboxing - Configurable timeouts per operation - Exponential backoff with jitter - Circuit breakers to prevent cascading failures - Resource limits (CPU, memory, time)

Configuration:

from stratarouter_runtime import ExecutionEngine

engine = ExecutionEngine(
    timeout=60,                    # seconds
    max_retries=3,
    retry_delay_ms=100,
    circuit_breaker_threshold=5,
    max_memory_mb=512,
    max_cpu_percent=80
)

3. Provider Clients¶

Purpose: Unified interface for multiple LLM providers

graph TB
    A[LLM Request] --> B[Provider Registry]

    B --> C[OpenAI Client]
    B --> D[Anthropic Client]
    B --> E[Google Client]
    B --> F[Local Client]

    C --> G[GPT-3.5]
    C --> H[GPT-4]

    D --> I[Claude 3]
    D --> J[Claude Sonnet 4]

    E --> K[Gemini]
    E --> L[Vertex AI]

    F --> M[Ollama]
    F --> N[vLLM]

    style B fill:#14b8a6

Supported Providers: - ✅ OpenAI (GPT-3.5, GPT-4, Embeddings) - ✅ Anthropic (Claude 2, 3, Sonnet 4) - ✅ Google (Gemini, Vertex AI) - ✅ Cohere (Command, Embed) - ✅ Azure OpenAI - ✅ Local (Ollama, vLLM, HuggingFace)

Example:

from stratarouter_runtime import LLMClientRegistry

registry = LLMClientRegistry()
registry.register("openai", OpenAIClient(api_key="..."))
registry.register("anthropic", AnthropicClient(api_key="..."))

# Execute with automatic fallback
result = await registry.complete(
    primary="openai",
    fallback=["anthropic", "google"],
    messages=[{"role": "user", "content": "Hello"}]
)

4. Cache Layer¶

Purpose: Intelligent semantic caching for cost and latency reduction

graph TB
    A[Query] --> B{Exact Match?}
    B -->|Yes| C[Return Cached<br/>< 1ms]
    B -->|No| D{Semantic Match?}
    D -->|Yes, >95%| E[Return Similar<br/>< 5ms]
    D -->|No| F[Execute LLM<br/>~50ms]
    F --> G[Store in Cache]
    G --> H[Return Result]

    style C fill:#10b981
    style E fill:#14b8a6
    style F fill:#f59e0b

Cache Types: - Exact Match: Hash-based, <1ms lookup - Semantic Match: Embedding similarity, <5ms lookup - Response Cache: Full response caching with TTL

Performance: - 85%+ hit rate in production workloads - 70-80% cost reduction - 10-15x latency improvement

Configuration:

from stratarouter_runtime import CacheManager

cache = CacheManager(
    backend="redis",              # "redis" or "memory"
    ttl=3600,                      # 1 hour
    similarity_threshold=0.95,     # 95% similarity
    max_cache_size_mb=1024        # 1GB
)

# Automatic caching
result = await cache.get_or_execute(
    key=query,
    embedding=embedding,
    executor=lambda: expensive_llm_call()
)

5. Batch Processor¶

Purpose: Automatic request batching and deduplication

graph TB
    A[Request 1] --> B[Batch Window<br/>50ms]
    C[Request 2] --> B
    D[Request 3] --> B
    E[Request 4] --> B

    B --> F{Check Similarity}
    F --> G[Unique Requests<br/>N=2]
    G --> H[Execute Batch]
    H --> I[Return N=4<br/>Results]

    style B fill:#3b82f6
    style G fill:#10b981

Features: - Request coalescing within time window - Similarity-based deduplication (>98%) - Automatic result distribution

Benefits: - 3-5x throughput improvement - 40-60% cost reduction from deduplication

Configuration:

from stratarouter_runtime import BatchProcessor

batch = BatchProcessor(
    window_ms=50,          # Collect for 50ms
    max_size=32,           # Max 32 requests per batch
    dedup_threshold=0.98   # 98% similarity = duplicate
)

6. State Manager¶

Purpose: Persistent execution state with crash recovery

graph TB
    A[Execution Start] --> B[Checkpoint 1]
    B --> C[Step 1]
    C --> D[Checkpoint 2]
    D --> E[Step 2]
    E --> F[Checkpoint 3]
    F --> G[Step 3]
    G --> H[Completion]

    E -.->|Crash| I[Recovery]
    I -.->|Resume from| D

    style B fill:#3b82f6
    style D fill:#3b82f6
    style F fill:#3b82f6
    style I fill:#f59e0b

Features: - PostgreSQL backend for ACID guarantees - Automatic checkpointing at configurable intervals - Crash recovery with automatic resume - Full audit trail for compliance - Transaction support

Configuration:

from stratarouter_runtime import StateManager

state = StateManager(
    db_url="postgresql://localhost/stratarouter",
    checkpoint_interval=10,  # Every 10 steps
    retention_days=30
)

# Save checkpoint
await state.checkpoint(execution_id, state_data)

# Recover from crash
state_data = await state.recover(execution_id)

7. Observability Stack¶

Purpose: Production monitoring and debugging

graph TB
    A[Runtime] --> B[Metrics]
    A --> C[Traces]
    A --> D[Logs]

    B --> E[Prometheus]
    C --> F[Jaeger/Tempo]
    D --> G[Loki/ES]

    E --> H[Grafana]
    F --> H
    G --> H

    H --> I[Dashboards]
    H --> J[Alerts]

    style B fill:#f59e0b
    style C fill:#3b82f6
    style D fill:#10b981

Metrics (Prometheus format):

# Request metrics
stratarouter_runtime_requests_total
stratarouter_runtime_requests_duration_seconds

# Cache metrics
stratarouter_runtime_cache_hits_total
stratarouter_runtime_cache_misses_total
stratarouter_runtime_cache_hit_rate

# Cost metrics
stratarouter_runtime_cost_usd_total
stratarouter_runtime_tokens_total

# Error metrics
stratarouter_runtime_errors_total
stratarouter_runtime_timeouts_total

Distributed Tracing (OpenTelemetry): - Full request flow visibility - Span attribution across services - Performance bottleneck identification - Error tracking and debugging

Structured Logging (JSON):

{
  "timestamp": "2026-01-15T10:30:45Z",
  "level": "info",
  "event": "execution_complete",
  "execution_id": "exec-123",
  "duration_ms": 45.2,
  "cache_hit": false,
  "provider": "openai",
  "model": "gpt-4",
  "tokens": 1250,
  "cost_usd": 0.0375
}

Data Flow¶

Standard Request Flow¶

sequenceDiagram
    participant App as Application
    participant Bridge as Bridge
    participant Cache as Cache
    participant Batch as Batch
    participant Exec as Executor
    participant LLM as LLM Provider
    participant State as State Manager

    App->>Bridge: Execute Query
    Bridge->>Cache: Check Cache
    Cache-->>Bridge: Cache Miss
    Bridge->>Batch: Add to Batch

    Note over Batch: Wait 50ms or<br/>32 requests

    Batch->>Exec: Execute Batch
    Exec->>State: Save Checkpoint
    Exec->>LLM: API Call
    LLM-->>Exec: Response
    Exec->>State: Save Completion
    Exec->>Cache: Store Result
    Exec-->>Batch: Results
    Batch-->>Bridge: Deduplicated Results
    Bridge-->>App: Response

Cache Hit Flow¶

sequenceDiagram
    participant App as Application
    participant Bridge as Bridge
    participant Cache as Cache

    App->>Bridge: Execute Query
    Bridge->>Cache: Check Cache
    Cache-->>Bridge: Cache Hit! (4ms)
    Bridge-->>App: Cached Response

    Note over Bridge,Cache: 10-15x faster<br/>No LLM cost

Deployment Patterns¶

Pattern 1: Single Instance¶

graph TB
    subgraph "Single Server"
        A[Runtime Instance]
        B[(PostgreSQL)]
        C[(Redis)]
    end

    A --> B
    A --> C

    D[Load Balancer] --> A

    style A fill:#14b8a6

Use Case: Development, small deployments
Capacity: 1-5K requests/second
Cost: ~$50-100/month

Pattern 2: Horizontal Scaling¶

graph TB
    A[Load Balancer]

    subgraph "Runtime Instances"
        B[Runtime 1]
        C[Runtime 2]
        D[Runtime 3]
        E[Runtime N]
    end

    subgraph "Shared State"
        F[(PostgreSQL<br/>Primary)]
        G[(Redis<br/>Cluster)]
    end

    A --> B
    A --> C
    A --> D
    A --> E

    B --> F
    C --> F
    D --> F
    E --> F

    B --> G
    C --> G
    D --> G
    E --> G

    style B fill:#14b8a6
    style C fill:#14b8a6
    style D fill:#14b8a6
    style E fill:#14b8a6

Use Case: Production, high traffic
Capacity: 50K+ requests/second (linear scaling)
Cost: Scales with load

Pattern 3: High Availability¶

graph TB
    subgraph "Region 1 - Primary"
        A[LB 1]
        B[Runtime 1A]
        C[Runtime 1B]
        D[(PostgreSQL<br/>Primary)]
        E[(Redis 1)]
    end

    subgraph "Region 2 - Standby"
        F[LB 2]
        G[Runtime 2A]
        H[Runtime 2B]
        I[(PostgreSQL<br/>Replica)]
        J[(Redis 2)]
    end

    A --> B
    A --> C
    F --> G
    F --> H

    B --> D
    C --> D
    G --> I
    H --> I

    D -.->|Replication| I
    E -.->|Sync| J

    style B fill:#14b8a6
    style C fill:#14b8a6
    style G fill:#10b981
    style H fill:#10b981

Use Case: Enterprise, SLA requirements
Uptime: 99.95%+ guaranteed
RTO: < 30 seconds
RPO: < 1 minute

Performance Characteristics¶

Latency Breakdown (P99)¶

pie title "Latency Distribution (50ms total)"
    "LLM API Call" : 45
    "Batch Processing" : 3
    "Cache Lookup" : 2
    "State Save" : 1.5
    "Bridge Overhead" : 0.5

With Cache Hit: ~4ms total (12.5x faster)

Resource Usage¶

Resource	Idle	Normal Load	Peak Load
CPU	5%	45%	85%
Memory	500MB	2GB	4GB
Network	1 Mbps	10 Mbps	50 Mbps
Disk I/O	10 IOPS	100 IOPS	500 IOPS

Configuration Examples¶

Basic Configuration¶

from stratarouter_runtime import RuntimeConfig

config = RuntimeConfig(
    execution_timeout=60,
    cache_enabled=True,
    batch_enabled=True
)

Production Configuration¶

config = RuntimeConfig(
    # Execution
    execution_timeout=60,
    max_retries=3,
    retry_delay_ms=100,
    circuit_breaker_threshold=5,

    # Cache
    cache_backend="redis",
    cache_ttl=3600,
    cache_similarity_threshold=0.95,

    # Batch
    batch_enabled=True,
    batch_window_ms=50,
    batch_max_size=32,
    batch_dedup_threshold=0.98,

    # State
    state_backend="postgresql",
    checkpoint_interval=10,
    state_retention_days=30,

    # Observability
    metrics_enabled=True,
    metrics_port=9090,
    tracing_enabled=True,
    tracing_endpoint="http://jaeger:4317",
    log_level="info"
)

Enterprise Configuration¶

config = RuntimeConfig(
    # All production settings, plus:

    # Security
    auth_enabled=True,
    jwt_secret="your-secret-key",
    api_key_validation=True,

    # Multi-tenancy
    tenant_isolation=True,
    resource_quotas=True,

    # High Availability
    region="us-east-1",
    failover_region="us-west-2",
    replication_lag_ms=1000,

    # Compliance
    audit_logging=True,
    data_retention_days=2555,  # 7 years
    encryption_at_rest=True
)

Security Architecture¶

graph TB
    A[API Request] --> B{Authentication}
    B -->|Valid| C{Authorization}
    B -->|Invalid| D[401 Unauthorized]

    C -->|Allowed| E{Rate Limit}
    C -->|Denied| F[403 Forbidden]

    E -->|Within Limit| G[Execution]
    E -->|Exceeded| H[429 Too Many Requests]

    G --> I{Sandbox}
    I --> J[Execute Safely]

    J --> K{Audit Log}
    K --> L[Return Response]

    style B fill:#f59e0b
    style C fill:#f59e0b
    style I fill:#10b981

Security Features¶

Authentication: API keys, JWT, OAuth 2.0
Authorization: RBAC, resource-level permissions
Rate Limiting: Token bucket, per-user/tenant
Sandboxing: Process isolation, resource limits
Encryption: TLS in transit, AES-256 at rest
Audit Logging: All operations logged with context

Runtime Overview - Feature overview
Core-Runtime Bridge - Integration layer
Deployment Guide - Production deployment
Monitoring - Observability setup

Runtime Architecture¶

System Overview¶

Core Components¶

1. Core-Runtime Bridge¶

2. Execution Engine¶

3. Provider Clients¶

4. Cache Layer¶

5. Batch Processor¶

6. State Manager¶

7. Observability Stack¶

Data Flow¶

Standard Request Flow¶

Cache Hit Flow¶

Deployment Patterns¶

Pattern 1: Single Instance¶

Pattern 2: Horizontal Scaling¶

Pattern 3: High Availability¶

Performance Characteristics¶

Latency Breakdown (P99)¶

Resource Usage¶

Configuration Examples¶

Basic Configuration¶

Production Configuration¶

Enterprise Configuration¶

Security Architecture¶

Security Features¶

Related Documentation¶