Skip to content

Configuration

Complete guide to configuring StrataRouter for your use case.

Overview

StrataRouter offers flexible configuration at multiple levels:

  • Router Config - Core routing behavior
  • Runtime Config - Execution and caching
  • Environment Variables - Infrastructure settings
  • YAML/TOML Files - Declarative configuration

Router Configuration

Basic Setup

from stratarouter import Router, RouterConfig

config = RouterConfig(
    dimension=384,          # Embedding dimension
    threshold=0.5,          # Minimum confidence threshold
    max_candidates=10       # HNSW search depth
)

router = Router(config=config)

Configuration Options

config = RouterConfig(
    # Embedding settings
    dimension=384,              # Must match your embedding model

    # Routing thresholds
    threshold=0.5,              # Min confidence (0.0-1.0)
    max_candidates=10,          # HNSW candidates to score

    # HNSW index parameters
    hnsw_m=16,                  # Connections per layer (8-64)
    hnsw_ef_construction=200,   # Build quality (100-500)
    hnsw_ef_search=50,          # Search quality (10-200)
)

Parameter Guidance:

Parameter Low Medium High
hnsw_m 8 (fast) 16 (balanced) 32 (accurate)
hnsw_ef_construction 100 (quick build) 200 (standard) 400 (best quality)
hnsw_ef_search 20 (fast) 50 (balanced) 100 (thorough)

Hybrid Scoring

config = RouterConfig(
    # Enable components
    enable_keyword_boost=True,
    enable_calibration=True,

    # Scoring weights (must sum to 1.0)
    semantic_weight=0.64,       # Embedding similarity
    keyword_weight=0.29,        # Keyword matching
    rule_weight=0.07,           # Rule-based scoring
)

Performance Tuning

config = RouterConfig(
    # SIMD optimization
    enable_simd=True,           # Use AVX2 instructions

    # Parallel processing
    num_threads=4,              # Worker threads

    # Memory optimization
    enable_quantization=False,  # Reduce memory (slight accuracy loss)
)

Runtime Configuration

Basic Setup

from stratarouter_runtime import RuntimeConfig, CoreRuntimeBridge

config = RuntimeConfig(
    # Execution
    execution_timeout=60,
    max_retries=3,

    # Cache
    cache_enabled=True,
    cache_backend="redis",

    # Batch processing
    batch_enabled=True,
)

bridge = CoreRuntimeBridge(config=config)

Execution Settings

config = RuntimeConfig(
    # Timeouts
    execution_timeout=60,           # Max execution time (seconds)
    provider_timeout=30,            # LLM call timeout

    # Retry logic
    max_retries=3,                  # Retry attempts
    retry_delay_ms=100,             # Initial delay
    retry_backoff=2.0,              # Exponential multiplier

    # Circuit breaker
    circuit_breaker_threshold=5,    # Failures before opening
    circuit_breaker_timeout=60,     # Reset timeout
)

Cache Configuration

config = RuntimeConfig(
    # Enable/disable
    cache_enabled=True,

    # Backend selection
    cache_backend="redis",          # "redis" or "memory"

    # Cache behavior
    cache_ttl=3600,                 # Time-to-live (seconds)
    cache_max_size=10000,           # Max cached entries

    # Semantic matching
    cache_similarity_threshold=0.95, # Min similarity for cache hit
    cache_enable_semantic=True,      # Enable semantic matching
)

Cache Backend Comparison:

Backend Latency Capacity Persistence Use Case
memory <1ms Limited No Development, single instance
redis 2-5ms Large Yes Production, distributed

Batch Processing

config = RuntimeConfig(
    # Enable/disable
    batch_enabled=True,

    # Batch collection
    batch_window_ms=50,             # Collection window
    batch_max_size=32,              # Max batch size

    # Deduplication
    batch_similarity_threshold=0.98, # Dedup threshold
    batch_enable_dedup=True,         # Enable deduplication
)

State Management

config = RuntimeConfig(
    # Storage backend
    state_backend="postgresql",     # "postgresql" or "memory"

    # Checkpointing
    checkpoint_interval=10,         # Steps between checkpoints
    checkpoint_retention=100,       # Max checkpoints to keep

    # Recovery
    enable_auto_recovery=True,      # Auto-recover on restart
)

Observability

config = RuntimeConfig(
    # Metrics
    metrics_enabled=True,
    metrics_port=9090,

    # Tracing
    tracing_enabled=True,
    tracing_sample_rate=0.1,        # Sample 10% of requests

    # Logging
    log_level="info",               # "debug", "info", "warn", "error"
    log_format="json",              # "json" or "text"
)

Environment Variables

Core Configuration

# Database
export DATABASE_URL="postgresql://localhost/stratarouter"
export DATABASE_POOL_SIZE="20"

# Cache
export REDIS_URL="redis://localhost:6379"
export REDIS_POOL_SIZE="10"

# API Keys
export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
export GOOGLE_API_KEY="..."

Runtime Settings

# Execution
export EXECUTION_TIMEOUT="60"
export MAX_RETRIES="3"

# Cache
export CACHE_ENABLED="true"
export CACHE_TTL="3600"

# Observability
export PROMETHEUS_PORT="9090"
export OTEL_EXPORTER_OTLP_ENDPOINT="http://localhost:4317"

Provider Configuration

# OpenAI
export OPENAI_API_KEY="sk-..."
export OPENAI_ORG_ID="org-..."
export OPENAI_TIMEOUT="30"

# Anthropic
export ANTHROPIC_API_KEY="sk-ant-..."
export ANTHROPIC_TIMEOUT="30"

# Google
export GOOGLE_API_KEY="..."
export GOOGLE_PROJECT_ID="..."

File-Based Configuration

YAML Configuration

# config.yaml
router:
  dimension: 384
  threshold: 0.5
  max_candidates: 10

  hnsw:
    m: 16
    ef_construction: 200
    ef_search: 50

  scoring:
    semantic_weight: 0.64
    keyword_weight: 0.29
    rule_weight: 0.07

runtime:
  execution:
    timeout: 60
    max_retries: 3

  cache:
    enabled: true
    backend: redis
    ttl: 3600

  batch:
    enabled: true
    window_ms: 50
    max_size: 32

Load configuration:

import yaml
from stratarouter import Router, RouterConfig
from stratarouter_runtime import RuntimeConfig

with open("config.yaml") as f:
    config = yaml.safe_load(f)

router_config = RouterConfig(**config["router"])
runtime_config = RuntimeConfig(**config["runtime"])

TOML Configuration

# config.toml
[router]
dimension = 384
threshold = 0.5
max_candidates = 10

[router.hnsw]
m = 16
ef_construction = 200
ef_search = 50

[runtime.cache]
enabled = true
backend = "redis"
ttl = 3600

Route Configuration

Basic Route

from stratarouter import Route

route = Route(
    id="billing",
    description="Billing and payment questions",
    keywords=["invoice", "payment", "refund"],
    examples=[
        "Where's my invoice?",
        "How do I update payment?"
    ]
)

Advanced Route

route = Route(
    id="urgent_escalation",
    description="Urgent issues requiring immediate attention",

    # Semantic matching
    keywords=["urgent", "emergency", "asap"],
    examples=[
        "This is urgent",
        "Emergency situation"
    ],

    # Rule-based matching
    rules=[
        r"(?i)urgent",
        r"(?i)emergency",
        r"(?i)immediately"
    ],

    # Metadata
    priority=10,                # Higher priority
    enabled=True,               # Route is active

    # Custom metadata
    metadata={
        "category": "support",
        "sla_minutes": 5,
        "requires_human": True
    }
)

Environment-Specific Configurations

Development

# development config
config = RouterConfig(
    threshold=0.4,              # Lower threshold for testing
    hnsw_ef_search=20,          # Faster search
    enable_simd=False,          # Better debugging
    num_threads=1,              # Deterministic
)

runtime_config = RuntimeConfig(
    cache_backend="memory",     # No external dependencies
    cache_ttl=60,               # Short TTL for testing
    log_level="debug",          # Verbose logging
)

Production

# production config
config = RouterConfig(
    threshold=0.5,              # Standard threshold
    hnsw_ef_search=50,          # Balanced performance
    enable_simd=True,           # Maximum performance
    num_threads=8,              # Multi-threaded
)

runtime_config = RuntimeConfig(
    cache_backend="redis",      # Distributed cache
    cache_ttl=3600,             # Long TTL
    log_level="info",           # Standard logging
    tracing_enabled=True,       # Full observability
)

Configuration Best Practices

1. Start Simple

# Minimal config for getting started
config = RouterConfig(dimension=384)
runtime_config = RuntimeConfig()

2. Tune for Your Use Case

High Accuracy Priority:

config = RouterConfig(
    threshold=0.7,              # Higher bar
    hnsw_ef_search=100,         # Thorough search
    semantic_weight=0.8,        # Favor semantics
)

Low Latency Priority:

config = RouterConfig(
    max_candidates=5,           # Fewer candidates
    hnsw_ef_search=20,          # Fast search
    enable_simd=True,           # SIMD optimization
)

Memory Constrained:

config = RouterConfig(
    hnsw_m=8,                   # Fewer connections
    enable_quantization=True,   # Compress vectors
)

runtime_config = RuntimeConfig(
    cache_max_size=1000,        # Limit cache
)

3. Monitor and Adjust

# Log key metrics
router.get_metrics()
# {
#   "avg_latency_ms": 1.2,
#   "p99_latency_ms": 8.7,
#   "avg_confidence": 0.84,
#   "cache_hit_rate": 0.87
# }

# Adjust based on metrics
if metrics["avg_confidence"] < 0.7:
    config.threshold = 0.6  # Lower threshold

Use Case Profiles

Quick-start configurations optimised for common scenarios:

High Accuracy (Strict Matching)

For critical routing where precision is paramount:

config = RouterConfig(
    threshold=0.8,              # Only high-confidence matches
    dimension=768,              # Better embeddings
    hnsw_ef_search=200,         # More thorough search
    semantic_weight=0.75,       # Prefer semantic understanding
)

High Throughput (Fast Routing)

For high-volume applications prioritising speed:

config = RouterConfig(
    threshold=0.6,              # Relaxed threshold
    dimension=384,              # Smaller vectors (faster)
    hnsw_m=16,                  # Balanced index
    hnsw_ef_search=20,          # Fast search
    enable_simd=True,           # Hardware acceleration
    num_threads=8,              # Multi-threaded
)

runtime_config = RuntimeConfig(
    cache_enabled=True,
    batch_enabled=True,
    batch_max_size=128,         # Large batches
)

Good balance of accuracy, speed, and resource usage:

config = RouterConfig(
    threshold=0.5,              # Standard threshold
    dimension=384,              # Good accuracy/speed balance
    semantic_weight=0.64,       # Default weights
    keyword_weight=0.29,
    rule_weight=0.07,
    hnsw_m=16,
    hnsw_ef_search=50,
    enable_simd=True,
)

Memory-Constrained

For limited memory environments:

config = RouterConfig(
    dimension=384,              # Smaller embeddings
    hnsw_m=8,                   # Minimal index connections
    hnsw_ef_search=20,          # Fast, light search
    enable_quantization=True,   # Compress vectors
)

runtime_config = RuntimeConfig(
    cache_max_size=500,         # Small cache
    batch_max_size=8,           # Small batches
)

Multilingual

For routing across multiple languages:

from sentence_transformers import SentenceTransformer

config = RouterConfig(
    dimension=768,
    threshold=0.5,              # Slightly lower for cross-lingual variance
)

# Use a multilingual embedding model
model = SentenceTransformer("paraphrase-multilingual-mpnet-base-v2")

Pre-Deployment Checklist

  • Embedding dimension matches your model
  • Threshold tuned for your accuracy/recall tradeoff
  • Scoring weights reflect your use case
  • SIMD enabled on your CPU architecture
  • Cache size appropriate for available memory
  • Number of threads matches CPU core count
  • HNSW parameters tuned for route count
  • Configurations validated with sample queries
  • Monitoring and alerting active

Troubleshooting

Symptom Solution
Low routing accuracy Increase threshold, use better embeddings, tune weights
Routing latency > 10ms Reduce hnsw_ef_search, enable SIMD, use smaller dimension
High memory usage Reduce hnsw_m, enable quantization, lower cache size
All queries fall through to fallback Lower threshold, improve route descriptions
Cache hit rate < 70% Increase cache_ttl, lower cache_similarity_threshold

Next Steps