Configuration¶
Complete guide to configuring StrataRouter for your use case.
Overview¶
StrataRouter offers flexible configuration at multiple levels:
- Router Config - Core routing behavior
- Runtime Config - Execution and caching
- Environment Variables - Infrastructure settings
- YAML/TOML Files - Declarative configuration
Router Configuration¶
Basic Setup¶
from stratarouter import Router, RouterConfig
config = RouterConfig(
dimension=384, # Embedding dimension
threshold=0.5, # Minimum confidence threshold
max_candidates=10 # HNSW search depth
)
router = Router(config=config)
Configuration Options¶
Vector Search¶
config = RouterConfig(
# Embedding settings
dimension=384, # Must match your embedding model
# Routing thresholds
threshold=0.5, # Min confidence (0.0-1.0)
max_candidates=10, # HNSW candidates to score
# HNSW index parameters
hnsw_m=16, # Connections per layer (8-64)
hnsw_ef_construction=200, # Build quality (100-500)
hnsw_ef_search=50, # Search quality (10-200)
)
Parameter Guidance:
| Parameter | Low | Medium | High |
|---|---|---|---|
hnsw_m |
8 (fast) | 16 (balanced) | 32 (accurate) |
hnsw_ef_construction |
100 (quick build) | 200 (standard) | 400 (best quality) |
hnsw_ef_search |
20 (fast) | 50 (balanced) | 100 (thorough) |
Hybrid Scoring¶
config = RouterConfig(
# Enable components
enable_keyword_boost=True,
enable_calibration=True,
# Scoring weights (must sum to 1.0)
semantic_weight=0.64, # Embedding similarity
keyword_weight=0.29, # Keyword matching
rule_weight=0.07, # Rule-based scoring
)
Performance Tuning¶
config = RouterConfig(
# SIMD optimization
enable_simd=True, # Use AVX2 instructions
# Parallel processing
num_threads=4, # Worker threads
# Memory optimization
enable_quantization=False, # Reduce memory (slight accuracy loss)
)
Runtime Configuration¶
Basic Setup¶
from stratarouter_runtime import RuntimeConfig, CoreRuntimeBridge
config = RuntimeConfig(
# Execution
execution_timeout=60,
max_retries=3,
# Cache
cache_enabled=True,
cache_backend="redis",
# Batch processing
batch_enabled=True,
)
bridge = CoreRuntimeBridge(config=config)
Execution Settings¶
config = RuntimeConfig(
# Timeouts
execution_timeout=60, # Max execution time (seconds)
provider_timeout=30, # LLM call timeout
# Retry logic
max_retries=3, # Retry attempts
retry_delay_ms=100, # Initial delay
retry_backoff=2.0, # Exponential multiplier
# Circuit breaker
circuit_breaker_threshold=5, # Failures before opening
circuit_breaker_timeout=60, # Reset timeout
)
Cache Configuration¶
config = RuntimeConfig(
# Enable/disable
cache_enabled=True,
# Backend selection
cache_backend="redis", # "redis" or "memory"
# Cache behavior
cache_ttl=3600, # Time-to-live (seconds)
cache_max_size=10000, # Max cached entries
# Semantic matching
cache_similarity_threshold=0.95, # Min similarity for cache hit
cache_enable_semantic=True, # Enable semantic matching
)
Cache Backend Comparison:
| Backend | Latency | Capacity | Persistence | Use Case |
|---|---|---|---|---|
memory |
<1ms | Limited | No | Development, single instance |
redis |
2-5ms | Large | Yes | Production, distributed |
Batch Processing¶
config = RuntimeConfig(
# Enable/disable
batch_enabled=True,
# Batch collection
batch_window_ms=50, # Collection window
batch_max_size=32, # Max batch size
# Deduplication
batch_similarity_threshold=0.98, # Dedup threshold
batch_enable_dedup=True, # Enable deduplication
)
State Management¶
config = RuntimeConfig(
# Storage backend
state_backend="postgresql", # "postgresql" or "memory"
# Checkpointing
checkpoint_interval=10, # Steps between checkpoints
checkpoint_retention=100, # Max checkpoints to keep
# Recovery
enable_auto_recovery=True, # Auto-recover on restart
)
Observability¶
config = RuntimeConfig(
# Metrics
metrics_enabled=True,
metrics_port=9090,
# Tracing
tracing_enabled=True,
tracing_sample_rate=0.1, # Sample 10% of requests
# Logging
log_level="info", # "debug", "info", "warn", "error"
log_format="json", # "json" or "text"
)
Environment Variables¶
Core Configuration¶
# Database
export DATABASE_URL="postgresql://localhost/stratarouter"
export DATABASE_POOL_SIZE="20"
# Cache
export REDIS_URL="redis://localhost:6379"
export REDIS_POOL_SIZE="10"
# API Keys
export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
export GOOGLE_API_KEY="..."
Runtime Settings¶
# Execution
export EXECUTION_TIMEOUT="60"
export MAX_RETRIES="3"
# Cache
export CACHE_ENABLED="true"
export CACHE_TTL="3600"
# Observability
export PROMETHEUS_PORT="9090"
export OTEL_EXPORTER_OTLP_ENDPOINT="http://localhost:4317"
Provider Configuration¶
# OpenAI
export OPENAI_API_KEY="sk-..."
export OPENAI_ORG_ID="org-..."
export OPENAI_TIMEOUT="30"
# Anthropic
export ANTHROPIC_API_KEY="sk-ant-..."
export ANTHROPIC_TIMEOUT="30"
# Google
export GOOGLE_API_KEY="..."
export GOOGLE_PROJECT_ID="..."
File-Based Configuration¶
YAML Configuration¶
# config.yaml
router:
dimension: 384
threshold: 0.5
max_candidates: 10
hnsw:
m: 16
ef_construction: 200
ef_search: 50
scoring:
semantic_weight: 0.64
keyword_weight: 0.29
rule_weight: 0.07
runtime:
execution:
timeout: 60
max_retries: 3
cache:
enabled: true
backend: redis
ttl: 3600
batch:
enabled: true
window_ms: 50
max_size: 32
Load configuration:
import yaml
from stratarouter import Router, RouterConfig
from stratarouter_runtime import RuntimeConfig
with open("config.yaml") as f:
config = yaml.safe_load(f)
router_config = RouterConfig(**config["router"])
runtime_config = RuntimeConfig(**config["runtime"])
TOML Configuration¶
# config.toml
[router]
dimension = 384
threshold = 0.5
max_candidates = 10
[router.hnsw]
m = 16
ef_construction = 200
ef_search = 50
[runtime.cache]
enabled = true
backend = "redis"
ttl = 3600
Route Configuration¶
Basic Route¶
from stratarouter import Route
route = Route(
id="billing",
description="Billing and payment questions",
keywords=["invoice", "payment", "refund"],
examples=[
"Where's my invoice?",
"How do I update payment?"
]
)
Advanced Route¶
route = Route(
id="urgent_escalation",
description="Urgent issues requiring immediate attention",
# Semantic matching
keywords=["urgent", "emergency", "asap"],
examples=[
"This is urgent",
"Emergency situation"
],
# Rule-based matching
rules=[
r"(?i)urgent",
r"(?i)emergency",
r"(?i)immediately"
],
# Metadata
priority=10, # Higher priority
enabled=True, # Route is active
# Custom metadata
metadata={
"category": "support",
"sla_minutes": 5,
"requires_human": True
}
)
Environment-Specific Configurations¶
Development¶
# development config
config = RouterConfig(
threshold=0.4, # Lower threshold for testing
hnsw_ef_search=20, # Faster search
enable_simd=False, # Better debugging
num_threads=1, # Deterministic
)
runtime_config = RuntimeConfig(
cache_backend="memory", # No external dependencies
cache_ttl=60, # Short TTL for testing
log_level="debug", # Verbose logging
)
Production¶
# production config
config = RouterConfig(
threshold=0.5, # Standard threshold
hnsw_ef_search=50, # Balanced performance
enable_simd=True, # Maximum performance
num_threads=8, # Multi-threaded
)
runtime_config = RuntimeConfig(
cache_backend="redis", # Distributed cache
cache_ttl=3600, # Long TTL
log_level="info", # Standard logging
tracing_enabled=True, # Full observability
)
Configuration Best Practices¶
1. Start Simple¶
# Minimal config for getting started
config = RouterConfig(dimension=384)
runtime_config = RuntimeConfig()
2. Tune for Your Use Case¶
High Accuracy Priority:
config = RouterConfig(
threshold=0.7, # Higher bar
hnsw_ef_search=100, # Thorough search
semantic_weight=0.8, # Favor semantics
)
Low Latency Priority:
config = RouterConfig(
max_candidates=5, # Fewer candidates
hnsw_ef_search=20, # Fast search
enable_simd=True, # SIMD optimization
)
Memory Constrained:
config = RouterConfig(
hnsw_m=8, # Fewer connections
enable_quantization=True, # Compress vectors
)
runtime_config = RuntimeConfig(
cache_max_size=1000, # Limit cache
)
3. Monitor and Adjust¶
# Log key metrics
router.get_metrics()
# {
# "avg_latency_ms": 1.2,
# "p99_latency_ms": 8.7,
# "avg_confidence": 0.84,
# "cache_hit_rate": 0.87
# }
# Adjust based on metrics
if metrics["avg_confidence"] < 0.7:
config.threshold = 0.6 # Lower threshold
Use Case Profiles¶
Quick-start configurations optimised for common scenarios:
High Accuracy (Strict Matching)¶
For critical routing where precision is paramount:
config = RouterConfig(
threshold=0.8, # Only high-confidence matches
dimension=768, # Better embeddings
hnsw_ef_search=200, # More thorough search
semantic_weight=0.75, # Prefer semantic understanding
)
High Throughput (Fast Routing)¶
For high-volume applications prioritising speed:
config = RouterConfig(
threshold=0.6, # Relaxed threshold
dimension=384, # Smaller vectors (faster)
hnsw_m=16, # Balanced index
hnsw_ef_search=20, # Fast search
enable_simd=True, # Hardware acceleration
num_threads=8, # Multi-threaded
)
runtime_config = RuntimeConfig(
cache_enabled=True,
batch_enabled=True,
batch_max_size=128, # Large batches
)
Balanced (Recommended Default)¶
Good balance of accuracy, speed, and resource usage:
config = RouterConfig(
threshold=0.5, # Standard threshold
dimension=384, # Good accuracy/speed balance
semantic_weight=0.64, # Default weights
keyword_weight=0.29,
rule_weight=0.07,
hnsw_m=16,
hnsw_ef_search=50,
enable_simd=True,
)
Memory-Constrained¶
For limited memory environments:
config = RouterConfig(
dimension=384, # Smaller embeddings
hnsw_m=8, # Minimal index connections
hnsw_ef_search=20, # Fast, light search
enable_quantization=True, # Compress vectors
)
runtime_config = RuntimeConfig(
cache_max_size=500, # Small cache
batch_max_size=8, # Small batches
)
Multilingual¶
For routing across multiple languages:
from sentence_transformers import SentenceTransformer
config = RouterConfig(
dimension=768,
threshold=0.5, # Slightly lower for cross-lingual variance
)
# Use a multilingual embedding model
model = SentenceTransformer("paraphrase-multilingual-mpnet-base-v2")
Pre-Deployment Checklist¶
- Embedding dimension matches your model
- Threshold tuned for your accuracy/recall tradeoff
- Scoring weights reflect your use case
- SIMD enabled on your CPU architecture
- Cache size appropriate for available memory
- Number of threads matches CPU core count
- HNSW parameters tuned for route count
- Configurations validated with sample queries
- Monitoring and alerting active
Troubleshooting¶
| Symptom | Solution |
|---|---|
| Low routing accuracy | Increase threshold, use better embeddings, tune weights |
| Routing latency > 10ms | Reduce hnsw_ef_search, enable SIMD, use smaller dimension |
| High memory usage | Reduce hnsw_m, enable quantization, lower cache size |
| All queries fall through to fallback | Lower threshold, improve route descriptions |
| Cache hit rate < 70% | Increase cache_ttl, lower cache_similarity_threshold |