Performance Tuning¶
Optimization Guide for Production
Maximize StrataRouter performance for your specific workload.
Quick Wins¶
Enable Semantic Caching¶
Impact: 70-80% cost reduction, 5-10x faster responses
Enable Batch Processing¶
Impact: 3-5x throughput improvement
Optimize HNSW Parameters¶
Impact: 20-40% latency reduction
HNSW Index Tuning¶
Parameter Guide¶
| Parameter | Impact | Recommendation |
|---|---|---|
ef_construction |
Build time, accuracy | 100-400 |
ef_search |
Search time, accuracy | 30-100 |
M |
Memory, accuracy | 8-32 |
Workload-Specific Settings¶
Speed-Optimized:
config = RouterConfig(
hnsw_ef_construction=100,
hnsw_ef_search=30,
hnsw_m=8
)
# Latency: ~0.6ms, Accuracy: ~93%
Balanced:
config = RouterConfig(
hnsw_ef_construction=200,
hnsw_ef_search=50,
hnsw_m=16
)
# Latency: ~1.2ms, Accuracy: ~95.4%
Accuracy-Optimized:
config = RouterConfig(
hnsw_ef_construction=400,
hnsw_ef_search=100,
hnsw_m=32
)
# Latency: ~2.5ms, Accuracy: ~97.2%
Cache Optimization¶
Cache Configuration¶
config = RuntimeConfig(
cache_enabled=True,
cache_backend="redis", # or "memory"
cache_ttl=3600, # 1 hour
cache_max_size=10000,
cache_similarity_threshold=0.95
)
Similarity Threshold¶
| Threshold | Hit Rate | Trade-off |
|---|---|---|
| 0.99 | ~60% | Very strict |
| 0.95 | ~85% | Recommended |
| 0.90 | ~92% | Looser matching |
Cache Backend Comparison¶
| Backend | Latency | Scalability | Persistence |
|---|---|---|---|
| Memory | ~0.1ms | Single node | No |
| Redis | ~0.5ms | Multi-node | Yes |
| Memcached | ~0.3ms | Multi-node | No |
Batch Processing¶
Optimal Batch Size¶
# Test different batch sizes
batch_sizes = [8, 16, 32, 64]
for size in batch_sizes:
config = RuntimeConfig(batch_size=size)
# Measure throughput
Recommendations:
- Low latency: batch_size=8, timeout=10ms
- Balanced: batch_size=32, timeout=50ms
- High throughput: batch_size=64, timeout=100ms
Resource Limits¶
CPU Optimization¶
# Set thread pool size
config = RuntimeConfig(
worker_threads=8, # Match CPU cores
blocking_threads=16
)
Memory Management¶
Provider Optimization¶
Timeout Tuning¶
config = RuntimeConfig(
provider_timeout_ms=5000, # Lower for faster failover
routing_timeout_ms=10,
total_timeout_ms=30000
)
Retry Strategy¶
config = RuntimeConfig(
retry_max_attempts=3,
retry_initial_delay_ms=100,
retry_max_delay_ms=5000,
retry_backoff_multiplier=2.0
)
Network Optimization¶
Connection Pooling¶
HTTP/2¶
# Enable HTTP/2 for providers
config = RuntimeConfig(
http2_enabled=True,
http2_max_concurrent_streams=100
)
Profiling¶
Latency Breakdown¶
from stratarouter.runtime import enable_profiling
enable_profiling()
result = executor.execute(query)
print(result.profile)
Output:
Routing: 1.2ms (10%)
Cache Lookup: 0.3ms (2%)
LLM Call: 9.8ms (85%)
State Update: 0.3ms (3%)
Total: 11.6ms
Benchmarking¶
Load Testing¶
import asyncio
from stratarouter.testing import load_test
results = await load_test(
executor=executor,
queries=queries,
duration_seconds=60,
concurrent_users=100
)
print(f"P99: {results.p99_ms}ms")
print(f"RPS: {results.requests_per_second}")
print(f"Error rate: {results.error_rate}%")
Best Practices¶
Profile before optimizing — Measure to find bottlenecks Enable caching — Biggest performance win Batch when possible — Improves throughput Right-size HNSW — Balance speed vs accuracy Monitor metrics — Track impact of changes Test under load — Validate in staging
Common Issues¶
High Latency¶
Symptoms: P99 > 20ms
Solutions:
- Lower hnsw_ef_search
- Enable caching
- Reduce provider timeout
- Check network latency
Low Throughput¶
Symptoms: RPS < 10K
Solutions: - Increase batch size - Add more workers - Scale horizontally - Enable connection pooling
High Memory¶
Symptoms: Memory > 500MB
Solutions:
- Reduce hnsw_m
- Limit cache size
- Reduce max routes
- Use Redis for cache