Performance Tuning¶

Optimization Guide for Production

Maximize StrataRouter performance for your specific workload.

Quick Wins¶

Enable Semantic Caching¶

Impact: 70-80% cost reduction, 5-10x faster responses

config = RuntimeConfig(cache_enabled=True)
executor = RuntimeExecutor(router, config=config)

Enable Batch Processing¶

Impact: 3-5x throughput improvement

config = RuntimeConfig(
    batch_size=32,
    batch_timeout_ms=50
)

Optimize HNSW Parameters¶

Impact: 20-40% latency reduction

config = RouterConfig(
    hnsw_ef_search=30,  # Lower for speed
    hnsw_m=8  # Lower for memory
)

HNSW Index Tuning¶

Parameter Guide¶

Parameter	Impact	Recommendation
`ef_construction`	Build time, accuracy	100-400
`ef_search`	Search time, accuracy	30-100
`M`	Memory, accuracy	8-32

Workload-Specific Settings¶

Speed-Optimized:

config = RouterConfig(
    hnsw_ef_construction=100,
    hnsw_ef_search=30,
    hnsw_m=8
)
# Latency: ~0.6ms, Accuracy: ~93%

Balanced:

config = RouterConfig(
    hnsw_ef_construction=200,
    hnsw_ef_search=50,
    hnsw_m=16
)
# Latency: ~1.2ms, Accuracy: ~95.4%

Accuracy-Optimized:

config = RouterConfig(
    hnsw_ef_construction=400,
    hnsw_ef_search=100,
    hnsw_m=32
)
# Latency: ~2.5ms, Accuracy: ~97.2%

Cache Optimization¶

Cache Configuration¶

config = RuntimeConfig(
    cache_enabled=True,
    cache_backend="redis",  # or "memory"
    cache_ttl=3600,  # 1 hour
    cache_max_size=10000,
    cache_similarity_threshold=0.95
)

Similarity Threshold¶

Threshold	Hit Rate	Trade-off
0.99	~60%	Very strict
0.95	~85%	Recommended
0.90	~92%	Looser matching

Cache Backend Comparison¶

Backend	Latency	Scalability	Persistence
Memory	~0.1ms	Single node	No
Redis	~0.5ms	Multi-node	Yes
Memcached	~0.3ms	Multi-node	No

Batch Processing¶

Optimal Batch Size¶

# Test different batch sizes
batch_sizes = [8, 16, 32, 64]
for size in batch_sizes:
    config = RuntimeConfig(batch_size=size)
    # Measure throughput

Recommendations: - Low latency: batch_size=8, timeout=10ms - Balanced: batch_size=32, timeout=50ms
- High throughput: batch_size=64, timeout=100ms

Resource Limits¶

CPU Optimization¶

# Set thread pool size
config = RuntimeConfig(
    worker_threads=8,  # Match CPU cores
    blocking_threads=16
)

Memory Management¶

config = RouterConfig(
    max_routes=10000,
    max_index_memory_mb=512
)

Provider Optimization¶

Timeout Tuning¶

config = RuntimeConfig(
    provider_timeout_ms=5000,  # Lower for faster failover
    routing_timeout_ms=10,
    total_timeout_ms=30000
)

Retry Strategy¶

config = RuntimeConfig(
    retry_max_attempts=3,
    retry_initial_delay_ms=100,
    retry_max_delay_ms=5000,
    retry_backoff_multiplier=2.0
)

Network Optimization¶

Connection Pooling¶

config = RuntimeConfig(
    connection_pool_size=100,
    connection_timeout_ms=5000,
    keep_alive=True
)

HTTP/2¶

# Enable HTTP/2 for providers
config = RuntimeConfig(
    http2_enabled=True,
    http2_max_concurrent_streams=100
)

Profiling¶

Latency Breakdown¶

from stratarouter.runtime import enable_profiling

enable_profiling()

result = executor.execute(query)
print(result.profile)

Output:

Routing: 1.2ms (10%)
Cache Lookup: 0.3ms (2%)
LLM Call: 9.8ms (85%)
State Update: 0.3ms (3%)
Total: 11.6ms

Benchmarking¶

Load Testing¶

import asyncio
from stratarouter.testing import load_test

results = await load_test(
    executor=executor,
    queries=queries,
    duration_seconds=60,
    concurrent_users=100
)

print(f"P99: {results.p99_ms}ms")
print(f"RPS: {results.requests_per_second}")
print(f"Error rate: {results.error_rate}%")

Best Practices¶

Profile before optimizing — Measure to find bottlenecks Enable caching — Biggest performance win Batch when possible — Improves throughput Right-size HNSW — Balance speed vs accuracy Monitor metrics — Track impact of changes Test under load — Validate in staging

Common Issues¶

High Latency¶

Symptoms: P99 > 20ms

Solutions: - Lower hnsw_ef_search - Enable caching - Reduce provider timeout - Check network latency

Low Throughput¶

Symptoms: RPS < 10K

Solutions: - Increase batch size - Add more workers - Scale horizontally - Enable connection pooling

High Memory¶

Symptoms: Memory > 500MB

Solutions: - Reduce hnsw_m - Limit cache size - Reduce max routes - Use Redis for cache

Next Steps¶

Monitoring

Track performance metrics

Setup Monitoring →

Scaling

Handle more traffic

Scaling Guide →