Skip to content

Performance Tuning

Optimization Guide for Production

Maximize StrataRouter performance for your specific workload.


Quick Wins

Enable Semantic Caching

Impact: 70-80% cost reduction, 5-10x faster responses

config = RuntimeConfig(cache_enabled=True)
executor = RuntimeExecutor(router, config=config)

Enable Batch Processing

Impact: 3-5x throughput improvement

config = RuntimeConfig(
    batch_size=32,
    batch_timeout_ms=50
)

Optimize HNSW Parameters

Impact: 20-40% latency reduction

config = RouterConfig(
    hnsw_ef_search=30,  # Lower for speed
    hnsw_m=8  # Lower for memory
)

HNSW Index Tuning

Parameter Guide

Parameter Impact Recommendation
ef_construction Build time, accuracy 100-400
ef_search Search time, accuracy 30-100
M Memory, accuracy 8-32

Workload-Specific Settings

Speed-Optimized:

config = RouterConfig(
    hnsw_ef_construction=100,
    hnsw_ef_search=30,
    hnsw_m=8
)
# Latency: ~0.6ms, Accuracy: ~93%

Balanced:

config = RouterConfig(
    hnsw_ef_construction=200,
    hnsw_ef_search=50,
    hnsw_m=16
)
# Latency: ~1.2ms, Accuracy: ~95.4%

Accuracy-Optimized:

config = RouterConfig(
    hnsw_ef_construction=400,
    hnsw_ef_search=100,
    hnsw_m=32
)
# Latency: ~2.5ms, Accuracy: ~97.2%


Cache Optimization

Cache Configuration

config = RuntimeConfig(
    cache_enabled=True,
    cache_backend="redis",  # or "memory"
    cache_ttl=3600,  # 1 hour
    cache_max_size=10000,
    cache_similarity_threshold=0.95
)

Similarity Threshold

Threshold Hit Rate Trade-off
0.99 ~60% Very strict
0.95 ~85% Recommended
0.90 ~92% Looser matching

Cache Backend Comparison

Backend Latency Scalability Persistence
Memory ~0.1ms Single node No
Redis ~0.5ms Multi-node Yes
Memcached ~0.3ms Multi-node No

Batch Processing

Optimal Batch Size

# Test different batch sizes
batch_sizes = [8, 16, 32, 64]
for size in batch_sizes:
    config = RuntimeConfig(batch_size=size)
    # Measure throughput

Recommendations: - Low latency: batch_size=8, timeout=10ms - Balanced: batch_size=32, timeout=50ms
- High throughput: batch_size=64, timeout=100ms


Resource Limits

CPU Optimization

# Set thread pool size
config = RuntimeConfig(
    worker_threads=8,  # Match CPU cores
    blocking_threads=16
)

Memory Management

config = RouterConfig(
    max_routes=10000,
    max_index_memory_mb=512
)

Provider Optimization

Timeout Tuning

config = RuntimeConfig(
    provider_timeout_ms=5000,  # Lower for faster failover
    routing_timeout_ms=10,
    total_timeout_ms=30000
)

Retry Strategy

config = RuntimeConfig(
    retry_max_attempts=3,
    retry_initial_delay_ms=100,
    retry_max_delay_ms=5000,
    retry_backoff_multiplier=2.0
)

Network Optimization

Connection Pooling

config = RuntimeConfig(
    connection_pool_size=100,
    connection_timeout_ms=5000,
    keep_alive=True
)

HTTP/2

# Enable HTTP/2 for providers
config = RuntimeConfig(
    http2_enabled=True,
    http2_max_concurrent_streams=100
)

Profiling

Latency Breakdown

from stratarouter.runtime import enable_profiling

enable_profiling()

result = executor.execute(query)
print(result.profile)

Output:

Routing: 1.2ms (10%)
Cache Lookup: 0.3ms (2%)
LLM Call: 9.8ms (85%)
State Update: 0.3ms (3%)
Total: 11.6ms


Benchmarking

Load Testing

import asyncio
from stratarouter.testing import load_test

results = await load_test(
    executor=executor,
    queries=queries,
    duration_seconds=60,
    concurrent_users=100
)

print(f"P99: {results.p99_ms}ms")
print(f"RPS: {results.requests_per_second}")
print(f"Error rate: {results.error_rate}%")

Best Practices

Profile before optimizing — Measure to find bottlenecks Enable caching — Biggest performance win Batch when possible — Improves throughput Right-size HNSW — Balance speed vs accuracy Monitor metrics — Track impact of changes Test under load — Validate in staging


Common Issues

High Latency

Symptoms: P99 > 20ms

Solutions: - Lower hnsw_ef_search - Enable caching - Reduce provider timeout - Check network latency

Low Throughput

Symptoms: RPS < 10K

Solutions: - Increase batch size - Add more workers - Scale horizontally - Enable connection pooling

High Memory

Symptoms: Memory > 500MB

Solutions: - Reduce hnsw_m - Limit cache size - Reduce max routes - Use Redis for cache


Next Steps

Monitoring

Track performance metrics

Setup Monitoring →

Scaling

Handle more traffic

Scaling Guide →