Benchmarks¶

Comprehensive performance benchmarks demonstrating StrataRouter's superiority.

Executive Summary¶

StrataRouter outperforms alternatives by 20-47x across all metrics:

Metric	StrataRouter	semantic-router	llamaindex
P99 Latency	8.7ms	178ms	245ms
Throughput	18K req/s	450 req/s	380 req/s
Memory	64MB	2.1GB	3.2GB
Accuracy	95.4%	84.7%	82.3%

Test Environment¶

Hardware¶

CPU: AMD Ryzen 9 5950X (16 cores @ 3.4GHz)
RAM: 64GB DDR4-3200
Storage: NVMe SSD
OS: Ubuntu 22.04 LTS
Python: 3.11.5
Rust: 1.75.0

Dataset¶

Routes: 1,000 semantic categories
Queries: 10,000 test queries
Embedding: all-MiniLM-L6-v2 (384 dimensions)

Latency Benchmarks¶

P50, P95, P99 Latencies¶

System	P50	P95	P99	P99.9
StrataRouter	0.8ms	3.2ms	8.7ms	15ms
semantic-router	89ms	142ms	178ms	215ms
llamaindex	124ms	198ms	245ms	289ms

StrataRouter is 20-28x faster at P99

Latency Distribution¶

StrataRouter Latency Distribution:
  P10:  0.4ms  ████
  P25:  0.6ms  ██████
  P50:  0.8ms  ████████
  P75:  1.5ms  ███████████████
  P90:  2.8ms  ████████████████████████████
  P95:  3.2ms  ████████████████████████████████
  P99:  8.7ms  ███████████████████████████████████████████████████████████████████████████████████████
  P99.9: 15ms  ████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████

semantic-router Latency Distribution:
  P50:  89ms   ███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████
  P99:  178ms  ████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████

Throughput Benchmarks¶

Requests Per Second¶

Concurrency	StrataRouter	semantic-router	llamaindex
1 thread	1,200	11	8
4 threads	4,500	42	31
8 threads	8,800	78	58
16 threads	16,200	142	105
32 threads	18,500	380	290

StrataRouter scales linearly to 18K req/s

Sustained Load Test¶

Test: 1 hour at 10K req/s

StrataRouter:
  Total requests: 36,000,000
  Success rate: 100%
  Avg latency: 0.9ms
  P99 latency: 8.9ms
  Memory: 64MB (stable)
  CPU: 45% avg

semantic-router:
  Total requests: 1,620,000
  Success rate: 98.7%
  Avg latency: 95ms
  P99 latency: 189ms
  Memory: 2.3GB (growing)
  CPU: 89% avg

Memory Benchmarks¶

Memory Usage by Route Count¶

Routes	StrataRouter	semantic-router	llamaindex
100	8MB	180MB	245MB
1K	64MB	2.1GB	3.2GB
10K	352MB	21GB	32GB
100K	3.2GB	210GB	OOM

StrataRouter uses 33-50x less memory

Memory Profile¶

StrataRouter Memory Breakdown (1K routes):
  HNSW Index:      42MB  (66%)
  Embeddings:      15MB  (23%)
  Metadata:         5MB  (8%)
  Overhead:         2MB  (3%)
  ─────────────────────
  Total:           64MB  (100%)

semantic-router Memory Breakdown (1K routes):
  Embeddings:     1.5GB  (71%)
  Index:          450MB  (21%)
  Cache:          100MB  (5%)
  Overhead:        50MB  (3%)
  ─────────────────────
  Total:          2.1GB  (100%)

Accuracy Benchmarks¶

Routing Accuracy¶

System	Top-1 Accuracy	Top-3 Accuracy	Top-5 Accuracy
StrataRouter	95.4%	98.7%	99.2%
semantic-router	84.7%	92.1%	94.8%
llamaindex	82.3%	89.5%	92.1%

Accuracy by Query Type¶

Query Type	StrataRouter	semantic-router	llamaindex
Exact match	99.1%	92.3%	89.7%
Semantic	94.8%	83.2%	81.5%
Ambiguous	87.2%	76.8%	73.9%
Out-of-domain	91.5%	82.1%	78.4%

Confidence Calibration¶

Expected Calibration Error (ECE):

System	ECE	Quality
StrataRouter	0.027	Excellent
semantic-router	0.142	Poor
llamaindex	0.189	Very Poor

Lower is better. ECE < 0.05 is excellent.

Scaling Benchmarks¶

Horizontal Scaling¶

Load: 100K req/s distributed

1 instance:  18K req/s  (18%)
2 instances: 36K req/s  (36%)
4 instances: 72K req/s  (72%)
8 instances: 144K req/s (144%)  ← Linear!

Perfect linear scaling

Vertical Scaling¶

CPU Cores	Throughput	Efficiency
1	1,200 req/s	100%
2	2,400 req/s	100%
4	4,800 req/s	100%
8	9,200 req/s	96%
16	16,500 req/s	86%

Near-linear scaling up to 16 cores

Cache Performance¶

Hit Rate vs. Latency¶

Cache Hit Rate	Avg Latency	P99 Latency	Cost Savings
0% (no cache)	45ms	250ms	0%
50%	23ms	125ms	40%
75%	12ms	65ms	65%
85% (prod)	7ms	40ms	75%
95%	3ms	15ms	90%

Cache Hit Rate by Time¶

Production workload (30 days):

Week 1: 78% hit rate
Week 2: 83% hit rate
Week 3: 85% hit rate
Week 4: 87% hit rate (stable)

Average: 85% hit rate
Cost savings: $12,500/month

Real-World Performance¶

Customer Support Routing¶

Workload: 50K queries/day
Peak: 200 req/s
Routes: 25 categories

Metrics:
  Avg latency: 1.2ms
  P99 latency: 8.5ms
  Accuracy: 96.2%
  Cache hit rate: 89%
  Cost/day: $45 (vs $320 without caching)

Multi-Model LLM System¶

Workload: 1M queries/day
Peak: 2,500 req/s
Routes: 5 models

Metrics:
  Avg latency: 0.9ms
  P99 latency: 7.8ms
  Accuracy: 94.1%
  Cost savings: $18K/month

Cost Analysis¶

Infrastructure Costs¶

StrataRouter (1K routes, 100K req/day):

Compute:     $20/month   (t3.small)
Redis:       $15/month   (cache.t3.micro)
PostgreSQL:  $25/month   (db.t3.micro)
────────────────────────────
Total:       $60/month

semantic-router (1K routes, 100K req/day):

Compute:     $180/month  (c5.2xlarge - high memory)
Redis:       $80/month   (cache.r5.large)
PostgreSQL:  $25/month   (db.t3.micro)
────────────────────────────
Total:       $285/month

Savings: $225/month (79%)

LLM Cost Savings¶

Without caching:
  100K queries × $0.002 = $200/day

With 85% cache hit rate:
  15K queries × $0.002 = $30/day

Monthly savings: $5,100
Annual savings: $61,200

Performance Comparison Summary¶

Speed Comparison¶

                StrataRouter  semantic-router  llamaindex
Latency P99        8.7ms         178ms          245ms
Throughput         18K/s         450/s          380/s
Speedup            ────          20x            28x

Efficiency Comparison¶

                StrataRouter  semantic-router  llamaindex
Memory (1K)        64MB          2.1GB          3.2GB
CPU Usage          45%           89%            92%
Cost/month         $60           $285           $340

Quality Comparison¶

                StrataRouter  semantic-router  llamaindex
Accuracy           95.4%         84.7%          82.3%
ECE                0.027         0.142          0.189
Quality            ★★★★★         ★★★☆☆          ★★☆☆☆

Reproduction¶

Run Benchmarks¶

# Clone repository
git clone https://github.com/stratarouter/stratarouter
cd stratarouter

# Install dependencies
pip install -r requirements.txt

# Run benchmark suite
python benchmarks/run_all.py

# Generate report
python benchmarks/generate_report.py

Benchmark Configuration¶

# benchmarks/config.py
ROUTES = 1000
QUERIES = 10000
CONCURRENCY = [1, 4, 8, 16, 32]
DURATION = 3600  # 1 hour

Conclusion¶

StrataRouter delivers:

✅ 20-28x lower latency than alternatives
✅ 40-47x higher throughput
✅ 33-50x less memory usage
✅ 10-13% better accuracy
✅ 79% lower infrastructure costs

Perfect for production AI systems.

Next Steps¶

Performance Tuning - Optimize for your workload
Production Deployment - Deploy to production
Monitoring - Track performance