Frequently Asked Questions¶
Answers to the most common questions about StrataRouter.
General¶
What is StrataRouter?
StrataRouter is a production-grade semantic routing engine that intelligently routes queries to the right handler — model, agent, API endpoint, or workflow — with sub-10ms latency and 95.4% accuracy. It is built in Rust with Python bindings and designed to run in production from day one.
How is StrataRouter different from other semantic routers?
Performance:
- 20–28× faster than alternatives (8.7ms P99 vs 178–245ms)
- 33–50× less memory (64MB vs 2.1–3.2GB)
- 40–47× higher throughput (18K req/s vs 380–450 req/s)
Accuracy: 95.4% vs 84.7% (semantic-router) and 82.3% (LlamaIndex) on the same benchmark.
Production features: Semantic caching (85%+ hit rate), batch processing, circuit breakers, full OpenTelemetry — not bolt-ons, built in from the start.
What languages are supported?
- Python 3.8+ — Primary API with the full feature set
- Rust 1.70+ — Native implementation and crate
- REST API — Language-agnostic HTTP interface
- gRPC — High-performance binary protocol
Is StrataRouter open source?
Yes. The Core and Runtime are MIT licensed and fully open source. Enterprise features (multi-tenancy, RBAC, compliance packs) are available under a commercial license. See Pricing.
Installation & Setup¶
How do I install StrataRouter?
That's it for the Python package. See Installation for source builds, extras, and Docker.
What are the system requirements?
Minimum: Python 3.8+, 64-bit OS (Linux, macOS, Windows), 512 MB RAM.
Recommended for production: 4+ CPU cores, 4+ GB RAM, Redis 6+ for caching, PostgreSQL 13+ for state persistence.
Can I run StrataRouter without external dependencies?
Yes. Basic routing works with zero external dependencies — just pip install stratarouter. Redis and PostgreSQL are optional and only needed for distributed caching and persistent state.
Routing & Accuracy¶
How accurate is StrataRouter?
95.4% on our public benchmark dataset. For comparison: semantic-router achieves 84.7% and LlamaIndex 82.3% on the same benchmark.
Accuracy depends heavily on the quality of your route examples. See Core Concepts for guidance on writing good routes.
How does routing work under the hood?
Every query passes through a three-stage pipeline:
- Candidate selection — HNSW approximate nearest-neighbor search finds the top-N candidate routes in sub-millisecond time.
- Hybrid scoring — Dense semantic score (64%) + BM25 sparse score (29%) + rule-based patterns (7%) are fused into a single score.
- Calibration — Isotonic regression maps the fused score to a true probability. Expected Calibration Error < 0.03.
See Routing Engine for the full deep dive.
Can I use my own embedding model?
Yes. Implement BaseEncoder with two methods — encode() and dimension — and pass it to Router(encoder=your_encoder). See Custom Encoders.
What if routing confidence is low?
Set a fallback for low-confidence results:
result = router.route(query, embedding)
route_id = result.route_id if result.confidence >= 0.5 else "fallback"
You can also configure fallback_route globally in RouterConfig.
Performance¶
How fast is StrataRouter?
| Percentile | Latency |
|---|---|
| P50 | < 1ms |
| P95 | < 6ms |
| P99 | 8.7ms |
| P99.9 | 15ms |
Throughput: ~18K req/s single-node (4 cores). Scales linearly with cores.
What's the fastest way to optimize latency?
- Use a smaller embedding dimension (384 instead of 1536)
- Reduce
max_candidates(try 3 instead of 10) - Enable SIMD (
enable_simd=True) - Disable calibration if you don't need calibrated probabilities
Does semantic caching really help?
Dramatically. At an 85% cache hit rate:
| Metric | Without Cache | With Cache |
|---|---|---|
| Avg latency | 250ms | 21ms |
| Throughput | 200 req/s | 1,500 req/s |
| Cost | $0.002/req | $0.0003/req |
The 85% figure is typical for production workloads with repeated query patterns.
Integrations¶
Which frameworks does StrataRouter support?
- LangChain — Full chain and agent integration
- LangGraph — Conditional edge routing in graphs
- CrewAI — Multi-agent crew orchestration
- AutoGen — Group chat routing
- OpenAI — Native embeddings and models
- Anthropic — Claude embeddings and models
- Google — Gemini embeddings and models
See Integrations for code examples.
Can I use StrataRouter with local models?
Yes. Register a LocalClient pointing at Ollama, vLLM, HuggingFace TGI, or any OpenAI-compatible endpoint:
Production & Deployment¶
Is StrataRouter production-ready?
Yes. It runs in production at Fortune 500 companies, healthcare providers (HIPAA), financial services, and SaaS companies handling 50K+ requests/day. The runtime includes semantic caching, batch processing, automatic retry with circuit breakers, and full OpenTelemetry observability.
What deployment options are supported?
Docker, Kubernetes, AWS / GCP / Azure VMs, and on-premises / air-gapped. See Deployment.
What monitoring is available?
- Prometheus metrics — request rate, latency histograms, cache hit rate, error rate, cost
- OpenTelemetry traces — distributed tracing across your entire request flow
- Structured JSON logs — configurable log levels and output targets
See Observability and Monitoring.
Security & Compliance¶
What compliance certifications does StrataRouter hold?
Enterprise: SOC 2 Type II · HIPAA · GDPR · ISO 27001 · FedRAMP (in progress)
Can StrataRouter handle PHI / PII?
Yes, under HIPAA-compliant Enterprise configurations: AES-256 encryption at rest, TLS 1.3 in transit, audit logging, role-based access, data residency controls, and a Business Associate Agreement (BAA) available.
Costs¶
How much does StrataRouter cost?
Open Source: Free. Full Core + Runtime, MIT licensed.
Enterprise: Custom pricing based on request volume, number of tenants, and support tier. Contact enterprise@stratarouter.dev for a quote.
How much can I save on LLM costs?
Typical production savings:
- Semantic caching: 70–80% reduction (85%+ hit rate)
- Multi-model routing: 40–60% reduction (cheap model for simple queries)
- Batch deduplication: 20–30% reduction
Combined: most customers see 60–85% total LLM cost reduction. See Pricing for a detailed example.
Example: $2,000/month → $400/month at an 80% cache hit rate.
Is there a free trial for Enterprise?
Yes — a 90-day POC with full Enterprise access, dedicated engineering support, and no credit card required. Apply here.
Troubleshooting¶
Routing accuracy is below 80%
Common causes:
- Too few examples per route (add 10+ per route)
- Routes are too semantically similar (make descriptions more distinct)
- Poor-quality embeddings (try OpenAI
text-embedding-3-smallor a fine-tuned encoder) - Threshold too low (increase
thresholdinRouterConfig)
See Troubleshooting.
Performance is slower than expected
Check:
stratarouter.simd_enabled— should beTruemax_candidates— reduce from 10 to 3 for lower latency- Embedding dimension — 384 is much faster than 1536
- Cache enabled — dramatically reduces average latency
See Performance Tuning.
ImportError on install
pip install --force-reinstall stratarouter
python -c "import stratarouter; print(stratarouter.__version__)"
If building from source, ensure Rust 1.70+ and maturin are installed. See Installation.