Frequently Asked Questions¶

Answers to the most common questions about StrataRouter.

General¶

What is StrataRouter?

StrataRouter is a production-grade semantic routing engine that intelligently routes queries to the right handler — model, agent, API endpoint, or workflow — with sub-10ms latency and 95.4% accuracy. It is built in Rust with Python bindings and designed to run in production from day one.

How is StrataRouter different from other semantic routers?

Performance:

20–28× faster than alternatives (8.7ms P99 vs 178–245ms)
33–50× less memory (64MB vs 2.1–3.2GB)
40–47× higher throughput (18K req/s vs 380–450 req/s)

Accuracy: 95.4% vs 84.7% (semantic-router) and 82.3% (LlamaIndex) on the same benchmark.

Production features: Semantic caching (85%+ hit rate), batch processing, circuit breakers, full OpenTelemetry — not bolt-ons, built in from the start.

What languages are supported?

Python 3.8+ — Primary API with the full feature set
Rust 1.70+ — Native implementation and crate
REST API — Language-agnostic HTTP interface
gRPC — High-performance binary protocol

Is StrataRouter open source?

Yes. The Core and Runtime are MIT licensed and fully open source. Enterprise features (multi-tenancy, RBAC, compliance packs) are available under a commercial license. See Pricing.

Installation & Setup¶

How do I install StrataRouter?

pip install stratarouter

That's it for the Python package. See Installation for source builds, extras, and Docker.

What are the system requirements?

Minimum: Python 3.8+, 64-bit OS (Linux, macOS, Windows), 512 MB RAM.

Recommended for production: 4+ CPU cores, 4+ GB RAM, Redis 6+ for caching, PostgreSQL 13+ for state persistence.

Can I run StrataRouter without external dependencies?

Yes. Basic routing works with zero external dependencies — just pip install stratarouter. Redis and PostgreSQL are optional and only needed for distributed caching and persistent state.

Routing & Accuracy¶

How accurate is StrataRouter?

95.4% on our public benchmark dataset. For comparison: semantic-router achieves 84.7% and LlamaIndex 82.3% on the same benchmark.

Accuracy depends heavily on the quality of your route examples. See Core Concepts for guidance on writing good routes.

How does routing work under the hood?

Every query passes through a three-stage pipeline:

Candidate selection — HNSW approximate nearest-neighbor search finds the top-N candidate routes in sub-millisecond time.
Hybrid scoring — Dense semantic score (64%) + BM25 sparse score (29%) + rule-based patterns (7%) are fused into a single score.
Calibration — Isotonic regression maps the fused score to a true probability. Expected Calibration Error < 0.03.

See Routing Engine for the full deep dive.

Can I use my own embedding model?

Yes. Implement BaseEncoder with two methods — encode() and dimension — and pass it to Router(encoder=your_encoder). See Custom Encoders.

What if routing confidence is low?

Set a fallback for low-confidence results:

result = router.route(query, embedding)
route_id = result.route_id if result.confidence >= 0.5 else "fallback"

You can also configure fallback_route globally in RouterConfig.

Performance¶

How fast is StrataRouter?

Percentile	Latency
P50	< 1ms
P95	< 6ms
P99	8.7ms
P99.9	15ms

Throughput: ~18K req/s single-node (4 cores). Scales linearly with cores.

What's the fastest way to optimize latency?

Use a smaller embedding dimension (384 instead of 1536)
Reduce max_candidates (try 3 instead of 10)
Enable SIMD (enable_simd=True)
Disable calibration if you don't need calibrated probabilities

Does semantic caching really help?

Dramatically. At an 85% cache hit rate:

Metric	Without Cache	With Cache
Avg latency	250ms	21ms
Throughput	200 req/s	1,500 req/s
Cost	$0.002/req	$0.0003/req

The 85% figure is typical for production workloads with repeated query patterns.

Integrations¶

Which frameworks does StrataRouter support?

LangChain — Full chain and agent integration
LangGraph — Conditional edge routing in graphs
CrewAI — Multi-agent crew orchestration
AutoGen — Group chat routing
OpenAI — Native embeddings and models
Anthropic — Claude embeddings and models
Google — Gemini embeddings and models

See Integrations for code examples.

Can I use StrataRouter with local models?

Yes. Register a LocalClient pointing at Ollama, vLLM, HuggingFace TGI, or any OpenAI-compatible endpoint:

from stratarouter_runtime.clients import LocalClient
bridge.register_client("local", LocalClient(endpoint="http://localhost:11434"))

Production & Deployment¶

Is StrataRouter production-ready?

Yes. It runs in production at Fortune 500 companies, healthcare providers (HIPAA), financial services, and SaaS companies handling 50K+ requests/day. The runtime includes semantic caching, batch processing, automatic retry with circuit breakers, and full OpenTelemetry observability.

What deployment options are supported?

Docker, Kubernetes, AWS / GCP / Azure VMs, and on-premises / air-gapped. See Deployment.

What monitoring is available?

Prometheus metrics — request rate, latency histograms, cache hit rate, error rate, cost
OpenTelemetry traces — distributed tracing across your entire request flow
Structured JSON logs — configurable log levels and output targets

See Observability and Monitoring.

Security & Compliance¶

What compliance certifications does StrataRouter hold?

Enterprise: SOC 2 Type II · HIPAA · GDPR · ISO 27001 · FedRAMP (in progress)

Can StrataRouter handle PHI / PII?

Yes, under HIPAA-compliant Enterprise configurations: AES-256 encryption at rest, TLS 1.3 in transit, audit logging, role-based access, data residency controls, and a Business Associate Agreement (BAA) available.

Costs¶

How much does StrataRouter cost?

Open Source: Free. Full Core + Runtime, MIT licensed.

Enterprise: Custom pricing based on request volume, number of tenants, and support tier. Contact enterprise@stratarouter.dev for a quote.

How much can I save on LLM costs?

Typical production savings:

Semantic caching: 70–80% reduction (85%+ hit rate)
Multi-model routing: 40–60% reduction (cheap model for simple queries)
Batch deduplication: 20–30% reduction

Combined: most customers see 60–85% total LLM cost reduction. See Pricing for a detailed example.

Example: $2,000/month → $400/month at an 80% cache hit rate.

Is there a free trial for Enterprise?

Yes — a 90-day POC with full Enterprise access, dedicated engineering support, and no credit card required. Apply here.

Troubleshooting¶

Routing accuracy is below 80%

Common causes:

Too few examples per route (add 10+ per route)
Routes are too semantically similar (make descriptions more distinct)
Poor-quality embeddings (try OpenAI text-embedding-3-small or a fine-tuned encoder)
Threshold too low (increase threshold in RouterConfig)

See Troubleshooting.

Performance is slower than expected

Check:

stratarouter.simd_enabled — should be True
max_candidates — reduce from 10 to 3 for lower latency
Embedding dimension — 384 is much faster than 1536
Cache enabled — dramatically reduces average latency

See Performance Tuning.

ImportError on install

pip install --force-reinstall stratarouter
python -c "import stratarouter; print(stratarouter.__version__)"

If building from source, ensure Rust 1.70+ and maturin are installed. See Installation.

Still Have Questions?¶

Discord Community

Ask in #support — community members and the core team respond quickly.

→

GitHub Issues

File a bug or feature request with full context.

→