Skip to content

Frequently Asked Questions

Answers to the most common questions about StrataRouter.


General

What is StrataRouter?

StrataRouter is a production-grade semantic routing engine that intelligently routes queries to the right handler — model, agent, API endpoint, or workflow — with sub-10ms latency and 95.4% accuracy. It is built in Rust with Python bindings and designed to run in production from day one.

How is StrataRouter different from other semantic routers?

Performance:

  • 20–28× faster than alternatives (8.7ms P99 vs 178–245ms)
  • 33–50× less memory (64MB vs 2.1–3.2GB)
  • 40–47× higher throughput (18K req/s vs 380–450 req/s)

Accuracy: 95.4% vs 84.7% (semantic-router) and 82.3% (LlamaIndex) on the same benchmark.

Production features: Semantic caching (85%+ hit rate), batch processing, circuit breakers, full OpenTelemetry — not bolt-ons, built in from the start.

What languages are supported?
  • Python 3.8+ — Primary API with the full feature set
  • Rust 1.70+ — Native implementation and crate
  • REST API — Language-agnostic HTTP interface
  • gRPC — High-performance binary protocol
Is StrataRouter open source?

Yes. The Core and Runtime are MIT licensed and fully open source. Enterprise features (multi-tenancy, RBAC, compliance packs) are available under a commercial license. See Pricing.


Installation & Setup

How do I install StrataRouter?
pip install stratarouter

That's it for the Python package. See Installation for source builds, extras, and Docker.

What are the system requirements?

Minimum: Python 3.8+, 64-bit OS (Linux, macOS, Windows), 512 MB RAM.

Recommended for production: 4+ CPU cores, 4+ GB RAM, Redis 6+ for caching, PostgreSQL 13+ for state persistence.

Can I run StrataRouter without external dependencies?

Yes. Basic routing works with zero external dependencies — just pip install stratarouter. Redis and PostgreSQL are optional and only needed for distributed caching and persistent state.


Routing & Accuracy

How accurate is StrataRouter?

95.4% on our public benchmark dataset. For comparison: semantic-router achieves 84.7% and LlamaIndex 82.3% on the same benchmark.

Accuracy depends heavily on the quality of your route examples. See Core Concepts for guidance on writing good routes.

How does routing work under the hood?

Every query passes through a three-stage pipeline:

  1. Candidate selection — HNSW approximate nearest-neighbor search finds the top-N candidate routes in sub-millisecond time.
  2. Hybrid scoring — Dense semantic score (64%) + BM25 sparse score (29%) + rule-based patterns (7%) are fused into a single score.
  3. Calibration — Isotonic regression maps the fused score to a true probability. Expected Calibration Error < 0.03.

See Routing Engine for the full deep dive.

Can I use my own embedding model?

Yes. Implement BaseEncoder with two methods — encode() and dimension — and pass it to Router(encoder=your_encoder). See Custom Encoders.

What if routing confidence is low?

Set a fallback for low-confidence results:

result = router.route(query, embedding)
route_id = result.route_id if result.confidence >= 0.5 else "fallback"

You can also configure fallback_route globally in RouterConfig.


Performance

How fast is StrataRouter?
Percentile Latency
P50 < 1ms
P95 < 6ms
P99 8.7ms
P99.9 15ms

Throughput: ~18K req/s single-node (4 cores). Scales linearly with cores.

What's the fastest way to optimize latency?
  1. Use a smaller embedding dimension (384 instead of 1536)
  2. Reduce max_candidates (try 3 instead of 10)
  3. Enable SIMD (enable_simd=True)
  4. Disable calibration if you don't need calibrated probabilities
Does semantic caching really help?

Dramatically. At an 85% cache hit rate:

Metric Without Cache With Cache
Avg latency 250ms 21ms
Throughput 200 req/s 1,500 req/s
Cost $0.002/req $0.0003/req

The 85% figure is typical for production workloads with repeated query patterns.


Integrations

Which frameworks does StrataRouter support?
  • LangChain — Full chain and agent integration
  • LangGraph — Conditional edge routing in graphs
  • CrewAI — Multi-agent crew orchestration
  • AutoGen — Group chat routing
  • OpenAI — Native embeddings and models
  • Anthropic — Claude embeddings and models
  • Google — Gemini embeddings and models

See Integrations for code examples.

Can I use StrataRouter with local models?

Yes. Register a LocalClient pointing at Ollama, vLLM, HuggingFace TGI, or any OpenAI-compatible endpoint:

from stratarouter_runtime.clients import LocalClient
bridge.register_client("local", LocalClient(endpoint="http://localhost:11434"))

Production & Deployment

Is StrataRouter production-ready?

Yes. It runs in production at Fortune 500 companies, healthcare providers (HIPAA), financial services, and SaaS companies handling 50K+ requests/day. The runtime includes semantic caching, batch processing, automatic retry with circuit breakers, and full OpenTelemetry observability.

What deployment options are supported?

Docker, Kubernetes, AWS / GCP / Azure VMs, and on-premises / air-gapped. See Deployment.

What monitoring is available?
  • Prometheus metrics — request rate, latency histograms, cache hit rate, error rate, cost
  • OpenTelemetry traces — distributed tracing across your entire request flow
  • Structured JSON logs — configurable log levels and output targets

See Observability and Monitoring.


Security & Compliance

What compliance certifications does StrataRouter hold?

Enterprise: SOC 2 Type II · HIPAA · GDPR · ISO 27001 · FedRAMP (in progress)

Can StrataRouter handle PHI / PII?

Yes, under HIPAA-compliant Enterprise configurations: AES-256 encryption at rest, TLS 1.3 in transit, audit logging, role-based access, data residency controls, and a Business Associate Agreement (BAA) available.


Costs

How much does StrataRouter cost?

Open Source: Free. Full Core + Runtime, MIT licensed.

Enterprise: Custom pricing based on request volume, number of tenants, and support tier. Contact enterprise@stratarouter.dev for a quote.

How much can I save on LLM costs?

Typical production savings:

  • Semantic caching: 70–80% reduction (85%+ hit rate)
  • Multi-model routing: 40–60% reduction (cheap model for simple queries)
  • Batch deduplication: 20–30% reduction

Combined: most customers see 60–85% total LLM cost reduction. See Pricing for a detailed example.

Example: $2,000/month → $400/month at an 80% cache hit rate.

Is there a free trial for Enterprise?

Yes — a 90-day POC with full Enterprise access, dedicated engineering support, and no credit card required. Apply here.


Troubleshooting

Routing accuracy is below 80%

Common causes:

  • Too few examples per route (add 10+ per route)
  • Routes are too semantically similar (make descriptions more distinct)
  • Poor-quality embeddings (try OpenAI text-embedding-3-small or a fine-tuned encoder)
  • Threshold too low (increase threshold in RouterConfig)

See Troubleshooting.

Performance is slower than expected

Check:

  1. stratarouter.simd_enabled — should be True
  2. max_candidates — reduce from 10 to 3 for lower latency
  3. Embedding dimension — 384 is much faster than 1536
  4. Cache enabled — dramatically reduces average latency

See Performance Tuning.

ImportError on install
pip install --force-reinstall stratarouter
python -c "import stratarouter; print(stratarouter.__version__)"

If building from source, ensure Rust 1.70+ and maturin are installed. See Installation.


Still Have Questions?