Core Concepts¶
Understand StrataRouter's fundamental building blocks and how they work together.
Semantic Routing vs Traditional Routing¶
Traditional Routing (Rule-Based)¶
Traditional routing uses explicit rules and pattern matching:
# Traditional approach — brittle and hard to maintain
if "invoice" in query or "payment" in query:
return "billing"
elif "bug" in query or "error" in query:
return "technical"
else:
return "general"
Problems with rule-based routing:
- Brittle — breaks with synonyms ("receipt" vs "invoice")
- Hard to maintain — need rules for every variation
- No confidence scores — binary yes/no decisions
- Doesn't understand context or intent
Semantic Routing (StrataRouter)¶
Semantic routing uses meaning and context:
# Semantic approach — understands intent, not just keywords
embedding = get_embedding(query) # Convert to vector
result = router.route(query, embedding) # Find semantically similar route
# Works with synonyms, context, and intent
# Returns confidence score and latency metrics
Advantages of semantic routing:
- Understands meaning — "receipt" is close to "invoice"
- Context-aware — considers full query intent
- Confidence scores — 0–100% certainty per decision
- Self-improving — learns from examples
Core Concepts¶
1. Routes¶
A route represents a destination for queries — a handler, agent, model, or workflow.
from stratarouter import Route
billing_route = Route(
id="billing", # Unique identifier
description="Billing questions", # What this route handles
keywords=["invoice", "payment"], # Semantic hints
examples=[ # Representative queries
"Where is my invoice?",
"Update payment method"
]
)
Route components:
| Field | Purpose |
|---|---|
id |
Unique identifier used in code |
description |
Human-readable explanation of what this route handles |
keywords |
Important terms that boost keyword-matching score |
examples |
Representative queries that train the embedding |
Well-designed vs poorly-designed routes:
# Good — specific, clear, 5-10 examples
Route(
id="billing_refunds",
description="Handle refund requests and payment disputes",
keywords=["refund", "dispute", "chargeback"],
examples=[
"I want a refund",
"Dispute this charge",
"Request chargeback",
"Get my money back",
"Cancel and refund"
]
)
# Bad — vague, no examples, useless keywords
Route(
id="stuff",
description="Various things",
keywords=["help"]
)
2. Embeddings¶
Embeddings are vector representations of text that capture semantic meaning.
# Text to vector conversion
text = "Where is my invoice?"
embedding = get_embedding(text)
# Returns [0.1, -0.3, 0.5, ...] — 384 floating-point numbers
# Similar meanings produce similar vectors
# "invoice" and "receipt" → close vectors
# "invoice" and "cat" → distant vectors
Choosing an embedding dimension:
| Dimension | Model | Speed | Quality |
|---|---|---|---|
| 384d | all-MiniLM-L6-v2 |
Fastest | Good |
| 768d | all-mpnet-base-v2 |
Fast | Better |
| 1536d | OpenAI text-embedding-3-large |
Medium | Best |
Common embedding providers:
# OpenAI (recommended for production)
import openai
embedding = openai.embeddings.create(
model="text-embedding-3-small", # 1536 dims
input="your text"
).data[0].embedding
# Sentence Transformers (free, local)
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('all-MiniLM-L6-v2') # 384 dims
embedding = model.encode("your text")
# Cohere
import cohere
co = cohere.Client(api_key)
embedding = co.embed(texts=["your text"]).embeddings[0]
3. Similarity Threshold¶
The threshold is the minimum confidence required to route a query to a destination.
router = Router(dimension=384, threshold=0.5) # 50% minimum
result = router.route(query, embedding)
if result.confidence >= 0.5:
handle(result.route_id) # Confident — proceed
else:
escalate_to_human() # Not confident — fallback
Threshold guidance:
| Range | Behavior | Recommended Use Case |
|---|---|---|
| 0.3–0.5 | Permissive | Exploratory apps, high recall needed |
| 0.5–0.7 | Balanced | Production default |
| 0.7–0.9 | Conservative | High-precision requirements |
| 0.9+ | Strict | Critical or high-stakes decisions only |
4. Router¶
The Router orchestrates the routing pipeline: manages routes, builds the index, and scores queries.
from stratarouter import Router
router = Router(
dimension=384, # Embedding dimension
threshold=0.5, # Minimum confidence
max_candidates=10 # HNSW search depth
)
router.add_route(billing_route)
router.add_route(support_route)
# Build index once — then route many times
router.build_index(embeddings)
result = router.route(query, embedding)
Router lifecycle:
graph LR
A[Create Router] --> B[Add Routes]
B --> C[Build Index]
C --> D[Route Queries]
D --> D
style C fill:#FFC107
style D fill:#00C853
5. HNSW Index¶
The HNSW (Hierarchical Navigable Small World) index enables O(log N) approximate nearest-neighbor search.
# Build index once — O(N log N) time
router.build_index(embeddings)
# Search is fast — O(log N) per query
result = router.route(query, embedding)
HNSW performance at scale:
| Routes | Build Time | Query Latency | Recall |
|---|---|---|---|
| 10 | 5ms | <1ms | 99.9% |
| 100 | 50ms | 1–2ms | 99.5% |
| 1,000 | 500ms | 2–3ms | 99.0% |
| 10,000 | 5s | 3–5ms | 98.5% |
See HNSW Index Deep Dive for full details.
6. Hybrid Scoring¶
StrataRouter combines three signals for robust, accurate routing:
final_score = (
0.64 * semantic_similarity + # Dense embedding cosine similarity
0.29 * keyword_match + # BM25 sparse keyword score
0.07 * rule_score # Pattern/regex matching
)
Why hybrid scoring outperforms embeddings alone:
| Signal | Weight | Strength |
|---|---|---|
| Semantic similarity | 64% | Paraphrase detection, general intent |
| Keyword matching (BM25) | 29% | Exact terms, domain-specific jargon |
| Rule-based patterns | 7% | Known phrases, deterministic cases |
Example calculation:
query = "invoice PDF download"
# Signal scores
semantic = 0.74 # Understands "invoice" intent
keyword = 0.95 # Exact match on "invoice"
rule = 0.00 # No regex rule matches
# Hybrid score
score = 0.64 * 0.74 + 0.29 * 0.95 + 0.07 * 0.00
# = 0.4736 + 0.2755 + 0.0 = 0.749
See Scoring Algorithms for the full implementation.
7. Confidence Calibration¶
Raw similarity scores are calibrated to true probabilities using isotonic regression.
raw_score = 0.85 # Uncalibrated similarity score
calibrated = 0.92 # Calibrated probability
# Calibrated score approximates the true probability of correct routing
Why calibration matters:
- Raw scores are not probabilities — they cannot be compared across queries
- Calibration maps scores to the range [0, 1] with semantic meaning
- Enables reliable confidence-based thresholding and fallback decisions
Calibration options:
router = Router(calibration_method="isotonic") # Default — most accurate
router = Router(calibration_method="platt") # Faster, slightly less accurate
router = Router(enable_calibration=False) # No calibration — raw scores
8. Route Result¶
The RouteResult contains all routing information needed to act on a decision:
result = router.route(query, embedding)
result.route_id # "billing"
result.confidence # 0.94 (94% confident)
result.latency_ms # 1.23 (milliseconds)
result.alternatives # [("support", 0.31), ("sales", 0.18)]
result.metadata # Additional routing info
Acting on results:
if result.confidence > 0.8:
# High confidence — route automatically
handle(result.route_id)
elif result.confidence > 0.5:
# Medium confidence — confirm with user
confirm_with_user(result.route_id)
else:
# Low confidence — escalate
escalate_to_human()
Routing Flow¶
graph TB
A[User Query] --> B[Get Embedding]
B --> C{Router}
C --> D[HNSW Search]
D --> E[Find Candidates]
E --> F[Hybrid Scoring]
F --> G[Calibrate]
G --> H{Confidence Above Threshold?}
H -->|Yes| I[Return Route]
H -->|No| J[Fallback]
I --> K[Handle Request]
J --> L[Escalate or Default]
style C fill:#4A9EFF
style F fill:#00C853
style G fill:#FFC107
Best Practices¶
Route Design¶
Do:
- Create specific, focused routes with clear boundaries
- Provide 5–10 diverse examples per route
- Use descriptive names that reflect the handler's purpose
- Include relevant domain-specific keywords
Avoid:
- Overlapping routes with similar descriptions
- Vague descriptions like "various things"
- Skimping on examples — more is better
- Forgetting to test edge cases and ambiguous queries
Threshold Tuning¶
Do:
- Start at 0.5–0.7 and tune based on real data
- Monitor confidence distribution in production
- A/B test different thresholds before committing
Avoid:
- Setting too high — valid queries miss their route
- Setting too low — low-confidence queries get incorrectly routed
- Using one threshold for all routes regardless of criticality
Performance¶
Do:
- Call
router.build_index()once during initialization - Cache embeddings — avoid re-generating the same text repeatedly
- Batch requests where possible for throughput gains
- Monitor latency with
result.latency_ms
Avoid:
- Rebuilding the index per request (expensive)
- Generating embeddings inside the routing hot path
- Making synchronous blocking calls to embedding APIs
Common Patterns¶
Pattern 1: Fallback Chain¶
def route_with_fallback(query: str, embedding: list) -> str:
result = router.route(query, embedding)
if result.confidence > 0.8:
return handle_route(result.route_id)
elif result.confidence > 0.5:
return ask_clarification(result.route_id)
else:
return default_handler()
Pattern 2: Two-Stage Routing¶
# Stage 1 — broad category
category = category_router.route(query, embedding)
# Stage 2 — specific sub-route within category
if category.route_id == "support":
specific = support_router.route(query, embedding)
return specific.route_id
Pattern 3: Alternatives Exploration¶
result = router.route(query, embedding, k=3)
print(f"Primary: {result.route_id} ({result.confidence:.1%})")
for alt_id, alt_conf in result.alternatives:
print(f" Alt: {alt_id} ({alt_conf:.1%})")
Quick Reference¶
| Term | Definition |
|---|---|
| Route | Destination for queries (e.g., "billing", "support") |
| Embedding | Vector representation of text (e.g., [0.1, -0.3, ...]) |
| HNSW | Hierarchical graph for fast nearest-neighbor search |
| Threshold | Minimum confidence to accept a route (e.g., 0.5 = 50%) |
| Calibration | Mapping raw similarity scores to true probabilities |
| Hybrid Scoring | Combining semantic + keyword + rule signals |