Core Concepts¶

Understand StrataRouter's fundamental building blocks and how they work together.

Semantic Routing vs Traditional Routing¶

Traditional Routing (Rule-Based)¶

Traditional routing uses explicit rules and pattern matching:

# Traditional approach — brittle and hard to maintain
if "invoice" in query or "payment" in query:
    return "billing"
elif "bug" in query or "error" in query:
    return "technical"
else:
    return "general"

Problems with rule-based routing:

Brittle — breaks with synonyms ("receipt" vs "invoice")
Hard to maintain — need rules for every variation
No confidence scores — binary yes/no decisions
Doesn't understand context or intent

Semantic Routing (StrataRouter)¶

Semantic routing uses meaning and context:

# Semantic approach — understands intent, not just keywords
embedding = get_embedding(query)         # Convert to vector
result = router.route(query, embedding)  # Find semantically similar route

# Works with synonyms, context, and intent
# Returns confidence score and latency metrics

Advantages of semantic routing:

Understands meaning — "receipt" is close to "invoice"
Context-aware — considers full query intent
Confidence scores — 0–100% certainty per decision
Self-improving — learns from examples

Core Concepts¶

1. Routes¶

A route represents a destination for queries — a handler, agent, model, or workflow.

from stratarouter import Route

billing_route = Route(
    id="billing",                       # Unique identifier
    description="Billing questions",    # What this route handles
    keywords=["invoice", "payment"],    # Semantic hints
    examples=[                          # Representative queries
        "Where is my invoice?",
        "Update payment method"
    ]
)

Route components:

Field	Purpose
`id`	Unique identifier used in code
`description`	Human-readable explanation of what this route handles
`keywords`	Important terms that boost keyword-matching score
`examples`	Representative queries that train the embedding

Well-designed vs poorly-designed routes:

# Good — specific, clear, 5-10 examples
Route(
    id="billing_refunds",
    description="Handle refund requests and payment disputes",
    keywords=["refund", "dispute", "chargeback"],
    examples=[
        "I want a refund",
        "Dispute this charge",
        "Request chargeback",
        "Get my money back",
        "Cancel and refund"
    ]
)

# Bad — vague, no examples, useless keywords
Route(
    id="stuff",
    description="Various things",
    keywords=["help"]
)

2. Embeddings¶

Embeddings are vector representations of text that capture semantic meaning.

# Text to vector conversion
text = "Where is my invoice?"
embedding = get_embedding(text)
# Returns [0.1, -0.3, 0.5, ...] — 384 floating-point numbers

# Similar meanings produce similar vectors
# "invoice" and "receipt" → close vectors
# "invoice" and "cat"    → distant vectors

Choosing an embedding dimension:

Dimension	Model	Speed	Quality
384d	`all-MiniLM-L6-v2`	Fastest	Good
768d	`all-mpnet-base-v2`	Fast	Better
1536d	OpenAI `text-embedding-3-large`	Medium	Best

Common embedding providers:

# OpenAI (recommended for production)
import openai
embedding = openai.embeddings.create(
    model="text-embedding-3-small",  # 1536 dims
    input="your text"
).data[0].embedding

# Sentence Transformers (free, local)
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('all-MiniLM-L6-v2')  # 384 dims
embedding = model.encode("your text")

# Cohere
import cohere
co = cohere.Client(api_key)
embedding = co.embed(texts=["your text"]).embeddings[0]

3. Similarity Threshold¶

The threshold is the minimum confidence required to route a query to a destination.

router = Router(dimension=384, threshold=0.5)  # 50% minimum

result = router.route(query, embedding)

if result.confidence >= 0.5:
    handle(result.route_id)   # Confident — proceed
else:
    escalate_to_human()       # Not confident — fallback

Threshold guidance:

Range	Behavior	Recommended Use Case
0.3–0.5	Permissive	Exploratory apps, high recall needed
0.5–0.7	Balanced	Production default
0.7–0.9	Conservative	High-precision requirements
0.9+	Strict	Critical or high-stakes decisions only

4. Router¶

The Router orchestrates the routing pipeline: manages routes, builds the index, and scores queries.

from stratarouter import Router

router = Router(
    dimension=384,      # Embedding dimension
    threshold=0.5,      # Minimum confidence
    max_candidates=10   # HNSW search depth
)

router.add_route(billing_route)
router.add_route(support_route)

# Build index once — then route many times
router.build_index(embeddings)

result = router.route(query, embedding)

Router lifecycle:

graph LR
    A[Create Router] --> B[Add Routes]
    B --> C[Build Index]
    C --> D[Route Queries]
    D --> D
    style C fill:#FFC107
    style D fill:#00C853

5. HNSW Index¶

The HNSW (Hierarchical Navigable Small World) index enables O(log N) approximate nearest-neighbor search.

# Build index once — O(N log N) time
router.build_index(embeddings)

# Search is fast — O(log N) per query
result = router.route(query, embedding)

HNSW performance at scale:

Routes	Build Time	Query Latency	Recall
10	5ms	<1ms	99.9%
100	50ms	1–2ms	99.5%
1,000	500ms	2–3ms	99.0%
10,000	5s	3–5ms	98.5%

See HNSW Index Deep Dive for full details.

6. Hybrid Scoring¶

StrataRouter combines three signals for robust, accurate routing:

final_score = (
    0.64 * semantic_similarity +  # Dense embedding cosine similarity
    0.29 * keyword_match +        # BM25 sparse keyword score
    0.07 * rule_score             # Pattern/regex matching
)

Why hybrid scoring outperforms embeddings alone:

Signal	Weight	Strength
Semantic similarity	64%	Paraphrase detection, general intent
Keyword matching (BM25)	29%	Exact terms, domain-specific jargon
Rule-based patterns	7%	Known phrases, deterministic cases

Example calculation:

query = "invoice PDF download"

# Signal scores
semantic = 0.74   # Understands "invoice" intent
keyword  = 0.95   # Exact match on "invoice"
rule     = 0.00   # No regex rule matches

# Hybrid score
score = 0.64 * 0.74 + 0.29 * 0.95 + 0.07 * 0.00
# = 0.4736 + 0.2755 + 0.0 = 0.749

See Scoring Algorithms for the full implementation.

7. Confidence Calibration¶

Raw similarity scores are calibrated to true probabilities using isotonic regression.

raw_score  = 0.85   # Uncalibrated similarity score
calibrated = 0.92   # Calibrated probability

# Calibrated score approximates the true probability of correct routing

Why calibration matters:

Raw scores are not probabilities — they cannot be compared across queries
Calibration maps scores to the range [0, 1] with semantic meaning
Enables reliable confidence-based thresholding and fallback decisions

Calibration options:

router = Router(calibration_method="isotonic")  # Default — most accurate
router = Router(calibration_method="platt")     # Faster, slightly less accurate
router = Router(enable_calibration=False)       # No calibration — raw scores

8. Route Result¶

The RouteResult contains all routing information needed to act on a decision:

result = router.route(query, embedding)

result.route_id       # "billing"
result.confidence     # 0.94 (94% confident)
result.latency_ms     # 1.23 (milliseconds)
result.alternatives   # [("support", 0.31), ("sales", 0.18)]
result.metadata       # Additional routing info

Acting on results:

if result.confidence > 0.8:
    # High confidence — route automatically
    handle(result.route_id)

elif result.confidence > 0.5:
    # Medium confidence — confirm with user
    confirm_with_user(result.route_id)

else:
    # Low confidence — escalate
    escalate_to_human()

Routing Flow¶

graph TB
    A[User Query] --> B[Get Embedding]
    B --> C{Router}
    C --> D[HNSW Search]
    D --> E[Find Candidates]
    E --> F[Hybrid Scoring]
    F --> G[Calibrate]
    G --> H{Confidence Above Threshold?}
    H -->|Yes| I[Return Route]
    H -->|No| J[Fallback]
    I --> K[Handle Request]
    J --> L[Escalate or Default]
    style C fill:#4A9EFF
    style F fill:#00C853
    style G fill:#FFC107

Best Practices¶

Route Design¶

Do:

Create specific, focused routes with clear boundaries
Provide 5–10 diverse examples per route
Use descriptive names that reflect the handler's purpose
Include relevant domain-specific keywords

Avoid:

Overlapping routes with similar descriptions
Vague descriptions like "various things"
Skimping on examples — more is better
Forgetting to test edge cases and ambiguous queries

Threshold Tuning¶

Do:

Start at 0.5–0.7 and tune based on real data
Monitor confidence distribution in production
A/B test different thresholds before committing

Avoid:

Setting too high — valid queries miss their route
Setting too low — low-confidence queries get incorrectly routed
Using one threshold for all routes regardless of criticality

Performance¶

Do:

Call router.build_index() once during initialization
Cache embeddings — avoid re-generating the same text repeatedly
Batch requests where possible for throughput gains
Monitor latency with result.latency_ms

Avoid:

Rebuilding the index per request (expensive)
Generating embeddings inside the routing hot path
Making synchronous blocking calls to embedding APIs

Common Patterns¶

Pattern 1: Fallback Chain¶

def route_with_fallback(query: str, embedding: list) -> str:
    result = router.route(query, embedding)

    if result.confidence > 0.8:
        return handle_route(result.route_id)
    elif result.confidence > 0.5:
        return ask_clarification(result.route_id)
    else:
        return default_handler()

Pattern 2: Two-Stage Routing¶

# Stage 1 — broad category
category = category_router.route(query, embedding)

# Stage 2 — specific sub-route within category
if category.route_id == "support":
    specific = support_router.route(query, embedding)
    return specific.route_id

Pattern 3: Alternatives Exploration¶

result = router.route(query, embedding, k=3)

print(f"Primary: {result.route_id} ({result.confidence:.1%})")
for alt_id, alt_conf in result.alternatives:
    print(f"  Alt: {alt_id} ({alt_conf:.1%})")

Quick Reference¶

Term	Definition
Route	Destination for queries (e.g., "billing", "support")
Embedding	Vector representation of text (e.g., [0.1, -0.3, ...])
HNSW	Hierarchical graph for fast nearest-neighbor search
Threshold	Minimum confidence to accept a route (e.g., 0.5 = 50%)
Calibration	Mapping raw similarity scores to true probabilities
Hybrid Scoring	Combining semantic + keyword + rule signals

Next Steps¶

HNSW Index

Deep dive into the vector search engine powering routing decisions.

→

Scoring Algorithms

Understand hybrid scoring, BM25, and confidence calibration.

→

Performance Tuning

Optimize latency, throughput, and memory for your workload.

→

Production Deployment

Deploy with Docker, Kubernetes, or your cloud provider.

→