Skip to content

Core Concepts

Understand StrataRouter's fundamental building blocks and how they work together.


Semantic Routing vs Traditional Routing

Traditional Routing (Rule-Based)

Traditional routing uses explicit rules and pattern matching:

# Traditional approach — brittle and hard to maintain
if "invoice" in query or "payment" in query:
    return "billing"
elif "bug" in query or "error" in query:
    return "technical"
else:
    return "general"

Problems with rule-based routing:

  • Brittle — breaks with synonyms ("receipt" vs "invoice")
  • Hard to maintain — need rules for every variation
  • No confidence scores — binary yes/no decisions
  • Doesn't understand context or intent

Semantic Routing (StrataRouter)

Semantic routing uses meaning and context:

# Semantic approach — understands intent, not just keywords
embedding = get_embedding(query)         # Convert to vector
result = router.route(query, embedding)  # Find semantically similar route

# Works with synonyms, context, and intent
# Returns confidence score and latency metrics

Advantages of semantic routing:

  • Understands meaning — "receipt" is close to "invoice"
  • Context-aware — considers full query intent
  • Confidence scores — 0–100% certainty per decision
  • Self-improving — learns from examples

Core Concepts

1. Routes

A route represents a destination for queries — a handler, agent, model, or workflow.

from stratarouter import Route

billing_route = Route(
    id="billing",                       # Unique identifier
    description="Billing questions",    # What this route handles
    keywords=["invoice", "payment"],    # Semantic hints
    examples=[                          # Representative queries
        "Where is my invoice?",
        "Update payment method"
    ]
)

Route components:

Field Purpose
id Unique identifier used in code
description Human-readable explanation of what this route handles
keywords Important terms that boost keyword-matching score
examples Representative queries that train the embedding

Well-designed vs poorly-designed routes:

# Good — specific, clear, 5-10 examples
Route(
    id="billing_refunds",
    description="Handle refund requests and payment disputes",
    keywords=["refund", "dispute", "chargeback"],
    examples=[
        "I want a refund",
        "Dispute this charge",
        "Request chargeback",
        "Get my money back",
        "Cancel and refund"
    ]
)

# Bad — vague, no examples, useless keywords
Route(
    id="stuff",
    description="Various things",
    keywords=["help"]
)

2. Embeddings

Embeddings are vector representations of text that capture semantic meaning.

# Text to vector conversion
text = "Where is my invoice?"
embedding = get_embedding(text)
# Returns [0.1, -0.3, 0.5, ...] — 384 floating-point numbers

# Similar meanings produce similar vectors
# "invoice" and "receipt" → close vectors
# "invoice" and "cat"    → distant vectors

Choosing an embedding dimension:

Dimension Model Speed Quality
384d all-MiniLM-L6-v2 Fastest Good
768d all-mpnet-base-v2 Fast Better
1536d OpenAI text-embedding-3-large Medium Best

Common embedding providers:

# OpenAI (recommended for production)
import openai
embedding = openai.embeddings.create(
    model="text-embedding-3-small",  # 1536 dims
    input="your text"
).data[0].embedding

# Sentence Transformers (free, local)
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('all-MiniLM-L6-v2')  # 384 dims
embedding = model.encode("your text")

# Cohere
import cohere
co = cohere.Client(api_key)
embedding = co.embed(texts=["your text"]).embeddings[0]

3. Similarity Threshold

The threshold is the minimum confidence required to route a query to a destination.

router = Router(dimension=384, threshold=0.5)  # 50% minimum

result = router.route(query, embedding)

if result.confidence >= 0.5:
    handle(result.route_id)   # Confident — proceed
else:
    escalate_to_human()       # Not confident — fallback

Threshold guidance:

Range Behavior Recommended Use Case
0.3–0.5 Permissive Exploratory apps, high recall needed
0.5–0.7 Balanced Production default
0.7–0.9 Conservative High-precision requirements
0.9+ Strict Critical or high-stakes decisions only

4. Router

The Router orchestrates the routing pipeline: manages routes, builds the index, and scores queries.

from stratarouter import Router

router = Router(
    dimension=384,      # Embedding dimension
    threshold=0.5,      # Minimum confidence
    max_candidates=10   # HNSW search depth
)

router.add_route(billing_route)
router.add_route(support_route)

# Build index once — then route many times
router.build_index(embeddings)

result = router.route(query, embedding)

Router lifecycle:

graph LR
    A[Create Router] --> B[Add Routes]
    B --> C[Build Index]
    C --> D[Route Queries]
    D --> D
    style C fill:#FFC107
    style D fill:#00C853

5. HNSW Index

The HNSW (Hierarchical Navigable Small World) index enables O(log N) approximate nearest-neighbor search.

# Build index once — O(N log N) time
router.build_index(embeddings)

# Search is fast — O(log N) per query
result = router.route(query, embedding)

HNSW performance at scale:

Routes Build Time Query Latency Recall
10 5ms <1ms 99.9%
100 50ms 1–2ms 99.5%
1,000 500ms 2–3ms 99.0%
10,000 5s 3–5ms 98.5%

See HNSW Index Deep Dive for full details.


6. Hybrid Scoring

StrataRouter combines three signals for robust, accurate routing:

final_score = (
    0.64 * semantic_similarity +  # Dense embedding cosine similarity
    0.29 * keyword_match +        # BM25 sparse keyword score
    0.07 * rule_score             # Pattern/regex matching
)

Why hybrid scoring outperforms embeddings alone:

Signal Weight Strength
Semantic similarity 64% Paraphrase detection, general intent
Keyword matching (BM25) 29% Exact terms, domain-specific jargon
Rule-based patterns 7% Known phrases, deterministic cases

Example calculation:

query = "invoice PDF download"

# Signal scores
semantic = 0.74   # Understands "invoice" intent
keyword  = 0.95   # Exact match on "invoice"
rule     = 0.00   # No regex rule matches

# Hybrid score
score = 0.64 * 0.74 + 0.29 * 0.95 + 0.07 * 0.00
# = 0.4736 + 0.2755 + 0.0 = 0.749

See Scoring Algorithms for the full implementation.


7. Confidence Calibration

Raw similarity scores are calibrated to true probabilities using isotonic regression.

raw_score  = 0.85   # Uncalibrated similarity score
calibrated = 0.92   # Calibrated probability

# Calibrated score approximates the true probability of correct routing

Why calibration matters:

  • Raw scores are not probabilities — they cannot be compared across queries
  • Calibration maps scores to the range [0, 1] with semantic meaning
  • Enables reliable confidence-based thresholding and fallback decisions

Calibration options:

router = Router(calibration_method="isotonic")  # Default — most accurate
router = Router(calibration_method="platt")     # Faster, slightly less accurate
router = Router(enable_calibration=False)       # No calibration — raw scores

8. Route Result

The RouteResult contains all routing information needed to act on a decision:

result = router.route(query, embedding)

result.route_id       # "billing"
result.confidence     # 0.94 (94% confident)
result.latency_ms     # 1.23 (milliseconds)
result.alternatives   # [("support", 0.31), ("sales", 0.18)]
result.metadata       # Additional routing info

Acting on results:

if result.confidence > 0.8:
    # High confidence — route automatically
    handle(result.route_id)

elif result.confidence > 0.5:
    # Medium confidence — confirm with user
    confirm_with_user(result.route_id)

else:
    # Low confidence — escalate
    escalate_to_human()

Routing Flow

graph TB
    A[User Query] --> B[Get Embedding]
    B --> C{Router}
    C --> D[HNSW Search]
    D --> E[Find Candidates]
    E --> F[Hybrid Scoring]
    F --> G[Calibrate]
    G --> H{Confidence Above Threshold?}
    H -->|Yes| I[Return Route]
    H -->|No| J[Fallback]
    I --> K[Handle Request]
    J --> L[Escalate or Default]
    style C fill:#4A9EFF
    style F fill:#00C853
    style G fill:#FFC107

Best Practices

Route Design

Do:

  • Create specific, focused routes with clear boundaries
  • Provide 5–10 diverse examples per route
  • Use descriptive names that reflect the handler's purpose
  • Include relevant domain-specific keywords

Avoid:

  • Overlapping routes with similar descriptions
  • Vague descriptions like "various things"
  • Skimping on examples — more is better
  • Forgetting to test edge cases and ambiguous queries

Threshold Tuning

Do:

  • Start at 0.5–0.7 and tune based on real data
  • Monitor confidence distribution in production
  • A/B test different thresholds before committing

Avoid:

  • Setting too high — valid queries miss their route
  • Setting too low — low-confidence queries get incorrectly routed
  • Using one threshold for all routes regardless of criticality

Performance

Do:

  • Call router.build_index() once during initialization
  • Cache embeddings — avoid re-generating the same text repeatedly
  • Batch requests where possible for throughput gains
  • Monitor latency with result.latency_ms

Avoid:

  • Rebuilding the index per request (expensive)
  • Generating embeddings inside the routing hot path
  • Making synchronous blocking calls to embedding APIs

Common Patterns

Pattern 1: Fallback Chain

def route_with_fallback(query: str, embedding: list) -> str:
    result = router.route(query, embedding)

    if result.confidence > 0.8:
        return handle_route(result.route_id)
    elif result.confidence > 0.5:
        return ask_clarification(result.route_id)
    else:
        return default_handler()

Pattern 2: Two-Stage Routing

# Stage 1 — broad category
category = category_router.route(query, embedding)

# Stage 2 — specific sub-route within category
if category.route_id == "support":
    specific = support_router.route(query, embedding)
    return specific.route_id

Pattern 3: Alternatives Exploration

result = router.route(query, embedding, k=3)

print(f"Primary: {result.route_id} ({result.confidence:.1%})")
for alt_id, alt_conf in result.alternatives:
    print(f"  Alt: {alt_id} ({alt_conf:.1%})")

Quick Reference

Term Definition
Route Destination for queries (e.g., "billing", "support")
Embedding Vector representation of text (e.g., [0.1, -0.3, ...])
HNSW Hierarchical graph for fast nearest-neighbor search
Threshold Minimum confidence to accept a route (e.g., 0.5 = 50%)
Calibration Mapping raw similarity scores to true probabilities
Hybrid Scoring Combining semantic + keyword + rule signals

Next Steps