Algorithms¶
Deep dive into the algorithms powering StrataRouter's routing engine.
HNSW (Hierarchical Navigable Small World)¶
Fast approximate nearest neighbor search for semantic routing.
Overview¶
HNSW builds a multi-layer graph structure where: - Top layers: Sparse, long-range connections for fast navigation - Bottom layer: Dense connections for accurate final search
Algorithm¶
def search(query, k):
# Start from top layer
current = entry_point
# Navigate down layers
for layer in range(max_layer, 0, -1):
current = search_layer(query, current, layer, ef=1)
# Search bottom layer thoroughly
candidates = search_layer(query, current, 0, ef=ef_search)
# Return top-k nearest
return select_neighbors(candidates, k)
Construction¶
def insert(point):
# Determine layer (exponential distribution)
layer = random_level()
# Insert into each layer
for l in range(layer + 1):
# Find M nearest neighbors
neighbors = search_layer(point, entry, l, ef_construction)
# Connect bidirectionally
for neighbor in neighbors[:M]:
add_edge(point, neighbor)
add_edge(neighbor, point)
Parameters¶
| Parameter | Default | Description |
|---|---|---|
M |
16 | Max connections per node |
ef_construction |
200 | Construction quality (higher = better) |
ef_search |
50 | Search quality (higher = more accurate) |
max_layer |
log(N) | Number of layers |
Time Complexity¶
- Construction: O(N log N * M)
- Query: O(log N * M)
- Space: O(N * M)
Performance Tuning¶
# Fast construction, good quality
router = Router(m=16, ef_construction=200)
# Slower construction, better quality
router = Router(m=32, ef_construction=400)
# Faster search
router = Router(ef_search=30)
# More accurate search
router = Router(ef_search=100)
When to Use¶
✅ Best for: - Large datasets (1K+ routes) - Sub-millisecond latency required - High-dimensional vectors (384+)
❌ Avoid when: - Very few routes (<10) - Need exact nearest neighbors - Memory constrained (<64MB)
BM25 (Best Matching 25)¶
Probabilistic information retrieval for keyword matching.
Formula¶
score(D, Q) = Σ IDF(qi) * (f(qi, D) * (k1 + 1)) / (f(qi, D) + k1 * (1 - b + b * |D|/avgdl))
where:
D = document (route)
Q = query
qi = query term
f(qi, D) = term frequency in D
|D| = document length
avgdl = average document length
Components¶
1. IDF (Inverse Document Frequency)¶
def idf(term, documents):
df = count_documents_containing(term, documents)
N = len(documents)
return log((N - df + 0.5) / (df + 0.5) + 1)
Intuition: Rare terms are more discriminative
2. Term Frequency Saturation¶
def tf_component(tf, doc_length, avg_length, k1=1.5, b=0.75):
normalized_length = doc_length / avg_length
denominator = tf + k1 * (1 - b + b * normalized_length)
return tf * (k1 + 1) / denominator
Intuition: More occurrences help, but with diminishing returns
Parameters¶
| Parameter | Default | Effect |
|---|---|---|
k1 |
1.5 | Term frequency saturation (0 = binary, ∞ = raw TF) |
b |
0.75 | Length normalization (0 = none, 1 = full) |
Example¶
# Query: "invoice payment"
# Route: "Billing questions about invoices, payments, and refunds"
# Step 1: Tokenize
query_terms = ["invoice", "payment"]
doc_terms = ["billing", "questions", "about", "invoices",
"payments", "and", "refunds"]
# Step 2: Calculate IDF
idf("invoice") = log((100 - 20 + 0.5) / (20 + 0.5) + 1) = 1.58
idf("payment") = log((100 - 15 + 0.5) / (15 + 0.5) + 1) = 1.79
# Step 3: Calculate TF component
tf("invoice", doc) = 1
tf_component = 1 * (1.5 + 1) / (1 + 1.5 * (1 - 0.75 + 0.75 * 7/10))
= 2.5 / 2.025 = 1.23
# Step 4: Sum scores
score = 1.58 * 1.23 + 1.79 * 1.23 = 4.14
Optimization¶
# Precompute IDF scores
idf_cache = {}
for term in vocabulary:
idf_cache[term] = calculate_idf(term)
# Fast lookup during query
def score_query(query, doc):
score = 0
for term in query:
if term in idf_cache:
score += idf_cache[term] * tf_component(term, doc)
return score
When to Use¶
✅ Best for: - Exact keyword matches needed - Short documents (route descriptions) - Domain-specific terminology
❌ Avoid when: - Semantic similarity more important - Paraphrasing common - No keywords available
Isotonic Regression¶
Monotonic calibration for confidence scores.
Problem¶
Raw scores don't match actual accuracy:
Raw Score → Actual Accuracy
0.60 → 0.71 (underconfident)
0.70 → 0.76 (underconfident)
0.80 → 0.88 (slightly under)
0.90 → 0.91 (well calibrated)
Solution¶
Fit a monotonic curve that maps scores to accuracy:
from sklearn.isotonic import IsotonicRegression
# Collect validation data
scores = [0.60, 0.65, 0.70, 0.75, 0.80, 0.85, 0.90, 0.95]
accuracies = [0.71, 0.74, 0.76, 0.80, 0.88, 0.90, 0.91, 0.97]
# Fit monotonic curve
calibrator = IsotonicRegression(out_of_bounds='clip')
calibrator.fit(scores, accuracies)
# Calibrate new scores
raw_score = 0.73
calibrated = calibrator.predict([raw_score])[0] # 0.78
Algorithm¶
1. Sort (score, accuracy) pairs by score
2. Pool adjacent violators (PAV algorithm):
- Scan from left to right
- If non-monotonic, average adjacent points
- Repeat until monotonic
3. Return piecewise constant function
PAV Algorithm¶
def pool_adjacent_violators(x, y):
"""Pool Adjacent Violators Algorithm"""
n = len(x)
solution = y.copy()
i = 0
while i < n - 1:
if solution[i] > solution[i + 1]:
# Violation found, pool
pool_start = i
pool_end = i + 1
# Extend pool
while pool_end < n and solution[pool_end - 1] > solution[pool_end]:
pool_end += 1
# Average pool
pool_mean = sum(solution[pool_start:pool_end]) / (pool_end - pool_start)
for j in range(pool_start, pool_end):
solution[j] = pool_mean
i = pool_end
else:
i += 1
return solution
Per-Route Calibration¶
Each route gets its own calibration curve:
# Easy route (high baseline accuracy)
easy_calibrator = IsotonicRegression()
easy_calibrator.fit([0.6, 0.7, 0.8, 0.9], [0.82, 0.88, 0.93, 0.98])
# Hard route (lower baseline accuracy)
hard_calibrator = IsotonicRegression()
hard_calibrator.fit([0.6, 0.7, 0.8, 0.9], [0.65, 0.72, 0.81, 0.90])
# Apply appropriate calibration
if route == 'easy':
confidence = easy_calibrator.predict([raw_score])
else:
confidence = hard_calibrator.predict([raw_score])
Benefits¶
- Accurate Confidence: Confidence = expected accuracy
- Maintains Ranking: Doesn't change route order
- Per-Route: Accounts for route difficulty
- Fast: O(log N) lookup with binary search
When to Use¶
✅ Always use for production systems where confidence matters
❌ Skip only for prototyping or when speed is absolutely critical (<1ms)
SIMD Optimization¶
Hardware-accelerated vector operations for speed.
Overview¶
StrataRouter Core uses SIMD (Single Instruction Multiple Data) operations to accelerate similarity computations by up to 10x. The implementation is found in src/algorithms/simd_ops.rs and automatically detects the best available instruction set.
Supported Operations¶
// Cosine similarity with SIMD acceleration
pub fn cosine_similarity(a: &[f32], b: &[f32]) -> f32 {
#[cfg(all(target_arch = "x86_64", target_feature = "avx2"))]
{
unsafe { cosine_similarity_avx2(a, b) }
}
#[cfg(not(all(target_arch = "x86_64", target_feature = "avx2")))]
{
cosine_similarity_scalar(a, b)
}
}
// Scalar fallback for non-AVX2 systems
fn cosine_similarity_scalar(a: &[f32], b: &[f32]) -> f32 {
let dot: f32 = a.iter().zip(b.iter()).map(|(x, y)| x * y).sum();
let norm_a: f32 = a.iter().map(|x| x * x).sum::<f32>().sqrt();
let norm_b: f32 = b.iter().map(|x| x * x).sum::<f32>().sqrt();
dot / (norm_a * norm_b)
}
Performance Gain¶
| Operation | Scalar | AVX2 | NEON | Speedup |
|---|---|---|---|---|
| Dot Product (384D) | 45ns | 12ns | 18ns | 2.5-3.8x |
| Cosine Similarity | 120ns | 35ns | 50ns | 2.4-3.4x |
| Euclidean Distance | 90ns | 28ns | 40ns | 2.3-3.2x |
Auto-Detection¶
// Automatically uses best available instruction set
let similarity = if is_x86_feature_detected!("avx2") {
cosine_similarity_avx2(a, b)
} else if is_x86_feature_detected!("sse4.1") {
cosine_similarity_sse(a, b)
} else {
cosine_similarity_scalar(a, b)
}
Requirements¶
- x86_64: AVX2 (2013+) or SSE4.1 (2006+)
- ARM: NEON (most ARM CPUs)
- Fallback: Scalar operations for all CPUs
Score Fusion¶
Combining multiple scores into a single confidence.
Weighted Average¶
# Learned weights
w_dense = 0.64
w_sparse = 0.29
w_rule = 0.07
# Fusion
fused = w_dense * dense_score + w_sparse * sparse_score + w_rule * rule_score
Normalization¶
# Sigmoid normalization
confidence = 1 / (1 + exp(-fused))
# Alternative: Min-max scaling
confidence = (fused - min_score) / (max_score - min_score)
# Alternative: Softmax (multi-route)
exp_scores = [exp(score) for score in all_scores]
confidence = exp(fused) / sum(exp_scores)
Learning Weights¶
from sklearn.linear_model import LogisticRegression
# Training data
X = [[dense1, sparse1, rule1],
[dense2, sparse2, rule2],
...]
y = [correct1, correct2, ...] # 1 if correct, 0 if wrong
# Train
model = LogisticRegression()
model.fit(X, y)
# Extract weights
w_dense, w_sparse, w_rule = model.coef_[0]
Comparison¶
| Algorithm | Latency | Memory | Accuracy | Use Case |
|---|---|---|---|---|
| HNSW | 0.8ms | 50MB | 95% | Semantic search |
| BM25 | 0.5ms | 10MB | 85% | Keyword matching |
| Rules | 0.1ms | 2MB | 100% | Exact patterns |
| Isotonic | 0.1ms | 2MB | N/A | Calibration |
| SIMD | 0.01ms | 0MB | N/A | Acceleration |
Next Steps¶
- Routing Engine - How it all fits together
- Performance - Benchmarks and optimization
- API Reference - Implementation details