HNSW Index¶

Deep dive into the Hierarchical Navigable Small World (HNSW) graph for ultra-fast vector search.

Overview¶

HNSW is StrataRouter's secret weapon for sub-millisecond routing. It provides O(log N) approximate nearest neighbor search with 99.5%+ recall.

Performance: 0.4ms P99 for 10K routes

What is HNSW?¶

HNSW builds a multi-layer graph that enables logarithmic search complexity:

Layer 2:  A ←→ E
          ↓     ↓
Layer 1:  A → B → E → F
          ↓   ↓   ↓   ↓
Layer 0:  A-B-C-D-E-F-G-H  (all nodes)

Key Properties:

Hierarchical - Multiple layers with decreasing density
Navigable - Long-range connections in upper layers
Small World - Short paths between any two nodes
Logarithmic Search - O(log N) complexity

How It Works¶

1. Index Construction¶

router = Router(dimension=384)

# Add routes
router.add_route(Route("billing"))
router.add_route(Route("support"))

# Build HNSW index
embeddings = get_embeddings([r.description for r in router.routes])
router.build_index(embeddings)  # Constructs HNSW graph

Build Process:

Insert first node in all layers
For each new node:
Determine layer with probability function
Find nearest neighbors at each layer
Create bidirectional connections
Prune to maintain degree bounds

Time Complexity: O(M × log N) per insertion

2. Search Process¶

query_embedding = get_embedding("Where's my invoice?")
result = router.route(query, query_embedding)

Search Algorithm:

Start at top layer with entry point
Greedy search to nearest neighbor
Drop to next layer and repeat
At layer 0 - find K nearest neighbors
Return top candidates

Time Complexity: O(log N)

Configuration Parameters¶

Core Parameters¶

config = RouterConfig(
    hnsw_m=16,                  # Connections per layer
    hnsw_ef_construction=200,   # Build quality
    hnsw_ef_search=50,          # Search quality
)

M (Connections Per Layer)¶

Controls graph connectivity.

M Value	Build Time	Memory	Search Speed	Accuracy
8	Fast	Low	Fast	Good
16	Medium	Medium	Medium	Better
32	Slow	High	Slow	Best

Recommendation: Start with 16

ef_construction¶

Controls build quality (higher = better graph).

ef_construction	Build Time	Recall@10
100	1x	98.5%
200	2x	99.5%
400	4x	99.8%

Recommendation: 200 for production

ef_search¶

Controls search quality (higher = more thorough).

ef_search	Latency	Recall@10
20	0.3ms	97%
50	0.4ms	99%
100	0.6ms	99.8%

Recommendation: 50 for balanced performance

Performance Characteristics¶

Latency Breakdown¶

HNSW Search (P99): 0.4ms
├─ Entry point:    0.05ms
├─ Layer 2→1:      0.05ms
├─ Layer 1→0:      0.10ms
└─ Layer 0 search: 0.20ms

Scaling Behavior¶

Routes	Build Time	Memory	P50 Latency	P99 Latency
100	50ms	8MB	0.2ms	0.4ms
1K	500ms	64MB	0.3ms	0.5ms
10K	5s	352MB	0.4ms	0.7ms
100K	60s	3.2GB	0.6ms	1.2ms

Logarithmic scaling - 10x more routes = +30% latency

Optimization Techniques¶

1. Tuning for Accuracy¶

# Maximum accuracy
config = RouterConfig(
    hnsw_m=32,                  # More connections
    hnsw_ef_construction=400,   # Better build
    hnsw_ef_search=100,         # Thorough search
)

Trade-off: 2-3x slower but 99.9%+ recall

2. Tuning for Speed¶

# Minimum latency
config = RouterConfig(
    hnsw_m=8,                   # Fewer connections
    hnsw_ef_construction=100,   # Quick build
    hnsw_ef_search=20,          # Fast search
)

Trade-off: 40% faster but 97% recall

3. Balanced Configuration¶

# Recommended for production
config = RouterConfig(
    hnsw_m=16,
    hnsw_ef_construction=200,
    hnsw_ef_search=50,
)

Result: 99.5% recall @ 0.4ms P99

Memory Management¶

Memory Formula¶

Memory = BASE + (ROUTES × DIMENSIONS × 4 bytes × M × 1.5)

For 10K routes with 384 dimensions, M=16:
= 32MB + (10,000 × 384 × 4 × 16 × 1.5)
= 32MB + 368MB
= 400MB

Memory Optimization¶

# Reduce memory by 50%
config = RouterConfig(
    enable_quantization=True,   # Use int8 instead of float32
)

Trade-off: Slight accuracy loss (<1%)

Concurrent Access¶

HNSW index supports concurrent reads without locks:

# Safe for concurrent routing
import asyncio

async def route_query(query):
    embedding = await get_embedding(query)
    return router.route(query, embedding)

# All run concurrently
results = await asyncio.gather(
    route_query("Query 1"),
    route_query("Query 2"),
    route_query("Query 3"),
)

Note: Index updates (adding routes) require exclusive access

Index Persistence¶

Save Index¶

# Save to disk
router.save_index("router_index.bin")

Load Index¶

# Load from disk (instant!)
router = Router(dimension=384)
router.load_index("router_index.bin")

Benefits:

No rebuild needed
Instant startup
Consistent results

Advanced Usage¶

Dynamic Index Updates¶

# Add route to existing index
new_route = Route("new_route")
new_embedding = get_embedding(new_route.description)

router.add_route_to_index(new_route, new_embedding)

Note: Frequent updates may degrade graph quality. Rebuild periodically.

Batch Index Construction¶

# More efficient for bulk inserts
routes = [Route(f"route_{i}") for i in range(1000)]
embeddings = get_embeddings([r.description for r in routes])

router.build_index_batch(embeddings)  # 2x faster than individual inserts

Monitoring Index Health¶

stats = router.get_index_stats()
print(f"Routes: {stats.num_routes}")
print(f"Layers: {stats.num_layers}")
print(f"Avg degree: {stats.avg_degree}")
print(f"Max degree: {stats.max_degree}")

Comparison to Alternatives¶

Method	Time Complexity	Space Complexity	Latency (10K)	Recall
HNSW	O(log N)	O(N × M)	0.4ms	99.5%
Flat Index	O(N)	O(N)	5ms	100%
IVF	O(√N)	O(N)	2ms	95%
LSH	O(1)	O(N × L)	0.3ms	85%

Why HNSW wins:

Logarithmic scaling
High recall
No training needed
Efficient updates

Best Practices¶

1. Build Once, Query Many¶

# Build index during initialization
router.build_index(embeddings)
router.save_index("index.bin")

# Load in production
router.load_index("index.bin")

2. Choose M Based on Scale¶

if num_routes < 1000:
    m = 8      # Small dataset
elif num_routes < 10000:
    m = 16     # Medium dataset
else:
    m = 24     # Large dataset

3. Monitor Recall¶

# Validate recall on test set
test_queries = [...]
test_labels = [...]

correct = sum(
    router.route(q, e).route_id == label
    for q, e, label in zip(test_queries, embeddings, test_labels)
)

recall = correct / len(test_queries)
print(f"Recall: {recall:.2%}")

Troubleshooting¶

Low Recall¶

Problem: Recall < 95%

Solutions: - Increase ef_search to 100 - Increase ef_construction to 400 - Increase m to 32

High Latency¶

Problem: P99 > 2ms

Solutions: - Decrease ef_search to 20 - Decrease m to 8 - Enable SIMD: enable_simd=True

High Memory Usage¶

Problem: Memory > 1GB for 10K routes

Solutions: - Enable quantization: enable_quantization=True - Reduce m to 8 - Use smaller embedding dimension