HNSW Index¶
Deep dive into the Hierarchical Navigable Small World (HNSW) graph for ultra-fast vector search.
Overview¶
HNSW is StrataRouter's secret weapon for sub-millisecond routing. It provides O(log N) approximate nearest neighbor search with 99.5%+ recall.
Performance: 0.4ms P99 for 10K routes
What is HNSW?¶
HNSW builds a multi-layer graph that enables logarithmic search complexity:
Key Properties:
- Hierarchical - Multiple layers with decreasing density
- Navigable - Long-range connections in upper layers
- Small World - Short paths between any two nodes
- Logarithmic Search - O(log N) complexity
How It Works¶
1. Index Construction¶
router = Router(dimension=384)
# Add routes
router.add_route(Route("billing"))
router.add_route(Route("support"))
# Build HNSW index
embeddings = get_embeddings([r.description for r in router.routes])
router.build_index(embeddings) # Constructs HNSW graph
Build Process:
- Insert first node in all layers
- For each new node:
- Determine layer with probability function
- Find nearest neighbors at each layer
- Create bidirectional connections
- Prune to maintain degree bounds
Time Complexity: O(M × log N) per insertion
2. Search Process¶
query_embedding = get_embedding("Where's my invoice?")
result = router.route(query, query_embedding)
Search Algorithm:
- Start at top layer with entry point
- Greedy search to nearest neighbor
- Drop to next layer and repeat
- At layer 0 - find K nearest neighbors
- Return top candidates
Time Complexity: O(log N)
Configuration Parameters¶
Core Parameters¶
config = RouterConfig(
hnsw_m=16, # Connections per layer
hnsw_ef_construction=200, # Build quality
hnsw_ef_search=50, # Search quality
)
M (Connections Per Layer)¶
Controls graph connectivity.
| M Value | Build Time | Memory | Search Speed | Accuracy |
|---|---|---|---|---|
| 8 | Fast | Low | Fast | Good |
| 16 | Medium | Medium | Medium | Better |
| 32 | Slow | High | Slow | Best |
Recommendation: Start with 16
ef_construction¶
Controls build quality (higher = better graph).
| ef_construction | Build Time | Recall@10 |
|---|---|---|
| 100 | 1x | 98.5% |
| 200 | 2x | 99.5% |
| 400 | 4x | 99.8% |
Recommendation: 200 for production
ef_search¶
Controls search quality (higher = more thorough).
| ef_search | Latency | Recall@10 |
|---|---|---|
| 20 | 0.3ms | 97% |
| 50 | 0.4ms | 99% |
| 100 | 0.6ms | 99.8% |
Recommendation: 50 for balanced performance
Performance Characteristics¶
Latency Breakdown¶
HNSW Search (P99): 0.4ms
├─ Entry point: 0.05ms
├─ Layer 2→1: 0.05ms
├─ Layer 1→0: 0.10ms
└─ Layer 0 search: 0.20ms
Scaling Behavior¶
| Routes | Build Time | Memory | P50 Latency | P99 Latency |
|---|---|---|---|---|
| 100 | 50ms | 8MB | 0.2ms | 0.4ms |
| 1K | 500ms | 64MB | 0.3ms | 0.5ms |
| 10K | 5s | 352MB | 0.4ms | 0.7ms |
| 100K | 60s | 3.2GB | 0.6ms | 1.2ms |
Logarithmic scaling - 10x more routes = +30% latency
Optimization Techniques¶
1. Tuning for Accuracy¶
# Maximum accuracy
config = RouterConfig(
hnsw_m=32, # More connections
hnsw_ef_construction=400, # Better build
hnsw_ef_search=100, # Thorough search
)
Trade-off: 2-3x slower but 99.9%+ recall
2. Tuning for Speed¶
# Minimum latency
config = RouterConfig(
hnsw_m=8, # Fewer connections
hnsw_ef_construction=100, # Quick build
hnsw_ef_search=20, # Fast search
)
Trade-off: 40% faster but 97% recall
3. Balanced Configuration¶
# Recommended for production
config = RouterConfig(
hnsw_m=16,
hnsw_ef_construction=200,
hnsw_ef_search=50,
)
Result: 99.5% recall @ 0.4ms P99
Memory Management¶
Memory Formula¶
Memory = BASE + (ROUTES × DIMENSIONS × 4 bytes × M × 1.5)
For 10K routes with 384 dimensions, M=16:
= 32MB + (10,000 × 384 × 4 × 16 × 1.5)
= 32MB + 368MB
= 400MB
Memory Optimization¶
# Reduce memory by 50%
config = RouterConfig(
enable_quantization=True, # Use int8 instead of float32
)
Trade-off: Slight accuracy loss (<1%)
Concurrent Access¶
HNSW index supports concurrent reads without locks:
# Safe for concurrent routing
import asyncio
async def route_query(query):
embedding = await get_embedding(query)
return router.route(query, embedding)
# All run concurrently
results = await asyncio.gather(
route_query("Query 1"),
route_query("Query 2"),
route_query("Query 3"),
)
Note: Index updates (adding routes) require exclusive access
Index Persistence¶
Save Index¶
Load Index¶
Benefits:
- No rebuild needed
- Instant startup
- Consistent results
Advanced Usage¶
Dynamic Index Updates¶
# Add route to existing index
new_route = Route("new_route")
new_embedding = get_embedding(new_route.description)
router.add_route_to_index(new_route, new_embedding)
Note: Frequent updates may degrade graph quality. Rebuild periodically.
Batch Index Construction¶
# More efficient for bulk inserts
routes = [Route(f"route_{i}") for i in range(1000)]
embeddings = get_embeddings([r.description for r in routes])
router.build_index_batch(embeddings) # 2x faster than individual inserts
Monitoring Index Health¶
stats = router.get_index_stats()
print(f"Routes: {stats.num_routes}")
print(f"Layers: {stats.num_layers}")
print(f"Avg degree: {stats.avg_degree}")
print(f"Max degree: {stats.max_degree}")
Comparison to Alternatives¶
| Method | Time Complexity | Space Complexity | Latency (10K) | Recall |
|---|---|---|---|---|
| HNSW | O(log N) | O(N × M) | 0.4ms | 99.5% |
| Flat Index | O(N) | O(N) | 5ms | 100% |
| IVF | O(√N) | O(N) | 2ms | 95% |
| LSH | O(1) | O(N × L) | 0.3ms | 85% |
Why HNSW wins:
- Logarithmic scaling
- High recall
- No training needed
- Efficient updates
Best Practices¶
1. Build Once, Query Many¶
# Build index during initialization
router.build_index(embeddings)
router.save_index("index.bin")
# Load in production
router.load_index("index.bin")
2. Choose M Based on Scale¶
if num_routes < 1000:
m = 8 # Small dataset
elif num_routes < 10000:
m = 16 # Medium dataset
else:
m = 24 # Large dataset
3. Monitor Recall¶
# Validate recall on test set
test_queries = [...]
test_labels = [...]
correct = sum(
router.route(q, e).route_id == label
for q, e, label in zip(test_queries, embeddings, test_labels)
)
recall = correct / len(test_queries)
print(f"Recall: {recall:.2%}")
Troubleshooting¶
Low Recall¶
Problem: Recall < 95%
Solutions:
- Increase ef_search to 100
- Increase ef_construction to 400
- Increase m to 32
High Latency¶
Problem: P99 > 2ms
Solutions:
- Decrease ef_search to 20
- Decrease m to 8
- Enable SIMD: enable_simd=True
High Memory Usage¶
Problem: Memory > 1GB for 10K routes
Solutions:
- Enable quantization: enable_quantization=True
- Reduce m to 8
- Use smaller embedding dimension
Further Reading¶
- Algorithms - Hybrid scoring and calibration
- Performance - Optimization guide
- Benchmarks - Performance comparisons
- Original HNSW Paper