Skip to content

HNSW Index

Deep dive into the Hierarchical Navigable Small World (HNSW) graph for ultra-fast vector search.

Overview

HNSW is StrataRouter's secret weapon for sub-millisecond routing. It provides O(log N) approximate nearest neighbor search with 99.5%+ recall.

Performance: 0.4ms P99 for 10K routes


What is HNSW?

HNSW builds a multi-layer graph that enables logarithmic search complexity:

Layer 2:  A ←→ E
          ↓     ↓
Layer 1:  A → B → E → F
          ↓   ↓   ↓   ↓
Layer 0:  A-B-C-D-E-F-G-H  (all nodes)

Key Properties:

  • Hierarchical - Multiple layers with decreasing density
  • Navigable - Long-range connections in upper layers
  • Small World - Short paths between any two nodes
  • Logarithmic Search - O(log N) complexity

How It Works

1. Index Construction

router = Router(dimension=384)

# Add routes
router.add_route(Route("billing"))
router.add_route(Route("support"))

# Build HNSW index
embeddings = get_embeddings([r.description for r in router.routes])
router.build_index(embeddings)  # Constructs HNSW graph

Build Process:

  1. Insert first node in all layers
  2. For each new node:
  3. Determine layer with probability function
  4. Find nearest neighbors at each layer
  5. Create bidirectional connections
  6. Prune to maintain degree bounds

Time Complexity: O(M × log N) per insertion

2. Search Process

query_embedding = get_embedding("Where's my invoice?")
result = router.route(query, query_embedding)

Search Algorithm:

  1. Start at top layer with entry point
  2. Greedy search to nearest neighbor
  3. Drop to next layer and repeat
  4. At layer 0 - find K nearest neighbors
  5. Return top candidates

Time Complexity: O(log N)


Configuration Parameters

Core Parameters

config = RouterConfig(
    hnsw_m=16,                  # Connections per layer
    hnsw_ef_construction=200,   # Build quality
    hnsw_ef_search=50,          # Search quality
)

M (Connections Per Layer)

Controls graph connectivity.

M Value Build Time Memory Search Speed Accuracy
8 Fast Low Fast Good
16 Medium Medium Medium Better
32 Slow High Slow Best

Recommendation: Start with 16

ef_construction

Controls build quality (higher = better graph).

ef_construction Build Time Recall@10
100 1x 98.5%
200 2x 99.5%
400 4x 99.8%

Recommendation: 200 for production

Controls search quality (higher = more thorough).

ef_search Latency Recall@10
20 0.3ms 97%
50 0.4ms 99%
100 0.6ms 99.8%

Recommendation: 50 for balanced performance


Performance Characteristics

Latency Breakdown

HNSW Search (P99): 0.4ms
├─ Entry point:    0.05ms
├─ Layer 2→1:      0.05ms
├─ Layer 1→0:      0.10ms
└─ Layer 0 search: 0.20ms

Scaling Behavior

Routes Build Time Memory P50 Latency P99 Latency
100 50ms 8MB 0.2ms 0.4ms
1K 500ms 64MB 0.3ms 0.5ms
10K 5s 352MB 0.4ms 0.7ms
100K 60s 3.2GB 0.6ms 1.2ms

Logarithmic scaling - 10x more routes = +30% latency


Optimization Techniques

1. Tuning for Accuracy

# Maximum accuracy
config = RouterConfig(
    hnsw_m=32,                  # More connections
    hnsw_ef_construction=400,   # Better build
    hnsw_ef_search=100,         # Thorough search
)

Trade-off: 2-3x slower but 99.9%+ recall

2. Tuning for Speed

# Minimum latency
config = RouterConfig(
    hnsw_m=8,                   # Fewer connections
    hnsw_ef_construction=100,   # Quick build
    hnsw_ef_search=20,          # Fast search
)

Trade-off: 40% faster but 97% recall

3. Balanced Configuration

# Recommended for production
config = RouterConfig(
    hnsw_m=16,
    hnsw_ef_construction=200,
    hnsw_ef_search=50,
)

Result: 99.5% recall @ 0.4ms P99


Memory Management

Memory Formula

Memory = BASE + (ROUTES × DIMENSIONS × 4 bytes × M × 1.5)

For 10K routes with 384 dimensions, M=16:
= 32MB + (10,000 × 384 × 4 × 16 × 1.5)
= 32MB + 368MB
= 400MB

Memory Optimization

# Reduce memory by 50%
config = RouterConfig(
    enable_quantization=True,   # Use int8 instead of float32
)

Trade-off: Slight accuracy loss (<1%)


Concurrent Access

HNSW index supports concurrent reads without locks:

# Safe for concurrent routing
import asyncio

async def route_query(query):
    embedding = await get_embedding(query)
    return router.route(query, embedding)

# All run concurrently
results = await asyncio.gather(
    route_query("Query 1"),
    route_query("Query 2"),
    route_query("Query 3"),
)

Note: Index updates (adding routes) require exclusive access


Index Persistence

Save Index

# Save to disk
router.save_index("router_index.bin")

Load Index

# Load from disk (instant!)
router = Router(dimension=384)
router.load_index("router_index.bin")

Benefits:

  • No rebuild needed
  • Instant startup
  • Consistent results

Advanced Usage

Dynamic Index Updates

# Add route to existing index
new_route = Route("new_route")
new_embedding = get_embedding(new_route.description)

router.add_route_to_index(new_route, new_embedding)

Note: Frequent updates may degrade graph quality. Rebuild periodically.

Batch Index Construction

# More efficient for bulk inserts
routes = [Route(f"route_{i}") for i in range(1000)]
embeddings = get_embeddings([r.description for r in routes])

router.build_index_batch(embeddings)  # 2x faster than individual inserts

Monitoring Index Health

stats = router.get_index_stats()
print(f"Routes: {stats.num_routes}")
print(f"Layers: {stats.num_layers}")
print(f"Avg degree: {stats.avg_degree}")
print(f"Max degree: {stats.max_degree}")

Comparison to Alternatives

Method Time Complexity Space Complexity Latency (10K) Recall
HNSW O(log N) O(N × M) 0.4ms 99.5%
Flat Index O(N) O(N) 5ms 100%
IVF O(√N) O(N) 2ms 95%
LSH O(1) O(N × L) 0.3ms 85%

Why HNSW wins:

  • Logarithmic scaling
  • High recall
  • No training needed
  • Efficient updates

Best Practices

1. Build Once, Query Many

# Build index during initialization
router.build_index(embeddings)
router.save_index("index.bin")

# Load in production
router.load_index("index.bin")

2. Choose M Based on Scale

if num_routes < 1000:
    m = 8      # Small dataset
elif num_routes < 10000:
    m = 16     # Medium dataset
else:
    m = 24     # Large dataset

3. Monitor Recall

# Validate recall on test set
test_queries = [...]
test_labels = [...]

correct = sum(
    router.route(q, e).route_id == label
    for q, e, label in zip(test_queries, embeddings, test_labels)
)

recall = correct / len(test_queries)
print(f"Recall: {recall:.2%}")

Troubleshooting

Low Recall

Problem: Recall < 95%

Solutions: - Increase ef_search to 100 - Increase ef_construction to 400 - Increase m to 32

High Latency

Problem: P99 > 2ms

Solutions: - Decrease ef_search to 20 - Decrease m to 8 - Enable SIMD: enable_simd=True

High Memory Usage

Problem: Memory > 1GB for 10K routes

Solutions: - Enable quantization: enable_quantization=True - Reduce m to 8 - Use smaller embedding dimension


Further Reading