Custom Encoders¶
Build domain-specific embedding models for improved routing accuracy.
Overview¶
Custom encoders allow you to train specialized embedding models optimized for your specific routing use case, potentially improving accuracy by 5-15% over general-purpose models.
When to Use Custom Encoders¶
Consider custom encoders when:
- Your domain has specialized terminology
- General embeddings perform poorly (<85% accuracy)
- You have 10K+ labeled routing examples
- Routing accuracy is mission-critical
Encoder Interface¶
Base Encoder Class¶
from stratarouter.encoders import BaseEncoder
import numpy as np
class CustomEncoder(BaseEncoder):
"""Custom encoder interface."""
def encode(self, texts: list[str]) -> np.ndarray:
"""Encode texts to embeddings.
Args:
texts: List of input texts
Returns:
Array of shape (len(texts), dimension)
"""
raise NotImplementedError
@property
def dimension(self) -> int:
"""Return embedding dimension."""
raise NotImplementedError
Implementation Examples¶
1. HuggingFace Encoder¶
from transformers import AutoTokenizer, AutoModel
import torch
class HuggingFaceEncoder(BaseEncoder):
"""Custom encoder using HuggingFace models."""
def __init__(self, model_name: str = "sentence-transformers/all-MiniLM-L6-v2"):
self.tokenizer = AutoTokenizer.from_pretrained(model_name)
self.model = AutoModel.from_pretrained(model_name)
self.model.eval()
def encode(self, texts: list[str]) -> np.ndarray:
"""Encode texts using HuggingFace model."""
# Tokenize
inputs = self.tokenizer(
texts,
padding=True,
truncation=True,
return_tensors="pt"
)
# Encode
with torch.no_grad():
outputs = self.model(**inputs)
embeddings = outputs.last_hidden_state.mean(dim=1)
return embeddings.numpy()
@property
def dimension(self) -> int:
return self.model.config.hidden_size
2. OpenAI Encoder¶
import openai
class OpenAIEncoder(BaseEncoder):
"""Encoder using OpenAI's embedding API."""
def __init__(self, model: str = "text-embedding-3-small"):
self.model = model
self._dimension = 1536 if "large" in model else 512
def encode(self, texts: list[str]) -> np.ndarray:
"""Encode using OpenAI API."""
response = openai.Embedding.create(
model=self.model,
input=texts
)
embeddings = [item['embedding'] for item in response['data']]
return np.array(embeddings)
@property
def dimension(self) -> int:
return self._dimension
3. Fine-Tuned Encoder¶
from sentence_transformers import SentenceTransformer
class FineTunedEncoder(BaseEncoder):
"""Fine-tuned sentence transformer."""
def __init__(self, model_path: str):
self.model = SentenceTransformer(model_path)
def encode(self, texts: list[str]) -> np.ndarray:
"""Encode using fine-tuned model."""
return self.model.encode(texts)
@property
def dimension(self) -> int:
return self.model.get_sentence_embedding_dimension()
Training Custom Models¶
Data Preparation¶
# Prepare training data
training_data = [
{
"query": "Where's my invoice?",
"route": "billing",
"positive_examples": [
"Show me my bill",
"I need my receipt"
],
"negative_examples": [
"How do I login?",
"Product not working"
]
},
# ... more examples
]
# Convert to sentence-transformers format
from sentence_transformers import InputExample
examples = []
for item in training_data:
query = item["query"]
# Positive pairs
for pos in item["positive_examples"]:
examples.append(InputExample(texts=[query, pos], label=1.0))
# Negative pairs
for neg in item["negative_examples"]:
examples.append(InputExample(texts=[query, neg], label=0.0))
Fine-Tuning¶
from sentence_transformers import SentenceTransformer, losses
from torch.utils.data import DataLoader
# Load base model
model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')
# Create data loader
train_dataloader = DataLoader(examples, shuffle=True, batch_size=16)
# Define loss
train_loss = losses.CosineSimilarityLoss(model)
# Fine-tune
model.fit(
train_objectives=[(train_dataloader, train_loss)],
epochs=10,
warmup_steps=100,
output_path='./models/custom-router-encoder'
)
Evaluation¶
# Evaluate on validation set
from sklearn.metrics.pairwise import cosine_similarity
def evaluate_encoder(encoder, validation_data):
"""Evaluate encoder on validation set."""
correct = 0
total = 0
for item in validation_data:
query_emb = encoder.encode([item["query"]])[0]
# Find closest route
route_embs = encoder.encode([r["description"] for r in routes])
similarities = cosine_similarity([query_emb], route_embs)[0]
predicted_route = routes[similarities.argmax()]["id"]
if predicted_route == item["route"]:
correct += 1
total += 1
accuracy = correct / total
return accuracy
accuracy = evaluate_encoder(custom_encoder, validation_data)
print(f"Accuracy: {accuracy:.2%}")
Model Optimization¶
Quantization¶
import torch
def quantize_model(model_path: str, output_path: str):
"""Quantize model to INT8 for faster inference."""
model = torch.load(model_path)
# Dynamic quantization
quantized_model = torch.quantization.quantize_dynamic(
model,
{torch.nn.Linear},
dtype=torch.qint8
)
torch.save(quantized_model, output_path)
return quantized_model
ONNX Export¶
def export_to_onnx(model, output_path: str):
"""Export model to ONNX format."""
dummy_input = torch.randn(1, 128) # Example input
torch.onnx.export(
model,
dummy_input,
output_path,
input_names=['input'],
output_names=['output'],
dynamic_axes={
'input': {0: 'batch_size'},
'output': {0: 'batch_size'}
}
)
Using Custom Encoders¶
With Core Router¶
from stratarouter import Router
from my_encoders import FineTunedEncoder
# Initialize custom encoder
encoder = FineTunedEncoder("./models/custom-router-encoder")
# Create router with custom dimension
router = Router(dimension=encoder.dimension)
# Add routes
router.add_routes(routes)
# Build index with custom embeddings
route_texts = [r.description for r in router.routes]
embeddings = encoder.encode(route_texts)
router.build_index(embeddings)
# Route with custom encoder
query = "Where's my payment?"
query_embedding = encoder.encode([query])[0]
result = router.route(query, query_embedding)
With Runtime¶
from stratarouter_runtime import CoreRuntimeBridge
# Initialize with custom encoder
config = RuntimeConfig(
encoder=encoder,
cache_enabled=True
)
bridge = CoreRuntimeBridge(config)
# Execute (encoder used automatically)
result = await bridge.execute(
query="Show me my invoice",
context={}
)
Performance Comparison¶
Benchmark Results¶
| Encoder | Dimension | Latency | Accuracy | Use Case |
|---|---|---|---|---|
| OpenAI text-embedding-3-small | 512 | 50ms | 92% | General purpose |
| all-MiniLM-L6-v2 | 384 | 5ms | 88% | Fast, local |
| Custom fine-tuned | 384 | 5ms | 95% | Domain-specific |
| mpnet-base-v2 | 768 | 15ms | 91% | High accuracy |
Best Practices¶
Model Selection¶
- Start with general-purpose models
- OpenAI embeddings for quick start
-
sentence-transformers for self-hosted
-
Fine-tune for production
- Collect 10K+ routing examples
- Balance positive and negative pairs
-
Evaluate on held-out test set
-
Optimize for inference
- Quantize for 2-4x speedup
- Use ONNX for cross-platform
- Batch encode when possible
Training Tips¶
- Data quality over quantity - 10K high-quality examples > 100K noisy
- Balance classes - Equal representation of all routes
- Hard negatives - Include confusing examples
- Validate regularly - Monitor accuracy on production data
Troubleshooting¶
Low Accuracy¶
Issue: Custom encoder performs worse than baseline.
Solutions: - Increase training data (aim for 1K+ per route) - Add hard negative examples - Try different base models - Adjust learning rate and epochs
Slow Inference¶
Issue: Custom encoder too slow for production.
Solutions: - Quantize model to INT8 - Export to ONNX - Use smaller base model - Implement caching
High Memory Usage¶
Issue: Model consumes too much memory.
Solutions: - Use distilled models - Quantize weights - Reduce batch size - Implement model pruning
Next Steps¶
- Policy Engine - Add governance policies
- Multi-Model - Route to multiple models
- Performance - Optimize inference speed
Build encoders optimized for your domain. 🎯