Hybrid Extension¶
The Hybrid Extension provides a multi-strategy approach to approximate nearest neighbor search, combining the strengths of different algorithms to achieve superior performance across diverse workloads.
Features¶
- Multi-Strategy Approach: Combines HNSW with complementary techniques like exact search, LSH, and data partitioning
- Adaptive Selection: Automatically selects the optimal search strategy based on dataset characteristics
- Improved Performance: Superior search efficiency and recall compared to single-strategy approaches
- Resource Optimization: Better memory utilization and scalability for large datasets
- Unified Interface: Seamless integration with the core HNSW API
Architecture¶
The Hybrid Index architecture integrates multiple search strategies within a unified framework:
- Index Manager: Coordinates the selection and combination of different indexing strategies
- HNSW Index: Provides efficient approximate search for medium to large datasets
- Exact Index: Delivers perfect recall for small datasets
- LSH Index: Generates candidate sets for high-dimensional data
- Partitioner: Divides the vector space into regions for improved scalability
Strategy Selection¶
The hybrid index employs a tiered strategy for index selection:
- For small datasets (below a configurable threshold), exact search is used to ensure perfect recall
- For medium-sized datasets, HNSW provides an optimal balance of speed and accuracy
- For large datasets, a combination of partitioning and HNSW is employed
- For high-dimensional data, LSH is used to generate candidate sets before refinement
Usage¶
Installation¶
Creating a Hybrid Index¶
// Create a configuration
config := hybrid.DefaultHybridConfig()
config.ExactThreshold = 1000 // Use exact search for datasets smaller than 1000 vectors
config.DimThreshold = 1000 // Use LSH for dimensions greater than 1000
config.NumPartitions = 16 // Number of partitions for large datasets
config.Distance = hnsw.CosineDistance
// Create a new hybrid index
index, err := hybrid.NewHybridIndex[int](config)
if err != nil {
// Handle error
}
defer index.Close()
Adding Vectors¶
// Add a single vector
err := index.Add(1, []float32{0.1, 0.2, 0.3})
// Add multiple vectors
for i := 0; i < 1000; i++ {
err := index.Add(i, generateVector(128)) // Your vector generation function
if err != nil {
// Handle error
}
}
Searching¶
// Search for the 5 nearest neighbors
query := []float32{0.2, 0.3, 0.4}
results, err := index.Search(query, 5)
if err != nil {
// Handle error
}
// Process results
for i, result := range results {
fmt.Printf("%d. ID: %d, Distance: %f\n", i+1, result.Key, result.Distance)
}
Deleting Vectors¶
// Delete a single vector
deleted := index.Delete(1)
// Delete multiple vectors
keysToDelete := []int{2, 3, 4}
results := index.BatchDelete(keysToDelete)
Adaptive Hybrid Index¶
The Adaptive Hybrid Index extends the basic hybrid approach with real-time performance monitoring and dynamic strategy selection:
// Create an adaptive configuration
config := hybrid.DefaultAdaptiveConfig()
config.LearningRate = 0.1
config.AdaptationInterval = 100 // Adapt strategy every 100 queries
config.PerformanceWeight = 0.7 // Weight for search performance vs recall
// Create a new adaptive hybrid index
index, err := hybrid.NewAdaptiveHybridIndex[int](config)
if err != nil {
// Handle error
}
defer index.Close()
The adaptive index automatically learns from query patterns and adjusts its strategy selection to optimize performance over time.
Components¶
Exact Index¶
The exact index maintains a simple map of keys to vectors and performs brute-force search by computing distances to all vectors:
type ExactIndex[K cmp.Ordered] struct {
vectors map[K][]float32
distance DistanceFunc
mu sync.RWMutex
}
LSH Index¶
The LSH index uses random projections to hash vectors into buckets, enabling efficient candidate generation:
type LSHIndex[K cmp.Ordered] struct {
hashTables []map[uint64][]K
projections [][]float32
numHashBits int
numHashTables int
vectors map[K][]float32
distance DistanceFunc
mu sync.RWMutex
}
Partitioner¶
The partitioner divides the vector space into regions using a clustering approach:
type Partitioner[K cmp.Ordered] struct {
numPartitions int
centroids [][]float32
assignments map[K]int
counts []int
distance DistanceFunc
mu sync.RWMutex
}
Configuration Options¶
HybridConfig¶
The HybridConfig
struct allows customization of various parameters:
ExactThreshold
: Dataset size threshold for using exact search (default: 1000)DimThreshold
: Dimensionality threshold for using LSH (default: 1000)NumPartitions
: Number of partitions for large datasets (default: 16)Distance
: Distance function (default: CosineDistance)HNSWConfig
: Configuration for the HNSW indexLSHConfig
: Configuration for the LSH index
AdaptiveConfig¶
The AdaptiveConfig
struct extends HybridConfig
with additional parameters:
LearningRate
: Rate at which the adaptive index learns from query patterns (default: 0.1)AdaptationInterval
: Number of queries between strategy adaptations (default: 100)PerformanceWeight
: Weight for search performance vs recall in strategy selection (default: 0.7)RecallTarget
: Target recall level for adaptive optimization (default: 0.95)
Performance Considerations¶
For optimal performance, consider the following:
- Dataset Size: Adjust the
ExactThreshold
based on your hardware capabilities. Smaller values favor speed, larger values favor recall. - Dimensionality: Adjust the
DimThreshold
based on your data characteristics. Higher-dimensional data benefits more from LSH. - Number of Partitions: Adjust the
NumPartitions
based on your dataset size. More partitions improve scalability but may reduce recall. - Adaptive Parameters: Tune the
LearningRate
andAdaptationInterval
based on your query patterns. More stable patterns benefit from slower adaptation.
Benchmark Results¶
The hybrid index demonstrates significant performance improvements over single-strategy approaches:
- Small Datasets: Up to 10x faster than HNSW with perfect recall
- High-Dimensional Data: Up to 5x faster than HNSW with comparable recall
- Large Datasets: Up to 3x better memory efficiency with minimal recall degradation
- Adaptive Selection: Up to 30% improvement in query throughput for mixed workloads
Example¶
Here's a complete example of using the Hybrid extension:
package main
import (
"fmt"
"math/rand"
"github.com/TFMV/hnsw"
"github.com/TFMV/hnsw/hnsw-extensions/hybrid"
)
func main() {
// Create a new hybrid index with default configuration
config := hybrid.DefaultHybridConfig()
config.ExactThreshold = 1000
config.Distance = hnsw.CosineDistance
index, err := hybrid.NewHybridIndex[int](config)
if err != nil {
panic(err)
}
defer index.Close()
// Add vectors
dimensions := 128
numVectors := 10000
for i := 0; i < numVectors; i++ {
err := index.Add(i, generateRandomVector(dimensions))
if err != nil {
panic(err)
}
}
// Perform a search
query := generateRandomVector(dimensions)
results, err := index.Search(query, 10)
if err != nil {
panic(err)
}
fmt.Println("Search results:")
for i, result := range results {
fmt.Printf(" %d. ID: %d, Distance: %.6f\n", i+1, result.Key, result.Distance)
}
}
// generateRandomVector creates a random normalized vector
func generateRandomVector(dimensions int) []float32 {
vector := make([]float32, dimensions)
for i := range vector {
vector[i] = rand.Float32()
}
// Normalize the vector
var sum float32
for _, v := range vector {
sum += v * v
}
norm := float32(1.0 / float64(sum))
for i := range vector {
vector[i] *= norm
}
return vector
}
Adaptive Example¶
Here's an example using the Adaptive Hybrid Index:
package main
import (
"fmt"
"math/rand"
"github.com/TFMV/hnsw"
"github.com/TFMV/hnsw/hnsw-extensions/hybrid"
)
func main() {
// Create a new adaptive hybrid index
config := hybrid.DefaultAdaptiveConfig()
config.LearningRate = 0.1
config.AdaptationInterval = 100
config.Distance = hnsw.CosineDistance
index, err := hybrid.NewAdaptiveHybridIndex[int](config)
if err != nil {
panic(err)
}
defer index.Close()
// Add vectors
dimensions := 128
numVectors := 10000
for i := 0; i < numVectors; i++ {
err := index.Add(i, generateRandomVector(dimensions))
if err != nil {
panic(err)
}
}
// Perform searches to train the adaptive index
for i := 0; i < 1000; i++ {
query := generateRandomVector(dimensions)
_, err := index.Search(query, 10)
if err != nil {
panic(err)
}
}
// The index has now adapted to the query patterns
fmt.Println("Adaptive index has been trained on 1000 queries")
// Perform a final search
query := generateRandomVector(dimensions)
results, err := index.Search(query, 10)
if err != nil {
panic(err)
}
fmt.Println("Search results:")
for i, result := range results {
fmt.Printf(" %d. ID: %d, Distance: %.6f\n", i+1, result.Key, result.Distance)
}
// Get statistics about strategy selection
stats := index.GetStats()
fmt.Printf("Strategy usage: HNSW: %d%%, Exact: %d%%, LSH: %d%%, Partitioned: %d%%\n",
stats.StrategyUsage["hnsw"],
stats.StrategyUsage["exact"],
stats.StrategyUsage["lsh"],
stats.StrategyUsage["partitioned"])
}