Hybrid Extension¶

The Hybrid Extension provides a multi-strategy approach to approximate nearest neighbor search, combining the strengths of different algorithms to achieve superior performance across diverse workloads.

Features¶

Multi-Strategy Approach: Combines HNSW with complementary techniques like exact search, LSH, and data partitioning
Adaptive Selection: Automatically selects the optimal search strategy based on dataset characteristics
Improved Performance: Superior search efficiency and recall compared to single-strategy approaches
Resource Optimization: Better memory utilization and scalability for large datasets
Unified Interface: Seamless integration with the core HNSW API

Architecture¶

The Hybrid Index architecture integrates multiple search strategies within a unified framework:

Index Manager: Coordinates the selection and combination of different indexing strategies
HNSW Index: Provides efficient approximate search for medium to large datasets
Exact Index: Delivers perfect recall for small datasets
LSH Index: Generates candidate sets for high-dimensional data
Partitioner: Divides the vector space into regions for improved scalability

Strategy Selection¶

The hybrid index employs a tiered strategy for index selection:

For small datasets (below a configurable threshold), exact search is used to ensure perfect recall
For medium-sized datasets, HNSW provides an optimal balance of speed and accuracy
For large datasets, a combination of partitioning and HNSW is employed
For high-dimensional data, LSH is used to generate candidate sets before refinement

Usage¶

Installation¶

import (
    "github.com/TFMV/hnsw"
    "github.com/TFMV/hnsw/hnsw-extensions/hybrid"
)

Creating a Hybrid Index¶

// Create a configuration
config := hybrid.DefaultHybridConfig()
config.ExactThreshold = 1000  // Use exact search for datasets smaller than 1000 vectors
config.DimThreshold = 1000    // Use LSH for dimensions greater than 1000
config.NumPartitions = 16     // Number of partitions for large datasets
config.Distance = hnsw.CosineDistance

// Create a new hybrid index
index, err := hybrid.NewHybridIndex[int](config)
if err != nil {
    // Handle error
}
defer index.Close()

Adding Vectors¶

// Add a single vector
err := index.Add(1, []float32{0.1, 0.2, 0.3})

// Add multiple vectors
for i := 0; i < 1000; i++ {
    err := index.Add(i, generateVector(128)) // Your vector generation function
    if err != nil {
        // Handle error
    }
}

Searching¶

// Search for the 5 nearest neighbors
query := []float32{0.2, 0.3, 0.4}
results, err := index.Search(query, 5)
if err != nil {
    // Handle error
}

// Process results
for i, result := range results {
    fmt.Printf("%d. ID: %d, Distance: %f\n", i+1, result.Key, result.Distance)
}

Deleting Vectors¶

// Delete a single vector
deleted := index.Delete(1)

// Delete multiple vectors
keysToDelete := []int{2, 3, 4}
results := index.BatchDelete(keysToDelete)

Adaptive Hybrid Index¶

The Adaptive Hybrid Index extends the basic hybrid approach with real-time performance monitoring and dynamic strategy selection:

// Create an adaptive configuration
config := hybrid.DefaultAdaptiveConfig()
config.LearningRate = 0.1
config.AdaptationInterval = 100 // Adapt strategy every 100 queries
config.PerformanceWeight = 0.7  // Weight for search performance vs recall

// Create a new adaptive hybrid index
index, err := hybrid.NewAdaptiveHybridIndex[int](config)
if err != nil {
    // Handle error
}
defer index.Close()

The adaptive index automatically learns from query patterns and adjusts its strategy selection to optimize performance over time.

Components¶

Exact Index¶

The exact index maintains a simple map of keys to vectors and performs brute-force search by computing distances to all vectors:

type ExactIndex[K cmp.Ordered] struct {
    vectors  map[K][]float32
    distance DistanceFunc
    mu       sync.RWMutex
}

LSH Index¶

The LSH index uses random projections to hash vectors into buckets, enabling efficient candidate generation:

type LSHIndex[K cmp.Ordered] struct {
    hashTables    []map[uint64][]K
    projections   [][]float32
    numHashBits   int
    numHashTables int
    vectors       map[K][]float32
    distance      DistanceFunc
    mu            sync.RWMutex
}

Partitioner¶

The partitioner divides the vector space into regions using a clustering approach:

type Partitioner[K cmp.Ordered] struct {
    numPartitions int
    centroids     [][]float32
    assignments   map[K]int
    counts        []int
    distance      DistanceFunc
    mu            sync.RWMutex
}

Configuration Options¶

HybridConfig¶

The HybridConfig struct allows customization of various parameters:

ExactThreshold: Dataset size threshold for using exact search (default: 1000)
DimThreshold: Dimensionality threshold for using LSH (default: 1000)
NumPartitions: Number of partitions for large datasets (default: 16)
Distance: Distance function (default: CosineDistance)
HNSWConfig: Configuration for the HNSW index
LSHConfig: Configuration for the LSH index

AdaptiveConfig¶

The AdaptiveConfig struct extends HybridConfig with additional parameters:

LearningRate: Rate at which the adaptive index learns from query patterns (default: 0.1)
AdaptationInterval: Number of queries between strategy adaptations (default: 100)
PerformanceWeight: Weight for search performance vs recall in strategy selection (default: 0.7)
RecallTarget: Target recall level for adaptive optimization (default: 0.95)

Performance Considerations¶

For optimal performance, consider the following:

Dataset Size: Adjust the ExactThreshold based on your hardware capabilities. Smaller values favor speed, larger values favor recall.
Dimensionality: Adjust the DimThreshold based on your data characteristics. Higher-dimensional data benefits more from LSH.
Number of Partitions: Adjust the NumPartitions based on your dataset size. More partitions improve scalability but may reduce recall.
Adaptive Parameters: Tune the LearningRate and AdaptationInterval based on your query patterns. More stable patterns benefit from slower adaptation.

Benchmark Results¶

The hybrid index demonstrates significant performance improvements over single-strategy approaches:

Small Datasets: Up to 10x faster than HNSW with perfect recall
High-Dimensional Data: Up to 5x faster than HNSW with comparable recall
Large Datasets: Up to 3x better memory efficiency with minimal recall degradation
Adaptive Selection: Up to 30% improvement in query throughput for mixed workloads

Example¶

Here's a complete example of using the Hybrid extension:

package main

import (
    "fmt"
    "math/rand"

    "github.com/TFMV/hnsw"
    "github.com/TFMV/hnsw/hnsw-extensions/hybrid"
)

func main() {
    // Create a new hybrid index with default configuration
    config := hybrid.DefaultHybridConfig()
    config.ExactThreshold = 1000
    config.Distance = hnsw.CosineDistance

    index, err := hybrid.NewHybridIndex[int](config)
    if err != nil {
        panic(err)
    }
    defer index.Close()

    // Add vectors
    dimensions := 128
    numVectors := 10000

    for i := 0; i < numVectors; i++ {
        err := index.Add(i, generateRandomVector(dimensions))
        if err != nil {
            panic(err)
        }
    }

    // Perform a search
    query := generateRandomVector(dimensions)
    results, err := index.Search(query, 10)
    if err != nil {
        panic(err)
    }

    fmt.Println("Search results:")
    for i, result := range results {
        fmt.Printf("  %d. ID: %d, Distance: %.6f\n", i+1, result.Key, result.Distance)
    }
}

// generateRandomVector creates a random normalized vector
func generateRandomVector(dimensions int) []float32 {
    vector := make([]float32, dimensions)
    for i := range vector {
        vector[i] = rand.Float32()
    }

    // Normalize the vector
    var sum float32
    for _, v := range vector {
        sum += v * v
    }
    norm := float32(1.0 / float64(sum))
    for i := range vector {
        vector[i] *= norm
    }

    return vector
}

Adaptive Example¶

Here's an example using the Adaptive Hybrid Index:

package main

import (
    "fmt"
    "math/rand"

    "github.com/TFMV/hnsw"
    "github.com/TFMV/hnsw/hnsw-extensions/hybrid"
)

func main() {
    // Create a new adaptive hybrid index
    config := hybrid.DefaultAdaptiveConfig()
    config.LearningRate = 0.1
    config.AdaptationInterval = 100
    config.Distance = hnsw.CosineDistance

    index, err := hybrid.NewAdaptiveHybridIndex[int](config)
    if err != nil {
        panic(err)
    }
    defer index.Close()

    // Add vectors
    dimensions := 128
    numVectors := 10000

    for i := 0; i < numVectors; i++ {
        err := index.Add(i, generateRandomVector(dimensions))
        if err != nil {
            panic(err)
        }
    }

    // Perform searches to train the adaptive index
    for i := 0; i < 1000; i++ {
        query := generateRandomVector(dimensions)
        _, err := index.Search(query, 10)
        if err != nil {
            panic(err)
        }
    }

    // The index has now adapted to the query patterns
    fmt.Println("Adaptive index has been trained on 1000 queries")

    // Perform a final search
    query := generateRandomVector(dimensions)
    results, err := index.Search(query, 10)
    if err != nil {
        panic(err)
    }

    fmt.Println("Search results:")
    for i, result := range results {
        fmt.Printf("  %d. ID: %d, Distance: %.6f\n", i+1, result.Key, result.Distance)
    }

    // Get statistics about strategy selection
    stats := index.GetStats()
    fmt.Printf("Strategy usage: HNSW: %d%%, Exact: %d%%, LSH: %d%%, Partitioned: %d%%\n",
        stats.StrategyUsage["hnsw"], 
        stats.StrategyUsage["exact"],
        stats.StrategyUsage["lsh"],
        stats.StrategyUsage["partitioned"])
}