Skip to content

Search Algorithms

This page explains the search algorithms used in HNSW, how they work, and how to optimize them for your specific use case.

Overview

HNSW (Hierarchical Navigable Small World) uses a multi-layered graph structure to enable efficient approximate nearest neighbor search. The search algorithm is a key component that allows HNSW to achieve logarithmic time complexity for search operations.

The core search algorithm in HNSW is a greedy search with a beam-like approach that:

  1. Starts at the top layer of the graph (with fewest nodes)
  2. Performs a greedy search to find the closest node to the query
  3. Uses this node as an entry point to the next layer
  4. Repeats the process until reaching the bottom layer
  5. Returns the closest nodes found in the bottom layer

Basic Search Algorithm

The basic search algorithm in HNSW is implemented as follows:

func (g *Graph) Search(query Vector, k int) []Node {
    // Start from the entry point at the top layer
    currentNode := g.entryPoint
    currentDistance := g.Distance(query, currentNode.Vector)

    // Traverse the layers from top to bottom
    for layer := g.maxLayer; layer > 0; layer-- {
        // Greedy search in the current layer
        changed := true
        for changed {
            changed = false

            // Check all neighbors of the current node at this layer
            for _, neighbor := range currentNode.Neighbors[layer] {
                neighborDistance := g.Distance(query, neighbor.Vector)

                // If we found a closer node, move to it
                if neighborDistance < currentDistance {
                    currentNode = neighbor
                    currentDistance = neighborDistance
                    changed = true
                    break
                }
            }
        }
    }

    // Perform the final search in the bottom layer (layer 0)
    // using a priority queue to keep track of the k closest nodes
    candidates := NewPriorityQueue(k)
    candidates.Push(currentNode, currentDistance)

    visited := make(map[NodeID]bool)
    visited[currentNode.ID] = true

    for candidates.Len() > 0 {
        // Get the closest unvisited node
        current, _ := candidates.Pop()

        // Check all neighbors of the current node
        for _, neighbor := range current.Neighbors[0] {
            if visited[neighbor.ID] {
                continue
            }

            visited[neighbor.ID] = true
            neighborDistance := g.Distance(query, neighbor.Vector)

            // If we haven't found k nodes yet, or if this node is closer than the furthest in our result
            if candidates.Len() < k || neighborDistance < candidates.MaxDistance() {
                candidates.Push(neighbor, neighborDistance)
            }
        }
    }

    // Return the k closest nodes
    return candidates.Items()
}

HNSW uses a beam search approach in the bottom layer to improve the quality of search results. Instead of a simple greedy search, it maintains a priority queue of the best candidates found so far and explores their neighborhoods.

The beam search algorithm:

  1. Initializes a priority queue with the entry point
  2. Repeatedly extracts the closest node from the queue
  3. Explores its neighbors and adds promising ones to the queue
  4. Continues until the queue is empty or a stopping condition is met

This approach helps avoid getting stuck in local minima and improves the recall of the search.

Search with ef Parameter

The ef parameter controls the size of the dynamic candidate list during search. A larger ef value results in more accurate but slower searches:

func (g *Graph) SearchWithEF(query Vector, k int, ef int) []Node {
    // Similar to the basic search, but with a larger candidate pool
    // controlled by the ef parameter

    // ... (code for traversing layers)

    // Use a priority queue with capacity ef instead of k
    candidates := NewPriorityQueue(ef)
    // ... (rest of the search algorithm)
}

The ef parameter allows you to trade off between search speed and accuracy:

  • Small ef (e.g., 10-50): Faster searches but potentially lower recall
  • Large ef (e.g., 100-1000): Slower searches but higher recall

HNSW supports concurrent searches, which can significantly improve throughput when processing multiple queries:

func (g *Graph) ConcurrentSearch(queries []Vector, k int) [][]Node {
    results := make([][]Node, len(queries))

    var wg sync.WaitGroup
    for i, query := range queries {
        wg.Add(1)
        go func(i int, query Vector) {
            defer wg.Done()
            results[i] = g.Search(query, k)
        }(i, query)
    }

    wg.Wait()
    return results
}

The HNSW graph structure is read-only during search operations, which allows for safe concurrent access without locks.

Search with Filters

HNSW can be extended to support filtered search, where results must satisfy certain criteria:

func (g *FilteredGraph) SearchWithFilter(query Vector, k int, filter Filter) []Node {
    // Similar to the basic search, but with an additional filter step

    // ... (code for traversing layers)

    for candidates.Len() > 0 {
        current, _ := candidates.Pop()

        // Only consider neighbors that satisfy the filter
        for _, neighbor := range current.Neighbors[0] {
            if visited[neighbor.ID] || !filter(neighbor) {
                continue
            }

            // ... (rest of the search algorithm)
        }
    }
}

Filtered search may require exploring more nodes to find k results that satisfy the filter, potentially increasing search time.

Search with Negative Examples

HNSW can be extended to support search with negative examples, where the goal is to find vectors similar to positive examples but dissimilar to negative examples:

func (g *Graph) SearchWithNegatives(positiveQuery, negativeQuery Vector, k int) []Node {
    // ... (code for traversing layers)

    for candidates.Len() > 0 {
        current, _ := candidates.Pop()

        // Calculate combined distance: distance to positive - distance to negative
        for _, neighbor := range current.Neighbors[0] {
            if visited[neighbor.ID] {
                continue
            }

            visited[neighbor.ID] = true
            positiveDistance := g.Distance(positiveQuery, neighbor.Vector)
            negativeDistance := g.Distance(negativeQuery, neighbor.Vector)
            combinedDistance := positiveDistance - negativeDistance

            // ... (rest of the search algorithm using combinedDistance)
        }
    }
}

This approach can be useful for recommendation systems and other applications where you want to find items similar to some examples but dissimilar to others.

Performance Tuning

Optimizing ef Parameter

The ef parameter has a significant impact on search performance:

// For high recall (>95%)
results := graph.SearchWithEF(query, 10, 200)

// For balanced performance
results := graph.SearchWithEF(query, 10, 50)

// For high speed
results := graph.SearchWithEF(query, 10, 20)

You can dynamically adjust the ef parameter based on your requirements:

func adaptiveSearch(query Vector, k int, targetRecall float64) []Node {
    // Start with a small ef
    ef := k * 2

    // Increase ef until we reach the target recall
    for {
        results := graph.SearchWithEF(query, k, ef)
        recall := estimateRecall(results)

        if recall >= targetRecall {
            return results
        }

        ef *= 2
        if ef > 1000 {
            // Cap ef to avoid excessive computation
            return graph.SearchWithEF(query, k, 1000)
        }
    }
}

For processing multiple queries, batch search can be more efficient than individual searches:

func (g *Graph) BatchSearch(queries []Vector, k int) [][]Node {
    results := make([][]Node, len(queries))

    // Process queries in batches to utilize CPU caches better
    batchSize := 16
    for i := 0; i < len(queries); i += batchSize {
        end := i + batchSize
        if end > len(queries) {
            end = len(queries)
        }

        // Process this batch concurrently
        var wg sync.WaitGroup
        for j := i; j < end; j++ {
            wg.Add(1)
            go func(j int) {
                defer wg.Done()
                results[j] = g.Search(queries[j], k)
            }(j)
        }

        wg.Wait()
    }

    return results
}

Batch processing can improve cache locality and overall throughput.

Advanced Search Techniques

HNSW can be combined with exact search methods for a hybrid approach:

func hybridSearch(query Vector, k int, exactThreshold int) []Node {
    // Perform approximate search with HNSW
    approximateResults := graph.Search(query, k)

    // If we need high precision or have few results, perform exact search
    if len(approximateResults) < exactThreshold {
        exactResults := exactSearch(query, k)
        return exactResults
    }

    return approximateResults
}

This approach can provide a good balance between speed and accuracy.

For interactive applications, progressive search can provide quick initial results that improve over time:

func progressiveSearch(query Vector, k int, callback func([]Node)) {
    // Start with a small ef for quick initial results
    initialResults := graph.SearchWithEF(query, k, 10)
    callback(initialResults)

    // Improve results with larger ef values
    for ef := 20; ef <= 100; ef *= 2 {
        improvedResults := graph.SearchWithEF(query, k, ef)
        callback(improvedResults)
    }
}

This approach is useful for user interfaces where showing some results quickly is more important than waiting for the most accurate results.

Next Steps

Now that you understand the search algorithms in HNSW, you can explore: