Skip to content

HNSW Algorithm

Quiver's blazing-fast search capabilities are powered by the Hierarchical Navigable Small World (HNSW) algorithm. Let's dive into how this algorithm works and why it's so effective for vector similarity search! 🔍

What is HNSW?

HNSW (Hierarchical Navigable Small World) is a graph-based algorithm for approximate nearest neighbor search. It was introduced in 2016 by Yury Malkov and Dmitry Yashunin in their paper "Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs".

The algorithm creates a multi-layered graph structure that allows for extremely fast navigation through the vector space, achieving logarithmic search complexity.

How HNSW Works

The Basic Idea

HNSW builds a multi-layered graph where:

  1. Each vector is represented as a node in the graph
  2. Nodes are connected to their "neighbors" (similar vectors)
  3. The graph has multiple layers, with fewer nodes in higher layers
  4. Higher layers act as "highways" for faster navigation

This hierarchical structure allows the algorithm to quickly narrow down the search space, making it incredibly efficient even with millions of vectors.

Graph Construction

When you add a vector to Quiver, the HNSW algorithm:

  1. Randomly assigns the vector to a maximum layer level
  2. Inserts the vector into all layers from 0 up to its maximum level
  3. For each layer, connects the vector to its nearest neighbors
  4. Maintains a maximum number of connections per node (the M parameter)

The construction process ensures that the graph has good navigational properties, allowing for efficient search.

// Simplified pseudocode for HNSW insertion
func (hnsw *Graph) Insert(vector []float32, id uint64) {
    // Randomly select maximum level for this vector
    maxLevel := randomLevel()

    // Start from the top layer of the existing graph
    currentNode := hnsw.entryPoint

    // For each layer from top to the vector's max level
    for level := hnsw.maxLevel; level > maxLevel; level-- {
        // Find the closest node to the vector at this level
        currentNode = findClosestNeighbor(vector, currentNode, level)
    }

    // For each layer from the vector's max level to bottom
    for level := maxLevel; level >= 0; level-- {
        // Find nearest neighbors at this level
        neighbors := findNearestNeighbors(vector, currentNode, level, hnsw.M)

        // Connect the vector to its neighbors
        hnsw.connect(id, neighbors, level)

        // Update the entry point if needed
        if level == hnsw.maxLevel {
            hnsw.entryPoint = id
        }

        // Use these neighbors as starting points for the next layer
        currentNode = findClosestNeighbor(vector, neighbors, level)
    }
}

Search Process

When you search for similar vectors, the HNSW algorithm:

  1. Starts at the entry point in the top layer
  2. Greedily moves to the nearest neighbor of the query vector
  3. When no closer neighbors are found, descends to the next layer
  4. Repeats until reaching the bottom layer
  5. Performs a more thorough search at the bottom layer

This approach allows for logarithmic search complexity, making it extremely fast even with large datasets.

// Simplified pseudocode for HNSW search
func (hnsw *Graph) Search(query []float32, k int) []SearchResult {
    // Start from the top layer of the graph
    currentNode := hnsw.entryPoint

    // For each layer from top to bottom
    for level := hnsw.maxLevel; level > 0; level-- {
        // Greedy search at this layer
        currentNode = greedySearch(query, currentNode, level)
    }

    // More thorough search at the bottom layer
    candidates := beamSearch(query, currentNode, 0, hnsw.efSearch)

    // Return the k closest candidates
    return getTopK(candidates, k)
}

Key Parameters

Quiver exposes several HNSW parameters that you can tune:

M (HNSWM)

The M parameter controls the maximum number of connections per node:

config := quiver.Config{
    // ... other settings ...
    HNSWM: 16, // Default is 16
}

Higher values of M:

  • Improve search accuracy
  • Increase memory usage
  • Slow down construction time

Lower values of M:

  • Reduce memory usage
  • Speed up construction
  • May reduce search accuracy

Recommended Values

Values between 12-64 work well for most applications. Start with 16 and adjust if needed.

efConstruction (HNSWEfConstruct)

The efConstruction parameter controls the quality of graph construction:

config := quiver.Config{
    // ... other settings ...
    HNSWEfConstruct: 200, // Default is 200
}

Higher values of efConstruction:

  • Create a better quality graph
  • Improve search accuracy
  • Slow down construction time

Lower values of efConstruction:

  • Speed up construction
  • May reduce search accuracy

Recommended Values

Values between 100-500 work well for most applications. Use higher values for better accuracy.

efSearch (HNSWEfSearch)

The efSearch parameter controls the quality of search:

config := quiver.Config{
    // ... other settings ...
    HNSWEfSearch: 100, // Default is 100
}

Higher values of efSearch:

  • Improve search accuracy
  • Slow down search speed

Lower values of efSearch:

  • Speed up search
  • May reduce search accuracy

Recommended Values

Values between 50-200 provide a good balance. If you need more precision, increase this value.

Performance Characteristics

HNSW offers several key performance advantages:

Search Complexity

  • Time complexity: O(log N), where N is the number of vectors
  • This is much better than the O(N) complexity of brute force search

Memory Usage

  • Memory usage: O(M * N), where M is the HNSWM parameter
  • Each vector requires storage for its connections

Construction Time

  • Construction time: O(M log N N)
  • Construction is slower than search, but still efficient

Quiver's Implementation

Quiver's implementation of HNSW includes several optimizations:

SIMD Acceleration

Quiver uses SIMD (Single Instruction, Multiple Data) instructions to accelerate distance calculations, making both construction and search faster.

Batch Processing

Quiver batches vector additions to improve throughput:

config := quiver.Config{
    // ... other settings ...
    BatchSize: 1000, // Default is 1000
}

Concurrent Access

Quiver's implementation is thread-safe, allowing for concurrent reads and writes.

Comparison with Other Algorithms

HNSW outperforms many other approximate nearest neighbor search algorithms:

Algorithm Search Time Build Time Memory Usage Accuracy
HNSW Very Fast Moderate Moderate High
Annoy Fast Fast Low Moderate
FAISS Fast Fast Moderate High
Brute Force Very Slow None Low Perfect

HNSW provides an excellent balance of speed, memory usage, and accuracy, making it ideal for most vector search applications.

Visualizing HNSW

To help understand how HNSW works, here's a simplified visualization of the multi-layered graph structure:

HNSW

In this visualization:

  • The top layer (purple) has fewer nodes and acts as a "highway"
  • The middle layer (orange) provides intermediate routing
  • The bottom layer (teal) contains all nodes with more connections
  • Search starts at the top layer and descends through the layers

Further Reading

If you're interested in learning more about HNSW, check out these resources:

Next Steps

Now that you understand how HNSW powers Quiver's vector search, check out: