Graph Structure¶

This page provides a comprehensive overview of the HNSW graph structure and architecture.

HNSW Architecture Overview¶

HNSW (Hierarchical Navigable Small World) is a graph-based algorithm for approximate nearest neighbor search. Its architecture consists of multiple layers of graphs, where each layer is a navigable small world graph.

The key components of the HNSW architecture include:

Hierarchical Structure: Multiple layers of graphs with decreasing density
Small World Graphs: Each layer is a navigable small world graph
Greedy Search: Efficient search algorithm that traverses the layers

This multi-layered approach allows HNSW to achieve logarithmic search complexity, making it one of the fastest algorithms for approximate nearest neighbor search.

Hierarchical Structure¶

The hierarchical structure of HNSW consists of multiple layers (L0, L1, L2, ..., Lmax), where:

L0 is the bottom layer containing all nodes
Each subsequent layer contains a subset of nodes from the layer below
The top layer (Lmax) contains very few nodes

The probability of a node being promoted to a higher layer follows a geometric distribution, resulting in an exponential decrease in the number of nodes as we move up the hierarchy:

Layer L0: 100% of nodes
Layer L1: ~1/M nodes from L0
Layer L2: ~1/M nodes from L1
And so on...

Where M is a parameter that controls the maximum number of connections per node.

This hierarchical structure enables efficient search by allowing the algorithm to quickly navigate to the approximate region of the query point in the higher layers, then refine the search in the lower layers.

Small World Graphs¶

Each layer in HNSW is a navigable small world graph, which has the following properties:

Short Path Length: The average path length between any two nodes is logarithmic in the number of nodes
High Clustering: Nodes tend to form clusters with their nearest neighbors
Long-Range Connections: Some connections span large distances, enabling efficient navigation

These properties allow for efficient navigation through the graph, as the algorithm can make large jumps using long-range connections and then refine the search using local connections.

Graph Construction¶

When adding a new vector to the HNSW graph, the following steps are performed:

Layer Assignment: Randomly assign the new node to layers based on a geometric distribution
Entry Point Selection: Start from the entry point at the top layer
Greedy Search: Find the closest node to the new vector at each layer
Connection Establishment: Connect the new node to its nearest neighbors at each assigned layer
Connection Optimization: Ensure that connections satisfy the small world properties

The graph construction process ensures that each node is connected to its nearest neighbors while maintaining the navigable small world properties.

Search Algorithm¶

The search algorithm in HNSW follows these steps:

Start at the Entry Point: Begin the search from the entry point at the top layer
Greedy Search at Each Layer: At each layer, perform a greedy search to find the closest node to the query
Layer Transition: Use the closest node found as the entry point for the next layer
Final Search: At the bottom layer, perform a more thorough search to find the k nearest neighbors

This approach allows HNSW to achieve logarithmic time complexity for search operations, making it highly efficient for large datasets.

Implementation Details¶

Data Structures¶

The HNSW graph is implemented using the following data structures:

type Graph[T comparable] struct {
    nodes       map[T]*Node[T]
    vectors     map[T][]float32
    entryPoint  T
    maxLayer    int
    M           int  // Maximum number of connections per node
    efConstruction int  // Size of the dynamic candidate list during construction
    Distance    DistanceFunc
    // ... other fields
}

type Node[T comparable] struct {
    ID        T
    Neighbors map[int][]T  // Map from layer to neighbor IDs
    // ... other fields
}

type DistanceFunc func(a, b []float32) float32

The graph stores nodes and their vectors separately, with connections represented as maps from layer to neighbor IDs.

Memory Layout¶

The memory layout of the HNSW graph is optimized for efficient access and cache locality:

Nodes: Stored in a map for O(1) access by ID
Vectors: Stored in a separate map to allow for different storage strategies
Connections: Stored as adjacency lists for each node at each layer

This layout allows for efficient traversal of the graph during search operations.

Performance Characteristics¶

The HNSW algorithm has the following performance characteristics:

Construction Time: O(n log n) for n vectors
Search Time: O(log n) for finding the nearest neighbor
Memory Usage: O(n M), where M is the maximum number of connections per node

These characteristics make HNSW suitable for large-scale nearest neighbor search applications.

Concurrency and Thread Safety¶

The HNSW implementation supports concurrent operations with the following characteristics:

Read Operations: Multiple search operations can be performed concurrently
Write Operations: Adding nodes requires exclusive access to the graph
Mixed Operations: Read operations can proceed concurrently with write operations with proper synchronization

This allows for efficient utilization of multi-core processors in high-throughput applications.

Next Steps¶

Now that you understand the graph structure of HNSW, you can explore:

Distance Functions: Learn about the different distance metrics available in HNSW
Search Algorithms: Dive deeper into the search algorithms used in HNSW
Performance Tuning: Learn how to optimize HNSW for your specific use case