Skip to content

Graph Structure

This page provides a comprehensive overview of the HNSW graph structure and architecture.

HNSW Architecture Overview

HNSW (Hierarchical Navigable Small World) is a graph-based algorithm for approximate nearest neighbor search. Its architecture consists of multiple layers of graphs, where each layer is a navigable small world graph.

The key components of the HNSW architecture include:

  1. Hierarchical Structure: Multiple layers of graphs with decreasing density
  2. Small World Graphs: Each layer is a navigable small world graph
  3. Greedy Search: Efficient search algorithm that traverses the layers

This multi-layered approach allows HNSW to achieve logarithmic search complexity, making it one of the fastest algorithms for approximate nearest neighbor search.

Hierarchical Structure

The hierarchical structure of HNSW consists of multiple layers (L0, L1, L2, ..., Lmax), where:

  • L0 is the bottom layer containing all nodes
  • Each subsequent layer contains a subset of nodes from the layer below
  • The top layer (Lmax) contains very few nodes

The probability of a node being promoted to a higher layer follows a geometric distribution, resulting in an exponential decrease in the number of nodes as we move up the hierarchy:

  • Layer L0: 100% of nodes
  • Layer L1: ~1/M nodes from L0
  • Layer L2: ~1/M nodes from L1
  • And so on...

Where M is a parameter that controls the maximum number of connections per node.

This hierarchical structure enables efficient search by allowing the algorithm to quickly navigate to the approximate region of the query point in the higher layers, then refine the search in the lower layers.

Small World Graphs

Each layer in HNSW is a navigable small world graph, which has the following properties:

  1. Short Path Length: The average path length between any two nodes is logarithmic in the number of nodes
  2. High Clustering: Nodes tend to form clusters with their nearest neighbors
  3. Long-Range Connections: Some connections span large distances, enabling efficient navigation

These properties allow for efficient navigation through the graph, as the algorithm can make large jumps using long-range connections and then refine the search using local connections.

Graph Construction

When adding a new vector to the HNSW graph, the following steps are performed:

  1. Layer Assignment: Randomly assign the new node to layers based on a geometric distribution
  2. Entry Point Selection: Start from the entry point at the top layer
  3. Greedy Search: Find the closest node to the new vector at each layer
  4. Connection Establishment: Connect the new node to its nearest neighbors at each assigned layer
  5. Connection Optimization: Ensure that connections satisfy the small world properties

The graph construction process ensures that each node is connected to its nearest neighbors while maintaining the navigable small world properties.

Search Algorithm

The search algorithm in HNSW follows these steps:

  1. Start at the Entry Point: Begin the search from the entry point at the top layer
  2. Greedy Search at Each Layer: At each layer, perform a greedy search to find the closest node to the query
  3. Layer Transition: Use the closest node found as the entry point for the next layer
  4. Final Search: At the bottom layer, perform a more thorough search to find the k nearest neighbors

This approach allows HNSW to achieve logarithmic time complexity for search operations, making it highly efficient for large datasets.

Implementation Details

Data Structures

The HNSW graph is implemented using the following data structures:

type Graph[T comparable] struct {
    nodes       map[T]*Node[T]
    vectors     map[T][]float32
    entryPoint  T
    maxLayer    int
    M           int  // Maximum number of connections per node
    efConstruction int  // Size of the dynamic candidate list during construction
    Distance    DistanceFunc
    // ... other fields
}

type Node[T comparable] struct {
    ID        T
    Neighbors map[int][]T  // Map from layer to neighbor IDs
    // ... other fields
}

type DistanceFunc func(a, b []float32) float32

The graph stores nodes and their vectors separately, with connections represented as maps from layer to neighbor IDs.

Memory Layout

The memory layout of the HNSW graph is optimized for efficient access and cache locality:

  1. Nodes: Stored in a map for O(1) access by ID
  2. Vectors: Stored in a separate map to allow for different storage strategies
  3. Connections: Stored as adjacency lists for each node at each layer

This layout allows for efficient traversal of the graph during search operations.

Performance Characteristics

The HNSW algorithm has the following performance characteristics:

  • Construction Time: O(n log n) for n vectors
  • Search Time: O(log n) for finding the nearest neighbor
  • Memory Usage: O(n M), where M is the maximum number of connections per node

These characteristics make HNSW suitable for large-scale nearest neighbor search applications.

Concurrency and Thread Safety

The HNSW implementation supports concurrent operations with the following characteristics:

  • Read Operations: Multiple search operations can be performed concurrently
  • Write Operations: Adding nodes requires exclusive access to the graph
  • Mixed Operations: Read operations can proceed concurrently with write operations with proper synchronization

This allows for efficient utilization of multi-core processors in high-throughput applications.

Next Steps

Now that you understand the graph structure of HNSW, you can explore: