Skip to content

Graph Analyzer

The Graph Analyzer provides tools for analyzing and evaluating the quality of HNSW graphs. This functionality is particularly useful for:

  • Diagnosing performance issues
  • Optimizing graph parameters
  • Understanding the internal structure of your HNSW index
  • Validating graph construction

Overview

The Analyzer offers methods to inspect various aspects of the graph structure, including connectivity patterns, layer distribution, and quality metrics. These insights can help you fine-tune your HNSW configuration for optimal performance.

Basic Usage

import (
    "fmt"

    "github.com/TFMV/hnsw"
)

func main() {
    // Create and populate a graph
    graph := hnsw.NewGraph[int]()

    // Add vectors to the graph
    // ...

    // Create an analyzer
    analyzer := &hnsw.Analyzer[int]{Graph: graph}

    // Get quality metrics
    metrics := analyzer.QualityMetrics()

    fmt.Printf("Graph height: %d\n", metrics.GraphHeight)
    fmt.Printf("Node count: %d\n", metrics.NodeCount)
    fmt.Printf("Average connectivity: %.2f\n", metrics.AvgConnectivity)
    fmt.Printf("Connectivity standard deviation: %.2f\n", metrics.ConnectivityStdDev)
    fmt.Printf("Distortion ratio: %.2f\n", metrics.DistortionRatio)
    fmt.Printf("Layer balance: %.2f\n", metrics.LayerBalance)
}

Available Methods

Height

func (a *Analyzer[T]) Height() int

Returns the number of layers in the graph. A well-constructed HNSW graph typically has a logarithmic number of layers relative to the number of nodes.

Example:

height := analyzer.Height()
fmt.Printf("Graph has %d layers\n", height)

Connectivity

func (a *Analyzer[T]) Connectivity() []float64

Returns the average number of connections per node for each non-empty layer. This helps you understand how well-connected the graph is at different levels.

Example:

connectivity := analyzer.Connectivity()
for i, avgConnections := range connectivity {
    fmt.Printf("Layer %d: %.2f average connections per node\n", i, avgConnections)
}

Topography

func (a *Analyzer[T]) Topography() []int

Returns the number of nodes in each layer of the graph. This helps you visualize the pyramid structure of the HNSW graph.

Example:

topography := analyzer.Topography()
for i, nodeCount := range topography {
    fmt.Printf("Layer %d: %d nodes\n", i, nodeCount)
}

QualityMetrics

func (a *Analyzer[T]) QualityMetrics() GraphQualityMetrics

Calculates and returns a comprehensive set of quality metrics for the graph. This is the most useful method for evaluating the overall health of your HNSW index.

Quality Metrics

The GraphQualityMetrics struct contains the following fields:

NodeCount

The total number of nodes in the graph. This should match the number of vectors you've added to the graph.

AvgConnectivity

The average number of connections per node in the base layer (layer 0). This value should be close to the M parameter you configured for the graph. Higher connectivity generally leads to better search accuracy but increases memory usage.

ConnectivityStdDev

The standard deviation of connections per node. A lower standard deviation indicates more uniform connectivity across the graph, which is generally desirable.

DistortionRatio

Measures how well the graph preserves distances between nodes. It compares the graph distance (number of hops) to the actual distance between vectors. Lower values indicate better distance preservation, which correlates with better search accuracy.

LayerBalance

Measures how well balanced the layers are compared to the theoretical ideal based on the level generation factor (Ml). Values closer to 1.0 indicate better balance, which leads to more efficient searches.

GraphHeight

The number of layers in the graph. This should scale logarithmically with the number of nodes for optimal performance.

Interpreting Results

Ideal Values

For a well-constructed HNSW graph:

  • AvgConnectivity: Should be close to the configured M value
  • ConnectivityStdDev: Should be low (typically < 2.0)
  • DistortionRatio: Lower is better (typically < 5.0)
  • LayerBalance: Should be close to 1.0 (typically > 0.8)
  • GraphHeight: Should be logarithmic in the number of nodes

Common Issues

High Distortion Ratio

A high distortion ratio (> 5.0) indicates that the graph is not preserving distances well. This can lead to lower search accuracy. Possible solutions:

  • Increase the M parameter to allow more connections per node
  • Increase the efConstruction parameter to improve graph quality during construction
  • Check if your distance function is appropriate for your data

Poor Layer Balance

A low layer balance (< 0.7) indicates that the layers are not distributed optimally. This can lead to inefficient searches. Possible solutions:

  • Adjust the Ml parameter to better match your data distribution
  • Rebuild the graph with more appropriate parameters

High Connectivity Standard Deviation

A high standard deviation (> 3.0) indicates uneven connectivity across the graph. This can lead to inconsistent search performance. Possible solutions:

  • Increase the efConstruction parameter
  • Check for outliers in your data
  • Consider normalizing your vectors

Advanced Usage

Monitoring Graph Quality Over Time

You can track graph quality metrics over time to detect degradation:

func monitorGraphQuality(graph *hnsw.Graph[int]) {
    analyzer := &hnsw.Analyzer[int]{Graph: graph}

    // Get initial metrics
    initialMetrics := analyzer.QualityMetrics()

    // Periodically check metrics
    ticker := time.NewTicker(1 * time.Hour)
    for range ticker.C {
        currentMetrics := analyzer.QualityMetrics()

        // Check for significant degradation
        if currentMetrics.DistortionRatio > initialMetrics.DistortionRatio*1.5 {
            fmt.Println("Warning: Graph quality has degraded significantly")
            // Consider rebuilding the graph
        }
    }
}

Comparing Different Graph Configurations

You can use the analyzer to compare different graph configurations:

func compareConfigurations(vectors [][]float32) {
    configs := []struct {
        m              int
        efConstruction int
        description    string
    }{
        {16, 200, "Default"},
        {32, 200, "Higher M"},
        {16, 400, "Higher efConstruction"},
        {32, 400, "Higher M and efConstruction"},
    }

    for _, config := range configs {
        graph := hnsw.NewGraph[int](
            hnsw.WithM(config.m),
            hnsw.WithEfConstruction(config.efConstruction),
        )

        // Add vectors
        for i, vector := range vectors {
            graph.Add(vector, i)
        }

        // Analyze graph
        analyzer := &hnsw.Analyzer[int]{Graph: graph}
        metrics := analyzer.QualityMetrics()

        fmt.Printf("Configuration: %s\n", config.description)
        fmt.Printf("  Distortion Ratio: %.2f\n", metrics.DistortionRatio)
        fmt.Printf("  Layer Balance: %.2f\n", metrics.LayerBalance)
        fmt.Printf("  Avg Connectivity: %.2f\n", metrics.AvgConnectivity)
    }
}

Limitations

The analyzer provides estimates and approximations rather than exact measurements, especially for large graphs. Some considerations:

  • Distortion Ratio: Calculated using a sample of nodes to keep computation reasonable
  • Graph Distance: Limited to a maximum depth to avoid excessive computation
  • Memory Usage: Not directly measured by the analyzer

Next Steps

Now that you understand how to analyze HNSW graphs, you can explore: