Graph Analyzer¶
The Graph Analyzer provides tools for analyzing and evaluating the quality of HNSW graphs. This functionality is particularly useful for:
- Diagnosing performance issues
- Optimizing graph parameters
- Understanding the internal structure of your HNSW index
- Validating graph construction
Overview¶
The Analyzer offers methods to inspect various aspects of the graph structure, including connectivity patterns, layer distribution, and quality metrics. These insights can help you fine-tune your HNSW configuration for optimal performance.
Basic Usage¶
import (
"fmt"
"github.com/TFMV/hnsw"
)
func main() {
// Create and populate a graph
graph := hnsw.NewGraph[int]()
// Add vectors to the graph
// ...
// Create an analyzer
analyzer := &hnsw.Analyzer[int]{Graph: graph}
// Get quality metrics
metrics := analyzer.QualityMetrics()
fmt.Printf("Graph height: %d\n", metrics.GraphHeight)
fmt.Printf("Node count: %d\n", metrics.NodeCount)
fmt.Printf("Average connectivity: %.2f\n", metrics.AvgConnectivity)
fmt.Printf("Connectivity standard deviation: %.2f\n", metrics.ConnectivityStdDev)
fmt.Printf("Distortion ratio: %.2f\n", metrics.DistortionRatio)
fmt.Printf("Layer balance: %.2f\n", metrics.LayerBalance)
}
Available Methods¶
Height¶
Returns the number of layers in the graph. A well-constructed HNSW graph typically has a logarithmic number of layers relative to the number of nodes.
Example:
Connectivity¶
Returns the average number of connections per node for each non-empty layer. This helps you understand how well-connected the graph is at different levels.
Example:
connectivity := analyzer.Connectivity()
for i, avgConnections := range connectivity {
fmt.Printf("Layer %d: %.2f average connections per node\n", i, avgConnections)
}
Topography¶
Returns the number of nodes in each layer of the graph. This helps you visualize the pyramid structure of the HNSW graph.
Example:
topography := analyzer.Topography()
for i, nodeCount := range topography {
fmt.Printf("Layer %d: %d nodes\n", i, nodeCount)
}
QualityMetrics¶
Calculates and returns a comprehensive set of quality metrics for the graph. This is the most useful method for evaluating the overall health of your HNSW index.
Quality Metrics¶
The GraphQualityMetrics
struct contains the following fields:
NodeCount¶
The total number of nodes in the graph. This should match the number of vectors you've added to the graph.
AvgConnectivity¶
The average number of connections per node in the base layer (layer 0). This value should be close to the M
parameter you configured for the graph. Higher connectivity generally leads to better search accuracy but increases memory usage.
ConnectivityStdDev¶
The standard deviation of connections per node. A lower standard deviation indicates more uniform connectivity across the graph, which is generally desirable.
DistortionRatio¶
Measures how well the graph preserves distances between nodes. It compares the graph distance (number of hops) to the actual distance between vectors. Lower values indicate better distance preservation, which correlates with better search accuracy.
LayerBalance¶
Measures how well balanced the layers are compared to the theoretical ideal based on the level generation factor (Ml
). Values closer to 1.0 indicate better balance, which leads to more efficient searches.
GraphHeight¶
The number of layers in the graph. This should scale logarithmically with the number of nodes for optimal performance.
Interpreting Results¶
Ideal Values¶
For a well-constructed HNSW graph:
- AvgConnectivity: Should be close to the configured
M
value - ConnectivityStdDev: Should be low (typically < 2.0)
- DistortionRatio: Lower is better (typically < 5.0)
- LayerBalance: Should be close to 1.0 (typically > 0.8)
- GraphHeight: Should be logarithmic in the number of nodes
Common Issues¶
High Distortion Ratio¶
A high distortion ratio (> 5.0) indicates that the graph is not preserving distances well. This can lead to lower search accuracy. Possible solutions:
- Increase the
M
parameter to allow more connections per node - Increase the
efConstruction
parameter to improve graph quality during construction - Check if your distance function is appropriate for your data
Poor Layer Balance¶
A low layer balance (< 0.7) indicates that the layers are not distributed optimally. This can lead to inefficient searches. Possible solutions:
- Adjust the
Ml
parameter to better match your data distribution - Rebuild the graph with more appropriate parameters
High Connectivity Standard Deviation¶
A high standard deviation (> 3.0) indicates uneven connectivity across the graph. This can lead to inconsistent search performance. Possible solutions:
- Increase the
efConstruction
parameter - Check for outliers in your data
- Consider normalizing your vectors
Advanced Usage¶
Monitoring Graph Quality Over Time¶
You can track graph quality metrics over time to detect degradation:
func monitorGraphQuality(graph *hnsw.Graph[int]) {
analyzer := &hnsw.Analyzer[int]{Graph: graph}
// Get initial metrics
initialMetrics := analyzer.QualityMetrics()
// Periodically check metrics
ticker := time.NewTicker(1 * time.Hour)
for range ticker.C {
currentMetrics := analyzer.QualityMetrics()
// Check for significant degradation
if currentMetrics.DistortionRatio > initialMetrics.DistortionRatio*1.5 {
fmt.Println("Warning: Graph quality has degraded significantly")
// Consider rebuilding the graph
}
}
}
Comparing Different Graph Configurations¶
You can use the analyzer to compare different graph configurations:
func compareConfigurations(vectors [][]float32) {
configs := []struct {
m int
efConstruction int
description string
}{
{16, 200, "Default"},
{32, 200, "Higher M"},
{16, 400, "Higher efConstruction"},
{32, 400, "Higher M and efConstruction"},
}
for _, config := range configs {
graph := hnsw.NewGraph[int](
hnsw.WithM(config.m),
hnsw.WithEfConstruction(config.efConstruction),
)
// Add vectors
for i, vector := range vectors {
graph.Add(vector, i)
}
// Analyze graph
analyzer := &hnsw.Analyzer[int]{Graph: graph}
metrics := analyzer.QualityMetrics()
fmt.Printf("Configuration: %s\n", config.description)
fmt.Printf(" Distortion Ratio: %.2f\n", metrics.DistortionRatio)
fmt.Printf(" Layer Balance: %.2f\n", metrics.LayerBalance)
fmt.Printf(" Avg Connectivity: %.2f\n", metrics.AvgConnectivity)
}
}
Limitations¶
The analyzer provides estimates and approximations rather than exact measurements, especially for large graphs. Some considerations:
- Distortion Ratio: Calculated using a sample of nodes to keep computation reasonable
- Graph Distance: Limited to a maximum depth to avoid excessive computation
- Memory Usage: Not directly measured by the analyzer
Next Steps¶
Now that you understand how to analyze HNSW graphs, you can explore:
- Performance Tuning: Learn how to optimize HNSW for your specific use case
- Graph Structure: Understand the hierarchical structure of HNSW
- Advanced Techniques: Explore sophisticated usage patterns