Distance Functions¶

Distance functions are a critical component of the HNSW algorithm, as they determine how similarity between vectors is measured. This page explains the different distance metrics available in HNSW and how to choose the right one for your use case.

Overview¶

In HNSW, distance functions are used to:

Determine the nearest neighbors during graph construction
Find the closest vectors during search operations
Calculate the distance between vectors for ranking search results

The choice of distance function can significantly impact the quality of search results and should be selected based on the properties of your vector data.

Built-in Distance Functions¶

HNSW provides several built-in distance functions:

Cosine Distance¶

func CosineDistance(a, b Vector) float32

Cosine distance measures the cosine of the angle between two vectors. It is defined as:

CosineDistance(a, b) = 1 - (a·b) / (||a|| * ||b||)

Where:

a·b is the dot product of vectors a and b
||a|| and ||b|| are the magnitudes (Euclidean norms) of vectors a and b

Properties:

Range: [0, 2] (0 means identical, 2 means opposite directions)
Invariant to vector scaling (only direction matters)
Best for normalized vectors (e.g., embeddings from language models)

Use cases:

Text embeddings (e.g., from OpenAI, BERT, etc.)
Semantic search
Recommendation systems

Implementation:

func CosineDistance(a, b Vector) float32 {
    var (
        dot    float32
        normA  float32
        normB  float32
    )

    for i := range a {
        dot += a[i] * b[i]
        normA += a[i] * a[i]
        normB += b[i] * b[i]
    }

    if normA == 0 || normB == 0 {
        return 1.0
    }

    return 1.0 - dot/float32(math.Sqrt(float64(normA)*float64(normB)))
}

Euclidean Distance¶

func EuclideanDistance(a, b Vector) float32

Euclidean distance measures the straight-line distance between two points in Euclidean space. It is defined as:

EuclideanDistance(a, b) = sqrt(sum((a_i - b_i)²))

Properties:

Range: [0, ∞) (0 means identical)
Sensitive to vector scaling
Preserves the geometric interpretation of distance

Use cases:

Image feature vectors
Geographical coordinates
Physical measurements
Any application where the absolute magnitude matters

Implementation:

func EuclideanDistance(a, b Vector) float32 {
    var sum float32

    for i := range a {
        d := a[i] - b[i]
        sum += d * d
    }

    return float32(math.Sqrt(float64(sum)))
}

Dot Product Distance¶

func DotProductDistance(a, b Vector) float32

Dot product distance is the negative of the dot product between two vectors. It is defined as:

DotProductDistance(a, b) = -sum(a_i * b_i)

Properties:

Range: (-∞, ∞)
Lower values indicate higher similarity
Sensitive to vector magnitude

Use cases:

When vectors are already normalized
Specialized applications where dot product is the natural similarity measure

Implementation:

func DotProductDistance(a, b Vector) float32 {
    var dot float32

    for i := range a {
        dot += a[i] * b[i]
    }

    return -dot
}

Custom Distance Functions¶

You can define your own distance function to suit your specific needs:

graph := hnsw.NewGraph[int]()

// Define a custom distance function
graph.Distance = func(a, b []float32) float32 {
    // Manhattan distance (L1 norm)
    var sum float32
    for i := range a {
        sum += float32(math.Abs(float64(a[i] - b[i])))
    }
    return sum
}

Common custom distance functions include:

Manhattan Distance (L1 Norm)¶

func ManhattanDistance(a, b Vector) float32 {
    var sum float32

    for i := range a {
        sum += float32(math.Abs(float64(a[i] - b[i])))
    }

    return sum
}

Chebyshev Distance (L∞ Norm)¶

func ChebyshevDistance(a, b Vector) float32 {
    var max float32

    for i := range a {
        d := float32(math.Abs(float64(a[i] - b[i])))
        if d > max {
            max = d
        }
    }

    return max
}

Weighted Euclidean Distance¶

func WeightedEuclideanDistance(weights Vector) DistanceFunc {
    return func(a, b Vector) float32 {
        var sum float32

        for i := range a {
            d := a[i] - b[i]
            sum += weights[i] * d * d
        }

        return float32(math.Sqrt(float64(sum)))
    }
}

Choosing the Right Distance Function¶

The choice of distance function depends on the properties of your data and the requirements of your application:

For text embeddings and semantic search:
Use Cosine Distance if vectors are normalized
Use Dot Product Distance if vectors are already normalized
For image features and physical measurements:
Use Euclidean Distance if the absolute magnitude matters
Use Manhattan Distance if you want to reduce the influence of outliers
For geographical coordinates:
Use Euclidean Distance for small areas
Use a specialized geodesic distance for large areas
For mixed data types:
Use a weighted combination of distance functions
Consider normalizing each feature before computing the distance

Performance Considerations¶

Distance calculations are the most computationally intensive part of the HNSW algorithm, as they are performed many times during both construction and search. Here are some tips for optimizing distance calculations:

Precompute norms for cosine distance:
If using cosine distance, consider precomputing and storing the norms of vectors
This can significantly reduce the computation time during search
Use SIMD instructions:
Modern CPUs support SIMD (Single Instruction, Multiple Data) instructions
These can accelerate vector operations like dot products and Euclidean distance
The Go compiler may automatically use SIMD instructions for some operations
Consider dimensionality reduction:
High-dimensional vectors require more computation for distance calculations
Consider using dimensionality reduction techniques like PCA or t-SNE
This can improve both performance and memory usage
Normalize vectors:
For cosine distance, normalizing vectors in advance can simplify calculations
This is especially useful for text embeddings from language models

Next Steps¶

Now that you understand distance functions in HNSW, you can explore:

Graph Structure: Learn about the hierarchical structure of HNSW
Search Algorithms: Dive deeper into the search algorithms used in HNSW
Performance Tuning: Learn how to optimize HNSW for your specific use case