Skip to content

Distance Functions

Distance functions are a critical component of the HNSW algorithm, as they determine how similarity between vectors is measured. This page explains the different distance metrics available in HNSW and how to choose the right one for your use case.

Overview

In HNSW, distance functions are used to:

  1. Determine the nearest neighbors during graph construction
  2. Find the closest vectors during search operations
  3. Calculate the distance between vectors for ranking search results

The choice of distance function can significantly impact the quality of search results and should be selected based on the properties of your vector data.

Built-in Distance Functions

HNSW provides several built-in distance functions:

Cosine Distance

func CosineDistance(a, b Vector) float32

Cosine distance measures the cosine of the angle between two vectors. It is defined as:

CosineDistance(a, b) = 1 - (a·b) / (||a|| * ||b||)

Where:

  • a·b is the dot product of vectors a and b
  • ||a|| and ||b|| are the magnitudes (Euclidean norms) of vectors a and b

Properties:

  • Range: [0, 2] (0 means identical, 2 means opposite directions)
  • Invariant to vector scaling (only direction matters)
  • Best for normalized vectors (e.g., embeddings from language models)

Use cases:

  • Text embeddings (e.g., from OpenAI, BERT, etc.)
  • Semantic search
  • Recommendation systems

Implementation:

func CosineDistance(a, b Vector) float32 {
    var (
        dot    float32
        normA  float32
        normB  float32
    )

    for i := range a {
        dot += a[i] * b[i]
        normA += a[i] * a[i]
        normB += b[i] * b[i]
    }

    if normA == 0 || normB == 0 {
        return 1.0
    }

    return 1.0 - dot/float32(math.Sqrt(float64(normA)*float64(normB)))
}

Euclidean Distance

func EuclideanDistance(a, b Vector) float32

Euclidean distance measures the straight-line distance between two points in Euclidean space. It is defined as:

EuclideanDistance(a, b) = sqrt(sum((a_i - b_i)²))

Properties:

  • Range: [0, ∞) (0 means identical)
  • Sensitive to vector scaling
  • Preserves the geometric interpretation of distance

Use cases:

  • Image feature vectors
  • Geographical coordinates
  • Physical measurements
  • Any application where the absolute magnitude matters

Implementation:

func EuclideanDistance(a, b Vector) float32 {
    var sum float32

    for i := range a {
        d := a[i] - b[i]
        sum += d * d
    }

    return float32(math.Sqrt(float64(sum)))
}

Dot Product Distance

func DotProductDistance(a, b Vector) float32

Dot product distance is the negative of the dot product between two vectors. It is defined as:

DotProductDistance(a, b) = -sum(a_i * b_i)

Properties:

  • Range: (-∞, ∞)
  • Lower values indicate higher similarity
  • Sensitive to vector magnitude

Use cases:

  • When vectors are already normalized
  • Specialized applications where dot product is the natural similarity measure

Implementation:

func DotProductDistance(a, b Vector) float32 {
    var dot float32

    for i := range a {
        dot += a[i] * b[i]
    }

    return -dot
}

Custom Distance Functions

You can define your own distance function to suit your specific needs:

graph := hnsw.NewGraph[int]()

// Define a custom distance function
graph.Distance = func(a, b []float32) float32 {
    // Manhattan distance (L1 norm)
    var sum float32
    for i := range a {
        sum += float32(math.Abs(float64(a[i] - b[i])))
    }
    return sum
}

Common custom distance functions include:

Manhattan Distance (L1 Norm)

func ManhattanDistance(a, b Vector) float32 {
    var sum float32

    for i := range a {
        sum += float32(math.Abs(float64(a[i] - b[i])))
    }

    return sum
}

Chebyshev Distance (L∞ Norm)

func ChebyshevDistance(a, b Vector) float32 {
    var max float32

    for i := range a {
        d := float32(math.Abs(float64(a[i] - b[i])))
        if d > max {
            max = d
        }
    }

    return max
}

Weighted Euclidean Distance

func WeightedEuclideanDistance(weights Vector) DistanceFunc {
    return func(a, b Vector) float32 {
        var sum float32

        for i := range a {
            d := a[i] - b[i]
            sum += weights[i] * d * d
        }

        return float32(math.Sqrt(float64(sum)))
    }
}

Choosing the Right Distance Function

The choice of distance function depends on the properties of your data and the requirements of your application:

  1. For text embeddings and semantic search:
  2. Use Cosine Distance if vectors are normalized
  3. Use Dot Product Distance if vectors are already normalized

  4. For image features and physical measurements:

  5. Use Euclidean Distance if the absolute magnitude matters
  6. Use Manhattan Distance if you want to reduce the influence of outliers

  7. For geographical coordinates:

  8. Use Euclidean Distance for small areas
  9. Use a specialized geodesic distance for large areas

  10. For mixed data types:

  11. Use a weighted combination of distance functions
  12. Consider normalizing each feature before computing the distance

Performance Considerations

Distance calculations are the most computationally intensive part of the HNSW algorithm, as they are performed many times during both construction and search. Here are some tips for optimizing distance calculations:

  1. Precompute norms for cosine distance:
  2. If using cosine distance, consider precomputing and storing the norms of vectors
  3. This can significantly reduce the computation time during search

  4. Use SIMD instructions:

  5. Modern CPUs support SIMD (Single Instruction, Multiple Data) instructions
  6. These can accelerate vector operations like dot products and Euclidean distance
  7. The Go compiler may automatically use SIMD instructions for some operations

  8. Consider dimensionality reduction:

  9. High-dimensional vectors require more computation for distance calculations
  10. Consider using dimensionality reduction techniques like PCA or t-SNE
  11. This can improve both performance and memory usage

  12. Normalize vectors:

  13. For cosine distance, normalizing vectors in advance can simplify calculations
  14. This is especially useful for text embeddings from language models

Next Steps

Now that you understand distance functions in HNSW, you can explore: