Skip to content

Benchmarking

Want to know exactly how fast Quiver can go? Let's dive into benchmarking! This guide will show you how to measure Quiver's performance and compare different configurations. Time to put those vectors to the test! ⏱️

Why Benchmark?

Benchmarking helps you:

  1. Understand Quiver's performance characteristics
  2. Compare different configurations
  3. Identify bottlenecks
  4. Set realistic expectations for production use
  5. Track performance improvements over time

Built-in Benchmarks

Quiver comes with a comprehensive suite of benchmarks that measure various aspects of performance:

# Run all benchmarks
go test -bench=. -benchmem ./...

# Run a specific benchmark
go test -bench=BenchmarkSearch -benchmem ./...

Understanding Benchmark Output

Here's an example benchmark output:

BenchmarkSearch-10                 20054             59194 ns/op           24193 B/op         439 allocs/op

This tells you:

  • BenchmarkSearch-10: The benchmark name and number of CPU cores used
  • 20054: Number of iterations run
  • 59194 ns/op: Average time per operation (59.2 microseconds)
  • 24193 B/op: Average memory allocated per operation (24.2 KB)
  • 439 allocs/op: Average number of allocations per operation

Available Benchmarks

Quiver includes the following benchmarks:

Benchmark Description
BenchmarkAdd Measures vector addition performance
BenchmarkSearch Measures basic vector search performance
BenchmarkHybridSearch Measures hybrid search (vector + metadata) performance
BenchmarkSearchWithNegatives Measures search with negative examples performance
BenchmarkBatchAdd Measures batch addition performance with different batch sizes
BenchmarkSearchWithDifferentK Measures search performance with different K values
BenchmarkSearchWithDifferentDimensions Measures search performance with different vector dimensions

Running Custom Benchmarks

Creating a Benchmark

You can create custom benchmarks to test specific scenarios:

// Example custom benchmark
func BenchmarkCustomSearch(b *testing.B) {
    // Setup
    logger, _ := zap.NewNop() // Use a no-op logger for benchmarks
    idx, _ := quiver.New(quiver.Config{
        Dimension:   128,
        StoragePath: ":memory:",
        HNSWM:       16,
        BatchSize:   1000,
    }, logger)
    defer idx.Close()

    // Add test vectors
    for i := 0; i < 10000; i++ {
        vector := generateRandomVector(128)
        metadata := map[string]interface{}{
            "category": "test",
            "id": i,
        }
        idx.Add(uint64(i), vector, metadata)
    }

    // Create query vector
    queryVector := generateRandomVector(128)

    // Reset timer before the actual benchmark
    b.ResetTimer()

    // Run the benchmark
    for i := 0; i < b.N; i++ {
        idx.Search(queryVector, 10, 1, 10)
    }
}

Benchmark Parameters

You can customize benchmark parameters:

# Run for at least 5 seconds per benchmark
go test -bench=. -benchtime=5s -benchmem ./...

# Run with a specific number of CPU cores
go test -bench=. -cpu=1,4,8 -benchmem ./...

# Run with verbose output
go test -bench=. -v -benchmem ./...

Benchmark Scenarios

Vector Addition

Benchmark vector addition with different batch sizes:

func BenchmarkAddWithDifferentBatchSizes(b *testing.B) {
    batchSizes := []int{100, 1000, 10000}

    for _, batchSize := range batchSizes {
        b.Run(fmt.Sprintf("BatchSize-%d", batchSize), func(b *testing.B) {
            // Setup with specific batch size
            logger, _ := zap.NewNop()
            idx, _ := quiver.New(quiver.Config{
                Dimension: 128,
                StoragePath: ":memory:",
                BatchSize: batchSize,
            }, logger)
            defer idx.Close()

            // Reset timer
            b.ResetTimer()

            // Run benchmark
            for i := 0; i < b.N; i++ {
                vector := generateRandomVector(128)
                metadata := map[string]interface{}{
                    "category": "test",
                    "id": i,
                }
                idx.Add(uint64(i), vector, metadata)
            }
        })
    }
}

Search Performance

Benchmark search with different HNSW parameters:

func BenchmarkSearchWithDifferentEfSearch(b *testing.B) {
    efValues := []int{50, 100, 200, 400}

    for _, ef := range efValues {
        b.Run(fmt.Sprintf("Ef-%d", ef), func(b *testing.B) {
            // Setup with specific efSearch
            logger, _ := zap.NewNop()
            idx, _ := quiver.New(quiver.Config{
                Dimension: 128,
                StoragePath: ":memory:",
                HNSWEfSearch: ef,
            }, logger)
            defer idx.Close()

            // Add test vectors
            for i := 0; i < 10000; i++ {
                vector := generateRandomVector(128)
                metadata := map[string]interface{}{
                    "category": "test",
                    "id": i,
                }
                idx.Add(uint64(i), vector, metadata)
            }

            // Create query vector
            queryVector := generateRandomVector(128)

            // Reset timer
            b.ResetTimer()

            // Run benchmark
            for i := 0; i < b.N; i++ {
                idx.Search(queryVector, 10, 1, 10)
            }
        })
    }
}

Benchmark hybrid search with different filter selectivity:

func BenchmarkHybridSearchWithDifferentFilters(b *testing.B) {
    filters := []struct{
        name string
        filter string
        selectivity string
    }{
        {"HighlySelective", "id < 100", "1%"},
        {"MediumSelective", "id < 1000", "10%"},
        {"LowSelective", "id < 5000", "50%"},
    }

    for _, f := range filters {
        b.Run(fmt.Sprintf("%s-%s", f.name, f.selectivity), func(b *testing.B) {
            // Setup
            logger, _ := zap.NewNop()
            idx, _ := quiver.New(quiver.Config{
                Dimension: 128,
                StoragePath: ":memory:",
            }, logger)
            defer idx.Close()

            // Add test vectors
            for i := 0; i < 10000; i++ {
                vector := generateRandomVector(128)
                metadata := map[string]interface{}{
                    "category": "test",
                    "id": i,
                }
                idx.Add(uint64(i), vector, metadata)
            }

            // Create query vector
            queryVector := generateRandomVector(128)

            // Reset timer
            b.ResetTimer()

            // Run benchmark
            for i := 0; i < b.N; i++ {
                idx.SearchWithFilter(queryVector, 10, f.filter)
            }
        })
    }
}

Comparing Results

Using benchstat

The benchstat tool helps compare benchmark results:

# Install benchstat
go install golang.org/x/perf/cmd/benchstat@latest

# Run benchmarks before changes
go test -bench=. -benchmem ./... > before.txt

# Run benchmarks after changes
go test -bench=. -benchmem ./... > after.txt

# Compare results
benchstat before.txt after.txt

Example output:

name                old time/op    new time/op    delta
Search-10             59.2µs ± 2%    52.1µs ± 3%  -12.00%  (p=0.000 n=10+10)
HybridSearch-10       208µs ± 5%     187µs ± 4%   -10.10%  (p=0.000 n=10+10)

name                old alloc/op   new alloc/op   delta
Search-10             24.2kB ± 0%    22.1kB ± 0%   -8.68%  (p=0.000 n=10+10)
HybridSearch-10       80.6kB ± 0%    75.2kB ± 0%   -6.70%  (p=0.000 n=10+10)

name                old allocs/op  new allocs/op  delta
Search-10               439 ± 0%       412 ± 0%    -6.15%  (p=0.000 n=10+10)
HybridSearch-10         822 ± 0%       798 ± 0%    -2.92%  (p=0.000 n=10+10)

This shows the performance change between the old and new versions.

Visualizing Results

You can visualize benchmark results using tools like:

Example Python script for plotting:

import matplotlib.pyplot as plt
import pandas as pd
import re

# Parse benchmark output
def parse_benchmark(filename):
    data = []
    with open(filename, 'r') as f:
        for line in f:
            if line.startswith('Benchmark'):
                parts = re.split(r'\s+', line.strip())
                name = parts[0]
                ops = int(parts[1])
                ns_per_op = float(parts[2])
                mb_per_op = float(parts[3]) / 1024 / 1024
                allocs_per_op = int(parts[4])
                data.append({
                    'name': name,
                    'ops': ops,
                    'ns_per_op': ns_per_op,
                    'mb_per_op': mb_per_op,
                    'allocs_per_op': allocs_per_op
                })
    return pd.DataFrame(data)

# Load data
df = parse_benchmark('benchmark_results.txt')

# Plot
plt.figure(figsize=(12, 6))
plt.bar(df['name'], df['ns_per_op'] / 1000)  # Convert to microseconds
plt.ylabel('Time per operation (µs)')
plt.xticks(rotation=45, ha='right')
plt.title('Quiver Benchmark Performance')
plt.tight_layout()
plt.savefig('benchmark_performance.png')

Real-world Benchmarks

Here are some real-world benchmark results from Quiver running on an M2 Pro CPU:

Basic Operations

Operation Throughput Latency Memory/Op Allocs/Op
Add 6.4K ops/sec 156µs 20.9 KB 370
Search 16.9K ops/sec 59µs 24.2 KB 439
Hybrid Search 4.8K ops/sec 208µs 80.6 KB 822
Search with Negatives 7.9K ops/sec 126µs 32.5 KB 491

Batch Performance

Batch Size Throughput Latency Memory/Op Allocs/Op
100 63 batches/sec 15.8ms 2.0 MB 35.8K
1000 6.6 batches/sec 152ms 19.0 MB 331K
10000 0.64 batches/sec 1.57s 208 MB 3.7M

Search with Different K Values

K Value Throughput Latency Memory/Op Allocs/Op
10 16.5K ops/sec 61µs 23.8 KB 441
50 2.1K ops/sec 480µs 190 KB 2.9K
100 1.9K ops/sec 516µs 317 KB 2.9K

Search with Different Dimensions

Dimension Throughput Latency Memory/Op Allocs/Op
32 27.6K ops/sec 36µs 23.3 KB 429
128 16.0K ops/sec 63µs 25.6 KB 457
512 7.0K ops/sec 143µs 24.2 KB 455

Performance Tuning Based on Benchmarks

Based on benchmark results, here are some tuning recommendations:

For High-Throughput Addition

  • Increase batch size (1000-5000)
  • Use Arrow integration for bulk loading
  • Consider parallel additions with multiple goroutines
  • Reduce efSearch (50-80)
  • Use smaller vector dimensions if possible
  • Keep index size manageable
  • Consider in-memory storage
  • Increase M (32-64)
  • Increase efConstruction (300-500)
  • Increase efSearch (200-400)

Next Steps

Now that you've benchmarked Quiver's performance, check out: