Skip to content

Architecture

Ever wonder what makes Quiver tick? Let's pop the hood and take a look at the inner workings of this blazing-fast vector database! This guide will walk you through Quiver's architecture, components, and design decisions. Time to get technical! 🔧

High-Level Overview

Quiver is designed with a focus on performance, simplicity, and reliability. Here's a bird's-eye view of the architecture:

Quiver Architecture

Core Components

HNSW Graph

The heart of Quiver is the Hierarchical Navigable Small World (HNSW) graph, which enables lightning-fast approximate nearest neighbor search:

  • Implementation: Quiver uses a highly optimized HNSW implementation
  • Parameters: Configurable M (max connections), efConstruction (build quality), and efSearch (search quality)
  • Distance Metrics: Supports Cosine and L2 (Euclidean) distance
type Index struct {
    // ...
    hnsw *hnsw.Graph[uint64]
    // ...
}

Vector Storage

Vectors are stored in memory for fast access during search operations:

  • In-memory Map: Vectors are stored in a map with vector IDs as keys
  • Float32 Precision: Vectors use 32-bit floats for a good balance of precision and memory usage
  • Concurrent Access: Protected by read-write locks for thread safety
type Index struct {
    // ...
    vectors map[uint64][]float32
    // ...
}

Metadata Store

Metadata is stored in DuckDB, a fast analytical database:

  • SQL Interface: Enables complex filtering and aggregation
  • Arrow Integration: Efficient data transfer with Apache Arrow
  • Schema Flexibility: Supports various data types for metadata fields
type Index struct {
    // ...
    duckdb *DuckDB
    dbConn *DuckDBConn
    // ...
}

Persistence & Backup

Quiver ensures data durability through:

  • Periodic Persistence: Configurable intervals for writing to disk
  • Incremental Backups: Only changed data is backed up
  • Compression: Optional compression for backups
  • Encryption: Optional encryption for sensitive data
type Index struct {
    // ...
    persistInterval time.Duration
    persistTicker   *time.Ticker
    persistDone     chan struct{}
    // ...
}

Dimensionality Reduction

Quiver includes built-in dimensionality reduction capabilities to optimize storage and search performance:

  • PCA Implementation: Principal Component Analysis for efficient dimension reduction
  • Adaptive Reduction: Automatically determines optimal dimensions based on data variance
  • Inline Processing: Reduction happens during insertion and query operations
  • Explicit API: Allows reducing vectors without adding them to the index
type Index struct {
    // ...
    dimReduction         bool
    dimReductionMethod   string
    dimReductionTarget   int
    dimReductionAdaptive bool
    dimReductionVariance float64
    // ...
}

Semantic Routing

Semantic routing enables intelligent query routing to the most relevant sub-indices:

  • Multi-Index Support: Route queries to the most appropriate indices
  • Routing Models: Uses lightweight models to determine query destinations
  • Parallel Search: Execute searches in parallel across multiple indices
  • Result Aggregation: Combine and rank results from multiple indices
type Router struct {
    // ...
    indices      map[string]*Index
    routingModel *RoutingModel
    // ...
}

Data Flow

Adding Vectors

When you add a vector to Quiver, here's what happens:

Add Vector

  1. Validation: Vector dimension and metadata are validated
  2. Dimensionality Reduction: If enabled, vector dimensions are reduced using the configured method
  3. Batching: Vector is added to a batch buffer
  4. HNSW Insertion: When the batch is full, vectors are added to the HNSW graph
  5. Metadata Storage: Metadata is stored in DuckDB
  6. Persistence: Data is periodically persisted to disk
func (idx *Index) Add(id uint64, vector []float32, meta map[string]interface{}) error {
    // Validation
    if len(vector) != idx.config.Dimension {
        return fmt.Errorf("vector dimension mismatch: expected %d, got %d", idx.config.Dimension, len(vector))
    }

    // Dimensionality reduction if enabled
    if idx.config.EnableDimReduction {
        vector = idx.reduceVector(vector)
    }

    // Add to batch buffer
    idx.batchLock.Lock()
    idx.batchBuffer = append(idx.batchBuffer, vectorMeta{id, vector, meta})
    idx.batchLock.Unlock()

    // Flush if batch is full
    if len(idx.batchBuffer) >= idx.config.BatchSize {
        return idx.flushBatch()
    }

    return nil
}

Searching Vectors

When you search for similar vectors:

Search Vector

  1. Query Preparation: Query vector is validated
  2. Dimensionality Reduction: If enabled, query dimensions are reduced using the same method as during insertion
  3. HNSW Search: HNSW graph is traversed to find nearest neighbors
  4. Distance Calculation: Distances are calculated using the configured metric
  5. Metadata Retrieval: Metadata for result vectors is retrieved from DuckDB
  6. Result Formatting: Results are formatted and returned
func (idx *Index) Search(query []float32, k, page, pageSize int) ([]SearchResult, error) {
    // Validation
    if len(query) != idx.config.Dimension {
        return nil, fmt.Errorf("query dimension mismatch: expected %d, got %d", idx.config.Dimension, len(query))
    }

    // Dimensionality reduction if enabled
    if idx.config.EnableDimReduction {
        query = idx.reduceVector(query)
    }

    // HNSW search
    idx.lock.RLock()
    neighbors, err := idx.hnsw.Search(query, k, idx.config.HNSWEfSearch)
    idx.lock.RUnlock()

    // Format results
    results := make([]SearchResult, len(neighbors))
    for i, n := range neighbors {
        results[i] = SearchResult{
            ID:       n.ID,
            Distance: n.Distance,
            Metadata: idx.getMetadata(n.ID),
        }
    }

    return results, nil
}

Quiver's hybrid search combines vector similarity with metadata filtering:

Hybrid Search

  1. Strategy Selection: Choose between filter-then-search or search-then-filter
  2. SQL Query: Execute SQL query to filter metadata
  3. Vector Search: Search for similar vectors among filtered results
  4. Result Combination: Combine vector similarity with metadata filtering
func (idx *Index) SearchWithFilter(query []float32, k int, filter string) ([]SearchResult, error) {
    // Execute SQL query to get filtered IDs
    sql := fmt.Sprintf("SELECT id FROM metadata WHERE %s", filter)
    reader, stmt, _, err := idx.dbConn.Query(context.Background(), sql)

    // Get filtered IDs
    filteredIDs := make([]uint64, 0)
    // ... process results ...

    // Search among filtered IDs
    results, err := idx.searchSubset(query, k, filteredIDs)

    return results, nil
}

Semantic Routing Flow

When using semantic routing for search across multiple indices:

Semantic Routing

  1. Query Routing: The routing model determines which indices are most relevant for the query
  2. Parallel Search: Searches are executed in parallel across the selected indices
  3. Result Aggregation: Results from all indices are combined and ranked
  4. Response: The final ranked results are returned to the client
func (router *Router) Search(query []float32, k int, options SearchOptions) ([]SearchResult, error) {
    // Determine target indices
    targetIndices := router.routingModel.DetermineTargetIndices(query)

    // Execute parallel searches
    var wg sync.WaitGroup
    resultChan := make(chan []SearchResult, len(targetIndices))
    errorChan := make(chan error, len(targetIndices))

    for _, indexName := range targetIndices {
        wg.Add(1)
        go func(name string) {
            defer wg.Done()
            idx := router.indices[name]
            results, err := idx.Search(query, k, options)
            if err != nil {
                errorChan <- err
                return
            }
            resultChan <- results
        }(indexName)
    }

    // Wait for all searches to complete
    wg.Wait()
    close(resultChan)
    close(errorChan)

    // Check for errors
    if len(errorChan) > 0 {
        return nil, <-errorChan
    }

    // Combine and rank results
    allResults := make([]SearchResult, 0)
    for results := range resultChan {
        allResults = append(allResults, results...)
    }

    // Sort by distance
    sort.Slice(allResults, func(i, j int) bool {
        return allResults[i].Distance < allResults[j].Distance
    })

    // Limit to k results
    if len(allResults) > k {
        allResults = allResults[:k]
    }

    return allResults, nil
}

Concurrency Model

Quiver uses a combination of locks and channels to manage concurrency:

Concurrency

  • Read-Write Locks: Protect shared data structures during concurrent operations
  • Batch Processing: Asynchronous batch processing for vector additions
  • Background Workers: Separate goroutines for persistence and backup
  • Connection Pool: Managed connections to DuckDB
type Index struct {
    // ...
    lock            sync.RWMutex
    batchLock       sync.Mutex
    cache           sync.Map
    batchTicker     *time.Ticker
    batchDone       chan struct{}
    persistTicker   *time.Ticker
    persistDone     chan struct{}
    backupTicker    *time.Ticker
    backupDone      chan struct{}
    bgWG            sync.WaitGroup
    // ...
}

DuckDB Integration

Quiver uses DuckDB for metadata storage and querying:

  • ADBC Interface: Uses Apache Arrow Database Connectivity for communication
  • Connection Management: Maintains a pool of connections
  • Arrow Record Batches: Efficient data transfer with Arrow record batches
  • SQL Queries: Executes SQL queries for metadata filtering
type DuckDB struct {
    mu     sync.Mutex
    db     adbc.Database
    driver adbc.Driver
    opts   DuckDBOptions
    conns  []*DuckDBConn
}

type DuckDBConn struct {
    parent *DuckDB
    conn   adbc.Connection
}

HTTP API

Quiver provides a RESTful HTTP API for remote access:

  • Fiber Framework: Uses the high-performance Fiber web framework
  • JSON Interface: Standard JSON request/response format
  • Authentication: API key authentication for security
  • Middleware: Logging, error handling, and rate limiting middleware
  • Health Checks: Endpoints for monitoring and health checks
type Server struct {
    app   *fiber.App
    log   *zap.Logger
    port  string
    index *quiver.Index
}

Performance Optimizations

Quiver includes several optimizations for maximum performance:

  • Batch Processing: Vectors are added in batches for better throughput
  • Metadata Caching: Frequently accessed metadata is cached in memory
  • Connection Pooling: DuckDB connections are pooled and reused
  • Concurrent Reads: Multiple read operations can execute in parallel
  • SIMD Instructions: Distance calculations use SIMD instructions when available
  • Memory Mapping: Efficient memory mapping for large indices
  • Dimensionality Reduction: Reduces vector dimensions to improve search speed and memory usage
  • Adaptive Reduction: Automatically determines optimal dimensions based on data variance
  • Semantic Routing: Routes queries to the most relevant indices to reduce search space
  • Parallel Search: Executes searches in parallel across multiple indices

Security Features

Quiver includes several security features:

  • API Key Authentication: Secure access to the HTTP API
  • Encryption at Rest: Optional encryption for stored data
  • TLS Support: HTTPS support for secure communication
  • Input Validation: Thorough validation of all inputs
  • Error Handling: Secure error handling that doesn't leak sensitive information

Configuration System

Quiver's behavior is controlled through a comprehensive configuration system:

type Config struct {
    Dimension           int
    StoragePath         string
    Distance            DistanceMetric
    MaxElements         uint64
    HNSWM               int
    HNSWEfConstruct     int
    HNSWEfSearch        int
    BatchSize           int
    PersistInterval     time.Duration
    BackupInterval      time.Duration
    BackupPath          string
    BackupCompression   bool
    MaxBackups          int
    EncryptionEnabled   bool
    EncryptionKey       string
    EnableDimReduction  bool
    DimReductionMethod  string
    DimReductionTarget  int
    DimReductionAdaptive bool
    DimReductionVariance float64
    EnableSemanticRouting bool
    RoutingModel        string
    RoutingThreshold    float64
}

Logging and Monitoring

Quiver uses structured logging and exposes metrics for monitoring:

  • Zap Logger: High-performance structured logging
  • Metrics Collection: Collects and exposes performance metrics
  • Query Logging: Optional logging of queries for debugging
  • Health Checks: Endpoints for monitoring system health
func (idx *Index) CollectMetrics() map[string]interface{} {
    metrics := make(map[string]interface{})
    metrics["vector_count"] = len(idx.vectors)
    metrics["metadata_count"] = len(idx.metadata)
    metrics["cache_size"] = idx.cacheSize()
    // ... more metrics ...
    return metrics
}

Design Decisions

Why HNSW?

HNSW was chosen for Quiver because:

  1. Performance: One of the fastest ANN algorithms available
  2. Memory Efficiency: Good balance of memory usage and search speed
  3. Quality: High recall even with approximate search
  4. Incremental Updates: Supports adding vectors without rebuilding the index

Why DuckDB?

DuckDB was chosen for metadata storage because:

  1. Performance: Extremely fast analytical database
  2. Embedded: No separate server process required
  3. SQL Interface: Familiar and powerful query language
  4. Arrow Integration: Efficient data transfer with Arrow

Why Go?

Go was chosen as the implementation language because:

  1. Performance: Compiled language with good performance
  2. Concurrency: Excellent support for concurrent programming
  3. Simplicity: Clean and straightforward language design
  4. Tooling: Great tooling for testing, profiling, and deployment

Why PCA for Dimensionality Reduction?

PCA was chosen as the initial dimensionality reduction method because:

  1. Efficiency: Computationally efficient compared to other methods
  2. Deterministic: Produces consistent results for the same input
  3. Variance Preservation: Maximizes the variance retained in the reduced dimensions
  4. Simplicity: Conceptually simple and well-understood algorithm
  5. Adaptability: Works well with the adaptive dimensionality approach

Why Semantic Routing?

Semantic routing was implemented to address the challenges of scaling vector search:

  1. Scalability: Enables efficient search across multiple indices
  2. Performance: Reduces the search space by targeting only relevant indices
  3. Flexibility: Allows for specialized indices for different types of data
  4. Resource Efficiency: Optimizes resource usage by avoiding unnecessary searches

Next Steps

Now that you understand Quiver's architecture, check out: