Architecture¶

Ever wonder what makes Quiver tick? Let's pop the hood and take a look at the inner workings of this blazing-fast vector database! This guide will walk you through Quiver's architecture, components, and design decisions. Time to get technical! 🔧

High-Level Overview¶

Quiver is designed with a focus on performance, simplicity, and reliability. Here's a bird's-eye view of the architecture:

Quiver Architecture

Core Components¶

HNSW Graph¶

The heart of Quiver is the Hierarchical Navigable Small World (HNSW) graph, which enables lightning-fast approximate nearest neighbor search:

Implementation: Quiver uses a highly optimized HNSW implementation
Parameters: Configurable M (max connections), efConstruction (build quality), and efSearch (search quality)
Distance Metrics: Supports Cosine and L2 (Euclidean) distance

type Index struct {
    // ...
    hnsw *hnsw.Graph[uint64]
    // ...
}

Vector Storage¶

Vectors are stored in memory for fast access during search operations:

In-memory Map: Vectors are stored in a map with vector IDs as keys
Float32 Precision: Vectors use 32-bit floats for a good balance of precision and memory usage
Concurrent Access: Protected by read-write locks for thread safety

type Index struct {
    // ...
    vectors map[uint64][]float32
    // ...
}

Metadata Store¶

Metadata is stored in DuckDB, a fast analytical database:

SQL Interface: Enables complex filtering and aggregation
Arrow Integration: Efficient data transfer with Apache Arrow
Schema Flexibility: Supports various data types for metadata fields

type Index struct {
    // ...
    duckdb *DuckDB
    dbConn *DuckDBConn
    // ...
}

Persistence & Backup¶

Quiver ensures data durability through:

Periodic Persistence: Configurable intervals for writing to disk
Incremental Backups: Only changed data is backed up
Compression: Optional compression for backups
Encryption: Optional encryption for sensitive data

type Index struct {
    // ...
    persistInterval time.Duration
    persistTicker   *time.Ticker
    persistDone     chan struct{}
    // ...
}

Dimensionality Reduction¶

Quiver includes built-in dimensionality reduction capabilities to optimize storage and search performance:

PCA Implementation: Principal Component Analysis for efficient dimension reduction
Adaptive Reduction: Automatically determines optimal dimensions based on data variance
Inline Processing: Reduction happens during insertion and query operations
Explicit API: Allows reducing vectors without adding them to the index

type Index struct {
    // ...
    dimReduction         bool
    dimReductionMethod   string
    dimReductionTarget   int
    dimReductionAdaptive bool
    dimReductionVariance float64
    // ...
}

Semantic Routing¶

Semantic routing enables intelligent query routing to the most relevant sub-indices:

Multi-Index Support: Route queries to the most appropriate indices
Routing Models: Uses lightweight models to determine query destinations
Parallel Search: Execute searches in parallel across multiple indices
Result Aggregation: Combine and rank results from multiple indices

type Router struct {
    // ...
    indices      map[string]*Index
    routingModel *RoutingModel
    // ...
}

Data Flow¶

Adding Vectors¶

When you add a vector to Quiver, here's what happens:

Add Vector

Validation: Vector dimension and metadata are validated
Dimensionality Reduction: If enabled, vector dimensions are reduced using the configured method
Batching: Vector is added to a batch buffer
HNSW Insertion: When the batch is full, vectors are added to the HNSW graph
Metadata Storage: Metadata is stored in DuckDB
Persistence: Data is periodically persisted to disk

func (idx *Index) Add(id uint64, vector []float32, meta map[string]interface{}) error {
    // Validation
    if len(vector) != idx.config.Dimension {
        return fmt.Errorf("vector dimension mismatch: expected %d, got %d", idx.config.Dimension, len(vector))
    }

    // Dimensionality reduction if enabled
    if idx.config.EnableDimReduction {
        vector = idx.reduceVector(vector)
    }

    // Add to batch buffer
    idx.batchLock.Lock()
    idx.batchBuffer = append(idx.batchBuffer, vectorMeta{id, vector, meta})
    idx.batchLock.Unlock()

    // Flush if batch is full
    if len(idx.batchBuffer) >= idx.config.BatchSize {
        return idx.flushBatch()
    }

    return nil
}

Searching Vectors¶

When you search for similar vectors:

Query Preparation: Query vector is validated
Dimensionality Reduction: If enabled, query dimensions are reduced using the same method as during insertion
HNSW Search: HNSW graph is traversed to find nearest neighbors
Distance Calculation: Distances are calculated using the configured metric
Metadata Retrieval: Metadata for result vectors is retrieved from DuckDB
Result Formatting: Results are formatted and returned

func (idx *Index) Search(query []float32, k, page, pageSize int) ([]SearchResult, error) {
    // Validation
    if len(query) != idx.config.Dimension {
        return nil, fmt.Errorf("query dimension mismatch: expected %d, got %d", idx.config.Dimension, len(query))
    }

    // Dimensionality reduction if enabled
    if idx.config.EnableDimReduction {
        query = idx.reduceVector(query)
    }

    // HNSW search
    idx.lock.RLock()
    neighbors, err := idx.hnsw.Search(query, k, idx.config.HNSWEfSearch)
    idx.lock.RUnlock()

    // Format results
    results := make([]SearchResult, len(neighbors))
    for i, n := range neighbors {
        results[i] = SearchResult{
            ID:       n.ID,
            Distance: n.Distance,
            Metadata: idx.getMetadata(n.ID),
        }
    }

    return results, nil
}

Hybrid Search¶

Quiver's hybrid search combines vector similarity with metadata filtering:

Strategy Selection: Choose between filter-then-search or search-then-filter
SQL Query: Execute SQL query to filter metadata
Vector Search: Search for similar vectors among filtered results
Result Combination: Combine vector similarity with metadata filtering

func (idx *Index) SearchWithFilter(query []float32, k int, filter string) ([]SearchResult, error) {
    // Execute SQL query to get filtered IDs
    sql := fmt.Sprintf("SELECT id FROM metadata WHERE %s", filter)
    reader, stmt, _, err := idx.dbConn.Query(context.Background(), sql)

    // Get filtered IDs
    filteredIDs := make([]uint64, 0)
    // ... process results ...

    // Search among filtered IDs
    results, err := idx.searchSubset(query, k, filteredIDs)

    return results, nil
}

Semantic Routing Flow¶

When using semantic routing for search across multiple indices:

Semantic Routing

Query Routing: The routing model determines which indices are most relevant for the query
Parallel Search: Searches are executed in parallel across the selected indices
Result Aggregation: Results from all indices are combined and ranked
Response: The final ranked results are returned to the client

func (router *Router) Search(query []float32, k int, options SearchOptions) ([]SearchResult, error) {
    // Determine target indices
    targetIndices := router.routingModel.DetermineTargetIndices(query)

    // Execute parallel searches
    var wg sync.WaitGroup
    resultChan := make(chan []SearchResult, len(targetIndices))
    errorChan := make(chan error, len(targetIndices))

    for _, indexName := range targetIndices {
        wg.Add(1)
        go func(name string) {
            defer wg.Done()
            idx := router.indices[name]
            results, err := idx.Search(query, k, options)
            if err != nil {
                errorChan <- err
                return
            }
            resultChan <- results
        }(indexName)
    }

    // Wait for all searches to complete
    wg.Wait()
    close(resultChan)
    close(errorChan)

    // Check for errors
    if len(errorChan) > 0 {
        return nil, <-errorChan
    }

    // Combine and rank results
    allResults := make([]SearchResult, 0)
    for results := range resultChan {
        allResults = append(allResults, results...)
    }

    // Sort by distance
    sort.Slice(allResults, func(i, j int) bool {
        return allResults[i].Distance < allResults[j].Distance
    })

    // Limit to k results
    if len(allResults) > k {
        allResults = allResults[:k]
    }

    return allResults, nil
}

Concurrency Model¶

Quiver uses a combination of locks and channels to manage concurrency:

Concurrency

Read-Write Locks: Protect shared data structures during concurrent operations
Batch Processing: Asynchronous batch processing for vector additions
Background Workers: Separate goroutines for persistence and backup
Connection Pool: Managed connections to DuckDB

type Index struct {
    // ...
    lock            sync.RWMutex
    batchLock       sync.Mutex
    cache           sync.Map
    batchTicker     *time.Ticker
    batchDone       chan struct{}
    persistTicker   *time.Ticker
    persistDone     chan struct{}
    backupTicker    *time.Ticker
    backupDone      chan struct{}
    bgWG            sync.WaitGroup
    // ...
}

DuckDB Integration¶

Quiver uses DuckDB for metadata storage and querying:

ADBC Interface: Uses Apache Arrow Database Connectivity for communication
Connection Management: Maintains a pool of connections
Arrow Record Batches: Efficient data transfer with Arrow record batches
SQL Queries: Executes SQL queries for metadata filtering

type DuckDB struct {
    mu     sync.Mutex
    db     adbc.Database
    driver adbc.Driver
    opts   DuckDBOptions
    conns  []*DuckDBConn
}

type DuckDBConn struct {
    parent *DuckDB
    conn   adbc.Connection
}

HTTP API¶

Quiver provides a RESTful HTTP API for remote access:

Fiber Framework: Uses the high-performance Fiber web framework
JSON Interface: Standard JSON request/response format
Authentication: API key authentication for security
Middleware: Logging, error handling, and rate limiting middleware
Health Checks: Endpoints for monitoring and health checks

type Server struct {
    app   *fiber.App
    log   *zap.Logger
    port  string
    index *quiver.Index
}

Performance Optimizations¶

Quiver includes several optimizations for maximum performance:

Batch Processing: Vectors are added in batches for better throughput
Metadata Caching: Frequently accessed metadata is cached in memory
Connection Pooling: DuckDB connections are pooled and reused
Concurrent Reads: Multiple read operations can execute in parallel
SIMD Instructions: Distance calculations use SIMD instructions when available
Memory Mapping: Efficient memory mapping for large indices
Dimensionality Reduction: Reduces vector dimensions to improve search speed and memory usage
Adaptive Reduction: Automatically determines optimal dimensions based on data variance
Semantic Routing: Routes queries to the most relevant indices to reduce search space
Parallel Search: Executes searches in parallel across multiple indices

Security Features¶

Quiver includes several security features:

API Key Authentication: Secure access to the HTTP API
Encryption at Rest: Optional encryption for stored data
TLS Support: HTTPS support for secure communication
Input Validation: Thorough validation of all inputs
Error Handling: Secure error handling that doesn't leak sensitive information

Configuration System¶

Quiver's behavior is controlled through a comprehensive configuration system:

type Config struct {
    Dimension           int
    StoragePath         string
    Distance            DistanceMetric
    MaxElements         uint64
    HNSWM               int
    HNSWEfConstruct     int
    HNSWEfSearch        int
    BatchSize           int
    PersistInterval     time.Duration
    BackupInterval      time.Duration
    BackupPath          string
    BackupCompression   bool
    MaxBackups          int
    EncryptionEnabled   bool
    EncryptionKey       string
    EnableDimReduction  bool
    DimReductionMethod  string
    DimReductionTarget  int
    DimReductionAdaptive bool
    DimReductionVariance float64
    EnableSemanticRouting bool
    RoutingModel        string
    RoutingThreshold    float64
}

Logging and Monitoring¶

Quiver uses structured logging and exposes metrics for monitoring:

Zap Logger: High-performance structured logging
Metrics Collection: Collects and exposes performance metrics
Query Logging: Optional logging of queries for debugging
Health Checks: Endpoints for monitoring system health

func (idx *Index) CollectMetrics() map[string]interface{} {
    metrics := make(map[string]interface{})
    metrics["vector_count"] = len(idx.vectors)
    metrics["metadata_count"] = len(idx.metadata)
    metrics["cache_size"] = idx.cacheSize()
    // ... more metrics ...
    return metrics
}

Design Decisions¶

Why HNSW?¶

HNSW was chosen for Quiver because:

Performance: One of the fastest ANN algorithms available
Memory Efficiency: Good balance of memory usage and search speed
Quality: High recall even with approximate search
Incremental Updates: Supports adding vectors without rebuilding the index

Why DuckDB?¶

DuckDB was chosen for metadata storage because:

Performance: Extremely fast analytical database
Embedded: No separate server process required
SQL Interface: Familiar and powerful query language
Arrow Integration: Efficient data transfer with Arrow

Why Go?¶

Go was chosen as the implementation language because:

Performance: Compiled language with good performance
Concurrency: Excellent support for concurrent programming
Simplicity: Clean and straightforward language design
Tooling: Great tooling for testing, profiling, and deployment

Why PCA for Dimensionality Reduction?¶

PCA was chosen as the initial dimensionality reduction method because:

Efficiency: Computationally efficient compared to other methods
Deterministic: Produces consistent results for the same input
Variance Preservation: Maximizes the variance retained in the reduced dimensions
Simplicity: Conceptually simple and well-understood algorithm
Adaptability: Works well with the adaptive dimensionality approach

Why Semantic Routing?¶

Semantic routing was implemented to address the challenges of scaling vector search:

Scalability: Enables efficient search across multiple indices
Performance: Reduces the search space by targeting only relevant indices
Flexibility: Allows for specialized indices for different types of data
Resource Efficiency: Optimizes resource usage by avoiding unnecessary searches

Next Steps¶

Now that you understand Quiver's architecture, check out:

HNSW Algorithm - Learn more about how HNSW works
DuckDB Integration - Understand how Quiver uses DuckDB
Performance Tuning - Optimize Quiver for your needs
Dimensionality Reduction - Learn about reducing vector dimensions
Semantic Routing - Understand how query routing works