Architecture¶
Ever wonder what makes Quiver tick? Let's pop the hood and take a look at the inner workings of this blazing-fast vector database! This guide will walk you through Quiver's architecture, components, and design decisions. Time to get technical! 🔧
High-Level Overview¶
Quiver is designed with a focus on performance, simplicity, and reliability. Here's a bird's-eye view of the architecture:
Core Components¶
HNSW Graph¶
The heart of Quiver is the Hierarchical Navigable Small World (HNSW) graph, which enables lightning-fast approximate nearest neighbor search:
- Implementation: Quiver uses a highly optimized HNSW implementation
- Parameters: Configurable
M
(max connections),efConstruction
(build quality), andefSearch
(search quality) - Distance Metrics: Supports Cosine and L2 (Euclidean) distance
Vector Storage¶
Vectors are stored in memory for fast access during search operations:
- In-memory Map: Vectors are stored in a map with vector IDs as keys
- Float32 Precision: Vectors use 32-bit floats for a good balance of precision and memory usage
- Concurrent Access: Protected by read-write locks for thread safety
Metadata Store¶
Metadata is stored in DuckDB, a fast analytical database:
- SQL Interface: Enables complex filtering and aggregation
- Arrow Integration: Efficient data transfer with Apache Arrow
- Schema Flexibility: Supports various data types for metadata fields
Persistence & Backup¶
Quiver ensures data durability through:
- Periodic Persistence: Configurable intervals for writing to disk
- Incremental Backups: Only changed data is backed up
- Compression: Optional compression for backups
- Encryption: Optional encryption for sensitive data
type Index struct {
// ...
persistInterval time.Duration
persistTicker *time.Ticker
persistDone chan struct{}
// ...
}
Dimensionality Reduction¶
Quiver includes built-in dimensionality reduction capabilities to optimize storage and search performance:
- PCA Implementation: Principal Component Analysis for efficient dimension reduction
- Adaptive Reduction: Automatically determines optimal dimensions based on data variance
- Inline Processing: Reduction happens during insertion and query operations
- Explicit API: Allows reducing vectors without adding them to the index
type Index struct {
// ...
dimReduction bool
dimReductionMethod string
dimReductionTarget int
dimReductionAdaptive bool
dimReductionVariance float64
// ...
}
Semantic Routing¶
Semantic routing enables intelligent query routing to the most relevant sub-indices:
- Multi-Index Support: Route queries to the most appropriate indices
- Routing Models: Uses lightweight models to determine query destinations
- Parallel Search: Execute searches in parallel across multiple indices
- Result Aggregation: Combine and rank results from multiple indices
Data Flow¶
Adding Vectors¶
When you add a vector to Quiver, here's what happens:
- Validation: Vector dimension and metadata are validated
- Dimensionality Reduction: If enabled, vector dimensions are reduced using the configured method
- Batching: Vector is added to a batch buffer
- HNSW Insertion: When the batch is full, vectors are added to the HNSW graph
- Metadata Storage: Metadata is stored in DuckDB
- Persistence: Data is periodically persisted to disk
func (idx *Index) Add(id uint64, vector []float32, meta map[string]interface{}) error {
// Validation
if len(vector) != idx.config.Dimension {
return fmt.Errorf("vector dimension mismatch: expected %d, got %d", idx.config.Dimension, len(vector))
}
// Dimensionality reduction if enabled
if idx.config.EnableDimReduction {
vector = idx.reduceVector(vector)
}
// Add to batch buffer
idx.batchLock.Lock()
idx.batchBuffer = append(idx.batchBuffer, vectorMeta{id, vector, meta})
idx.batchLock.Unlock()
// Flush if batch is full
if len(idx.batchBuffer) >= idx.config.BatchSize {
return idx.flushBatch()
}
return nil
}
Searching Vectors¶
When you search for similar vectors:
- Query Preparation: Query vector is validated
- Dimensionality Reduction: If enabled, query dimensions are reduced using the same method as during insertion
- HNSW Search: HNSW graph is traversed to find nearest neighbors
- Distance Calculation: Distances are calculated using the configured metric
- Metadata Retrieval: Metadata for result vectors is retrieved from DuckDB
- Result Formatting: Results are formatted and returned
func (idx *Index) Search(query []float32, k, page, pageSize int) ([]SearchResult, error) {
// Validation
if len(query) != idx.config.Dimension {
return nil, fmt.Errorf("query dimension mismatch: expected %d, got %d", idx.config.Dimension, len(query))
}
// Dimensionality reduction if enabled
if idx.config.EnableDimReduction {
query = idx.reduceVector(query)
}
// HNSW search
idx.lock.RLock()
neighbors, err := idx.hnsw.Search(query, k, idx.config.HNSWEfSearch)
idx.lock.RUnlock()
// Format results
results := make([]SearchResult, len(neighbors))
for i, n := range neighbors {
results[i] = SearchResult{
ID: n.ID,
Distance: n.Distance,
Metadata: idx.getMetadata(n.ID),
}
}
return results, nil
}
Hybrid Search¶
Quiver's hybrid search combines vector similarity with metadata filtering:
- Strategy Selection: Choose between filter-then-search or search-then-filter
- SQL Query: Execute SQL query to filter metadata
- Vector Search: Search for similar vectors among filtered results
- Result Combination: Combine vector similarity with metadata filtering
func (idx *Index) SearchWithFilter(query []float32, k int, filter string) ([]SearchResult, error) {
// Execute SQL query to get filtered IDs
sql := fmt.Sprintf("SELECT id FROM metadata WHERE %s", filter)
reader, stmt, _, err := idx.dbConn.Query(context.Background(), sql)
// Get filtered IDs
filteredIDs := make([]uint64, 0)
// ... process results ...
// Search among filtered IDs
results, err := idx.searchSubset(query, k, filteredIDs)
return results, nil
}
Semantic Routing Flow¶
When using semantic routing for search across multiple indices:
- Query Routing: The routing model determines which indices are most relevant for the query
- Parallel Search: Searches are executed in parallel across the selected indices
- Result Aggregation: Results from all indices are combined and ranked
- Response: The final ranked results are returned to the client
func (router *Router) Search(query []float32, k int, options SearchOptions) ([]SearchResult, error) {
// Determine target indices
targetIndices := router.routingModel.DetermineTargetIndices(query)
// Execute parallel searches
var wg sync.WaitGroup
resultChan := make(chan []SearchResult, len(targetIndices))
errorChan := make(chan error, len(targetIndices))
for _, indexName := range targetIndices {
wg.Add(1)
go func(name string) {
defer wg.Done()
idx := router.indices[name]
results, err := idx.Search(query, k, options)
if err != nil {
errorChan <- err
return
}
resultChan <- results
}(indexName)
}
// Wait for all searches to complete
wg.Wait()
close(resultChan)
close(errorChan)
// Check for errors
if len(errorChan) > 0 {
return nil, <-errorChan
}
// Combine and rank results
allResults := make([]SearchResult, 0)
for results := range resultChan {
allResults = append(allResults, results...)
}
// Sort by distance
sort.Slice(allResults, func(i, j int) bool {
return allResults[i].Distance < allResults[j].Distance
})
// Limit to k results
if len(allResults) > k {
allResults = allResults[:k]
}
return allResults, nil
}
Concurrency Model¶
Quiver uses a combination of locks and channels to manage concurrency:
- Read-Write Locks: Protect shared data structures during concurrent operations
- Batch Processing: Asynchronous batch processing for vector additions
- Background Workers: Separate goroutines for persistence and backup
- Connection Pool: Managed connections to DuckDB
type Index struct {
// ...
lock sync.RWMutex
batchLock sync.Mutex
cache sync.Map
batchTicker *time.Ticker
batchDone chan struct{}
persistTicker *time.Ticker
persistDone chan struct{}
backupTicker *time.Ticker
backupDone chan struct{}
bgWG sync.WaitGroup
// ...
}
DuckDB Integration¶
Quiver uses DuckDB for metadata storage and querying:
- ADBC Interface: Uses Apache Arrow Database Connectivity for communication
- Connection Management: Maintains a pool of connections
- Arrow Record Batches: Efficient data transfer with Arrow record batches
- SQL Queries: Executes SQL queries for metadata filtering
type DuckDB struct {
mu sync.Mutex
db adbc.Database
driver adbc.Driver
opts DuckDBOptions
conns []*DuckDBConn
}
type DuckDBConn struct {
parent *DuckDB
conn adbc.Connection
}
HTTP API¶
Quiver provides a RESTful HTTP API for remote access:
- Fiber Framework: Uses the high-performance Fiber web framework
- JSON Interface: Standard JSON request/response format
- Authentication: API key authentication for security
- Middleware: Logging, error handling, and rate limiting middleware
- Health Checks: Endpoints for monitoring and health checks
Performance Optimizations¶
Quiver includes several optimizations for maximum performance:
- Batch Processing: Vectors are added in batches for better throughput
- Metadata Caching: Frequently accessed metadata is cached in memory
- Connection Pooling: DuckDB connections are pooled and reused
- Concurrent Reads: Multiple read operations can execute in parallel
- SIMD Instructions: Distance calculations use SIMD instructions when available
- Memory Mapping: Efficient memory mapping for large indices
- Dimensionality Reduction: Reduces vector dimensions to improve search speed and memory usage
- Adaptive Reduction: Automatically determines optimal dimensions based on data variance
- Semantic Routing: Routes queries to the most relevant indices to reduce search space
- Parallel Search: Executes searches in parallel across multiple indices
Security Features¶
Quiver includes several security features:
- API Key Authentication: Secure access to the HTTP API
- Encryption at Rest: Optional encryption for stored data
- TLS Support: HTTPS support for secure communication
- Input Validation: Thorough validation of all inputs
- Error Handling: Secure error handling that doesn't leak sensitive information
Configuration System¶
Quiver's behavior is controlled through a comprehensive configuration system:
type Config struct {
Dimension int
StoragePath string
Distance DistanceMetric
MaxElements uint64
HNSWM int
HNSWEfConstruct int
HNSWEfSearch int
BatchSize int
PersistInterval time.Duration
BackupInterval time.Duration
BackupPath string
BackupCompression bool
MaxBackups int
EncryptionEnabled bool
EncryptionKey string
EnableDimReduction bool
DimReductionMethod string
DimReductionTarget int
DimReductionAdaptive bool
DimReductionVariance float64
EnableSemanticRouting bool
RoutingModel string
RoutingThreshold float64
}
Logging and Monitoring¶
Quiver uses structured logging and exposes metrics for monitoring:
- Zap Logger: High-performance structured logging
- Metrics Collection: Collects and exposes performance metrics
- Query Logging: Optional logging of queries for debugging
- Health Checks: Endpoints for monitoring system health
func (idx *Index) CollectMetrics() map[string]interface{} {
metrics := make(map[string]interface{})
metrics["vector_count"] = len(idx.vectors)
metrics["metadata_count"] = len(idx.metadata)
metrics["cache_size"] = idx.cacheSize()
// ... more metrics ...
return metrics
}
Design Decisions¶
Why HNSW?¶
HNSW was chosen for Quiver because:
- Performance: One of the fastest ANN algorithms available
- Memory Efficiency: Good balance of memory usage and search speed
- Quality: High recall even with approximate search
- Incremental Updates: Supports adding vectors without rebuilding the index
Why DuckDB?¶
DuckDB was chosen for metadata storage because:
- Performance: Extremely fast analytical database
- Embedded: No separate server process required
- SQL Interface: Familiar and powerful query language
- Arrow Integration: Efficient data transfer with Arrow
Why Go?¶
Go was chosen as the implementation language because:
- Performance: Compiled language with good performance
- Concurrency: Excellent support for concurrent programming
- Simplicity: Clean and straightforward language design
- Tooling: Great tooling for testing, profiling, and deployment
Why PCA for Dimensionality Reduction?¶
PCA was chosen as the initial dimensionality reduction method because:
- Efficiency: Computationally efficient compared to other methods
- Deterministic: Produces consistent results for the same input
- Variance Preservation: Maximizes the variance retained in the reduced dimensions
- Simplicity: Conceptually simple and well-understood algorithm
- Adaptability: Works well with the adaptive dimensionality approach
Why Semantic Routing?¶
Semantic routing was implemented to address the challenges of scaling vector search:
- Scalability: Enables efficient search across multiple indices
- Performance: Reduces the search space by targeting only relevant indices
- Flexibility: Allows for specialized indices for different types of data
- Resource Efficiency: Optimizes resource usage by avoiding unnecessary searches
Next Steps¶
Now that you understand Quiver's architecture, check out:
- HNSW Algorithm - Learn more about how HNSW works
- DuckDB Integration - Understand how Quiver uses DuckDB
- Performance Tuning - Optimize Quiver for your needs
- Dimensionality Reduction - Learn about reducing vector dimensions
- Semantic Routing - Understand how query routing works