Dimensionality Reduction¶

Quiver includes built-in dimensionality reduction capabilities that can significantly improve performance and reduce storage requirements while maintaining search quality.

Overview¶

Dimensionality reduction is a technique used to reduce the number of features in a dataset while preserving as much information as possible. In the context of vector databases like Quiver, this means reducing the size of vectors without significantly impacting search quality.

Benefits of dimensionality reduction include:

Reduced storage requirements: Smaller vectors mean less storage space needed
Improved search performance: Searching in lower-dimensional spaces is faster
Reduced memory usage: Lower-dimensional vectors consume less memory
Potential noise reduction: Removing less significant dimensions can reduce noise

How It Works in Quiver¶

Quiver's dimensionality reduction happens at two key points:

At insertion time: When vectors are added to the index, they are automatically reduced to the target dimension before storage.
At query time: When searching, query vectors are reduced using the same method to ensure consistency.

There is no separate background process - the reduction happens inline during these operations.

Available Methods¶

Quiver currently supports the following dimensionality reduction methods:

PCA (Principal Component Analysis): A linear technique that identifies the directions (principal components) that maximize the variance in the data.
t-SNE: (Coming soon) A non-linear technique that's particularly good at preserving local structure.
UMAP: (Coming soon) A non-linear technique that can preserve both local and global structure.

Configuration¶

To enable dimensionality reduction, you need to configure it when creating your index:

from quiver import Quiver

# Create a Quiver index with dimensionality reduction
index = Quiver(
    dimension=1536,  # Original dimension
    enable_dim_reduction=True,
    dim_reduction_method="PCA",
    dim_reduction_target=768,  # Target dimension
)

# Add vectors as usual - they'll be automatically reduced
index.add(id=1, vector=[...])  # 1536-dimensional vector

Configuration Options¶

Option	Description	Default
`enable_dim_reduction`	Whether to enable dimensionality reduction	`False`
`dim_reduction_method`	Method to use (`PCA`, `TSNE`, `UMAP`)	`PCA`
`dim_reduction_target`	Target dimension	`dimension / 2`
`dim_reduction_adaptive`	Whether to use adaptive dimensionality reduction	`False`
`dim_reduction_min_variance`	Minimum variance to explain (0.0-1.0) for adaptive reduction	`0.95`

Adaptive Dimensionality Reduction¶

Quiver supports adaptive dimensionality reduction, which automatically determines the optimal number of dimensions based on the data. When enabled, Quiver will:

Analyze the variance explained by each dimension
Choose the minimum number of dimensions needed to explain the specified minimum variance
Use this as the target dimension

This is particularly useful when you don't know the optimal dimension in advance or when your data characteristics change over time.

# Create a Quiver index with adaptive dimensionality reduction
index = Quiver(
    dimension=1536,
    enable_dim_reduction=True,
    dim_reduction_method="PCA",
    dim_reduction_adaptive=True,
    dim_reduction_min_variance=0.95,  # Preserve 95% of variance
)

Explicit Reduction API¶

You can also use Quiver's dimensionality reduction API directly without storing vectors:

# Reduce vectors explicitly
reduced_vectors = index.reduce_vectors([
    [0.1, 0.2, 0.3, ...],  # Original high-dimensional vectors
    [0.4, 0.5, 0.6, ...],
])

Best Practices¶

Start with PCA: It's fast and works well for most use cases
Use adaptive reduction when you're unsure about the optimal dimension
Benchmark different settings: The optimal configuration depends on your specific data
Consider the trade-off: Lower dimensions mean faster searches but potentially lower accuracy
Test with your actual queries: Make sure the reduction doesn't negatively impact your specific use case