Metadata & Filtering¶

One of Quiver's superpowers is its ability to combine vector search with rich metadata filtering. Let's dive into how you can use metadata to supercharge your vector searches! 🔍

Metadata Basics¶

What is Metadata?¶

In Quiver, metadata is structured information attached to each vector. It's stored as a map of string keys to arbitrary values:

metadata := map[string]interface{}{
    "category": "science",
    "name": "black hole",
    "tags": []string{"astronomy", "physics"},
    "created_at": time.Now().Unix(),
    "rating": 4.8,
    "is_featured": true,
}

Metadata can include:

Strings
Numbers
Booleans
Arrays
Nested objects

Adding Metadata¶

You add metadata when you add a vector:

idx.Add(1, vector, metadata)

Required Fields

Quiver requires at least a "category" field in the metadata. This helps with organization and filtering.

Retrieving Metadata¶

When you search for vectors, the results include the metadata:

results, _ := idx.Search(queryVector, 10, 1, 10)
for _, result := range results {
    fmt.Printf("ID: %d, Name: %s, Category: %s\n",
        result.ID, 
        result.Metadata["name"],
        result.Metadata["category"])
}

Metadata Storage¶

Behind the scenes, Quiver uses DuckDB to store and query metadata. This gives you the power of SQL for filtering and organizing your vectors.

Schema¶

Metadata is stored in a table with the following schema:

CREATE TABLE metadata (
    id BIGINT PRIMARY KEY,
    json JSON
)

id: The vector ID
json: The metadata as a JSON object

This schema allows for flexible metadata while still enabling efficient queries.

Filtering with SQL¶

Basic Filtering¶

The simplest way to filter is with the SearchWithFilter method:

results, _ := idx.SearchWithFilter(queryVector, 10, 
    "category = 'science'")

The filter is a SQL WHERE clause that operates on the metadata.

Advanced Filtering¶

You can use any SQL expression that DuckDB supports:

// Complex filter with multiple conditions
filter := `
    category = 'science' 
    AND json_array_contains(tags, 'physics') 
    AND rating > 4.0 
    AND created_at > 1609459200
`
results, _ := idx.SearchWithFilter(queryVector, 10, filter)

JSON Functions¶

DuckDB provides powerful JSON functions for querying nested data:

// Query nested properties
filter := `json_extract(json, '$.details.publisher') = 'Nature'`
results, _ := idx.SearchWithFilter(queryVector, 10, filter)

Array Operations¶

You can query arrays in metadata:

// Check if an array contains a value
filter := `json_array_contains(tags, 'quantum')`
results, _ := idx.SearchWithFilter(queryVector, 10, filter)

// Check array length
filter := `json_array_length(tags) > 3`
results, _ := idx.SearchWithFilter(queryVector, 10, filter)

Direct Metadata Queries¶

Querying Without Vectors¶

You can query metadata directly without vector search:

// Query metadata using SQL
results, _ := idx.QueryMetadata(
    "SELECT * FROM metadata WHERE category = 'science' ORDER BY created_at DESC LIMIT 10")

This returns a slice of metadata maps.

Aggregations and Analytics¶

You can use SQL aggregations for analytics:

// Count vectors by category
counts, _ := idx.QueryMetadata(
    "SELECT json_extract(json, '$.category') as category, COUNT(*) as count " +
    "FROM metadata GROUP BY category ORDER BY count DESC")

Hybrid Search Strategies¶

Two-Stage Search¶

For large datasets, a two-stage search can be more efficient:

First, filter the metadata to get a subset of IDs
Then, perform vector search only on those IDs

// Step 1: Get IDs matching the filter
idResults, _ := idx.QueryMetadata(
    "SELECT id FROM metadata WHERE category = 'science' AND rating > 4.5")

// Extract IDs
ids := make([]uint64, len(idResults))
for i, result := range idResults {
    ids[i] = result["id"].(uint64)
}

// Step 2: Search only those IDs
results, _ := idx.SearchSubset(queryVector, ids, 10)

Faceted Search¶

Faceted search allows filtering by specific metadata facets:

// Search with facets
results, _ := idx.FacetedSearch(queryVector, 10, map[string]string{
    "category": "science",
    "year": "2023",
})

Best Practices¶

Metadata Design¶

Keep metadata fields consistent across vectors
Use a schema validation function for consistency
Consider indexing frequently queried fields
Use meaningful categories for organization

Performance Tips¶

Avoid overly complex SQL queries on large datasets
Use two-stage search for highly selective filters
Cache common query results if possible
Consider denormalizing data for faster queries

Example Metadata Schemas¶

Document Vectors:

metadata := map[string]interface{}{
    "category": "document",
    "title": "Quantum Computing Basics",
    "author": "Jane Smith",
    "tags": []string{"quantum", "computing", "tutorial"},
    "created_at": time.Now().Unix(),
    "word_count": 2500,
}

Product Vectors:

metadata := map[string]interface{}{
    "category": "product",
    "name": "Wireless Headphones",
    "brand": "AudioTech",
    "price": 99.99,
    "colors": []string{"black", "white", "blue"},
    "in_stock": true,
    "ratings": map[string]interface{}{
        "average": 4.7,
        "count": 253,
    },
}

Next Steps¶

Now that you've mastered metadata and filtering, check out:

Persistence & Backup - Keep your data safe
Security - Secure your vector database
HTTP API - Use Quiver as a service