Skip to content

Metadata & Filtering

One of Quiver's superpowers is its ability to combine vector search with rich metadata filtering. Let's dive into how you can use metadata to supercharge your vector searches! 🔍

Metadata Basics

What is Metadata?

In Quiver, metadata is structured information attached to each vector. It's stored as a map of string keys to arbitrary values:

metadata := map[string]interface{}{
    "category": "science",
    "name": "black hole",
    "tags": []string{"astronomy", "physics"},
    "created_at": time.Now().Unix(),
    "rating": 4.8,
    "is_featured": true,
}

Metadata can include:

  • Strings
  • Numbers
  • Booleans
  • Arrays
  • Nested objects

Adding Metadata

You add metadata when you add a vector:

idx.Add(1, vector, metadata)

Required Fields

Quiver requires at least a "category" field in the metadata. This helps with organization and filtering.

Retrieving Metadata

When you search for vectors, the results include the metadata:

results, _ := idx.Search(queryVector, 10, 1, 10)
for _, result := range results {
    fmt.Printf("ID: %d, Name: %s, Category: %s\n",
        result.ID, 
        result.Metadata["name"],
        result.Metadata["category"])
}

Metadata Storage

Behind the scenes, Quiver uses DuckDB to store and query metadata. This gives you the power of SQL for filtering and organizing your vectors.

Schema

Metadata is stored in a table with the following schema:

CREATE TABLE metadata (
    id BIGINT PRIMARY KEY,
    json JSON
)
  • id: The vector ID
  • json: The metadata as a JSON object

This schema allows for flexible metadata while still enabling efficient queries.

Filtering with SQL

Basic Filtering

The simplest way to filter is with the SearchWithFilter method:

results, _ := idx.SearchWithFilter(queryVector, 10, 
    "category = 'science'")

The filter is a SQL WHERE clause that operates on the metadata.

Advanced Filtering

You can use any SQL expression that DuckDB supports:

// Complex filter with multiple conditions
filter := `
    category = 'science' 
    AND json_array_contains(tags, 'physics') 
    AND rating > 4.0 
    AND created_at > 1609459200
`
results, _ := idx.SearchWithFilter(queryVector, 10, filter)

JSON Functions

DuckDB provides powerful JSON functions for querying nested data:

// Query nested properties
filter := `json_extract(json, '$.details.publisher') = 'Nature'`
results, _ := idx.SearchWithFilter(queryVector, 10, filter)

Array Operations

You can query arrays in metadata:

// Check if an array contains a value
filter := `json_array_contains(tags, 'quantum')`
results, _ := idx.SearchWithFilter(queryVector, 10, filter)

// Check array length
filter := `json_array_length(tags) > 3`
results, _ := idx.SearchWithFilter(queryVector, 10, filter)

Direct Metadata Queries

Querying Without Vectors

You can query metadata directly without vector search:

// Query metadata using SQL
results, _ := idx.QueryMetadata(
    "SELECT * FROM metadata WHERE category = 'science' ORDER BY created_at DESC LIMIT 10")

This returns a slice of metadata maps.

Aggregations and Analytics

You can use SQL aggregations for analytics:

// Count vectors by category
counts, _ := idx.QueryMetadata(
    "SELECT json_extract(json, '$.category') as category, COUNT(*) as count " +
    "FROM metadata GROUP BY category ORDER BY count DESC")

Hybrid Search Strategies

For large datasets, a two-stage search can be more efficient:

  1. First, filter the metadata to get a subset of IDs
  2. Then, perform vector search only on those IDs
// Step 1: Get IDs matching the filter
idResults, _ := idx.QueryMetadata(
    "SELECT id FROM metadata WHERE category = 'science' AND rating > 4.5")

// Extract IDs
ids := make([]uint64, len(idResults))
for i, result := range idResults {
    ids[i] = result["id"].(uint64)
}

// Step 2: Search only those IDs
results, _ := idx.SearchSubset(queryVector, ids, 10)

Faceted search allows filtering by specific metadata facets:

// Search with facets
results, _ := idx.FacetedSearch(queryVector, 10, map[string]string{
    "category": "science",
    "year": "2023",
})

Best Practices

Metadata Design

  • Keep metadata fields consistent across vectors
  • Use a schema validation function for consistency
  • Consider indexing frequently queried fields
  • Use meaningful categories for organization

Performance Tips

  • Avoid overly complex SQL queries on large datasets
  • Use two-stage search for highly selective filters
  • Cache common query results if possible
  • Consider denormalizing data for faster queries

Example Metadata Schemas

Document Vectors:

metadata := map[string]interface{}{
    "category": "document",
    "title": "Quantum Computing Basics",
    "author": "Jane Smith",
    "tags": []string{"quantum", "computing", "tutorial"},
    "created_at": time.Now().Unix(),
    "word_count": 2500,
}

Product Vectors:

metadata := map[string]interface{}{
    "category": "product",
    "name": "Wireless Headphones",
    "brand": "AudioTech",
    "price": 99.99,
    "colors": []string{"black", "white", "blue"},
    "in_stock": true,
    "ratings": map[string]interface{}{
        "average": 4.7,
        "count": 253,
    },
}

Next Steps

Now that you've mastered metadata and filtering, check out: