Metadata & Filtering¶
One of Quiver's superpowers is its ability to combine vector search with rich metadata filtering. Let's dive into how you can use metadata to supercharge your vector searches! 🔍
Metadata Basics¶
What is Metadata?¶
In Quiver, metadata is structured information attached to each vector. It's stored as a map of string keys to arbitrary values:
metadata := map[string]interface{}{
"category": "science",
"name": "black hole",
"tags": []string{"astronomy", "physics"},
"created_at": time.Now().Unix(),
"rating": 4.8,
"is_featured": true,
}
Metadata can include:
- Strings
- Numbers
- Booleans
- Arrays
- Nested objects
Adding Metadata¶
You add metadata when you add a vector:
Required Fields
Quiver requires at least a "category"
field in the metadata. This helps with organization and filtering.
Retrieving Metadata¶
When you search for vectors, the results include the metadata:
results, _ := idx.Search(queryVector, 10, 1, 10)
for _, result := range results {
fmt.Printf("ID: %d, Name: %s, Category: %s\n",
result.ID,
result.Metadata["name"],
result.Metadata["category"])
}
Metadata Storage¶
Behind the scenes, Quiver uses DuckDB to store and query metadata. This gives you the power of SQL for filtering and organizing your vectors.
Schema¶
Metadata is stored in a table with the following schema:
id
: The vector IDjson
: The metadata as a JSON object
This schema allows for flexible metadata while still enabling efficient queries.
Filtering with SQL¶
Basic Filtering¶
The simplest way to filter is with the SearchWithFilter
method:
The filter is a SQL WHERE clause that operates on the metadata.
Advanced Filtering¶
You can use any SQL expression that DuckDB supports:
// Complex filter with multiple conditions
filter := `
category = 'science'
AND json_array_contains(tags, 'physics')
AND rating > 4.0
AND created_at > 1609459200
`
results, _ := idx.SearchWithFilter(queryVector, 10, filter)
JSON Functions¶
DuckDB provides powerful JSON functions for querying nested data:
// Query nested properties
filter := `json_extract(json, '$.details.publisher') = 'Nature'`
results, _ := idx.SearchWithFilter(queryVector, 10, filter)
Array Operations¶
You can query arrays in metadata:
// Check if an array contains a value
filter := `json_array_contains(tags, 'quantum')`
results, _ := idx.SearchWithFilter(queryVector, 10, filter)
// Check array length
filter := `json_array_length(tags) > 3`
results, _ := idx.SearchWithFilter(queryVector, 10, filter)
Direct Metadata Queries¶
Querying Without Vectors¶
You can query metadata directly without vector search:
// Query metadata using SQL
results, _ := idx.QueryMetadata(
"SELECT * FROM metadata WHERE category = 'science' ORDER BY created_at DESC LIMIT 10")
This returns a slice of metadata maps.
Aggregations and Analytics¶
You can use SQL aggregations for analytics:
// Count vectors by category
counts, _ := idx.QueryMetadata(
"SELECT json_extract(json, '$.category') as category, COUNT(*) as count " +
"FROM metadata GROUP BY category ORDER BY count DESC")
Hybrid Search Strategies¶
Two-Stage Search¶
For large datasets, a two-stage search can be more efficient:
- First, filter the metadata to get a subset of IDs
- Then, perform vector search only on those IDs
// Step 1: Get IDs matching the filter
idResults, _ := idx.QueryMetadata(
"SELECT id FROM metadata WHERE category = 'science' AND rating > 4.5")
// Extract IDs
ids := make([]uint64, len(idResults))
for i, result := range idResults {
ids[i] = result["id"].(uint64)
}
// Step 2: Search only those IDs
results, _ := idx.SearchSubset(queryVector, ids, 10)
Faceted Search¶
Faceted search allows filtering by specific metadata facets:
// Search with facets
results, _ := idx.FacetedSearch(queryVector, 10, map[string]string{
"category": "science",
"year": "2023",
})
Best Practices¶
Metadata Design¶
- Keep metadata fields consistent across vectors
- Use a schema validation function for consistency
- Consider indexing frequently queried fields
- Use meaningful categories for organization
Performance Tips¶
- Avoid overly complex SQL queries on large datasets
- Use two-stage search for highly selective filters
- Cache common query results if possible
- Consider denormalizing data for faster queries
Example Metadata Schemas¶
Document Vectors:
metadata := map[string]interface{}{
"category": "document",
"title": "Quantum Computing Basics",
"author": "Jane Smith",
"tags": []string{"quantum", "computing", "tutorial"},
"created_at": time.Now().Unix(),
"word_count": 2500,
}
Product Vectors:
metadata := map[string]interface{}{
"category": "product",
"name": "Wireless Headphones",
"brand": "AudioTech",
"price": 99.99,
"colors": []string{"black", "white", "blue"},
"in_stock": true,
"ratings": map[string]interface{}{
"average": 4.7,
"count": 253,
},
}
Next Steps¶
Now that you've mastered metadata and filtering, check out:
- Persistence & Backup - Keep your data safe
- Security - Secure your vector database
- HTTP API - Use Quiver as a service