Search Algorithms¶
This page explains the search algorithms used in HNSW, how they work, and how to optimize them for your specific use case.
Overview¶
HNSW (Hierarchical Navigable Small World) uses a multi-layered graph structure to enable efficient approximate nearest neighbor search. The search algorithm is a key component that allows HNSW to achieve logarithmic time complexity for search operations.
The core search algorithm in HNSW is a greedy search with a beam-like approach that:
- Starts at the top layer of the graph (with fewest nodes)
- Performs a greedy search to find the closest node to the query
- Uses this node as an entry point to the next layer
- Repeats the process until reaching the bottom layer
- Returns the closest nodes found in the bottom layer
Basic Search Algorithm¶
The basic search algorithm in HNSW is implemented as follows:
func (g *Graph) Search(query Vector, k int) []Node {
// Start from the entry point at the top layer
currentNode := g.entryPoint
currentDistance := g.Distance(query, currentNode.Vector)
// Traverse the layers from top to bottom
for layer := g.maxLayer; layer > 0; layer-- {
// Greedy search in the current layer
changed := true
for changed {
changed = false
// Check all neighbors of the current node at this layer
for _, neighbor := range currentNode.Neighbors[layer] {
neighborDistance := g.Distance(query, neighbor.Vector)
// If we found a closer node, move to it
if neighborDistance < currentDistance {
currentNode = neighbor
currentDistance = neighborDistance
changed = true
break
}
}
}
}
// Perform the final search in the bottom layer (layer 0)
// using a priority queue to keep track of the k closest nodes
candidates := NewPriorityQueue(k)
candidates.Push(currentNode, currentDistance)
visited := make(map[NodeID]bool)
visited[currentNode.ID] = true
for candidates.Len() > 0 {
// Get the closest unvisited node
current, _ := candidates.Pop()
// Check all neighbors of the current node
for _, neighbor := range current.Neighbors[0] {
if visited[neighbor.ID] {
continue
}
visited[neighbor.ID] = true
neighborDistance := g.Distance(query, neighbor.Vector)
// If we haven't found k nodes yet, or if this node is closer than the furthest in our result
if candidates.Len() < k || neighborDistance < candidates.MaxDistance() {
candidates.Push(neighbor, neighborDistance)
}
}
}
// Return the k closest nodes
return candidates.Items()
}
Beam Search¶
HNSW uses a beam search approach in the bottom layer to improve the quality of search results. Instead of a simple greedy search, it maintains a priority queue of the best candidates found so far and explores their neighborhoods.
The beam search algorithm:
- Initializes a priority queue with the entry point
- Repeatedly extracts the closest node from the queue
- Explores its neighbors and adds promising ones to the queue
- Continues until the queue is empty or a stopping condition is met
This approach helps avoid getting stuck in local minima and improves the recall of the search.
Search with ef Parameter¶
The ef
parameter controls the size of the dynamic candidate list during search. A larger ef
value results in more accurate but slower searches:
func (g *Graph) SearchWithEF(query Vector, k int, ef int) []Node {
// Similar to the basic search, but with a larger candidate pool
// controlled by the ef parameter
// ... (code for traversing layers)
// Use a priority queue with capacity ef instead of k
candidates := NewPriorityQueue(ef)
// ... (rest of the search algorithm)
}
The ef
parameter allows you to trade off between search speed and accuracy:
- Small
ef
(e.g., 10-50): Faster searches but potentially lower recall - Large
ef
(e.g., 100-1000): Slower searches but higher recall
Concurrent Search¶
HNSW supports concurrent searches, which can significantly improve throughput when processing multiple queries:
func (g *Graph) ConcurrentSearch(queries []Vector, k int) [][]Node {
results := make([][]Node, len(queries))
var wg sync.WaitGroup
for i, query := range queries {
wg.Add(1)
go func(i int, query Vector) {
defer wg.Done()
results[i] = g.Search(query, k)
}(i, query)
}
wg.Wait()
return results
}
The HNSW graph structure is read-only during search operations, which allows for safe concurrent access without locks.
Search with Filters¶
HNSW can be extended to support filtered search, where results must satisfy certain criteria:
func (g *FilteredGraph) SearchWithFilter(query Vector, k int, filter Filter) []Node {
// Similar to the basic search, but with an additional filter step
// ... (code for traversing layers)
for candidates.Len() > 0 {
current, _ := candidates.Pop()
// Only consider neighbors that satisfy the filter
for _, neighbor := range current.Neighbors[0] {
if visited[neighbor.ID] || !filter(neighbor) {
continue
}
// ... (rest of the search algorithm)
}
}
}
Filtered search may require exploring more nodes to find k results that satisfy the filter, potentially increasing search time.
Search with Negative Examples¶
HNSW can be extended to support search with negative examples, where the goal is to find vectors similar to positive examples but dissimilar to negative examples:
func (g *Graph) SearchWithNegatives(positiveQuery, negativeQuery Vector, k int) []Node {
// ... (code for traversing layers)
for candidates.Len() > 0 {
current, _ := candidates.Pop()
// Calculate combined distance: distance to positive - distance to negative
for _, neighbor := range current.Neighbors[0] {
if visited[neighbor.ID] {
continue
}
visited[neighbor.ID] = true
positiveDistance := g.Distance(positiveQuery, neighbor.Vector)
negativeDistance := g.Distance(negativeQuery, neighbor.Vector)
combinedDistance := positiveDistance - negativeDistance
// ... (rest of the search algorithm using combinedDistance)
}
}
}
This approach can be useful for recommendation systems and other applications where you want to find items similar to some examples but dissimilar to others.
Performance Tuning¶
Optimizing ef Parameter¶
The ef
parameter has a significant impact on search performance:
// For high recall (>95%)
results := graph.SearchWithEF(query, 10, 200)
// For balanced performance
results := graph.SearchWithEF(query, 10, 50)
// For high speed
results := graph.SearchWithEF(query, 10, 20)
You can dynamically adjust the ef
parameter based on your requirements:
func adaptiveSearch(query Vector, k int, targetRecall float64) []Node {
// Start with a small ef
ef := k * 2
// Increase ef until we reach the target recall
for {
results := graph.SearchWithEF(query, k, ef)
recall := estimateRecall(results)
if recall >= targetRecall {
return results
}
ef *= 2
if ef > 1000 {
// Cap ef to avoid excessive computation
return graph.SearchWithEF(query, k, 1000)
}
}
}
Batch Search¶
For processing multiple queries, batch search can be more efficient than individual searches:
func (g *Graph) BatchSearch(queries []Vector, k int) [][]Node {
results := make([][]Node, len(queries))
// Process queries in batches to utilize CPU caches better
batchSize := 16
for i := 0; i < len(queries); i += batchSize {
end := i + batchSize
if end > len(queries) {
end = len(queries)
}
// Process this batch concurrently
var wg sync.WaitGroup
for j := i; j < end; j++ {
wg.Add(1)
go func(j int) {
defer wg.Done()
results[j] = g.Search(queries[j], k)
}(j)
}
wg.Wait()
}
return results
}
Batch processing can improve cache locality and overall throughput.
Advanced Search Techniques¶
Hybrid Search¶
HNSW can be combined with exact search methods for a hybrid approach:
func hybridSearch(query Vector, k int, exactThreshold int) []Node {
// Perform approximate search with HNSW
approximateResults := graph.Search(query, k)
// If we need high precision or have few results, perform exact search
if len(approximateResults) < exactThreshold {
exactResults := exactSearch(query, k)
return exactResults
}
return approximateResults
}
This approach can provide a good balance between speed and accuracy.
Progressive Search¶
For interactive applications, progressive search can provide quick initial results that improve over time:
func progressiveSearch(query Vector, k int, callback func([]Node)) {
// Start with a small ef for quick initial results
initialResults := graph.SearchWithEF(query, k, 10)
callback(initialResults)
// Improve results with larger ef values
for ef := 20; ef <= 100; ef *= 2 {
improvedResults := graph.SearchWithEF(query, k, ef)
callback(improvedResults)
}
}
This approach is useful for user interfaces where showing some results quickly is more important than waiting for the most accurate results.
Next Steps¶
Now that you understand the search algorithms in HNSW, you can explore:
- Graph Structure: Learn about the hierarchical structure of HNSW
- Distance Functions: Understand how to choose the right distance function
- Performance Tuning: Learn how to optimize HNSW for your specific use case