Hash Module¶

The Hash module provides efficient and flexible file hashing capabilities for FlashFS. It supports multiple hashing algorithms, concurrent processing, and memory-efficient operations through a buffer pool.

Overview¶

The Hash module is designed to:

Compute hashes for files, byte slices, strings, and readers
Support multiple hashing algorithms (BLAKE3, MD5, SHA1, SHA256)
Optimize memory usage with a buffer pool
Enable concurrent hashing of multiple files
Provide partial hashing for large files to improve performance

Key Components¶

Algorithms¶

The module supports the following hashing algorithms:

BLAKE3 (default): Fast and cryptographically secure
MD5: Provided for compatibility (not cryptographically secure)
SHA1: Provided for compatibility (not cryptographically secure)
SHA256: Secure but slower algorithm

Options¶

Hashing behavior can be configured using the Options struct:

type Options struct {
    Algorithm               Algorithm
    BufferSize              int
    SkipErrors              bool
    Concurrency             int
    UsePartialHashing       bool
    PartialHashingThreshold int64
}

Algorithm: The hashing algorithm to use
BufferSize: Size of the buffer used for reading files (default: 16MB)
SkipErrors: Whether to return an error or an empty hash on failure
Concurrency: Number of files to hash concurrently (0 = use all available CPUs)
UsePartialHashing: Enables partial hashing for large files
PartialHashingThreshold: File size threshold above which partial hashing is used (default: 10MB)

Result¶

The Result struct represents the outcome of a hashing operation:

type Result struct {
    Hash      string
    Error     error
    Algorithm Algorithm
    Size      int64
}

Core Functions¶

File Hashing¶

// Hash a file using the specified options
result := hash.File(path, options)

// Hash multiple files concurrently
results := hash.FilesConcurrent(paths, options)

// Hash a large file using partial hashing
result := hash.PartialFile(path, options)

Other Hashing Functions¶

// Hash a byte slice
result := hash.Bytes(data, algorithm)

// Hash a string
result := hash.String(data, algorithm)

// Hash data from an io.Reader
result := hash.Reader(reader, algorithm)

// Verify a file against an expected hash
match, err := hash.Verify(path, expectedHash, options)

Partial Hashing¶

For large files, the module provides a partial hashing feature that samples portions of the file rather than reading the entire content:

The first N bytes
The middle N bytes
The last N bytes

These samples are combined to create a representative hash of the file. This approach is significantly faster for very large files while still providing reliable change detection.

// Enable partial hashing in options
options := hash.DefaultOptions()
options.UsePartialHashing = true
options.PartialHashingThreshold = 100 * 1024 * 1024 // 100MB

// Hash a file (will use partial hashing for files > 100MB)
result := hash.File(path, options)

Performance Considerations¶

Buffer Size: Larger buffers generally improve throughput but consume more memory
Concurrency: Adjust based on available CPU cores and I/O capabilities
Partial Hashing: Consider for very large files where full hashing would be prohibitively expensive
Algorithm Choice: BLAKE3 offers the best performance among the secure algorithms

Example Usage¶

// Basic file hashing with default options
result := hash.File("path/to/file.txt", hash.DefaultOptions())
if result.Error != nil {
    log.Fatalf("Failed to hash file: %v", result.Error)
}
fmt.Printf("Hash: %s\n", result.Hash)

// Concurrent hashing of multiple files
files := []string{"file1.txt", "file2.txt", "file3.txt"}
options := hash.DefaultOptions()
options.Algorithm = hash.SHA256
results := hash.FilesConcurrent(files, options)
for path, result := range results {
    if result.Error != nil {
        fmt.Printf("Failed to hash %s: %v\n", path, result.Error)
        continue
    }
    fmt.Printf("%s: %s\n", path, result.Hash)
}