Hash Module¶
The Hash module provides efficient and flexible file hashing capabilities for FlashFS. It supports multiple hashing algorithms, concurrent processing, and memory-efficient operations through a buffer pool.
Overview¶
The Hash module is designed to:
- Compute hashes for files, byte slices, strings, and readers
- Support multiple hashing algorithms (BLAKE3, MD5, SHA1, SHA256)
- Optimize memory usage with a buffer pool
- Enable concurrent hashing of multiple files
- Provide partial hashing for large files to improve performance
Key Components¶
Algorithms¶
The module supports the following hashing algorithms:
- BLAKE3 (default): Fast and cryptographically secure
- MD5: Provided for compatibility (not cryptographically secure)
- SHA1: Provided for compatibility (not cryptographically secure)
- SHA256: Secure but slower algorithm
Options¶
Hashing behavior can be configured using the Options
struct:
type Options struct {
Algorithm Algorithm
BufferSize int
SkipErrors bool
Concurrency int
UsePartialHashing bool
PartialHashingThreshold int64
}
- Algorithm: The hashing algorithm to use
- BufferSize: Size of the buffer used for reading files (default: 16MB)
- SkipErrors: Whether to return an error or an empty hash on failure
- Concurrency: Number of files to hash concurrently (0 = use all available CPUs)
- UsePartialHashing: Enables partial hashing for large files
- PartialHashingThreshold: File size threshold above which partial hashing is used (default: 10MB)
Result¶
The Result
struct represents the outcome of a hashing operation:
type Result struct {
Hash string
Error error
Algorithm Algorithm
Size int64
}
Core Functions¶
File Hashing¶
// Hash a file using the specified options
result := hash.File(path, options)
// Hash multiple files concurrently
results := hash.FilesConcurrent(paths, options)
// Hash a large file using partial hashing
result := hash.PartialFile(path, options)
Other Hashing Functions¶
// Hash a byte slice
result := hash.Bytes(data, algorithm)
// Hash a string
result := hash.String(data, algorithm)
// Hash data from an io.Reader
result := hash.Reader(reader, algorithm)
// Verify a file against an expected hash
match, err := hash.Verify(path, expectedHash, options)
Partial Hashing¶
For large files, the module provides a partial hashing feature that samples portions of the file rather than reading the entire content:
- The first N bytes
- The middle N bytes
- The last N bytes
These samples are combined to create a representative hash of the file. This approach is significantly faster for very large files while still providing reliable change detection.
// Enable partial hashing in options
options := hash.DefaultOptions()
options.UsePartialHashing = true
options.PartialHashingThreshold = 100 * 1024 * 1024 // 100MB
// Hash a file (will use partial hashing for files > 100MB)
result := hash.File(path, options)
Performance Considerations¶
- Buffer Size: Larger buffers generally improve throughput but consume more memory
- Concurrency: Adjust based on available CPU cores and I/O capabilities
- Partial Hashing: Consider for very large files where full hashing would be prohibitively expensive
- Algorithm Choice: BLAKE3 offers the best performance among the secure algorithms
Example Usage¶
// Basic file hashing with default options
result := hash.File("path/to/file.txt", hash.DefaultOptions())
if result.Error != nil {
log.Fatalf("Failed to hash file: %v", result.Error)
}
fmt.Printf("Hash: %s\n", result.Hash)
// Concurrent hashing of multiple files
files := []string{"file1.txt", "file2.txt", "file3.txt"}
options := hash.DefaultOptions()
options.Algorithm = hash.SHA256
results := hash.FilesConcurrent(files, options)
for path, result := range results {
if result.Error != nil {
fmt.Printf("Failed to hash %s: %v\n", path, result.Error)
continue
}
fmt.Printf("%s: %s\n", path, result.Hash)
}