Hash Module

The Hash module provides efficient and flexible file hashing capabilities for FlashFS. It supports multiple hashing algorithms, concurrent processing, and memory-efficient operations through a buffer pool.

Overview

The Hash module is designed to:

  1. Compute hashes for files, byte slices, strings, and readers
  2. Support multiple hashing algorithms (BLAKE3, MD5, SHA1, SHA256)
  3. Optimize memory usage with a buffer pool
  4. Enable concurrent hashing of multiple files
  5. Provide partial hashing for large files to improve performance

Key Components

Algorithms

The module supports the following hashing algorithms:

  • BLAKE3 (default): Fast and cryptographically secure
  • MD5: Provided for compatibility (not cryptographically secure)
  • SHA1: Provided for compatibility (not cryptographically secure)
  • SHA256: Secure but slower algorithm

Options

Hashing behavior can be configured using the Options struct:

type Options struct {
    Algorithm               Algorithm
    BufferSize              int
    SkipErrors              bool
    Concurrency             int
    UsePartialHashing       bool
    PartialHashingThreshold int64
}
  • Algorithm: The hashing algorithm to use
  • BufferSize: Size of the buffer used for reading files (default: 16MB)
  • SkipErrors: Whether to return an error or an empty hash on failure
  • Concurrency: Number of files to hash concurrently (0 = use all available CPUs)
  • UsePartialHashing: Enables partial hashing for large files
  • PartialHashingThreshold: File size threshold above which partial hashing is used (default: 10MB)

Result

The Result struct represents the outcome of a hashing operation:

type Result struct {
    Hash      string
    Error     error
    Algorithm Algorithm
    Size      int64
}

Core Functions

File Hashing

// Hash a file using the specified options
result := hash.File(path, options)

// Hash multiple files concurrently
results := hash.FilesConcurrent(paths, options)

// Hash a large file using partial hashing
result := hash.PartialFile(path, options)

Other Hashing Functions

// Hash a byte slice
result := hash.Bytes(data, algorithm)

// Hash a string
result := hash.String(data, algorithm)

// Hash data from an io.Reader
result := hash.Reader(reader, algorithm)

// Verify a file against an expected hash
match, err := hash.Verify(path, expectedHash, options)

Partial Hashing

For large files, the module provides a partial hashing feature that samples portions of the file rather than reading the entire content:

  • The first N bytes
  • The middle N bytes
  • The last N bytes

These samples are combined to create a representative hash of the file. This approach is significantly faster for very large files while still providing reliable change detection.

// Enable partial hashing in options
options := hash.DefaultOptions()
options.UsePartialHashing = true
options.PartialHashingThreshold = 100 * 1024 * 1024 // 100MB

// Hash a file (will use partial hashing for files > 100MB)
result := hash.File(path, options)

Performance Considerations

  • Buffer Size: Larger buffers generally improve throughput but consume more memory
  • Concurrency: Adjust based on available CPU cores and I/O capabilities
  • Partial Hashing: Consider for very large files where full hashing would be prohibitively expensive
  • Algorithm Choice: BLAKE3 offers the best performance among the secure algorithms

Example Usage

// Basic file hashing with default options
result := hash.File("path/to/file.txt", hash.DefaultOptions())
if result.Error != nil {
    log.Fatalf("Failed to hash file: %v", result.Error)
}
fmt.Printf("Hash: %s\n", result.Hash)

// Concurrent hashing of multiple files
files := []string{"file1.txt", "file2.txt", "file3.txt"}
options := hash.DefaultOptions()
options.Algorithm = hash.SHA256
results := hash.FilesConcurrent(files, options)
for path, result := range results {
    if result.Error != nil {
        fmt.Printf("Failed to hash %s: %v\n", path, result.Error)
        continue
    }
    fmt.Printf("%s: %s\n", path, result.Hash)
}