FlashFS Documentation¶
FlashFS is a high-performance file system snapshot and diff tool designed for efficiently tracking and managing changes across file systems over time.
Overview¶
FlashFS allows you to:
- Create snapshots of your file system's state at a point in time
- Compare snapshots to identify exactly what has changed
- Generate and apply diffs between snapshots for efficient incremental backups
- Manage snapshot lifecycle with flexible expiry policies
- Query snapshots to find specific files or directories
FlashFS is optimized for performance, with features like parallel processing, efficient binary serialization, content-based deduplication, and intelligent caching.
Key Features¶
- Streaming Directory Walker: Process files as they're discovered for improved memory efficiency and responsiveness
- Multiple Hashing Algorithms: Support for BLAKE3, MD5, SHA1, and SHA256
- Partial Hashing: Efficiently hash large files by sampling portions of the content
- Concurrent Processing: Utilize multiple CPU cores for faster operations
- Progress Reporting: Real-time feedback during long-running operations
- Memory Efficiency: Designed to handle very large directory structures with minimal memory footprint
- Flexible Configuration: Extensive options for customizing behavior
Key Components¶
FlashFS consists of several modular components, each responsible for a specific aspect of the system:
Component | Description |
---|---|
CLI | Command-line interface for interacting with FlashFS |
Walker | Traverses file systems to collect metadata |
Serializer | Converts metadata to efficient binary format |
Schema | Defines data structures using FlatBuffers |
Storage | Manages storage and retrieval of snapshots and diffs |
Diff Computation | Computes and applies differences between snapshots |
Expiry Policy | Manages snapshot lifecycle and cleanup |
Hash | Provides efficient file hashing with multiple algorithms |
Buffer Pool | Optimizes memory usage for I/O operations |
Getting Started¶
Installation¶
go install github.com/TFMV/flashfs@latest
Basic Usage¶
Create a snapshot:
flashfs snapshot create --source /path/to/directory --output my-snapshot.snap
Compare two snapshots:
flashfs diff --base snapshot1.snap --target snapshot2.snap --output changes.diff
Apply a diff to generate a new snapshot:
flashfs apply --base snapshot1.snap --diff changes.diff --output snapshot2.snap
List snapshots:
flashfs snapshot list --dir /path/to/snapshots
Advanced Features¶
FlashFS includes advanced features like:
- Bloom filters for rapid change detection
- Content-based deduplication using BLAKE3 hashing
- Parallel processing for faster snapshot and diff operations
- Configurable compression using zstd
- Efficient caching for frequently accessed snapshots
- Query capabilities for finding specific files in snapshots
- Partial hashing for efficient processing of large files
- Memory-efficient buffer pools for optimized I/O operations
- Streaming serialization for handling very large snapshots
Performance¶
FlashFS is designed for high performance:
- Fast snapshot creation with parallel file system traversal
- Efficient binary serialization using FlatBuffers
- Minimal memory usage with streaming processing
- Quick diff computation using Bloom filters for pre-filtering
- Optimal compression with zstd for storage efficiency
- High-performance hashing with BLAKE3 and buffer pooling
- Memory-efficient I/O with reusable buffer pools
- Chunked streaming for processing very large datasets
Use Cases¶
- Backup systems: Create incremental backups by storing only changes
- File synchronization: Identify differences between systems efficiently
- Change monitoring: Track file system changes over time
- Deployment verification: Ensure consistency across deployed systems
- Data migration: Track changes during migration processes
Documentation¶
For detailed information on each component, please refer to the dedicated documentation pages linked in the Key Components section above.
For information on advanced features and capabilities:
- Diff Computation: Learn how FlashFS efficiently computes and applies differences between snapshots
- Expiry Policy: Understand how to manage snapshot lifecycle and cleanup
- Streaming Processing: Discover how FlashFS handles very large datasets with minimal memory footprint