FlashFS Documentation

FlashFS is a high-performance file system snapshot and diff tool designed for efficiently tracking and managing changes across file systems over time.

Overview

FlashFS allows you to:

  • Create snapshots of your file system's state at a point in time
  • Compare snapshots to identify exactly what has changed
  • Generate and apply diffs between snapshots for efficient incremental backups
  • Manage snapshot lifecycle with flexible expiry policies
  • Query snapshots to find specific files or directories

FlashFS is optimized for performance, with features like parallel processing, efficient binary serialization, content-based deduplication, and intelligent caching.

Key Features

  • Streaming Directory Walker: Process files as they're discovered for improved memory efficiency and responsiveness
  • Multiple Hashing Algorithms: Support for BLAKE3, MD5, SHA1, and SHA256
  • Partial Hashing: Efficiently hash large files by sampling portions of the content
  • Concurrent Processing: Utilize multiple CPU cores for faster operations
  • Progress Reporting: Real-time feedback during long-running operations
  • Memory Efficiency: Designed to handle very large directory structures with minimal memory footprint
  • Flexible Configuration: Extensive options for customizing behavior

Key Components

FlashFS consists of several modular components, each responsible for a specific aspect of the system:

Component Description
CLI Command-line interface for interacting with FlashFS
Walker Traverses file systems to collect metadata
Serializer Converts metadata to efficient binary format
Schema Defines data structures using FlatBuffers
Storage Manages storage and retrieval of snapshots and diffs
Diff Computation Computes and applies differences between snapshots
Expiry Policy Manages snapshot lifecycle and cleanup
Hash Provides efficient file hashing with multiple algorithms
Buffer Pool Optimizes memory usage for I/O operations

Getting Started

Installation

go install github.com/TFMV/flashfs@latest

Basic Usage

Create a snapshot:

flashfs snapshot create --source /path/to/directory --output my-snapshot.snap

Compare two snapshots:

flashfs diff --base snapshot1.snap --target snapshot2.snap --output changes.diff

Apply a diff to generate a new snapshot:

flashfs apply --base snapshot1.snap --diff changes.diff --output snapshot2.snap

List snapshots:

flashfs snapshot list --dir /path/to/snapshots

Advanced Features

FlashFS includes advanced features like:

  • Bloom filters for rapid change detection
  • Content-based deduplication using BLAKE3 hashing
  • Parallel processing for faster snapshot and diff operations
  • Configurable compression using zstd
  • Efficient caching for frequently accessed snapshots
  • Query capabilities for finding specific files in snapshots
  • Partial hashing for efficient processing of large files
  • Memory-efficient buffer pools for optimized I/O operations
  • Streaming serialization for handling very large snapshots

Performance

FlashFS is designed for high performance:

  • Fast snapshot creation with parallel file system traversal
  • Efficient binary serialization using FlatBuffers
  • Minimal memory usage with streaming processing
  • Quick diff computation using Bloom filters for pre-filtering
  • Optimal compression with zstd for storage efficiency
  • High-performance hashing with BLAKE3 and buffer pooling
  • Memory-efficient I/O with reusable buffer pools
  • Chunked streaming for processing very large datasets

Use Cases

  • Backup systems: Create incremental backups by storing only changes
  • File synchronization: Identify differences between systems efficiently
  • Change monitoring: Track file system changes over time
  • Deployment verification: Ensure consistency across deployed systems
  • Data migration: Track changes during migration processes

Documentation

For detailed information on each component, please refer to the dedicated documentation pages linked in the Key Components section above.

For information on advanced features and capabilities:

  • Diff Computation: Learn how FlashFS efficiently computes and applies differences between snapshots
  • Expiry Policy: Understand how to manage snapshot lifecycle and cleanup
  • Streaming Processing: Discover how FlashFS handles very large datasets with minimal memory footprint