Serializer

The serializer package provides efficient serialization and deserialization of FlashFS data structures, particularly snapshots and diffs. It uses FlatBuffers as the underlying serialization format, which offers memory-efficient zero-copy deserialization.

Features

  • Memory Efficiency: Uses FlatBuffers for zero-copy deserialization
  • Builder Reuse: Optimizes memory usage by reusing FlatBuffer builders
  • Streaming Support: Handles large datasets that don't fit in memory
  • Progress Tracking: Provides chunk-based progress reporting
  • Parallel Processing: Supports concurrent processing of chunks

Core Components

Standard Serialization

The standard serialization functions handle snapshots and diffs that fit in memory:

  • SerializeSnapshotFB: Serializes a snapshot to a byte array
  • DeserializeSnapshotFB: Deserializes a snapshot from a byte array
  • SerializeDiffFB: Serializes a diff to a byte array
  • DeserializeDiffFB: Deserializes a diff from a byte array

Streaming Serialization

For large datasets, the streaming serialization components break data into manageable chunks:

  • StreamingSerializer: Serializes and deserializes snapshots in chunks
  • StreamingDiffSerializer: Serializes and deserializes diffs in chunks

Memory Optimization

Builder Reuse

The serializer supports reusing FlatBuffers builders to reduce memory allocations:

// Create a builder once
builder := flatbuffers.NewBuilder(0)

// Reuse it for multiple serializations
for _, snapshot := range snapshots {
    data, err := serializer.SerializeSnapshotFB(snapshot, builder)
    // Process data...
}

Builder Pool

When passing nil as the builder parameter, the serializer automatically uses a builder pool:

// This will get a builder from the pool and return it when done
data, err := serializer.SerializeSnapshotFB(entries, nil)

Streaming Serialization

Configuration

The streaming serializer can be configured with options:

options := serializer.StreamingOptions{
    ChunkSize:  5000,     // 5000 entries per chunk
    BufferSize: 128*1024, // 128KB buffer
}
serializer := serializer.NewStreamingSerializer(options)

Writing to Storage

// Serialize to a writer (file, network, etc.)
err := serializer.SerializeToWriter(entries, writer)

Reading from Storage

// Deserialize from a reader with a callback for each chunk
err := serializer.DeserializeFromReader(reader, func(chunk SnapshotChunk) error {
    // Process chunk.Entries
    // Track progress with chunk.Index and chunk.Total
    return nil
})

Progress Tracking

The streaming serializer provides progress information through chunk metadata:

err := serializer.DeserializeFromReader(reader, func(chunk SnapshotChunk) error {
    // Update progress
    progress := float64(chunk.Index) / float64(chunk.Total) * 100
    fmt.Printf("Progress: %.2f%% (Chunk %d/%d)\n", 
        progress, chunk.Index, chunk.Total)
    return nil
})

Parallel Processing

Chunks can be processed in parallel for improved performance:

// Create worker pool
var wg sync.WaitGroup
chunkChan := make(chan SnapshotChunk, 10)

// Start worker goroutines
for i := 0; i < runtime.NumCPU(); i++ {
    wg.Add(1)
    go func() {
        defer wg.Done()
        for chunk := range chunkChan {
            // Process chunk in parallel
            processChunk(chunk)
        }
    }()
}

// Feed chunks to workers
serializer.DeserializeFromReader(reader, func(chunk SnapshotChunk) error {
    chunkChan <- chunk
    return nil
})
close(chunkChan)

// Wait for all workers to finish
wg.Wait()

Performance Considerations

  1. Builder Reuse: Always reuse builders for multiple serializations to reduce allocations.

  2. Chunk Size: For streaming serialization, adjust the chunk size based on your memory constraints and processing needs. Larger chunks are more efficient but use more memory.

  3. Buffer Size: The buffer size affects I/O performance. Larger buffers generally improve performance but use more memory.

  4. Parallel Processing: For large datasets, consider processing chunks in parallel during deserialization.

Use Cases

Efficient Storage

The serializer is optimized for efficient storage of filesystem metadata:

  • Snapshots: Store the state of a filesystem at a point in time
  • Diffs: Store the changes between two snapshots

Network Transfer

The streaming serializer is ideal for transferring large datasets over a network:

  • Chunked Transfer: Send data in manageable chunks
  • Progress Reporting: Track transfer progress
  • Resumable Transfers: Potentially resume interrupted transfers

Large Dataset Processing

For very large filesystems, the streaming serializer enables processing that wouldn't fit in memory:

  • Incremental Processing: Process data as it becomes available
  • Parallel Processing: Process chunks concurrently
  • Memory Efficiency: Control memory usage with chunk size