Serializer¶

The serializer package provides efficient serialization and deserialization of FlashFS data structures, particularly snapshots and diffs. It uses FlatBuffers as the underlying serialization format, which offers memory-efficient zero-copy deserialization.

Features¶

Memory Efficiency: Uses FlatBuffers for zero-copy deserialization
Builder Reuse: Optimizes memory usage by reusing FlatBuffer builders
Streaming Support: Handles large datasets that don't fit in memory
Progress Tracking: Provides chunk-based progress reporting
Parallel Processing: Supports concurrent processing of chunks

Core Components¶

Standard Serialization¶

The standard serialization functions handle snapshots and diffs that fit in memory:

SerializeSnapshotFB: Serializes a snapshot to a byte array
DeserializeSnapshotFB: Deserializes a snapshot from a byte array
SerializeDiffFB: Serializes a diff to a byte array
DeserializeDiffFB: Deserializes a diff from a byte array

Streaming Serialization¶

For large datasets, the streaming serialization components break data into manageable chunks:

StreamingSerializer: Serializes and deserializes snapshots in chunks
StreamingDiffSerializer: Serializes and deserializes diffs in chunks

Memory Optimization¶

Builder Reuse¶

The serializer supports reusing FlatBuffers builders to reduce memory allocations:

// Create a builder once
builder := flatbuffers.NewBuilder(0)

// Reuse it for multiple serializations
for _, snapshot := range snapshots {
    data, err := serializer.SerializeSnapshotFB(snapshot, builder)
    // Process data...
}

Builder Pool¶

When passing nil as the builder parameter, the serializer automatically uses a builder pool:

// This will get a builder from the pool and return it when done
data, err := serializer.SerializeSnapshotFB(entries, nil)

Streaming Serialization¶

Configuration¶

The streaming serializer can be configured with options:

options := serializer.StreamingOptions{
    ChunkSize:  5000,     // 5000 entries per chunk
    BufferSize: 128*1024, // 128KB buffer
}
serializer := serializer.NewStreamingSerializer(options)

Writing to Storage¶

// Serialize to a writer (file, network, etc.)
err := serializer.SerializeToWriter(entries, writer)

Reading from Storage¶

// Deserialize from a reader with a callback for each chunk
err := serializer.DeserializeFromReader(reader, func(chunk SnapshotChunk) error {
    // Process chunk.Entries
    // Track progress with chunk.Index and chunk.Total
    return nil
})

Progress Tracking¶

The streaming serializer provides progress information through chunk metadata:

err := serializer.DeserializeFromReader(reader, func(chunk SnapshotChunk) error {
    // Update progress
    progress := float64(chunk.Index) / float64(chunk.Total) * 100
    fmt.Printf("Progress: %.2f%% (Chunk %d/%d)\n", 
        progress, chunk.Index, chunk.Total)
    return nil
})

Parallel Processing¶

Chunks can be processed in parallel for improved performance:

// Create worker pool
var wg sync.WaitGroup
chunkChan := make(chan SnapshotChunk, 10)

// Start worker goroutines
for i := 0; i < runtime.NumCPU(); i++ {
    wg.Add(1)
    go func() {
        defer wg.Done()
        for chunk := range chunkChan {
            // Process chunk in parallel
            processChunk(chunk)
        }
    }()
}

// Feed chunks to workers
serializer.DeserializeFromReader(reader, func(chunk SnapshotChunk) error {
    chunkChan <- chunk
    return nil
})
close(chunkChan)

// Wait for all workers to finish
wg.Wait()

Performance Considerations¶

Builder Reuse: Always reuse builders for multiple serializations to reduce allocations.
Chunk Size: For streaming serialization, adjust the chunk size based on your memory constraints and processing needs. Larger chunks are more efficient but use more memory.
Buffer Size: The buffer size affects I/O performance. Larger buffers generally improve performance but use more memory.
Parallel Processing: For large datasets, consider processing chunks in parallel during deserialization.

Use Cases¶

Efficient Storage¶

The serializer is optimized for efficient storage of filesystem metadata:

Snapshots: Store the state of a filesystem at a point in time
Diffs: Store the changes between two snapshots

Network Transfer¶

The streaming serializer is ideal for transferring large datasets over a network:

Chunked Transfer: Send data in manageable chunks
Progress Reporting: Track transfer progress
Resumable Transfers: Potentially resume interrupted transfers

Large Dataset Processing¶

For very large filesystems, the streaming serializer enables processing that wouldn't fit in memory:

Incremental Processing: Process data as it becomes available
Parallel Processing: Process chunks concurrently
Memory Efficiency: Control memory usage with chunk size