Serializer¶
The serializer package provides efficient serialization and deserialization of FlashFS data structures, particularly snapshots and diffs. It uses FlatBuffers as the underlying serialization format, which offers memory-efficient zero-copy deserialization.
Features¶
- Memory Efficiency: Uses FlatBuffers for zero-copy deserialization
- Builder Reuse: Optimizes memory usage by reusing FlatBuffer builders
- Streaming Support: Handles large datasets that don't fit in memory
- Progress Tracking: Provides chunk-based progress reporting
- Parallel Processing: Supports concurrent processing of chunks
Core Components¶
Standard Serialization¶
The standard serialization functions handle snapshots and diffs that fit in memory:
SerializeSnapshotFB
: Serializes a snapshot to a byte arrayDeserializeSnapshotFB
: Deserializes a snapshot from a byte arraySerializeDiffFB
: Serializes a diff to a byte arrayDeserializeDiffFB
: Deserializes a diff from a byte array
Streaming Serialization¶
For large datasets, the streaming serialization components break data into manageable chunks:
StreamingSerializer
: Serializes and deserializes snapshots in chunksStreamingDiffSerializer
: Serializes and deserializes diffs in chunks
Memory Optimization¶
Builder Reuse¶
The serializer supports reusing FlatBuffers builders to reduce memory allocations:
// Create a builder once
builder := flatbuffers.NewBuilder(0)
// Reuse it for multiple serializations
for _, snapshot := range snapshots {
data, err := serializer.SerializeSnapshotFB(snapshot, builder)
// Process data...
}
Builder Pool¶
When passing nil
as the builder parameter, the serializer automatically uses a builder pool:
// This will get a builder from the pool and return it when done
data, err := serializer.SerializeSnapshotFB(entries, nil)
Streaming Serialization¶
Configuration¶
The streaming serializer can be configured with options:
options := serializer.StreamingOptions{
ChunkSize: 5000, // 5000 entries per chunk
BufferSize: 128*1024, // 128KB buffer
}
serializer := serializer.NewStreamingSerializer(options)
Writing to Storage¶
// Serialize to a writer (file, network, etc.)
err := serializer.SerializeToWriter(entries, writer)
Reading from Storage¶
// Deserialize from a reader with a callback for each chunk
err := serializer.DeserializeFromReader(reader, func(chunk SnapshotChunk) error {
// Process chunk.Entries
// Track progress with chunk.Index and chunk.Total
return nil
})
Progress Tracking¶
The streaming serializer provides progress information through chunk metadata:
err := serializer.DeserializeFromReader(reader, func(chunk SnapshotChunk) error {
// Update progress
progress := float64(chunk.Index) / float64(chunk.Total) * 100
fmt.Printf("Progress: %.2f%% (Chunk %d/%d)\n",
progress, chunk.Index, chunk.Total)
return nil
})
Parallel Processing¶
Chunks can be processed in parallel for improved performance:
// Create worker pool
var wg sync.WaitGroup
chunkChan := make(chan SnapshotChunk, 10)
// Start worker goroutines
for i := 0; i < runtime.NumCPU(); i++ {
wg.Add(1)
go func() {
defer wg.Done()
for chunk := range chunkChan {
// Process chunk in parallel
processChunk(chunk)
}
}()
}
// Feed chunks to workers
serializer.DeserializeFromReader(reader, func(chunk SnapshotChunk) error {
chunkChan <- chunk
return nil
})
close(chunkChan)
// Wait for all workers to finish
wg.Wait()
Performance Considerations¶
-
Builder Reuse: Always reuse builders for multiple serializations to reduce allocations.
-
Chunk Size: For streaming serialization, adjust the chunk size based on your memory constraints and processing needs. Larger chunks are more efficient but use more memory.
-
Buffer Size: The buffer size affects I/O performance. Larger buffers generally improve performance but use more memory.
-
Parallel Processing: For large datasets, consider processing chunks in parallel during deserialization.
Use Cases¶
Efficient Storage¶
The serializer is optimized for efficient storage of filesystem metadata:
- Snapshots: Store the state of a filesystem at a point in time
- Diffs: Store the changes between two snapshots
Network Transfer¶
The streaming serializer is ideal for transferring large datasets over a network:
- Chunked Transfer: Send data in manageable chunks
- Progress Reporting: Track transfer progress
- Resumable Transfers: Potentially resume interrupted transfers
Large Dataset Processing¶
For very large filesystems, the streaming serializer enables processing that wouldn't fit in memory:
- Incremental Processing: Process data as it becomes available
- Parallel Processing: Process chunks concurrently
- Memory Efficiency: Control memory usage with chunk size