> "fc is a lossless compressor for streams of IEEE-754 64-bit doubles."
The new OpenZL SDDL2 (Simple Data Description Language) supports several different floating-point types. It would be worthwhile to contribute some of the FC project's experience to OpenZL. Now the OpenZL supported types:
It splits the input into adaptively-sized blocks (quanta), runs a competition between many specialized codecs on each block, and emits the smallest result.
This is, for lack of a better term, a "metacompressor", but it will be interesting to see which of the choices end up dominating; in my past experiences with metacompression, one algorithm is usually consistently ahead.
show comments
loeg
The question is, how close can OpenLZ come? (This is from the same people who develop zstd, but suitable for structured data in a generic way.)
abcd_f
The most interesting section - How It Works - could really elaborate on details a bit more.
show comments
Scaevolus
I see you have ALP, but have you tried Chimp128 or Arrow's byte stream split?
KerrickStaley
Another library in this space is pcodec; I'd appreciate a comparison of the two.
enduku
I built "fc", a C library for compressing streams of 64-bit floating-point values without quantization.
It is not trying to replace zstd or lz4. The idea is narrower: take blocks of doubles, try a set of float-specific predictors/transforms/coders, and emit whichever representation is smallest for that block.
It is aimed at time-series, scientific, simulation, and analytics data where the numbers often have structure: smooth curves, repeated values, fixed increments, periodic signals, predictable deltas, or low-entropy mantissas.
The API is intentionally small: "fc_enc", "fc_dec", a config struct, and a few counters to inspect which modes won. Decode is parallel and meant to be fast; encode spends more CPU searching for a better representation.
Current caveats: x86-64 only for now, tuned for IEEE-754 doubles, research-grade rather than production-hardened.
> "fc is a lossless compressor for streams of IEEE-754 64-bit doubles."
The new OpenZL SDDL2 (Simple Data Description Language) supports several different floating-point types. It would be worthwhile to contribute some of the FC project's experience to OpenZL. Now the OpenZL supported types:
Some links:- https://github.com/facebook/openzl/releases/tag/v0.2.0
- https://openzl.org/getting-started/introduction/
- https://openzl.org/sddl/sddl2-announcement/
- https://openzl.org/sddl/core-concepts/
It splits the input into adaptively-sized blocks (quanta), runs a competition between many specialized codecs on each block, and emits the smallest result.
This is, for lack of a better term, a "metacompressor", but it will be interesting to see which of the choices end up dominating; in my past experiences with metacompression, one algorithm is usually consistently ahead.
The question is, how close can OpenLZ come? (This is from the same people who develop zstd, but suitable for structured data in a generic way.)
The most interesting section - How It Works - could really elaborate on details a bit more.
I see you have ALP, but have you tried Chimp128 or Arrow's byte stream split?
Another library in this space is pcodec; I'd appreciate a comparison of the two.
I built "fc", a C library for compressing streams of 64-bit floating-point values without quantization.
It is not trying to replace zstd or lz4. The idea is narrower: take blocks of doubles, try a set of float-specific predictors/transforms/coders, and emit whichever representation is smallest for that block.
It is aimed at time-series, scientific, simulation, and analytics data where the numbers often have structure: smooth curves, repeated values, fixed increments, periodic signals, predictable deltas, or low-entropy mantissas.
The API is intentionally small: "fc_enc", "fc_dec", a config struct, and a few counters to inspect which modes won. Decode is parallel and meant to be fast; encode spends more CPU searching for a better representation.
Current caveats: x86-64 only for now, tuned for IEEE-754 doubles, research-grade rather than production-hardened.
Repo: https://github.com/xtellect/fc