TurboQuant: Redefining AI efficiency with extreme compression

amitport

This is a great development for KV cache compression. I did notice a missing citation in the related works regarding the core mathematical mechanism, though. The foundational technique of applying a geometric rotation prior to extreme quantization, specifically for managing the high-dimensional geometry and enabling proper bias correction, was introduced in our NeurIPS 2021 paper, "DRIVE" (https://proceedings.neurips.cc/paper/2021/hash/0397758f8990c...). We used this exact rotational approach and a similar bias correction mechanism to achieve optimal distributed mean estimation. I also presented this work and subsequent papers in a private invited talk at Google shortly after publication. Given the strong theoretical overlap with the mechanisms in TurboQuant and PolarQuant, I hope to see this prior art acknowledged in the upcoming camera-ready versions.

benob

This is the worst lay-people explanation of an AI component I have seen in a long time. It doesn't even seem AI generated.

show comments

bluequbit

I did not understand what polarQuant is.

Is is something like pattern based compression where the algorithm finds repeating patterns and creates an index of those common symbols or numbers?

show comments

mskkm

Pied Piper vibes. As far as I can tell, this algorithm is hardly compatible with modern GPU architectures. My guess is that’s why the paper reports accuracy-vs-space, but conveniently avoids reporting inference wall-clock time. The baseline numbers also look seriously underreported. “several orders of magnitude” speedups for vector search? Really? anyone has actually reproduced these results?

lucrbvi

Sounds like Multi-Head Latent Attention (MLA) from DeepSeek

moktonar

Aren’t polar coordinates still n-1 + 1 for radius for n-dim vector? If so I understand that angles can be quantized better but when radius r is big the error is large for highly quantized angles right? What am I missing?

show comments

maurelius2

I'm somewhat at a loss here other than understanding the fundamentals. Can someone tell me how the compression impact performance?

show comments

aledevv

[dead]

veunes

[dead]

rsmtjohn

[dead]

mohsen1

[dead]

hikaru_ai

[dead]