```
We evaluated several precision pairings across our internal retrieval benchmark suite. Scores are NDCG@10 averaged across the suite, scaled to 0–100. NDCG@10 (Normalized Discounted Cumulative Gain at rank 10) measures how well the top 10 results are ordered against the ideal ranking, rewarding relevant documents more when they appear higher, with 100 being a perfect ranking. The full-precision baseline averages 90.26. Int8 query against binary documents averages 89.65, a 0.61 point drop, while reducing document-vector storage by 32x
```
Saying "Near lossless" to mean 90% accurate retrieval of saved vectors is simply a lie. Lossy-ness is binary, not something you can paper over with getting close enough. And 90% is not close. Sure, LLMs are all about gradient descent on noisy data sets so I guess this is acceptable in this field but that terminology usage still bothered me
show comments
elil17
I would love to see real examples of what reduced quality means in practice. Are you able to recover a document from the vector in a human readable format? If so, what sort of changes come up?
I could imagine a scenario where differences tend to be more substantive than you'd expect because of how less frequent words with fine distinctions in meaning - the very words that make the document special - may be embedded in the vector space.
show comments
purple-leafy
Hey breadislove; amazing article, I’ll be sending mixedbread an email in the morning that may interest you (email will be <5-characters>@pm.me)
I have also been working in compression and performance engineering, and managed to get a 99+% compression unlock versus conventional approaches (100+KB down to 1KB) in the scenario of 30 minute massive multiplayer game replays for a “game+engine” I’m developing
I think there’s a synergy between these 2 concepts I’d love to chat some more
show comments
derrickquinn
Asymmetry is clever. FWIW, this is very similar to the strategy employed by BitNet models (i.e., int8 activations with binary or ternary weights); I suspect retrieval is a little more amenable to this approach.
In principle, binary x binary should be pretty fast since it just requires bitwise XNOR and popcount/reduction, but in practice it's slow unless you've really optimized it. And, as stated in the article, you'd still be losing a lot of accuracy that way.
kaizenite
To people smarter than me, how impressive and/or revolutionary is this?
functionmouse
there is no such thing as "near lossless"
show comments
alfiedotwtf
If you squint hard enough, it sounds like their storage layer is a bloom filter
rq1
The Pi compression algorithm is better.
show comments
nathan_compton
" A single document produces more then one embedding, depending on the complexity of the document it can produce hundreds or thousands of vectors."
That typo up there is kind of endearing in the AI slop era.
``` We evaluated several precision pairings across our internal retrieval benchmark suite. Scores are NDCG@10 averaged across the suite, scaled to 0–100. NDCG@10 (Normalized Discounted Cumulative Gain at rank 10) measures how well the top 10 results are ordered against the ideal ranking, rewarding relevant documents more when they appear higher, with 100 being a perfect ranking. The full-precision baseline averages 90.26. Int8 query against binary documents averages 89.65, a 0.61 point drop, while reducing document-vector storage by 32x ```
Saying "Near lossless" to mean 90% accurate retrieval of saved vectors is simply a lie. Lossy-ness is binary, not something you can paper over with getting close enough. And 90% is not close. Sure, LLMs are all about gradient descent on noisy data sets so I guess this is acceptable in this field but that terminology usage still bothered me
I would love to see real examples of what reduced quality means in practice. Are you able to recover a document from the vector in a human readable format? If so, what sort of changes come up?
I could imagine a scenario where differences tend to be more substantive than you'd expect because of how less frequent words with fine distinctions in meaning - the very words that make the document special - may be embedded in the vector space.
Hey breadislove; amazing article, I’ll be sending mixedbread an email in the morning that may interest you (email will be <5-characters>@pm.me)
I have also been working in compression and performance engineering, and managed to get a 99+% compression unlock versus conventional approaches (100+KB down to 1KB) in the scenario of 30 minute massive multiplayer game replays for a “game+engine” I’m developing
I think there’s a synergy between these 2 concepts I’d love to chat some more
Asymmetry is clever. FWIW, this is very similar to the strategy employed by BitNet models (i.e., int8 activations with binary or ternary weights); I suspect retrieval is a little more amenable to this approach.
In principle, binary x binary should be pretty fast since it just requires bitwise XNOR and popcount/reduction, but in practice it's slow unless you've really optimized it. And, as stated in the article, you'd still be losing a lot of accuracy that way.
To people smarter than me, how impressive and/or revolutionary is this?
there is no such thing as "near lossless"
If you squint hard enough, it sounds like their storage layer is a bloom filter
The Pi compression algorithm is better.
" A single document produces more then one embedding, depending on the complexity of the document it can produce hundreds or thousands of vectors."
That typo up there is kind of endearing in the AI slop era.