aabhay

In Gemini at least, if you look at how they process PDFs, they do an OCR and then feed the text + image to the model, without charging you for the text tokens (I believe).

So my guess is that Claude’s backend is doing the same — so this hack is probably more of a loophole in token accounting that might get closed if Claude is doing what Gemini does

lpellis

I tried the same thing last year (with openai models), back then it worked to reduce prompt tokens, but you needed way more completion tokens, ultimately more expensive (and slower) https://pagewatch.ai/blog/post/llm-text-as-image-tokens/

aabhay

Ahhh my eyes the vibe coded readme

show comments
genxy

This seems like a pricing hack that burns resources, that when the loophole gets closed the price of OCR will have to rise?

show comments
dimitropoulos

there's also a DeepSeek whitepaper on this technique https://www.seangoedecke.com/text-tokens-as-image-tokens

puppycodes

That is hilarious and an amazing find.

dippogriff

I want to see more text-free foundation models