In Gemini at least, if you look at how they process PDFs, they do an OCR and then feed the text + image to the model, without charging you for the text tokens (I believe).So my guess is that Claude’s backend is doing the same — so this hack is probably more of a loophole in token accounting that might get closed if Claude is doing what Gemini does

I tried the same thing last year (with openai models), back then it worked to reduce prompt tokens, but you needed way more completion tokens, ultimately more expensive (and slower)
<a href="https:&#x2F;&#x2F;pagewatch.ai&#x2F;blog&#x2F;post&#x2F;llm-text-as-image-tokens&#x2F;" rel="nofollow">https:&#x2F;&#x2F;pagewatch.ai&#x2F;blog&#x2F;post&#x2F;llm-text-as-image-tokens&#x2F;</a>

What, you don&#x27;t like your caveats to be honest?

Truly a picture is worth a thousand words.

Of course it isn&#x27;tA text encoding uses 8bits per character on average, tokenization further compresses thatAn image font would be 25 bits if 5x5, and most fonts are 12 pixels highOf course it isn&#x27;t efficient, this is a pricing inefficiency and a hack to exploit it (even the author describes it as an exploit)

It’s not a loophole, it just happens that encoding information as optical tokens is much more efficient than text.

Not really. They arent actually using more resources this way either. This might be a fundamental inefficiency thats being removedIt kinda makes sense too. Because while people do read code word by word, we often &quot;glance over&quot; it and do roughly pattern recognition on it to know what it does. Only homing in on something when we need to answer a specific question. I think humans kinda naturally do this exploit anyway

This seems like a pricing hack that burns resources, that when the loophole gets closed the price of OCR will have to rise?

there&#x27;s also a DeepSeek whitepaper on this technique <a href="https:&#x2F;&#x2F;www.seangoedecke.com&#x2F;text-tokens-as-image-tokens" rel="nofollow">https:&#x2F;&#x2F;www.seangoedecke.com&#x2F;text-tokens-as-image-tokens</a>

That is hilarious and an amazing find.

I want to see more text-free foundation models

60% Fable cost cut by converting code to images and having the model OCR it