Denormals happen to be the way that Zero can even be represented at all?

&gt; This is not the first time we can see Nvidia taking shortcuts to achieve maximum performance of their GPUsWhy is implementing it correctly not performant? For context I have no idea how rounding is typically implemented anyways.

&gt; CPU processing of denormals tends to be extremely slow - I vaguely recall running into something like a 10x slowdown a decade agoIntel CPU processing, where slowdowns can be as bad as couple hundred cycles. AMD CPUs penalize them much more mildly, usually single-digit cycles. (No idea about ARM.)

cpus that aren&#x27;t Intel are plenty fast on denormals. Intel is the only one where denormals are 100x slower. (and Intel has fixed that on their new cpus, but only on their e cores)

More like 100x, but not sure how true that is nowadays.

Another thing to keep in mind is that CPU processing of denormals tends to be extremely slow - I vaguely recall running into something like a 10x slowdown a decade ago.For a lot of applications the difference between a denormal and zero is small enough to be irrelevant, so if you expect near-zero values to be common, enabling a denormals-to-zero compiler flag might give you a pretty nice performance boost for free.

On the other hand, they (unexpectedly to the inventor, who intended them to be a debugging tool) underpin a few foundational results in correctly rounded computation, such as <a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Sterbenz_lemma" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Sterbenz_lemma</a>.

&gt; Even their inventor had trouble writing correct code in their presenceI didn&#x27;t know that. Could you provide a more specific reference?

Flush denormals to zero. Even their inventor had trouble writing correct code in their presence - see the Appendix to that &quot;what every programmer should know...&quot; paper

It&#x27;s one of several issues with the design of IEEE floats, unfortunately. I wish we could start thinking more seriously about a new design, to complement if not replace IEEE in the long term. Posits are an example <a href="https:&#x2F;&#x2F;github.com&#x2F;andrepd&#x2F;posit-rust" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;andrepd&#x2F;posit-rust</a>

Floor and Ceil versus Denormals on CPU and GPU