There is <a href="https:&#x2F;&#x2F;ianbarber.blog&#x2F;feed" rel="nofollow">https:&#x2F;&#x2F;ianbarber.blog&#x2F;feed</a>

lol yeah I guess the best move right now is to fetch their &#x2F;feed and iterate through &lt;post&gt;s

- with all due respect, from a ux perspective, could you kindly add a page where i can see just the titles of all your blog posts- <a href="https:&#x2F;&#x2F;ianbarber.blog&#x2F;blogroll" rel="nofollow">https:&#x2F;&#x2F;ianbarber.blog&#x2F;blogroll</a>- <a href="https:&#x2F;&#x2F;ianbarber.blog&#x2F;archive" rel="nofollow">https:&#x2F;&#x2F;ianbarber.blog&#x2F;archive</a>- <a href="https:&#x2F;&#x2F;ianbarber.blog&#x2F;blog" rel="nofollow">https:&#x2F;&#x2F;ianbarber.blog&#x2F;blog</a>- <a href="https:&#x2F;&#x2F;ianbarber.blog&#x2F;posts" rel="nofollow">https:&#x2F;&#x2F;ianbarber.blog&#x2F;posts</a>- none of the above links work- i really dont want to scroll 200 pages just to see what your blog articles are

I got a very different message from this, actually much closer to the problem of incumbent advantage.The known-good thing has been heavily optimized for performance, making it much harder for new technologies to prove that they are better. This is similar to the problem of gas vs electric engines - we had a century of optimization and ecosystem development around gas engines, which creates an uphill battle for electric motors even though they are (eventually) superior on every way &#x2F;except&#x2F; having that massive ecosystem.The problem isn&#x27;t as bad here, because software is much more flexible than hardware, and scaling laws give a reasonable way to try things out at smaller scale before going whole hog.

I assume the choice of phrase &quot;bitter lesson&quot; is intentional irony (since the original concept is that you get better results by just scaling up and not trying to be clever with domain-specific knowledge)?

It&#x27;s the bitter-lesson to feature-engineering lifecycle.When a technique or technology is new people are making massive gains by just applying it to some use case, or gathering more data for training, or giving it more resources.As time goes on those &quot;bitter lesson&quot; gains start to hit the shallow part of the logistic curve and companies have to start investing more and more effort into engineering for each small, incremental gain.

Of course you can pass in your own state, but I always wondered about an LLM that has conversation context stay resident in GPU memory somehow.Or maybe this already effectively covered by context caching and the gains would be minimal (stateless, but if you pass in the same context or the same head context, it’s already in GPU memory and doesn’t need to be loaded?).

Maybe a charitable reading of the parent comment, but my interpretation of it was that while the _models_ are stateless, modern deployments of these models for inference rely on state.For example, tiered pricing for cached context relies on state, even if the models don’t.

This. lol. If you think state makes things easier you&#x27;re in for a big surprise.

If you think statefull LLMs would be easier to handle then stateless... Then I think you haven&#x27;t done a lot of software engineering

It costs tokens, so it helps the business model, so it’s not a bug but a feature.

That does not seem to be related to llms? It is more about the harness that utilizes them, right?

One thing that makes LLMs complicated in production is that they&#x27;re stateless — every call starts from zero. The complexity compounds when you need agents to maintain context across sessions and models. That&#x27;s a layer that&#x27;s largely missing from most stacks today.

indeed, there&#x27;s even a (pretty solid) custom server just for DS4 <a href="https:&#x2F;&#x2F;github.com&#x2F;antirez&#x2F;ds4" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;antirez&#x2F;ds4</a>-- works very well on high-RAM Macs

The author is correct, the model architecture is now much more complicated. You can see this if you use llama.cpp and follow the project. The earlier models were always fully implemented. Yet with more contributors, as of today tons of latest models only have partial implementation. DeepSeekv3.2 isn&#x27;t fully implemented, same with KimiK2.6, GLM5.2+, DeepSeekv4 has no implementation, MiniMaxM3 not supported yet, Hy3-preview no implementation. The latest models are just bare bones to run with lots of support missing for the advanced features.

&gt; Why didn&#x27;t this author compare Llama 3 with GLM 5.2 (released 1 week ago) which is a more standard attention based LLM? To compare 2 separate families of LLMs and then pointing out that they are different is not a surprising result and detracts from the point the author is trying to make.The entire point of the comparison is that LLMs look vastly different today than before. Comparing more similar LLMs would detract from the point I thought the author was trying to make.

Yeah, not a great apples-to-apples comparison.I think the point stands: MoE, a myriad of complex attention approaches, shared layers, you name it. And making it all work together well is a huge trial-and-error pain even for small models, never mind getting to efficient hardware utilization.

The source is the same in the original article too. He is using a different diagram from the same site on the right to justify his point on how much more complicated things have become.

&gt; If you look at it, the diagrams are very similar,The page links to the same site you do. No wonder it is similar -- the source is the same!

I am _very_ familiar with Claudish, and to some extent, the other AIs&#x27; writing styles. This article is human-written and features human writing quirks.The very first sentence&gt; Back in 2022 and 2023 there were two big branches of machine learning happening at Meta.is unmistakably human. That&#x27;s not how a LLM would phrase this sentence, and if it did, it would have put a comma after 2023.

[[citation needed]]I am a professional writer and have been for over 30 years. (I do not use any form of LLM ever.) This means I read a lot. This also means that I have 30+ years of experience of readers not understanding what I wrote, or not getting further than the title, or not getting the main message, or inverting it in their heads, or inserting their own message and then complaining when I diverge, and an endless list of Ways People Do Not Get It.I am also a trained TESOL teacher. Ability to capture gist is a skill we test for and measure, and many, maybe the majority, of native speakers don&#x27;t have it and don&#x27;t know.In recent years I constantly see people going &quot;this is written by AI&quot; and I have yet to see a single of of them able to coherently prove their point. It&#x27;s all just feelings and hunches.So I am calling you on this:How do you know? Show your working. Demonstrate your case.

Why didn&#x27;t this author compare Llama 3 with GLM 5.2 (released 1 week ago) which is a more standard attention based LLM? To compare 2 separate families of LLMs and then pointing out that they are different is not a surprising result and detracts from the point the author is trying to make.<a href="https:&#x2F;&#x2F;sebastianraschka.com&#x2F;llm-architecture-gallery&#x2F;?compare=llama-3-8b%2Cglm-5-2#architecture-diff-tool" rel="nofollow">https:&#x2F;&#x2F;sebastianraschka.com&#x2F;llm-architecture-gallery&#x2F;?compa...</a>If you look at it, the diagrams are very similar, but the main differences are that the feedforward is replaced with a MoE (router to multiple feedforwards) and the model has a different attention implementation.

LLMs Are Complicated Now