How Claude Code works in large codebases

177 points125 comments7 hours ago
jwilliams

> Claude Code navigates a codebase the way a software engineer would: it traverses the file system, reads files, uses grep to find exactly what it needs, and follows references across the codebase. It operates locally on the developer’s machine and doesn’t require a codebase index to be built, maintained, or uploaded to a server....

> Agentic search avoids those failure modes. There's no embedding pipeline or centralized index to maintain as thousands of engineers commit new code. Each developer's instance works from the live codebase.

The frame of "the way a software engineer would" and the conclusion seem at odds. I'd love to be schooled otherwise?

I use autocomplete/LSPs all the time and they're useful. That's an index? Why wouldn't Claude be able to use one? Also a "software engineer" remembers the codebase - that's definitely a RAG. I have a lot of muscle memory to find the file I need through an auto-completed CMD+P.

It doesn't need to particularly be real-time across thousands of engineers -- just the branch I'm on.

It's rare that I'd be navigating a codebase from first-principles traversal. It would usually be a new codebase and in those cases it's definitely not what I'd call an optimal experience.

show comments
eithed

I ask Claude to fix given test:

- runs the test what is failing | grep "x|failing" | tail 10

- runs the test again to get the why it's failing message | tail 10

- runs the test again because tail 10 cut off the message every fucking time.

I have a skill for it to not do that. Ignores the skill. It's maddening.

wg0

> How claud code works in large codebases?

Simple - It even eats up to 35% five hour usage limit in first prompt even on small projects and then there's 5 minutes time out for you to respond quickly or caches would go bust and you'll pay another 12% to 15% on the next prompt.

show comments
sinsudo

Just an anecdote: I was designing a project for LLMs onboarding and orchestration. Claude chose to read only the first 40 lines of each file. Later, in another session, looking for causes of low quality result, Claude detected the fault and changed the code to perform an AST analysis, so now the analyzer takes documentation lines and functions signature (input/output) as input.

Claude's initial approach was really poor. One has to wonder how many times Claude code has to be modified/reviewed for improvement, or whether it is possible at all to make good code with it.

Edited: Generalization: Claude can fix a localized, identifiable poor decision (e.g., "only reading first 40 lines") because the fault is discrete and traceable to one piece of code.

But real software quality problems often arise from many small, individually reasonable decisions that collectively produce bad outcomes. No single one is obviously "the fault." In that scenario, a tool that generates low-quality building blocks piecemeal may never converge on good code, because each piece seems fine in isolation.

show comments
jameson

Why can't Claude Code generate effective harness for us by inspecting the code base?

I tried defining CLAUDE.md (or AGENTS.md), skills, plugins, but I'm not getting the effectiveness others claim to be. LSP plugin for example, CC doesn't to use LSP's symbol renaming and edits file one by one slowly, or it does not invoke the skill when I explicitly ask to remember to invoke when prompt contains a specific clue.

Am I using it wrong? Is there a robust example I can copy the harness?

show comments
thinkindie

I don’t agree with the statement about indexing codebase: it works pretty well for IDEs like PHPstorm or other jetbrains IDEs

show comments
lebski88

> That also includes codebases running on languages that teams don't always associate with AI coding tools, such as C, C++, C#, Java, PHP.

What a strange comment for them to make. Why wouldn't I expect CC to work well with those languages? What languages would I associated it with? Python and Javascript?

Plywood1

Claude clearly wrote this. A lot of fluff, not much substance.

zihotki

I wonder if Anthropic tested their claims on a pro, 5x, 20x subscriptions. When you have infinite amount of free tokens it sure makes sense, you just throw tokens at the problem. But not in a limited usage scenarios it doesn't fly far..

belZaah

How very interesting. In an industry, where things shift around in months if not weeks, there’s been not only enough time for clear patterns to emerge but also these patterns have proven successful on large codebases. What’s the success criteria? Didn’t delete production database? Team velocity has increased? Codebase TTL has increased? Operations guys are happier?

show comments
ufish235

How important are Claude.MD files when they don’t even describe (with concrete terms) what should even go into each one?

show comments
prymitive

What I’m curious about is how well LLMs do when they create something from scratch, because so far my experience was with letting it fix issues or add features to existing codebase where I already shaped the general architecture and put in a lot of guardrails. But what if the architecture is unclear and there is nothing letting agent know if change breaks something or not? My only experience with tiny codebase where it did a lot of scaffolding was poor - it did what I asked for, not what I needed. If i did more of the thinking myself I would realise it’s a code that works but doesn’t solve the problem I’m after.

tex0

If the developer can have a local copy of the monorepo it's not a "large" codebase.

show comments
cdnsteve

Small plug for what I built:

You need a code dependency graph: https://github.com/roboticforce/remembrallmcp Ask "what breaks if I change this?"

Saves 98% token usage. Saves 95% tools being called.

Runs as an MCP server, works for 8 languages.

It just works, you need to try it.

martypitt

I don't have any LSP's hooked up to CC yet (going to fix that today), or particularly sophisticated CLAUDE.md files.

So, if I've read this post correctly, that means that CC is navigating my codebase today by sending lots of it up to a model, and building an understanding. Is that correct? Did I misunderstand it?

I kinda suspected there was more local inference going on somehow -- partly because the iteration times are fairly fast.

show comments
Tsarp

Wondering if enterprises have a modified version of CC that doesnt have to optimize to stop bleeding on fixed cost subscription plans.

The article really does not align with the current sentiment. Everyone with a choice has mostly moved on to codex (ofc in this world all it takes is a model update/harness update to turn things around).

CC is great at a lot of things, but repeatedly misses out reading on crucial parts of the code base, hallucinates on the work that was done and a bunch of other issues.

show comments
hbarka

Interesting that MCP was mentioned over CLI. For production or controlled environments, I would not make MCP the deployment path. I would let MCP help generate or choose commands, but have the actual deployment go through CLI scripts, Git commits, and CI/CD approval.

wood_spirit

I’m super interested to know what the back and forth between models and tools really looks like in practice.

Are there any much more detailed walkthroughs of how it works and how it decides the tools to use and the grep to use etc and what the conversations actually look like?

In the UI you see just enough to know it’s doing something but you don’t really see the jumps it’s making offscreen.

show comments
nilirl

So ... the better you explain the codebase to the LLM the better it explains it to you?

whh

A lot of words for not much. The harness taxonomy is fine, but anyone using Claude Code already knows CLAUDE.md exists.

show comments
svara

I use Claude Code quite a bit and quite enjoy it, so I'm a bit confused by how often it's mentioned that you should have CLAUDE.md.

I mean: If there was something you could add to the prompt to consistently increase performance why isn't it in the system prompt already?

If it's all about clarifying a couple of local idiosyncrasies, shouldn't it be able to quickly get them by looking through the repo?

Does anyone have an example of a CLAUDE.md that really makes a difference for them?

In general, this article would really have profited massively from examples of good applications of those patterns.

show comments
pouyaamreji

A long article about nothing, seems written by Claude itself.

ares623

Lots of concepts. Release the harness that made it possible to port Bun to Rust in 9 days. That's what everyone really wants. Then everyone can go "do that but for this other goal".

show comments
prodigycorp

This is really a zero information blog post. I want to know how they use the LSP to improve their understanding of the code base. Would be great if it was open source for us to review.

A post like this should be providing people with some reassurance about Claude's ability to understand code at a large scale. It's mostly fluff.

Edit: so I did some googling to dig around for thoughts on LSP performance and integration. the author of bun has a tweet about saying that they are a big drag on performance for no real gain and virtually all of the replies agree. Anyone else have any experience/thoughts?

https://xcancel.com/jarredsumner/status/2017704989540684176

show comments