Much more verbose, closer to programming than shell scripting. But less flags to remember.
show comments
bsuvc
I love how the author thinks developers write commit messages.
All joking aside, it really is a chronic problem in the corporate world. Most codebases I encounter just have "changed stuff" or "hope this works now".
It's a small minority of developers (myself included) who consider the git commit log to be important enough to spend time writing something meaningful.
AI generated commit messages helps this a lot, if developers would actually use it (I hope they will).
show comments
joshstrange
I ran these commands on a number of codebases I work on and I have to say they paint a very different picture than the reality I know to be true.
> git shortlog -sn --no-merges
Is the most egregious. In one codebase there is a developer's name at the top of the list who outpaced the number 2 by almost 3x the number of commits. That developer no longer works at the company? Crisis? Nope, the opposite. The developer was a net-negative to the team in more ways than one, didn't understand the codebase very well at all, and just happened to commit every time they turned around for some reason.
show comments
ramon156
> The 20 most-changed files in the last year. The file at the top is almost always the one people warn me about. “Oh yeah, that file. Everyone’s afraid to touch it.”
The most changed file is the one people are afraid of touching?
show comments
mattrighetti
I have a summary alias that kind of does similar things
# summary: print a helpful summary of some typical metrics
summary = "!f() { \
printf \"Summary of this branch...\n\"; \
printf \"%s\n\" $(git rev-parse --abbrev-ref HEAD); \
printf \"%s first commit timestamp\n\" $(git log --date-order --format=%cI | tail -1); \
printf \"%s latest commit timestamp\n\" $(git log -1 --date-order --format=%cI); \
printf \"%d commit count\n\" $(git rev-list --count HEAD); \
printf \"%d date count\n\" $(git log --format=oneline --format=\"%ad\" --date=format:\"%Y-%m-%d\" | awk '{a[$0]=1}END{for(i in a){n++;} print n}'); \
printf \"%d tag count\n\" $(git tag | wc -l); \
printf \"%d author count\n\" $(git log --format=oneline --format=\"%aE\" | awk '{a[$0]=1}END{for(i in a){n++;} print n}'); \
printf \"%d committer count\n\" $(git log --format=oneline --format=\"%cE\" | awk '{a[$0]=1}END{for(i in a){n++;} print n}'); \
printf \"%d local branch count\n\" $(git branch | grep -v \" -> \" | wc -l); \
printf \"%d remote branch count\n\" $(git branch -r | grep -v \" -> \" | wc -l); \
printf \"\nSummary of this directory...\n\"; \
printf \"%s\n\" $(pwd); \
printf \"%d file count via git ls-files\n\" $(git ls-files | wc -l); \
printf \"%d file count via find command\n\" $(find . | wc -l); \
printf \"%d disk usage\n\" $(du -s | awk '{print $1}'); \
printf \"\nMost-active authors, with commit count and %%...\n\"; git log-of-count-and-email | head -7; \
printf \"\nMost-active dates, with commit count and %%...\n\"; git log-of-count-and-day | head -7; \
printf \"\nMost-active files, with churn count\n\"; git churn | head -7; \
}; f"
I have a project with a large package named "debugger". The presence of "bug" within "debugger" causes the original command to go crazy.
show comments
fmbb
> One caveat: squash-merge workflows compress authorship. If the team squashes every PR into a single commit, this output reflects who merged, not who wrote. Worth asking about the merge strategy before drawing conclusions.
Well isn't it typical that the person who wrote is also the person that merged? I have never worked in a place where that is not the norm for application code.
Even if you are one of those insane teams that do not squash merge because keeping everyone's spelling fixes and "try CI again" commits is important for some reason, you will still not see who _wrote_ the code, you will only see who committed the code. And if the person that wrote the code is not also the person that merges the code, I see no reason to trust that the person making commits is also the person writing the code.
whstl
> One caveat: squash-merge workflows compress authorship. If the team squashes every PR into a single commit, this output reflects who merged, not who wrote. Worth asking about the merge strategy before drawing conclusions.
In my experience, when the team doesn't squash, this will reflect the messiest members of the team.
The top committer on the repository I maintain has 8x more commits than the second one. They were fired before I joined and nobody even remembers what they did. Git itself says: not much, just changing the same few files over and over.
Of course if nobody is making a mess in their own commits, this is not an issue. But if they are, squash can be quite more truthful.
kelnos
I really wanted to like this. The author presents a well-thought-out rationale for what conclusions to draw, but I'm skeptical. Commit counts aren't a great signal: yes, the person with the highest night be the person who built it or knows the most about it, but that could also be the person who is sloppy with commits (when they don't squash), or someone who makes a lot of mistakes and has to go back and fix them.
The grep for bugs is not particularly comprehensive: it will pick up some things that aren't bugs, and will miss a bunch of things too.
The "project accelerating or dying" seems odd to me. By definition, the bulk of commits/changes will be at the very beginning of history. And regardless, "stability" doesn't mean "dying".
icedchai
I wouldn't trust "commit counts." The quality and content of a "commit" can vary widely between developers. I have one guy on my team who commits only working code that has been thoroughly tested locally, another guy who commits one line changes that often don't work, only to be followed by fixes, and more fixes. His "commits" have about 1/100th of the value of the first guy.
show comments
ivanjermakov
When at work we migrated to monorepo, there was an implicit decision to drop commit history. I was the loudest one to make everyone understand how important it is.
blenderob
> Is This Project Accelerating or Dying
>
> git log --format='%ad' --date=format:'%Y-%m' | sort | uniq -c
If the commit frequency goes down, does it really mean that the project is dying? Maybe it is just becoming stable?
show comments
croemer
Rather than using an LLM to write fluffy paragraphs explaining what each command does and what it tells them, the author should have shown their output (truncated if necessary)
show comments
moritzwarhier
Interesting ideas, but some to me seem very overgeneralizef, e.g.:
> Crisis patterns are easy to read. Either they’re there or they’re not.
I disagree with the last two quoted sentences, and also, they sound like an LLM.
show comments
jbethune
Saved. Very useful. Normally I just dig around the Github UI to see what I can glean from contributor graphs and issues but these git commands are a pretty elegant solution as well.
If you only want to build something, it only downloads what you need to build it. I've probably saved a few terabytes at this point!
bullen
Dying or stabilizing?
Most good projects end up solving a problem permanently and if there is no salary to protect with bogus new features it is then to be considered final?
fzaninotto
Instead of focusing on the top 20 files, you can map the entire codebase with data taken from git log using ArcheoloGit [1].
and it touches in detail what exactly commit standards should be, and even how to automate this on CI level.
And then I also have idea/vision how to connect commits to actual product/technical/infra specs, and how to make it all granular and maintainable, and also IDE support.
I would love to see any feedback on my efforts. If you decide to go through my entire 3 posts I wrote, thank you
pwr1
Solid list. I'd add git log --all --oneline --graph pretty early on — gives you a quick sense of how active different branches are and whether this is a "one person commits everything" project or actually distributed. Helped me a ton on a job where I inheritied a monolith with like 4 years of history.
The git blame tip is underrated. People treat it like a gotcha tool but its maybe the fastest way to find the PR/ticket that explains a weird decision.
seba_dos1
> If the team squashes every PR into a single commit, this output reflects who merged, not who wrote.
Squash-merge workflows are stupid (you lose information without gaining anything in return as it was easily filterable at retrieval anyway) and only useful as a workaround for people not knowing how to use git, but git stores the author and committer names separately, so it doesn't matter who merged, but rather whether the squashed patchset consisted of commits with multiple authors (and even then you could store it with Co-authored-by trailers, but that's harder to use in such oneliners).
show comments
Cthulhu_
For "what changes the most", in my project it's package.json / lock (because of automatic dependency updates) and translation / localization files; I'd argue that's pretty normal and healthy.
For the "bus factor", there's one guy and then there's me, but I stopped being a primary contributor to this project nearly two years ago, lol.
arthurjj
These were interesting but I don't know if they'd work on most or any of the places I've worked. Most places and teams I've worked at have 2-3 small repos per project. Are most places working with monorepos these days?
show comments
gherkinnn
These are some helpful heuristics, thanks.
This list is also one of many arguments for maintaining good Git discipline.
alaudet
This is good stuff. Why I never think of things like this is beyond me. Thanks
giancarlostoro
> One caveat: squash-merge workflows compress authorship. If the team squashes every PR into a single commit, this output reflects who merged, not who wrote. Worth asking about the merge strategy before drawing conclusions.
I abhor squash merging for this and a few other reasons. I literally have to go out of my way to re-check out a branch. Someone who wants to use my current branch cannot do so if I merge my changes a month later, because the squash rewrites history, and now git is very confused. I don't get the obsession with "cleaning up the history" as if we're all always constantly running out of storage over 2 more commits.
show comments
pscanf
I just finished¹ building an experimental tool that tries to figure out if a repo is slopware or not just by looking at it's git history (plus some GitHub activity data).
The takeaway from my experiment is that you can really tell a lot by how / when / what people commit, but conclusions are very hard to generalize.
For example, I've also stumbled upon the "merge vs squash" issue, where squashes compress and mostly hide big chunks of history, so drawing conclusions from a squashed commit is basically just wild guessing.
(The author of course has also flagged this. But I just wanted to add my voice: yeah, careful to generalize.)
These are actually fun to run. Just checked from work who makes most commits and found I have as many commits in past 2 years as 3 next people.
That probably isn’t a good sign
alkonaut
Trusting the messages to contain specific keywords seems optimistic. I don't think I used "emergency" or "hotfix" ever. "Revert" is some times automatically created by some tools (E.g. un-merging a PR).
Out of curiosity, I ran the 5 command on my project's public git tree. The only informative one was #4 ("Is This Project Accelerating or Dying") - it showed cliffs when significant pieces of logic were decoupled and moved to other repos.
guilhermeasper
These commands are very useful, but adapting them to the codebase makes a huge difference.
For most, I added some filters and slightly changed the regex, and it showed the reality of the codebase (I already knew the reality, I just wanted to see if it matched, and it did).
traceroute66
> The 20 most-changed files in the last year. The file at the top is almost always the one people warn me about.
What a weird check and assumption.
I mean, surely most of the "20 most-changed files" will be README and docs, plus language-specific lock-files etc. ?
So if you're not accounting for those in your git/jj syntax you're going to end up with an awful lot of false-positive noise.
show comments
yonatan8070
My team usually uses "Squash and merge" when we finish PRs, so I feel that would skew the results significantly as it hides 99% of the commit messages inside the long description of the single squashed merge commit.
md224
The last sentence of the article is "Here’s what the rest of the week looks like." and then it just stops. Am I missing something?
show comments
cratermoon
This is the premise of the excellent book Your Code as a Crime Scene. The history and structure of the codebase reveals a wealth of information.
This way I can see right away which branches are 'ahead' of the pack, what 'the pack' looks like, and what is up and coming for future reference ... in fact I use the 'gss' alias to find out whats going on, regularly, i.e. "git fetch --all && gss" - doing this regularly, and even historically logging it to a file on login, helps see activity in the repo without too much digging. I just watch the hashes.
therealdeal2020
superficial. If I have to unfuck the backend 10 times a week in our API adapter, then these commands will show me constantly changing the API adapter, although it's the backend team constantly fixing their own bugs
aidenn0
What's the subversion equivalents to these commands?
tom-blk
Nice! Will probably adopt this, seems to give a great overview!
drob518
Nice timing. I was just today needing some of the info that these commands surface. Serendipitous!
xyst
might be useful if there’s an established commit message formatting. But for a majority of Fortune 500 to small businesses that I have worked for this is not the case. Usually you see shit like this:
On main:
2020-01-01: "Changes"
2020-01-05: "Changes"
2020-01-06: "merge <ref to jira/gh issue>"
2020-01-07: "revert <ref to unrelated jira/gh issue from 2 yrs ago>"
Then there’s the people that include merge commits despite agreeing on rebasing.
Occasionally see sprinkles of decent, consistently formatted commit messages.
I think this is only useful on medium to large _open source_ projects. Clearly established CONTRIBUTING.md/README.md and commit formatting/merging guide.
heliumtera
So you value more rushed descriptions of changes than actual changes. Nice
jayd16
No searching the codebase/commits for "fuck" and shit"? That will give you an idea what what was put in under stressful circumstances like a late night during a crunch.
gpvos
Ah yes, good old
|sort |uniq -c |sort -nr |head -20
I use it often.
kittikitti
This is a great list of commands to quickly understand a repository. Thank you for sharing.
user20251219
thank you - these are useful
jlarocco
I'm so used to magit, it seems kind of primitive to pipe git output around like this.
Anyway, I can glean a lot of this information in a few minutes scrolling through and filtering the log in magit, and it doesn't require memorizing a bunch of command line arguments.
boxed
Just looking at how often a file changes without knowing how big the file is seems a bit silly. Surely it should be changes/line or something?
show comments
atlgator
Step 6: grep the thread count on the squash-merge debate to determine if the team has unresolved interpersonal conflict.
yieldcrv
blog posts are just comments that would have been torn apart if only posted on a forum, now masquerading as important universal edicts
stackedinserter
This should be renamed to "Git commands that I run as a new hire to get metrics I'll forget on day 2".
TacticalCoder
> The 20 most-changed files in the last year. The file at the top is almost always the one people warn me about. “Oh yeah, that file. Everyone’s afraid to touch it.”
I've got my Emacs set up to display next to every file that is versioned the number of commits that file has been modified in (for the curious: using a modified all-the-icons-ivy-rich + custom elisp code + custom Bash scripts I wrote and it's trickier than it seems to do in a way that doesn't slows everything down). For example in the menu to open a file or open a recently visited file etc.: basically in every file list, in addition to its size, owner, permissions, etc. I also add the number of commits if it's a versioned file.
I like the fix/bug/broken search in TFA to see where the bugs gather.
lpribis
I was curious what information I could glean from these for some popular repos. Caveat: I'm primarily an low-level embedded developer so I don't interface with large open source projects at the source level very often (other than occasionally the linux kernel). I chose some projects at random that I use.
*Mainline linux*
Most changed files: pretty much what I expected for 1 and 2... the "cutting edge" of Linux development over other OSes -- bpf and containers. The bpf verifier and AMD GPU driver might get a boost in this list due to sheer LoCs in those files (26K and 14K respectively). An intel equivalent of amdgpu_dm is #21 in the list (drivers/gpu/drm/i915/display/intel_display.c) and nvidia is nowhere to be seen (presumably due to out-of-tree modules/blobs?).
10399 Christoph Hellwig -> I only know his name because of drama last year regarding rust bindings to DMA subsystem
8481 Mauro Carvalho Chehab -> I also know his name from the classic "Mauro, shut the fuck up!" Linus rant
8413 Takashi Iwai -> Listed as maintainer for sound subsystem, I think he manages ALSA
8072 Al Viro -> His name is all over bunch of filesystem code
Buggy files: Intel comes out on top of GPU drivers this time (twice). Along with KVM for x86(64), the main allocator, and BTRFS.
Buggy files: DWARF debuginfo generation, x86 heuristics tables, RS6000(?!) heuristic tables. I had to look up RS6000, it's an IBM instruction set from the 90s lol. cp-tree.h is an interesting file, it seems be the main C(++) AST datastructures.
*xfwm4*
Most changed files: the list is dominated by *.po localizations. I filtered these out. Even after this, I discovered there is very little active development in the last few years. If I extend to 4 years ago, I get:
1. src/client.c - Realizing this project is too "small" to glean much from this. client.c is just the core X client management code. Makes sense.
2. src/placement.c - Other core window management code.
This has not told me much other than where most of the functionality of this project lies.
Bus factor: Pretty huge. Not really an issue in this case due to lack of development I guess.
Files with bug commits: Very similar distribution to most changed files. Not enough datapoints in this one to draw any big conclusions.
I think these massive open projects (excl xfwm) are generally pretty consistent code quality across the heavily trodden areas because of the amount of manpower available to refactor the pain points. I've yet to see an example of "god help you if you have to change that file" in e.g. linux, but I have of course seen that situation many times in large proprietary codebases.
Jujutsu equivalents, if anyone is curious:
What Changes the Most
Who Built This Where Do Bugs Cluster Is This Project Accelerating or Dying How Often Is the Team Firefighting Much more verbose, closer to programming than shell scripting. But less flags to remember.I love how the author thinks developers write commit messages.
All joking aside, it really is a chronic problem in the corporate world. Most codebases I encounter just have "changed stuff" or "hope this works now".
It's a small minority of developers (myself included) who consider the git commit log to be important enough to spend time writing something meaningful.
AI generated commit messages helps this a lot, if developers would actually use it (I hope they will).
I ran these commands on a number of codebases I work on and I have to say they paint a very different picture than the reality I know to be true.
> git shortlog -sn --no-merges
Is the most egregious. In one codebase there is a developer's name at the top of the list who outpaced the number 2 by almost 3x the number of commits. That developer no longer works at the company? Crisis? Nope, the opposite. The developer was a net-negative to the team in more ways than one, didn't understand the codebase very well at all, and just happened to commit every time they turned around for some reason.
> The 20 most-changed files in the last year. The file at the top is almost always the one people warn me about. “Oh yeah, that file. Everyone’s afraid to touch it.”
The most changed file is the one people are afraid of touching?
I have a summary alias that kind of does similar things
EDIT: props to https://github.com/GitAlias/gitaliasSome nice ideas but the regexes should include word boundaries. For example:
git log -i -E --grep="\b(fix|fixed|fixes|bug|broken)\b" --name-only --format='' | sort | uniq -c | sort -nr | head -20
I have a project with a large package named "debugger". The presence of "bug" within "debugger" causes the original command to go crazy.
> One caveat: squash-merge workflows compress authorship. If the team squashes every PR into a single commit, this output reflects who merged, not who wrote. Worth asking about the merge strategy before drawing conclusions.
Well isn't it typical that the person who wrote is also the person that merged? I have never worked in a place where that is not the norm for application code.
Even if you are one of those insane teams that do not squash merge because keeping everyone's spelling fixes and "try CI again" commits is important for some reason, you will still not see who _wrote_ the code, you will only see who committed the code. And if the person that wrote the code is not also the person that merges the code, I see no reason to trust that the person making commits is also the person writing the code.
> One caveat: squash-merge workflows compress authorship. If the team squashes every PR into a single commit, this output reflects who merged, not who wrote. Worth asking about the merge strategy before drawing conclusions.
In my experience, when the team doesn't squash, this will reflect the messiest members of the team.
The top committer on the repository I maintain has 8x more commits than the second one. They were fired before I joined and nobody even remembers what they did. Git itself says: not much, just changing the same few files over and over.
Of course if nobody is making a mess in their own commits, this is not an issue. But if they are, squash can be quite more truthful.
I really wanted to like this. The author presents a well-thought-out rationale for what conclusions to draw, but I'm skeptical. Commit counts aren't a great signal: yes, the person with the highest night be the person who built it or knows the most about it, but that could also be the person who is sloppy with commits (when they don't squash), or someone who makes a lot of mistakes and has to go back and fix them.
The grep for bugs is not particularly comprehensive: it will pick up some things that aren't bugs, and will miss a bunch of things too.
The "project accelerating or dying" seems odd to me. By definition, the bulk of commits/changes will be at the very beginning of history. And regardless, "stability" doesn't mean "dying".
I wouldn't trust "commit counts." The quality and content of a "commit" can vary widely between developers. I have one guy on my team who commits only working code that has been thoroughly tested locally, another guy who commits one line changes that often don't work, only to be followed by fixes, and more fixes. His "commits" have about 1/100th of the value of the first guy.
When at work we migrated to monorepo, there was an implicit decision to drop commit history. I was the loudest one to make everyone understand how important it is.
> Is This Project Accelerating or Dying > > git log --format='%ad' --date=format:'%Y-%m' | sort | uniq -c
If the commit frequency goes down, does it really mean that the project is dying? Maybe it is just becoming stable?
Rather than using an LLM to write fluffy paragraphs explaining what each command does and what it tells them, the author should have shown their output (truncated if necessary)
Interesting ideas, but some to me seem very overgeneralizef, e.g.:
> How Often Is the Team Firefighting
> git log --oneline --since="1 year ago" | grep -iE 'revert|hotfix|emergency|rollback
> Crisis patterns are easy to read. Either they’re there or they’re not.
I disagree with the last two quoted sentences, and also, they sound like an LLM.
Saved. Very useful. Normally I just dig around the Github UI to see what I can glean from contributor graphs and issues but these git commands are a pretty elegant solution as well.
Biggest life changer for me has been:
git clone --depth 1 --branch $SomeReleaseTag $SomeRepoURL
If you only want to build something, it only downloads what you need to build it. I've probably saved a few terabytes at this point!
Dying or stabilizing?
Most good projects end up solving a problem permanently and if there is no salary to protect with bogus new features it is then to be considered final?
Instead of focusing on the top 20 files, you can map the entire codebase with data taken from git log using ArcheoloGit [1].
[1]: https://github.com/marmelab/ArcheoloGit
To me all of these are symptoms of the problem that I outlined in my recent blog post: https://news.ycombinator.com/item?id=47606192
and it touches in detail what exactly commit standards should be, and even how to automate this on CI level.
And then I also have idea/vision how to connect commits to actual product/technical/infra specs, and how to make it all granular and maintainable, and also IDE support.
I would love to see any feedback on my efforts. If you decide to go through my entire 3 posts I wrote, thank you
Solid list. I'd add git log --all --oneline --graph pretty early on — gives you a quick sense of how active different branches are and whether this is a "one person commits everything" project or actually distributed. Helped me a ton on a job where I inheritied a monolith with like 4 years of history.
The git blame tip is underrated. People treat it like a gotcha tool but its maybe the fastest way to find the PR/ticket that explains a weird decision.
> If the team squashes every PR into a single commit, this output reflects who merged, not who wrote.
Squash-merge workflows are stupid (you lose information without gaining anything in return as it was easily filterable at retrieval anyway) and only useful as a workaround for people not knowing how to use git, but git stores the author and committer names separately, so it doesn't matter who merged, but rather whether the squashed patchset consisted of commits with multiple authors (and even then you could store it with Co-authored-by trailers, but that's harder to use in such oneliners).
For "what changes the most", in my project it's package.json / lock (because of automatic dependency updates) and translation / localization files; I'd argue that's pretty normal and healthy.
For the "bus factor", there's one guy and then there's me, but I stopped being a primary contributor to this project nearly two years ago, lol.
These were interesting but I don't know if they'd work on most or any of the places I've worked. Most places and teams I've worked at have 2-3 small repos per project. Are most places working with monorepos these days?
These are some helpful heuristics, thanks.
This list is also one of many arguments for maintaining good Git discipline.
This is good stuff. Why I never think of things like this is beyond me. Thanks
> One caveat: squash-merge workflows compress authorship. If the team squashes every PR into a single commit, this output reflects who merged, not who wrote. Worth asking about the merge strategy before drawing conclusions.
I abhor squash merging for this and a few other reasons. I literally have to go out of my way to re-check out a branch. Someone who wants to use my current branch cannot do so if I merge my changes a month later, because the squash rewrites history, and now git is very confused. I don't get the obsession with "cleaning up the history" as if we're all always constantly running out of storage over 2 more commits.
I just finished¹ building an experimental tool that tries to figure out if a repo is slopware or not just by looking at it's git history (plus some GitHub activity data).
The takeaway from my experiment is that you can really tell a lot by how / when / what people commit, but conclusions are very hard to generalize.
For example, I've also stumbled upon the "merge vs squash" issue, where squashes compress and mostly hide big chunks of history, so drawing conclusions from a squashed commit is basically just wild guessing.
(The author of course has also flagged this. But I just wanted to add my voice: yeah, careful to generalize.)
¹ Nothing is ever finished.
Ages ago, google released an algorithm to identify hotspots in code by using commit messages. https://github.com/niedbalski/python-bugspots
Ages ago google wrote an algorithm to detect hotspots by using commit messages, https://github.com/niedbalski/python-bugspots
These are actually fun to run. Just checked from work who makes most commits and found I have as many commits in past 2 years as 3 next people.
That probably isn’t a good sign
Trusting the messages to contain specific keywords seems optimistic. I don't think I used "emergency" or "hotfix" ever. "Revert" is some times automatically created by some tools (E.g. un-merging a PR).
Can't resist making it as a git command https://github.com/zdk/git-critique
Nice set of commands! I would suggest using --all flag with git log though - scans through all branches and not just the current one
For more insights on Git, check out https://github.com/nolasoft/okgit
Out of curiosity, I ran the 5 command on my project's public git tree. The only informative one was #4 ("Is This Project Accelerating or Dying") - it showed cliffs when significant pieces of logic were decoupled and moved to other repos.
These commands are very useful, but adapting them to the codebase makes a huge difference.
For most, I added some filters and slightly changed the regex, and it showed the reality of the codebase (I already knew the reality, I just wanted to see if it matched, and it did).
> The 20 most-changed files in the last year. The file at the top is almost always the one people warn me about.
What a weird check and assumption.
I mean, surely most of the "20 most-changed files" will be README and docs, plus language-specific lock-files etc. ?
So if you're not accounting for those in your git/jj syntax you're going to end up with an awful lot of false-positive noise.
My team usually uses "Squash and merge" when we finish PRs, so I feel that would skew the results significantly as it hides 99% of the commit messages inside the long description of the single squashed merge commit.
The last sentence of the article is "Here’s what the rest of the week looks like." and then it just stops. Am I missing something?
This is the premise of the excellent book Your Code as a Crime Scene. The history and structure of the codebase reveals a wealth of information.
I created a small TUI based on the article https://github.com/mikaoelitiana/git-audit
I put it into a gist :)
https://gist.github.com/aeimer/8edc0b25f3197c0986d3f2618f036...
Thanks. What a great Skill for my Claude
Before, I ask AI "is this project maintained" done.
i’ll try to use the in an hook and test them with Claude. Thank you !
Great tips, added to notes.txt for future use ..
Another one I do, is:
This way I can see right away which branches are 'ahead' of the pack, what 'the pack' looks like, and what is up and coming for future reference ... in fact I use the 'gss' alias to find out whats going on, regularly, i.e. "git fetch --all && gss" - doing this regularly, and even historically logging it to a file on login, helps see activity in the repo without too much digging. I just watch the hashes.superficial. If I have to unfuck the backend 10 times a week in our API adapter, then these commands will show me constantly changing the API adapter, although it's the backend team constantly fixing their own bugs
What's the subversion equivalents to these commands?
Nice! Will probably adopt this, seems to give a great overview!
Nice timing. I was just today needing some of the info that these commands surface. Serendipitous!
might be useful if there’s an established commit message formatting. But for a majority of Fortune 500 to small businesses that I have worked for this is not the case. Usually you see shit like this:
On main:
2020-01-01: "Changes"
2020-01-05: "Changes"
2020-01-06: "merge <ref to jira/gh issue>"
2020-01-07: "revert <ref to unrelated jira/gh issue from 2 yrs ago>"
Then there’s the people that include merge commits despite agreeing on rebasing.
Occasionally see sprinkles of decent, consistently formatted commit messages.
I think this is only useful on medium to large _open source_ projects. Clearly established CONTRIBUTING.md/README.md and commit formatting/merging guide.
So you value more rushed descriptions of changes than actual changes. Nice
No searching the codebase/commits for "fuck" and shit"? That will give you an idea what what was put in under stressful circumstances like a late night during a crunch.
Ah yes, good old
I use it often.This is a great list of commands to quickly understand a repository. Thank you for sharing.
thank you - these are useful
I'm so used to magit, it seems kind of primitive to pipe git output around like this.
Anyway, I can glean a lot of this information in a few minutes scrolling through and filtering the log in magit, and it doesn't require memorizing a bunch of command line arguments.
Just looking at how often a file changes without knowing how big the file is seems a bit silly. Surely it should be changes/line or something?
Step 6: grep the thread count on the squash-merge debate to determine if the team has unresolved interpersonal conflict.
blog posts are just comments that would have been torn apart if only posted on a forum, now masquerading as important universal edicts
This should be renamed to "Git commands that I run as a new hire to get metrics I'll forget on day 2".
> The 20 most-changed files in the last year. The file at the top is almost always the one people warn me about. “Oh yeah, that file. Everyone’s afraid to touch it.”
I've got my Emacs set up to display next to every file that is versioned the number of commits that file has been modified in (for the curious: using a modified all-the-icons-ivy-rich + custom elisp code + custom Bash scripts I wrote and it's trickier than it seems to do in a way that doesn't slows everything down). For example in the menu to open a file or open a recently visited file etc.: basically in every file list, in addition to its size, owner, permissions, etc. I also add the number of commits if it's a versioned file.
I like the fix/bug/broken search in TFA to see where the bugs gather.
I was curious what information I could glean from these for some popular repos. Caveat: I'm primarily an low-level embedded developer so I don't interface with large open source projects at the source level very often (other than occasionally the linux kernel). I chose some projects at random that I use.
*Mainline linux*
Most changed files: pretty much what I expected for 1 and 2... the "cutting edge" of Linux development over other OSes -- bpf and containers. The bpf verifier and AMD GPU driver might get a boost in this list due to sheer LoCs in those files (26K and 14K respectively). An intel equivalent of amdgpu_dm is #21 in the list (drivers/gpu/drm/i915/display/intel_display.c) and nvidia is nowhere to be seen (presumably due to out-of-tree modules/blobs?).
Bus factor: obviously none. The top 4 Buggy files: Intel comes out on top of GPU drivers this time (twice). Along with KVM for x86(64), the main allocator, and BTRFS. *GCC*Most changed files: IR autovectorization code, riscv heuristics tables, and C++ template handling (pt.c is "paramaterized types").
Buggy files: DWARF debuginfo generation, x86 heuristics tables, RS6000(?!) heuristic tables. I had to look up RS6000, it's an IBM instruction set from the 90s lol. cp-tree.h is an interesting file, it seems be the main C(++) AST datastructures. *xfwm4* Most changed files: the list is dominated by *.po localizations. I filtered these out. Even after this, I discovered there is very little active development in the last few years. If I extend to 4 years ago, I get: 1. src/client.c - Realizing this project is too "small" to glean much from this. client.c is just the core X client management code. Makes sense. 2. src/placement.c - Other core window management code.This has not told me much other than where most of the functionality of this project lies.
Bus factor: Pretty huge. Not really an issue in this case due to lack of development I guess.
Files with bug commits: Very similar distribution to most changed files. Not enough datapoints in this one to draw any big conclusions.I think these massive open projects (excl xfwm) are generally pretty consistent code quality across the heavily trodden areas because of the amount of manpower available to refactor the pain points. I've yet to see an example of "god help you if you have to change that file" in e.g. linux, but I have of course seen that situation many times in large proprietary codebases.
More AI slop.
Wtf is happening to this website
git commands I run before reading any code:
git rm -rf .