Discussion
The Git Commands I Run Before Reading Any Code
gherkinnn: These are some helpful heuristics, thanks.This list is also one of many arguments for maintaining good Git discipline.
ramon156: > The 20 most-changed files in the last year. The file at the top is almost always the one people warn me about. “Oh yeah, that file. Everyone’s afraid to touch it.”The most changed file is the one people are afraid of touching?
pzmarzly: Jujutsu equivalents, if anyone is curious:What Changes the Most jj log --no-graph -r 'ancestors(trunk()) & committer_date(after:"1 year ago")' \ -T 'self.diff().files().map(|f| f.path() ++ "\n").join("")' \ | sort | uniq -c | sort -nr | head -20 Who Built This jj log --no-graph -r 'ancestors(trunk()) & ~merges()' \ -T 'self.author().name() ++ "\n"' \ | sort | uniq -c | sort -nr Where Do Bugs Cluster jj log --no-graph -r 'ancestors(trunk()) & description(regex:"(?i)fix|bug|broken")' \ -T 'self.diff().files().map(|f| f.path() ++ "\n").join("")' \ | sort | uniq -c | sort -nr | head -20 Is This Project Accelerating or Dying jj log --no-graph -r 'ancestors(trunk())' \ -T 'self.committer().timestamp().format("%Y-%m") ++ "\n"' \ | sort | uniq -c How Often Is the Team Firefighting jj log --no-graph \ -r 'ancestors(trunk()) & committer_date(after:"1 year ago") & description(regex:"(?i)revert|hotfix|emergency|rollback")' Much more verbose, closer to programming than shell scripting. But less flags to remember.
palata: To me, it makes jujutsu look like the Nix of VCSes.Not meaning to offend anyone: Nix is cool, but adds complexity. And as a disclaimer: I used jujutsu for a few months and went back to git. Mostly because git is wired in my fingers, and git is everywhere. Those examples of what jujutsu can do and not git sound nice, but in those few months I never remotely had a need for them, so it felt overkill for me.
JetSetIlly: Some nice ideas but the first example should include word boundaries in the regex.git log -i -E --grep="\b(fix|bug|broken)\b" --name-only --format='' | sort | uniq -c | sort -nr | head -20I have a project with a large package named "debugger". The presence of "bug" within "debugger" causes the original command to go crazy.
traceroute66: > The 20 most-changed files in the last year. The file at the top is almost always the one people warn me about.What a weird check and assumption.I mean, surely most of the "20 most-changed files" will be README and docs, plus language-specific lock-files etc. ?So if you're not accounting for those in your git/jj syntax you're going to end up with an awful lot of false-positive noise.
seba_dos1: > If the team squashes every PR into a single commit, this output reflects who merged, not who wrote.Squash-merge workflows are stupid (you lose information without gaining anything in return as it was easily filterable at retrieval anyway), but git stores the author and committer names separately, so it doesn't matter who merged, but rather whether the squashed patchset consisted of commits with multiple authors (and even then you could store it with Co-authored-by trailers).
dewey: I've just tried this, and the most touched files are also the most irrelevant or boring files (auto generated, entry-point of the service etc.) in my tests.
mememememememo: Yes. Because the fear is butressed with necessity. You have to edit the file, and so does everyone else and that is a recipe for a lot of mess. I can think back over years of files like this. Usually kilolines of impossible to reason about doeverything.
theshrike79: [delayed]
filcuk: Having the tree easy to filter doesn't matter if it returns hundreds of commits you have to sift through for no reason.
theshrike79: [delayed]
traceroute66: > Why would you touch the README file hundreds of times a year?You're right, perhaps I should have said CHANGELOG etc.
Aachen: If you use git commits like the save function of your editor and don't write messages intended for reading by anyone else, it makes sense to want to hide themFor other cases, you lose the information about why things are this way. It's too verbose to //comment on every like with how it came to be this way but on (non-rare in total, but rare per line) occasion it's useful to see what the change was that made the line be like this
mchaver: Definitely not in my experience. The most changed are the change logs, files with version numbers and readmes. I don't think anyone is afraid of keeping those up to date.
rbonvall: Just like that place that's so crowded nobody goes there anymore.
Jenk: Tbf you wouldn't use/switch to jj for (because of) those kind of commands, and are quite the outlier in the grand list of reasons to use jj. However the option to use the revset language in that manner is a high-ranking reason to use jj in my opinion.The most frequent "complex" command I use is to find commits in my name that are unsigned, and then sign them (this is owing to my workflow with agents that commit on my behalf but I'm not going to give agents my private key!) jj log -r 'mine() & ~signed()' # or if yolo mode... jj sign -r 'mine() & ~signed()' I hadn't even spared a moment to consider the git equivalent but I would humbly expect it to be quite obtuse.
palata: Actually, signing was one of the annoying parts of jujutsu for me: I sign with a security key, and the way jujutsu handled signing was very painful to me (I know it can be configured and I tried a few different ways, but it felt inherent to how jujutsu handles commits (revisions?)).
seba_dos1: These commits reaching the reviewer are a sign of either not knowing how to use git or not respecting their time. You clean things up and split into logical chunks when you get ready to push into a shared place.
yokoprime: Haha, good luck working with a team with more than 2 people. A good reviewer looks at the end-state and does not care about individual commits. If im curious about a specific change i just look at the blame.
seba_dos1: I have no troubles working on big FLOSS projects where reviews usually happen at the commit level :)
mattrighetti: I have a summary alias that kind of does similar things # summary: print a helpful summary of some typical metrics summary = "!f() { \ printf \"Summary of this branch...\n\"; \ printf \"%s\n\" $(git rev-parse --abbrev-ref HEAD); \ printf \"%s first commit timestamp\n\" $(git log --date-order --format=%cI | tail -1); \ printf \"%s latest commit timestamp\n\" $(git log -1 --date-order --format=%cI); \ printf \"%d commit count\n\" $(git rev-list --count HEAD); \ printf \"%d date count\n\" $(git log --format=oneline --format=\"%ad\" --date=format:\"%Y-%m-%d\" | awk '{a[$0]=1}END{for(i in a){n++;} print n}'); \ printf \"%d tag count\n\" $(git tag | wc -l); \ printf \"%d author count\n\" $(git log --format=oneline --format=\"%aE\" | awk '{a[$0]=1}END{for(i in a){n++;} print n}'); \ printf \"%d committer count\n\" $(git log --format=oneline --format=\"%cE\" | awk '{a[$0]=1}END{for(i in a){n++;} print n}'); \ printf \"%d local branch count\n\" $(git branch | grep -v \" -> \" | wc -l); \ printf \"%d remote branch count\n\" $(git branch -r | grep -v \" -> \" | wc -l); \ printf \"\nSummary of this directory...\n\"; \ printf \"%s\n\" $(pwd); \ printf \"%d file count via git ls-files\n\" $(git ls-files | wc -l); \ printf \"%d file count via find command\n\" $(find . | wc -l); \ printf \"%d disk usage\n\" $(du -s | awk '{print $1}'); \ printf \"\nMost-active authors, with commit count and %%...\n\"; git log-of-count-and-email | head -7; \ printf \"\nMost-active dates, with commit count and %%...\n\"; git log-of-count-and-day | head -7; \ printf \"\nMost-active files, with churn count\n\"; git churn | head -7; \ }; f"
duskdozer: Curious - why write it as a function in presumably .gitconfig and not just a git-summary script in your path? Just seems like a lot of extra escapes and quotes and stuff
raxxorraxor: Some readme files include changelogs. But aside from that I think this can still net some useful information. I like to look at the most recently changed files in a repo as well.
croemer: Rather than using an LLM to write fluffy paragraphs explaining what each command does and what it tells them, the author should have shown their output (truncated if necessary)
hhjinks: You review code not to verify the actual output of the code, but the code itself. For bugs, for maintainability. Commit hygiene is part of that.
arnorhs: The author is talking about the case where you have coherent commits, probably from multiple PRs/merges, that get merged into a main branch as a single commit.Yeah, I can imagine it being annoying that sqashing in that case wipes the author attribution, when not everybody is doing PRs against the main branch.However, calling all squash-merge workflows "stupid" without any nuance.. well that's "stupid" :)
duskdozer: I think the point is that if you have to squash, the PR-maker was already gitting wrong. They should have "squashed" on their end to one or more smaller, logically coherent commits, and then submitted that result.
faangguyindia: I can't remember all of this, does anyone know of any LLM model trained on CLI which can be run locally?
lamasery: If you copy those commands into a file and use that file to prompt the “sh” LLM.
jbjbjbjb: It’s easy enough to filter those out with grep. It still is relatively meaningless. If the team incrementally adds things then it’s just going to show what additions were made. It isn’t churn at all.
TonyStr: Looks nice. Unfortunately I don't have log-of-count-and-email, log-of-count-and-day or churn
petey283: +1 Same.Edit. https://github.com/mattrighetti/dotfiles/blob/master/.gitcon...
theshrike79: [delayed]
mcpherrinm: Squash merge is the only reasonable way to use GitHub:If you update a PR with review feedback, you shouldn’t change existing commits because GitHub’s tools for showing you what has changed since your last review assume you are pushing new commits.But then you don’t want those multiple commits addressing PR feedback to merge as they’re noise.So sure, there’s workflows with Git that doesn’t need squashing. But they’re incompatible with GitHub, which is at least where I keep my code today.Is it perfect? No. But neither is git, and I live in the world I am given.
jbjbjbjb: This command needs a warning. Using this command and drawing too many conclusions from it, especially if you’re new, will make you look stupid in front of your team mates.I ran this on the repo I have open and after I filtered out the non code files it really can only tell me which features we worked on in the last year. It says more about how we decided to split up the features into increments than anything to do with bugs and “churn”.
Pay08: Good thing that the article contains that warning, then.
tasuki: You gain the extra information by having reasonable commit messages rather than the ones you mentioned. To fix CI you force push.Can you explain to me what an avid squash-merger puts into the commit message of the squashed commit composed of commits "argh, let's see if this works", "crap, the CI is failing again, small fix to see if it works", and "pushing before leaving for vacation" ?
theshrike79: The squashed commit from the PR -> main will have a clean title + description that says what was added.Usually pretty close to what the PR title + description are actually, just without the videos and screenshots.Example:feat(ui): Add support for tagging users* Users can be tagged via the user page * User tags visible in search results (configurable)etc..I don't need to spend extra time cleaning up my git commits and force-pushing on the PR branch, losing context for code reviews etc. Nor does anyone have to see my shitty angry commits when I tried to figure out why Playwright tests ran on my machine and failed in the CI for 10 commits.
seba_dos1: > If someone uses git commits like the save function of their editorI use it like that too and yet the reviewers don't get to see these commits. Git has very powerful tools for manipulating the commit graph that many people just don't bother to learn. Imagine if I sent a patchset to the Linux Kernel Mailing List containing such "fix typo", "please work now", "wtf" patches - my shamelessness has its limits!
Aachen: Seems like a lot of extra effort (save, add, commit, come up with some message even if it's a prayer to work now) only to undo it again later and create a patch or alternate history out of the final version. Why bother with the intermediate commits if you're not planning for it to be part of the history?
markus_zhang: I also feel this reads like an AI slop, but at least I learned 5 commands. Not too bad.
user20251219: thank you - these are useful
niedbalski: Ages ago google wrote an algorithm to detect hotspots by using commit messages, https://github.com/niedbalski/python-bugspots
fusslo: our workflow makes git commits meaningless. Our 'bugfix'/'feature' are labelled as branches, making the only relevant commit the merge commit. The merge commit is automatically generated and only includes the author, branch name, and approvers list. So you have to traverse backwards to see any commit messages. Totally doable in git, EXCEPT since we have to put the 'relevant information' in the body of the PR (which doesn't make its way into the merge commit message...) nobody actually uses the commit messages. 'wip', 'latest'.. and the occasional JIRA task number were about as good as it gets.That is, until Claude. Now claude generates commit messages that are almost novellas. Nobody is reading them except the business people's Claude. It's Claude writing and reading messages to itself. The number of commits has also ballooned from ~4/PR to about 12/PR
niedbalski: Ages ago, google released an algorithm to identify hotspots in code by using commit messages. https://github.com/niedbalski/python-bugspots
pydry: I just tried it too and it basically just flagged a handful of 1500+ line files which probably ought to be broken up eventually but arent causing any serious problems.
tasuki: > A good reviewer looks at the end-state and does not care about individual commits.Then I must be a bad reviewer. In a past job, I had a colleague who meticulously crafted his commits - his PRs were a joy to review because I could go commit by commit in logical chunks, rather than wading through a single 3k line diff. I tried to do the same for him and hope I succeeded.
KptMarchewa: Split the PR rather than force me to wade through your commit history. Use graphite or something else that allows you to stack PRs.
KptMarchewa: In my case, it's .github/CODEOWNERS.Nobody is afraid of changing it.
gib444: Hah someone really looked at jq (?) and thought: "yes, more of this everywhere". I feel jq is like marmite
zaphirplane: What are examples of better ones. I don’t get the let me show the world my work and I’m not a fan of large PR
duskdozer: if you mean better messages, it's not really that. those junk messages should be rewritten and if the commits don't stand alone, merged together with rebase. it's the "logical chunks" the parent mentioned.it's hard to say fully, but unless a changeset is quite small or otherwise is basically 0% or 100%, there are usually smaller steps.like kind of contrived but say you have one function that uses a helper. if there's a bug in the function, and it turns out to fix that it makes a lot more sense to change the return type of the helper, you would make commit 1 to change the return type, then commit 2 fix the bug. would these be separate PRs? probably not to me but I guess it depends on your project workflow. keeping them in separate commits even if they're small lets you bisect more easily later on in case there was some unforseen or untested problem that was introduced, leading you to smaller chunks of code to check for the cause.
orsorna: If the code base is idempotent, I don't think showing commit history is helpful. It also makes rebases more complex than needed down the line. Thus I'd rather squash on merge.I've never considered how an engineer approaches a problem. As long as I can understand the fundamental change and it passes preflights/CI I don't care if it was scryed from a crystal ball.This does mean it is on the onus of the engineer to explain their change in natural language. In their own words of course.
jbjbjbjb: Not really strong enough in a post about what to do in a codebase you’re not familiar with. In that situation you’re probably new to the team and organisation and likely to get off on the wrong foot with people if you assume their code “hurts”.
fzaninotto: Instead of focusing on the top 20 files, you can map the entire codebase with data taken from git log using ArcheoloGit [1].[1]: https://github.com/marmelab/ArcheoloGit
One caveat: squash-merge workflows compress authorship. If the team squashes every PR into a single commit, this output reflects who merged, not who wrote. Worth asking about the merge strategy before drawing conclusions.
whstl: > One caveat: squash-merge workflows compress authorship. If the team squashes every PR into a single commit, this output reflects who merged, not who wrote. Worth asking about the merge strategy before drawing conclusions.In my experience, when the team doesn't squash, this will reflect the messiest members of the team.The top committer on the repository I maintain has 8x more commits than the second one. They were fired before I joined and nobody even remembers what they did. Git itself says: not much, just changing the same few files over and over.Of course if nobody is making a mess in their own commits, this is not an issue. But if they are, squash can be quite more truthful.