Git Internals: What Happens When You Type git commit
A guided tour of the object model hiding under every commit

Most of us use Git the way we use a microwave: press the buttons, food gets warm, never think about the magnetron. git add, git commit, git push, repeat. That works right up until it doesn’t — a detached HEAD, a botched rebase, a “lost” commit — and suddenly the buttons stop making sense. The cure is understanding what Git actually does when you commit, because the underlying model is far simpler and more elegant than the porcelain commands suggest. Spend an afternoon with the plumbing and Git stops being magic.
1 Git is a content-addressed object store
Strip away the commands and Git is a tiny key-value database. It stores four kinds of objects, and the key for every object is the SHA-1 hash of its contents. (Git is migrating towards SHA-256, but the classic 40-character hex hashes you see everywhere are SHA-1, and the model is identical either way.) Because the key is the hash of the content, the store is “content-addressed”: identical content always produces the identical hash, so Git stores it once and dedup is automatic.
The four object types are:
- blob — the raw bytes of a file. Just contents, no filename, no permissions.
- tree — a directory listing: names, modes, and the hashes of the blobs and sub-trees it contains.
- commit — a snapshot pointer plus metadata: the top-level tree, parent commit(s), author, committer, and message.
- tag — an annotated tag object (the named, signed kind).
Everything you think of as “a commit” is just a commit object pointing at a tree, which points at blobs and more trees. That’s the whole data model.
2 The staging area is a real file
When you git add a file, Git does two concrete things. It writes the file’s contents as a blob into the object store, and it records that blob’s hash, path and mode in the index — a single binary file at .git/index. The index is the staging area; it’s not an abstraction, it’s a file you can inspect.
$ echo "hello" > greeting.txt
$ git add greeting.txt
# the index now lists the staged file and its blob hash
$ git ls-files --stage
100644 ce013625030ba8dba906f756967f9e9ca394464a 0 greeting.txt
# and that blob really exists in the store
$ git cat-file -p ce013625
helloThat hash, ce0136..., is the SHA-1 of the blob (a short header plus the bytes hello\n). Note Git stored the content under a hash, but nowhere in the blob is the name greeting.txt — the filename lives in the index now, and in a tree later.
3 What commit actually does
Now the main event. When you run git commit, Git performs a tidy sequence of steps, all of which you can do by hand with plumbing commands.
# 1. Turn the current index into a tree object.
$ git write-tree
3c4e9cd789d88d8d89c1073707c3585e41b0e614
# 2. Create a commit object pointing at that tree,
# with the current HEAD as its parent.
$ echo "Add a greeting" | git commit-tree 3c4e9cd -p HEAD
a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8a9b0
# 3. Move the current branch to point at the new commit.
$ git update-ref refs/heads/main a1b2c3d4That’s it. git commit is sugar over those three operations. write-tree snapshots the index into a tree object (creating sub-trees for directories as needed). commit-tree wraps that tree with a parent pointer, your author/committer details and a message, hashing the lot into a commit object. Then the branch reference is updated to point at the new commit. Inspect the result and the structure is laid bare:
$ git cat-file -p HEAD
tree 3c4e9cd789d88d8d89c1073707c3585e41b0e614
parent 9f8e7d6c5b4a3f2e1d0c9b8a7f6e5d4c3b2a1f0e
author Smarc <[email protected]> 1718700000 +0100
committer Smarc <[email protected]> 1718700000 +0100
Add a greetingA commit is a snapshot, not a diff. The “diffs” you see in git log -p are computed on demand by comparing a commit’s tree with its parent’s tree. Git stores whole snapshots and works out changes when asked — the opposite of how many people imagine it.
4 Branches and HEAD are just pointers
Here’s the realisation that demystifies most Git confusion: a branch is a 41-byte file containing a commit hash. Look inside .git:
$ cat .git/refs/heads/main
a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8a9b0
$ cat .git/HEAD
ref: refs/heads/mainHEAD normally points at a branch ref, which points at a commit. Creating a branch writes a new tiny file with the same hash — instant, regardless of repository size, because nothing is copied. A “detached HEAD” simply means .git/HEAD holds a commit hash directly instead of ref: refs/heads/.... And a commit you’ve “lost” after a bad reset usually still exists in the object store; it just has no ref pointing at it. git reflog records where HEAD has been, so you can find that orphaned hash and git checkout or git reset back to it. This is why git rarely truly loses your work — the objects linger until garbage collection, long after the pointers have moved on.
5 So is this worth learning?
If Git is just add, commit, push and it never goes wrong, you can happily skip all this. But the moment you hit a confusing rebase, an interactive history rewrite, a detached HEAD or a panic about lost commits, this model is the difference between flailing and fixing it in thirty seconds. Knowing that commits are immutable snapshots, branches are throwaway pointers, and nothing is gone until gc runs turns Git from an anxiety machine into a tool you trust. You don’t need to use the plumbing day to day — but knowing it’s there, and roughly what commit is doing on your behalf, is one of the highest-return afternoons a developer can spend.




