2 posts tagged

git

What are git commits?

There seems to be a confusion about how git commits are stored internally. When we use git it might look like commits are stored as a difference between versions of files. Most commands like git diff and git show display us the information as diffs, and never as the entire files. And it sounds like it makes sense to store only the changes too: we have large code bases but only make small changes at a time.

However, this awesome Pro Git Book states that all commits are snapshots of your current project. What that means is when you do `git commit` it doesn't just store the difference between the last and new commits, it stores the entire files!

One of my projects takes 20MB and it has 1066 commits. But running du -ch .git I can see that the entire git history takes only 34MB! How is that possible? Every commit is supposed to store the entire project, why does the entire commit history only take 70% more storage than the project itself?

Turns out the answer is git gc. Git automatically compresses older commits by storing just diffs. It saves a lot of space in exchange of decreased performance (e.g. when checking out old commits). It is a good trade-off considering most of the time older commits just sit there for redundancy without ever being used. However, if you need to, you can turn the compression off by running

git config --global gc.auto 0

Why should I care?
Understanding how git stores objects internally can help you make more educated decisions about how to organize your workflow. For example, binary files (images, library files, executables, etc) can't take advantage of diffs, so they are always stored as the entire file. Knowing all that, you would know downsides of storing frequently-updated binary files in git:

  • • dramatic increase in repository size,
  • • decrease in performance.
2018   git

Git bisect

Have you ever had a situation when you encounter a bug and you remember for sure that in the previous release (or just some time ago) it was working properly? Let's say you know that 800 commits ago that bug was not there and you can't find why it's happening, what are you going to do?

Turns out git has a very smart and simple feature called git bisect. The docs explain the usage very well, but here's a small example:


$ git bisect start
$ git bisect bad          # mark current commit as bad
$ git bisect good 91dd441 # mark commit 91dd441 as good

Then git outputs


Bisecting: 800 revisions left to test after this (roughly 10 steps)
[a1ea503523ae328eff7b6ca00e54316ec7665c3e] Commit name

and your journey begins! It shows that you will only need to test ~10 revisions to find the one where the bug was introduced, and the first one to check is the one being displayed on the second line. At this point git checked out your repository at this commit and all you need to do is recompile (or whatever you need to do to get the current code running) and see if the bug is still there.

If it's broken, you type git bisect bad, if not, you type git bisect good. Either way git will show you your current progress and the next commit to test:


Bisecting: 400 revisions left to test after this (roughly 9 steps)
[9a5caf8c0ec4b96f99f63186cc4fd50fda0aa242] Another commit name

Repeat 9 more times and git will tell you exactly in which commit the bug was introduced! If you follow good practices and keep commits short and compilable, it will now be much easier to fix the bug you were looking for.

Make sure to check out this beautiful Pro Git docs to learn more.

2018   git