Git Squash and Rebase
As the size of the codebase increases...
more and more developers start collaborating and the entire branching in the project becomes one big, complex pile of commits flying in and out from branches and suddenly, understanding of the significance of a group of commits from a branch or by a developer starts to disappear.
This is where the squash and rebase features shine and help the version control become a lot easier to develop and follow
The basics
Out of the gate, the goal of both merging and rebasing is to take commits from a feature branch and put them onto another branch. Let’s start with how a quote-on-quote “normal” merge makes that happen.
Merging
Say I have a graph that looks like this. As you can see, I split off my feature branch at commit 2, and have done a bit of work.
- If I run a merge, git will stuff all of my changes from my feature branch into one large merge commit that contains ALL of my feature branch changes.
- It will then place this special merge commit onto master. When this happens, the tree will show your feature branch, as well as the master branch. Going further, if you imagine working on a team with other developers, your git tree can become complex: displaying everybody else’s branches and merges.
Rebasing
- Now let’s take a look at how rebase would handle this same situation. Instead of doing a git merge, I’ll do a git rebase.
- What rebase will do is take all of the commits on your feature branch and move them on top of the master commits. Behind the scenes, git is actually blowing away the feature branch commits and duplicating them as new commits on top of the master branch (remember, under the hood, commit objects are immutable and immovable). What you get with this approach is a nice clean tree with all your commits laid out nicely in a row, like a timeline.
Rebasing caveats
At this point, I think I better mention some caveats.
- Rebase doesn’t play super well with open-source projects and pull requests since it can be hard to trace, especially small changes that are introduced to a codebase. This point is a bit nuanced, but here is an article that does a good job of explaining why.
- It can also be dangerous if you’re working on a shared branch with other developers because of how Git rewrites commits when rebasing; however, in the workflow example below, I’ll show you how to mitigate this risk.
In practice: the actual commands
On the development team I work with, we’ve successfully adopted the workflow I’m about to show you and it works well for us.
When I start development I always make sure the code on my local machine is synced to the latest commit from remote master
# With my local master branch checked out
git pull
Next, I’ll check out a new branch so I can write and commit code to this branch – keeping my work separated from the master branch
git checkout -b my_cool_feature
As I’m developing my feature, I’ll make a few commits…
git add .
git commit -m 'This is a new commit, yay!'
Note: while I’m developing it’s likely that my fellow developers will have shipped some of their own changes to remote master. That’s ok, we can deal with that later.
Now that I’m done developing my feature, I want to merge my changes back into remote master. To begin this process I’ll switch back to local master branch and pull the latest changes. This ensures my local machine has any new commits submitted by my teammates.
git checkout master
git pull
What I want to do now is make sure my feature will jive with any new changes from remote master. To do this, I’ll checkout my feature branch and rebase against my local master. This will re-anchor my branch against the latest changes I just pulled from remote master. Additionally at this point, Git will let me know if I have any conflicts and I can take care of them on my branch
git checkout my_cool_feature
git rebase master
Now that my feature branch doesn’t have any conflicts, I can switch back to my master branch and place my changes onto master.
git checkout master
git rebase my_cool_feature
Since I synced with remote master before doing the rebase, I should be able to push my changes up to remote master without issues.
git push
Rebase vs. Merge
ANALOGY - Writing a book
-
Merging - Publishing the original draft of the book indicating the thought workflow from start to finish in its raw form
-
Rebasing - Publishing a more concise, edited and reviewed draft of this book in a way that is best for future readers.
- Merging justification - Repository’s commit history is a record of what actually happened. It’s a historical document, valuable in its own right, and shouldn’t be tampered with. From this angle, changing the commit history is almost blasphemous; you’re lying about what actually transpired. So what if there was a messy series of merge commits? That’s how it happened, and the repository should preserve that for posterity.
- Rebasing justification -The opposing point of view is that the commit history is the story of how your project was made. You wouldn’t publish the first draft of a book, so why show your messy work? When you’re working on a project, you may need a record of all your missteps and dead-end paths, but when it’s time to show your work to the world, you may want to tell a more coherent story of how to get from A to B. to tell the story in the way that’s best for future readers.
Squash - the basics
Say I have a graph that looks like this. You can see I split off some commits onto a bug fix branch:
- I’ll need to get my commits back onto mainline, and I can use merge or rebase to do that; however, with either solution, all my local branch commits would be preserved.
- Usually, preserving commits like this is a good idea, but let’s say you made a typo somewhere and had to create another commit to fix the spelling error. Or you have a bunch of local commits related to a bug fix, but you’d really rather just have all of those related commits under one roof. Enter squashing.
Squashing allows you to rewrite history and combine together commits.
In practice: the actual commands
Now that you know what squash is, let’s take a look the actual commands. Again, we’ll say my starting point is my bug fix branch with 3 commits.
It would be nice if I didn’t have to preserve these extraneous commits as separate entities since they are all related to a bug fix. I’d rather combine them together into one clean commit.
With my bug fix branch checked out, I’ll start by running the interactive rebase command with HEAD~3. This lets Git know I want to operate on the last three commits back from HEAD.
git rebase -i HEAD~3
Git will open up your default terminal text editor (most likely vim) and present you with a list of commits:
pick 7f9d4bf Accessibility fix for frontpage bug
pick 3f8e810 Updated screenreader attributes
pick ec48d74 Added comments & updated README
# Rebase 4095f73..ec48d74 onto 4095f73 (3 commands)
#
# Commands:
# p, pick <commit> = use commit
# r, reword <commit> = use commit, but edit the commit message
# e, edit <commit> = use commit, but stop for amending
# s, squash <commit> = use commit, but meld into previous commit
...
There are a couple options here, but we’ll go ahead and mark commits we’d like to meld with it’s successor by changing pick to squash. (If you’re using VIM, type i to enter insert mode)
pick 7f9d4bf Accessibility fix for frontpage bug
squash 3f8e810 Updated screenreader attributes
squash ec48d74 Added comments & updated README
Press ESC then type :wq to save and exit the file (if you are using VIM)
At this point Git will pop up another dialog where you can rename the commit message of the new, larger squashed commit:
# This is a combination of 3 commits
# This is the 1st commit message:
Accessibility fix for frontpage bug
# This is the commit message for #1:
Updated screenreader attributes
# This is the commit message for #2:
Added comments & updated README
# Please enter the commit message for your changes. Lines starting
# with '#' will be ignored, and an empty message aborts the commit
...
Simply saving this file without making changes will result in a single commit with a commit message that is a concatination of all 3 messages. If you’d rather rename your new commit entirely, comment out each commit’s message, and write you’re own. Once you’ve done, save and exit:
Afterthought
I think the merge approach is a more systematic approach to logging and developing commit messages in large codebases. Nevertheless, the importance of viewing raw history can't be overlooked as well. That's some food for thought. Until next time.....