Orthogonal Git Workflow

30 Dec 2016 ∼ 16 minutes / 3400 words

Countless of bytes have been wasted on tutorials and Internet debates (i.e. flame wars) on which Git workflow is the best, when to merge or rebase, how many lines of code per commit make for the best review experience, and so on.

What I’ll attempt to do in this post, in the least amount of bytes possible, is describe a simple, orthogonal Git workflow, designed for projects with a semi-regular release cadence, and built around a pre-release feature freeze.

This is not intended to be the end-all guide to workflow nirvana, but rather a collection of idioms that have been applied successfully throughout the lifecycles of various projects.

Additionally, the final two chapters contain advice and pointers on merging strategies and commit standards. If all that sounds interesting, please read on.

Permanent Branches

Using Gitflow as a starting point, we simplify certain concepts and entirely discard others. Thus, we specify two main, permanent branches:

The `master` Branch

The master branch represents code released to production (for releases with final release tags) and staging (for releases tagged -rc). We’ll talk more about release tags in a second, but it’s important to understand that the tip of master should always be pointed to by a tag, -rc or otherwise.

Commits are never made against master directly, but are rather made as part of other branches, and merged into master when we wish to deploy a new tag. Merge strategies are described below, and apply towards all code moved between branches.

The `develop` Branch

The develop branch is the integration point for all new features which will eventually make their way into master.

As with master, no commits should be made against develop directly, but should rather be part of ephemeral feature branches. The use of such branches is described below.

Ephemeral Branches

While the above two branches are permanent (i.e. should never be removed), they only serve as integration points for code built in ephemeral, or temporary branches. Of these we have two, each serving a distinctly different purpose, and having different semantics.

Feature Branches (`feature/XXX_yyy`)

Feature branches are where most of the work in a project happens, and are always opened against, and merged back into, the develop branch. What constitutes a feature is fairly broad, but essentially covers any code that is not a bugfix for an issue that exists in current master.

Figure 2.0 - Branching and merging feature branches. — **Figure 2.0** - *Branching and merging feature branches.*

Feature branch names follow a naming convention of feature/XXX_yyy where XXX refers to the ticket number opened against the work (if any), while yyy is a short, all-lowercase, dash-separated description of the work done. A (perhaps contrived) example would be:

git checkout develop # This will affect the base branch for our feature.
git pull develop     # Always a good idea to branch of the latest changes.
git checkout -b feature/45_implement-flux-capacitor

The rules behind merging of feature branches back into develop are project-specific, but most teams would have the code go through peer review and possibly a CI pass before merging. However, it is intended that projects implement a semi-regular (or at least predictable) release schedule, in which case features that are intended to appear in the upcoming release will have to be merged into develop before the feature freeze starts.

Once the feature freeze starts and develop is merged back into master and tagged as an -rc, the team is free to merge feature branches into develop again.

While most rules are meant to be broken, the ones described above (as loosely defined as they are) fit into the versioning strategies employed, and as such will benefit by being followed as closely as possible.

Certain fixes, however, cannot wait for next release, or are designed to fix breaking issues present in the master branch. For those, we have the following.

Bugfix Branches (`bugfix/XXX_yyy`)

Bugfix branches are intended to contain the bare minimum amount of code required for fixing an issue present on the master branch, and as such are always opened against, and merged back into master.

Figure 2.1 - Merging bugfix branches between master and develop. — **Figure 2.1** - *Merging bugfix branches between `master` and `develop`.*

Naming conventions and code acceptance rules are identical to those for feature branches, apart for the bugfix/ prefix applied. Bugfixes are not subject to feature freezes or release schedules.

For bugs that appear on both master and develop, the bugfix branch may, optionally, be merged into develop as well, which has the additional benefit of reducing divergence between the two branches. Why does this matter? Read below.

Versioning and Tagging Schemes

So now you have a bunch of code on develop waiting to be released. How do we go about doing that? Imagine the following, two-week (i.e. ten working day) release schedule.

Days 1 - 6: Feature Development

Cycle starts, with feature development commencing immediately. Features are opened against develop, peer-reviewed, tested, and eventually merged back into develop according to the release manager/team lead/maintainer’s directions. Large features ready for merging during the end of the window may be left un-merged in order to better test and/or avoid any latent issues.

Days 7 - 9: Feature Freeze/Pre-Release Bugfixing

Merge window closes, with any features left unmerged making their way into next cycle’s release. This is also called a “feature freeze”.

The develop branch is merged into master, and a -rc tag corresponding to the next feature version is opened. So, for instance, if the last version tagged against master was v.2.9.3, this tag is to be v.2.10.0-rc1. This tag is then pushed to a staging server and tested by all means available.

Any bugs we inevitably find are fixed in bugfix branches opened against master, and merged as soon as the fixes have been verified on the branches themselves. A subsequent -rc (i.e. v.2.10.0-rc2) release is tagged whenever we wish to push a new, fixed version to staging.

Day 10: Release Day

Release day! Hopefully we’ve had enough time to thoroughly test the new version, and as such are ready to tag and push a final version of master, v.2.10.0, to production. We make another round of testing on production and get ready for the next cycle (or release drinks).

Post-release Maintenance

We will eventually find bugs in production that weren’t uncovered by our testing on either feature branches or master. The strategy we follow differs slightly depending on which phase of the next cycle we’re on.

Before the Feature Freeze

As master is still in a pristine state, merging bugfixes back into master is a simple matter of opening a bugfix branch, merging that in, and tagging a new bugfix release version (e.g. v.2.10.1 for the above example) as soon as we’re ready to push to production.

After the Feature Freeze

The situation is slightly complicated by the fact that master now contains code that we’re not ready to release to production, and as such cannot be tagged directly. However, the workflow for opening a bugfix branch remains the same, as the issue will most likely exist in master, even with the additions from develop.

The most elegant way of solving this issue is opening a new “release” branch against the last stable tag, which will serve as the integration point for all relevant bugfix branches. The naming convention we’ll use for this branch is the major version for the release we’re branching off, i.e. v.2.10.

Once the branch has been created, we’re free to merge in all relevant bugfix branches, test locally, and tag the new version against this branch.

Figure 3.1 - Tagging a bugfix after feature freeze. — **Figure 3.1** - *Tagging a bugfix after feature freeze.*

This is the only case where a branch other than master is tagged, and as such constitutes a extraordinary measure.

A Note About Versioning

Both situations above require us to tag new release versions. Normally, we’d tag an initial -rc version, after which we’d push to staging and test. Whether this is necessary or not for bugfixes is debatable, and is left to be decided on a case-by-case basis. However, the convention of -rc to staging, final version to production, remains constant in all situations.

Merge Strategies

Countless debates exist on rebasing vs. merging, and whether to squash commits or not. Realizing that, in most cases, personal preference plays the largest role in choosing a strategy, the following sections may appear debatable, so please, take them with a grain of salt and apply them as needed. However, we’ll try to provide as much rationale as possible, while exploring alternatives in order to better understand the reasons behind our choices.

It is also important to understand that the following sections only apply to public code, i.e. anything that has been pushed to a remote. Nobody but you knows whether you squashed your 15 commits into 1 just before you pushed your code to a public repository somewhere. Regardless of the above, it often pays off to use the same strategies both offline and online, for reasons explained below.

General rules that apply to all strategies: merge with fast-forward, avoid squashing, avoid rebasing.

Merging Between Branches

Merging code between develop and various feature branches, as well as between develop and master, is one of the most common day-to-day operations, so let’s cover each case individually.

Merging from `feature` to `develop`

Once a feature has been peer reviewed and tested as a unit, and provided the feature freeze window is still open, a feature may be merged back into develop.

Choosing to merge instead of rebasing is based on the following rationale: the state of the work in the feature branch is a direct result of the point in develop it was branched off. For long-lived feature branches, this meta-information is an important aspect of understanding the design choices behind the feature work.

Additionally, rebasing disrupts the linear nature of history, that is to say, commits may appear to be behind ones that were made further in the past, but which were rebased into develop afterwards. This makes reasoning about the history harder (for instance, when wanting to bisect based on the knowledge that develop was in a “good” state on some specific date).

Rebasing may also lead to the loss of information concerning how a feature evolved in time, especially when a feature had to be refactored in response to changes made in develop (more on how these changes are brought into the feature branch in the following chapter).

In most cases, we’re only really concerned with the latest version of a feature. A commit introducing some code that is superseded by a following commit in the same feature branch may appear to be irrelevant since it never really touches upon the state of develop at the time of merging.

However such information is important in a historical sense. It may be that the code was refactored in response to an outside event, such as a different feature being merged or product decisions being made behind the scenes. Such information may be relevant in the future, even if the code itself was never strictly part of any release.

The general idea is that, anything done with intent, that is to say, manually, should be preserved in the state in which it was made.

Syncing from `develop` into a `feature`

In several cases, you may need to synchronize your feature branch with develop, for instance, when fixing merge conflicts.

Again, choosing to merge instead of rebasing is based on the general idea that actions with intent should be preserved. This especially true when working on a public branch, but holds for private branches as well.

Imagine the following scenario: You’re working on a feature branch for implementing image-uploading functionality in a CMS product. The functionality is close to be complete, when a refactor of the underlying Image class is merged into develop.

You, of course, can no longer merge my code as-is, and will have to change it in response to the refactor. You’re given two choices: either merge develop into the feature branch, fix any conflicts, and continue to add commits for refactoring any remaining functionality, or rebase the feature branch on top of develop and make it appear as if you were working with the refactored code from the beginning.

There are several reasons why merging provides benefit, especially in the long-term. One is, of course, that your refactor may be relevant to someone (including yourself) in the future. It may be as a pointer for refactoring other, similar features, or may help when attempting to debug issues that did not exist prior to the refactor.

Another, perhaps more esoteric reason is: fixing merge conflicts can go wrong. You may accidentally choose the wrong part of a conflict, or not merge the changes correctly, or add a typo somewhere that would not exist otherwise. When rebasing, it will appear as if these errors were part of the original design. When merging, these errors will appear as part of the merge commit, and as such can be traced back to with greater ease.

The proliferation of merge commits is the most common reason for choosing to rebase rather than merge, but cases like this demonstrate the value of preserving merge commits, both for their content and as meta-information: this was the point where you needed to refactor your feature; this is the point you merged your feature into develop.

Merging from `develop` to `master`

The same general rules for merging between feature branches and develop apply here as well: merge with fast-forward, do not squash.

It may be that, due to bugfix branches being applied to master alone and not develop, that the two will diverge. This however, should not complicate matters much, as in most cases develop is a strict super-set of master. Merging bugfix branches on both master and develop can help alleviate any future problems, and is the preferred strategy.

Merging from `bugfix` to `master`

Again, the same rules as with merging between feature branches and develop apply. As stated above, we should also merge all bugfixes into develop as well, even when the bug no longer applies, in order to eliminate divergence.

Notes About Fast-Forwarding and Squashing Commits

When merging features or bugfixes, we choose to fast-forward our branch relative to the base branch, for various reasons, most notably, the fact that feature and bugfix branches are intended to be ephemeral, and can (and should) be pruned regularly. The reason of why we treat these branches as such is related to how we treat commits, and is explained further below.

Choosing not to fast-forward makes bisecting and reasoning about the history harder, while providing dubious benefits, especially since pull requests (on, for example, GitHub or Bitbucket) continue to exist even if the underlying branch has been deleted.

It may appear, from the above, that the most important aspects of our workflow lie within our branching and merging strategies. However, this is not entirely true.

The smallest monad in any Git repository is the commit, which also makes it the most important aspect of our workflow. Maintaining a clean history depends largely on the quality of each individual commit pushed, and keeping the quality consistent is hard and requires buy-in from every individual team member.

Squashing commits is the antithesis of maintaining consistent quality – why would you squash commits that have been prepared with such diligence? Several other reasons apply, as explained in the following sections.

Commit Standards

The following chapters outline several rules on creating good commits.

Naming and Messages

Perhaps the easiest rule to implement, and the one providing the most benefits for the least amount of effort, is standardizing on naming conventions for commit messages. The advice below echoes conventions followed by quite a few large repositories, including the Git repository for the Linux kernel itself, but is nevertheless worth repeating:

Commit titles should be prepended with the file name or subsystem they affect, be written in the imperative starting with a verb, and be up to 60 characters in length.

So, applying the aforementioned rules, we have two examples:

Bad example:

The Get method of the Image class now fetches files asynchronously

Good example:

Image: Refactor method “Get” for asynchronous operation

The reasons are many-fold: prepending the name of the subsystem helps in understanding where the work is happening at a glance. Using the imperative and starting with a verb is easier to understand by using the following sentence before every commit title: “applying this commit will…”. Lastly, the choice of limiting the title to 60 characters may appear archaic, but it helps in being more terse.

All commits should be accompanied by a commit message (separated from the title by two consecutive newlines), ideally containing the rationale behind the changes made within the commit, but minimally the name of the ticket this work is attached to, which will most certainly be useful to you at some point in the future. For example:

Image: Refactor method “Get” for asynchronous operation

Fetching images from the remote image repository is now asynchronous, in order to allow for multiple images to download concurrently. This change does not affect the user-facing API or functionality in any way.

Related-To: #123

Using a standard syntax for relating commits to ticket numbers helps with finding them using git log --grep.

What to Commit and When

We don’t always have the ability or knowledge to foresee the final, completed state of work needed in order to implement a feature or fix a bug. As such, most work is driven by whatever idea we have about the code at the moment, and can therefore change rapidly.

The standard rule for choosing what to include in a commit is this: every commit should represent a single, individually reversible change to the codebase. That is to say, related work, work that builds on top of itself in the same branch, should be part of the same commit.

As an example, in the course of implementing the asynchronous image operations described above, you find a bug in the same file but a different, unrelated method.

This bugfix and the feature work done should appear in two, separate commits, for the simple reason that, we should be able to revert a buggy feature without sacrificing unrelated bugfixes made in the course of building that feature.

The tools we use will largely affect what our commits look like: GitHub now allows for better control over reviewing specific commits. Gerrit allows commits to be grouped into patch-sets, which can be reviewed and reworked as separate entities (which would usually either require a rebase or a new pull request). Other tools only allow reviewing the latest version of a branch as a whole.

Pushing for clear boundaries between commits, especially in the face of ever-changing requirements, and the fact that in most cases, you’d only ever revert an entire pull request/branch and not the individual commits themselves, may appear to be losing battle.

The easiest way to deal with these issues is at the time of review: if the commits are too big (over a couple hundred lines of code) and do not appear cohesive, reviewing the code is that much harder, and will eventually lead to inferior code quality and/or bugs falling through the cracks.

Closing Remarks

Various concepts have been presented, some harder to implement than others. If there is one take-away, please allow it to be this: it’s better to be consistent than to be correct, and it’s better to be simple than comprehensive.

Rules that are not orthogonal to one another are harder to implement and follow consistently, so keep that in mind when choosing which battles to fight.

The graphics in this post have been generated using Grawkit, a AWK script which generates git graphs from command-line descriptions.