Thursday, September 12, 2013

Git Tutorial: Branches and Workflow

Check out this story I read from Dr. Dobbs: Git Tutorial: Branches and Workflow.

Pulling changes, handling merges and conflicts, and building a productive workflow are activities that Git handles in its own productive but unique way.
This article is the second in a two-part tutorial on using Git. If you've never used Git, you should read the first installment Getting Started with Git: The Fundamentalsbefore starting on this one. In the previous article, I showed how to set up a Git project on GitHub, copy the project's files to a local repository, make changes locally, stage them, and finally push them to the remote repository.

Moving Forward

While we're browsing the Github interface, let's use it to create a change that you can fetch (or pull) to your local Git repository. This will emulate someone else accessing the remote repository and making a change. If you want your local copy of the repository to reflect what's stored in the remote repository, you need to keep yours up-to-date by intermittently fetching new changes. First, let's create a README.md file that Github will automatically use to describe your project. Github provides a button labeled "Add a README" for this, but let's do it the more generic way. Click the "Add a file" button on the GitHub repository
Git Tutorial
Figure 1: Adding a file to the remoted GitHub repository.

Type README.md for the name and a description that makes sense to you. (The "md" in the filename stands for "Markdown," which is a markup language that lets you augment your text with simple formatting. If you want details on how pretty you can make your README file, learn more about Github's version of Markdown. After adding some text, click the "Commit new file" button. You've committed the file to the remote GitHub repository.Go back to your terminal window and type git status.
Git Tutorial
Figure 2: Status after a change to the remote directory.

Git tells you that there's nothing to commit. This is because the Git status command does not do any network communication. Even typing "git log origin/master" won't show the change. Only Git's pushpull, and fetch do anything over the network. Let's talk about fetch, since pull is just a shortcut to some of the functionality that fetch offers.
When you track a remote branch, you do get a copy of that remote branch in your local repository. However, aside from those three aforementioned commands that talk over the network, Git treats these remote branches just like any other branches. (You can even have one local branch track another local branch.)
So, how do we update our local copies of the remote branches? git fetch will update all the local copies of the remote branches listed in your .git/config file. Here is what I get when I type git fetch
Git Tutorial
Figure 3: Output from git fetch.

Now, you'll notice there's still no difference if you type git log, but let's type git log origin/master.
Git Tutorial
Figure 4: How changes appear in the master.

Now you see the remote change.
Let's now merge the change we made on the remote repository with our local repository. In Git, a clean merge like that is called "fast-forwarding." It means there's no potential conflict. Specifically, it means that no changes were made to the branch you're going to merge the changes into. I'll explain more later in the section on rebasing, but for now, we're going to pull these changes in to our local repository. Type: git merge origin/master.
Git Tutorial
Figure 5: Pulling the remote changes from the master to our local branch.

Figure 5 shows there was one file inserted. Now if you typed git log, you'd see that you brought the change first from the master branch on your GitHub repository to your origin/master branch, and then from there to your local master branch. You could even have absolute proof of the change by looking in your current directory, where you'll see the README.md file.
There is a short cut. It's too late now that we've done the merge, but you could have done everything in one fell swoop by typing: git pull origin master.
That would have fetched the commits from the remote repository and done the merge. And if you want to pull all of the branches from all the remote repositories that your .git/config file lists, you can just type:git pull. You can be as trigger happy as you want.

Merges and Conflicts

For the purpose of learning about merges, we're going to undo that last merge. Very carefully, typegit reset HEAD~1 --hard.
Git Tutorial
Figure 6: Undoing a merge.

In Figure 6, "HEAD~1" refers to the first commit before the latest commit. The latest commit is referred to as the "HEAD" of the branch (currently master). By doing this hard reset, you're actually permanently erasing the last commit from your local master branch. As far as Git's concerned, the last link in the master branch's "chain" now is the commit that was previously second to last. Don't get in the habit of doing this. It's just for the purpose of this tutorial.
Your new README.md file is also safely committed to your local repository's cached version of the remote master branch, "origin/master." You could type git merge origin/master to remerge your changes, but don't do that right now.
Let's say someone else added that README.md, and you were unaware. You start to create a README.md in your local repository, with the intention of pushing it to the remote repository later. Because we undid our change, there is no longer a README.md file in your current directory. 
Let's quickly create a new README.md file: echo A test repository for learning git > README.md.
I used the cat command (For Windows, it'd be type) to display the contents of the simple file we created to make sure it's right. Now, let's stage and commit it. Type:
git add README.md
git commit -m "Created a simple readme file"
git status

Git Tutorial
Figure 7: Stage and commit.

Note the file was created and that the divergence between repositories has been identified. At present, two versions of a README.md file committed. You can see that your origin/master branch is one commit in one direction, and your master branch is one commit in the other direction. What will happen when I try to update master from origin/master? Type git merge origin/master.
Git Tutorial
Figure 8: Error from a merge.

Just as you might think, Git is flummoxed. This is essentially Git saying "You fix it." Let's see what state we're in. Type git status.
Git Tutorial
Figure 9: Status after a failed merge.

This message can't be any clearer, except for one detail. You have two options at this point. You can either edit the local file to match the original, or you can have Git help you. Let's choose the latter path, which is what you'd always choose with complex conflicts. While still in your project directory, having just experienced a failed merge command, type git mergetool.
Git Tutorial
Figure 10: Output from running git mergetool.

Mergetool will guide you through each conflicted file, letting you choose which version of each conflicted line you'd like to use for the committed file. You can see, by default, it uses opendiff. Press enter to see what opendiff looks like:
Git Tutorial
Figure 11: Opendiff.

If this were a conflict of more than one line, you'd be able to say "use the left (or right) version for this conflict line," or even "I don't want to use either line." In this case, we only have one conflicted line to choose from,. Click on the "Actions" pull down menu and choose "Choose right." You'll see nothing has changed. That's because that arrow in the middle was already pointing to the right. Try selecting "Choose left," then "Choose right" again. You'll see what I mean. Opendiff doesn't give you the opportunity to put in your own custom line. You can do that later if you wish. At the pull down menu at the top of the screen, select "File" then "Save Merge," go back to the menu and select "Quit FileMerge." Now, to stage the new version of the README file. Type git add README.md.
Now you're all set to commit changes, just like if you manually modified and staged (with git add) the files yourself. Now type git commit -m "Merged remote version of readme with local version." and then git status.
Git Tutorial
Figure 12: Status after the new commit.

Before we go on, if you noticed, there's a lingering "README.md.orig" file. That is just a backup file. However, it's a pain to deal with these "orig" files. For this time, you can move the file somewhere else, or just delete it, but for future reference, check out this page on the many strategies you can leverage to deal with those files.
Back to the merge. Look! Your branch is "ahead" of "origin/master" by 2 commits. Let's see what those commits are. To show just the last two commits, type
git log -n 2.

Now, let's push our changes to origin/master and see what happens. Type git push origin master. Now, just to be sure, we're not going to look at the "local version" of the remote branch. Let's go right to Github to see what happened. View the commits in your repository:
Git Tutorial
Figure 13: GitHub page showing commits.

What might not make sense here, is that you have first the GitHub-side readme commit, then your local readme commit, then the merge. It doesn't make sense for all of these commits to happen in sequence, since the first two are conflicting. What happens is that your local readme file commit is logged as a commit on a separate branch that is merged in. Let's graphically demonstrate that by clicking on the "Network" button on the right (circled in red in Figure 13).
Git Tutorial
Figure 14: GitHub's timeline on commits and merges.

Each dot in this diagram represents a commit. Later commits are on the right. The one that looks like it was committed to a separate branch (your local master branch) and then merged in is the commit of your local version of the readme file. Hover over this dot and see for yourself.

Rebasing

Before heading into discussions of workflows, I want to touch on a feature that Git does uniquely well, and that's worth knowing about should the need ever arise. It's called "rebasing." With it, you can shape your commits the way you prefer before merging them to another branch. You can already do some preparation when you're staging your files. You can stage and unstage files repeatedly, getting a commit exactly how you want. But there are two main things that rebasing lets you do in addition to that.
Let's say you were working on branch A and you created branch B. Branch B is nothing more than a series of changes made to a specific version of branch A, starting with a specific commit in branch A. Let's say you were able to take those changes and reapply them to the last commit in branch A! It's as though you checked out branch A and you made the same changes. You can use rebasing to allow your merges to be "fast-forward," so when you merge subsequent changes into another branch, there's no "merge commit." Your changes are simply added as the next commits in the target branch, and the new latest commit of that branch is your last change. This is a powerful feature.

Git Workflows

One of the most common Git workflows is the pull request, which shows up a lot in open source projects. Commits are often grouped into "feature branches," representing all the changes needed for a branch. Projects with designated maintainer(s) often operate is as follows:
  • You initially push your "feature branch" to a remote repository. This is often your fork of the main repository.
  • You create a "pull request" on Github for that branch, which tells the project maintainer that you want your branch merged into the master branch.
  • If the branch is recent enough or it can be rebased onto master without any conflicts, the maintainer can easily merge in your changes.
  • If there are conflicts, then it's generally up to the maintainer to do the merge or to reject the pull request and let you rebase and "de-conflict" the commits in your branch yourself.

My Workflows

At New York Magazine, where I work, we generally have four main branches of each project entitleddevqastgprod.
  • dev branch: While developers first test their code on their own computers, eventually they need to test changes on a server with shared resources. This exposes a bunch of integration issues and often requires multiple commits (multiple attempts to get it right) before the change is complete.
  • qa branch: This is branch is for QA (quality assurance) testing to be done on a new change. The branch is cleaner, consisting only of completed changes. While everything isn't necessarily optimized (maybe you do have debugging information being recorded to the log, for instance), it's much more controlled as opposed to dev.
  • stg branch: Changes approved by QA go to the "staging" environment. This environment is fully optimized, as if it were the production environment. There could be more issues that are exposed by testing in a fully optimized environment, but usually not. This is not to be confused with the much lower-level staging in Git, but ultimately, the concept is the same. You're ultimately preparing a set of features that are slated to go public, rather than a bunch of file changes that are about to be committed.
  • prod branch: What your clients/customers/users ultimately see is deployed directly from this branch.
We rely on the open-source continuous integration server Jenkins to monitor each branch. When any change is made, the project is built and redeployed to a computer/server dedicated to that environment. To manage the environment-specific configuration, including enabling optimizations and altering logging levels, we use Puppet. We also use Git to maintain our internal documentation, written as text files using the Git-variety of Markdown, to allow ease of collaboration and code-friendly formatting.
Each commit message at the magazine, optimally, should have a story number. A "story" is a description of a desired modification. If something should be changed in code, someone describes how the change works in a web interface provided by a story-tracking application such as Atlassian's JIRA, which we use. A developer can modify the "status" of the story to reflect progress being made toward its resolution.
We use Atlassian Crucible for peer code reviews. This lets a developer send a series of commits out to fellow developers to have a look at. It tracks who has made a change to review your code, and gives them the opportunity to make comments.
I'm often tasked with a modification I must make to a shared project hosted as a Github repository as I described. On Github, I have a separate user, "scottdanzig" for my Github activity, which allows clear separation of my personal projects from what I've done that for the magazine. For my examples, I'll refer to a Web application created with Scala and the Play Framework, that provides restaurant listings for your mobile device. Let's say we realized that the listings load very fast, and we can afford to display larger pictures. Here is my preferred workflow:
  • The first thing I do is change the status of the JIRA story I'm going to work on to "In Progress."
  • If I don't yet have the project cloned onto my machine, I'll do that first: git clone https://GitHub.com/nymag/listings.git
  • Check out the dev branch: git checkout dev
  • Update my dev branch with the latest from the remote repository: git pull origin dev
  • Create and checkout a branch off devgit checkout -b larger-pics
  • Make my modifications and test as much as I can, staging and committing my changes after successfully testing each piece of the new functionality.
  • Then update my dev branch again, so when I merge back, hopefully it's a fast-forward merge:git pull origin dev
  • I'll interactively rebase my larger-pics branch onto my dev branch. This gives me an opportunity to change all my commits to one big commit, to be applied to the latest commit on thedev branch: git rebase -i dev. I write one comprehensive commit message detailing my changes so far, making sure to start with the JIRA story number so people can review the motivation behind the change. It's possible I might want to not combine all my commits yet. If I'm not sure if one of the incremental changes is necessary, I may decide to keep it as a separate commit. This is possible if you leave it as a separate "pick" during the interactive rebasing. Git will give you an opportunity to rewrite the commit description for that commit separately.
  • Checkout the dev branch: git checkout dev
  • Merge in my one commit: git merge larger-pics
  • Push it to Github: git push origin dev
  • If Git rejects my change, I may need to rebase my dev branch onto origin/dev, and then try again. We're not going to combine any commits, so it doesn't need to be interactive: git rebase origin/dev then again: git push origin dev
  • Jenkins will detect the commit and kick off a new build. I can log into the Jenkins Web interface and watch the progress of the build. It's possible the build will fail, and other developers will grumble at me until I fix the now broken dev environment. Let's say I did just that.
  • If I think it might be a while before I'm able to fix my change, I'll use "git revert " to undo the commit. Either way, I'll again checkout my larger-pics branch, git rebase dev, then make changes, git pull origin devgit rebase devgit checkout dev, git merge larger-picsgit push origin dev. Let's say Jenkins gives me the thumbs up now.
  • Next stage is the code review. I'll log into Crucible and advertise my list of commits in the dev branch for others to review. I can make modifications based on their feedback if necessary.
Let's say both Jenkins and my fellow developers are happy. It's time to submit my code to QA. The QA branch is automatically deployed by Jenkins to the QA servers, a pristine environment meant to better reflect what actually is accessed by New York Magazine's readers. We have some dedicated QA experts who systematically test my functionality to make sure I didn't unintentionally break something. If there are no QA experts available, QA might be done by another developer if the feature is sufficiently urgent.
  • I need to update my local QA branch so I can rebase my changes onto it, pushing fast-forward commits. I first type: git pull origin qa
  • Then I change to my larger-pics branch: git checkout larger-pics
  • It's time to rebase my commits onto the qa branch, rather than dev, which can be polluted by the works in progress of other developers. I type: git rebase -i qa, creating a combined commit message describing my entire set of changes. I now have a branch that is the same as QA, plus one commit that reflects all of my changes.
  • I add my branch to the remote repository: git push -u origin larger-pics
  • I go to the repository on Github and create a pull request, requesting my larger-pics branch be merged into the qa branch.
At this point, it's out of my hands, for the time being. However, the project has a "maintainer" assigned.
  • The maintainer can first use the Github interface to see the changes. The maintainer can give a last check for the code.
  • If approved, the maintainer must merge the branch targeted by the pull request to the qa branch. If the commit will have no conflicts, Github's interface is sufficient to merge the change. Otherwise, the maintainer can reject the change, requesting for the original developer of the change to rebase the branch again and resolve the conflict before creating a new pull request. Otherwise, the maintainer can check out the branch locally and resolve the merge, rather than the original developer doing it.
  • The maintainer commits the merged change and updates the JIRA story to "Submitted to QA."
  • If QA finds a bug, they will change the JIRA status to "Failed QA." The maintainer will checkout the QA branch and use "git revert" to roll back the change, then will reassign the JIRA ticket back to the original developer.
  • If QA approves the change however, they will change the JIRA status to "Passed QA."
At regular intervals, a development team will release a set of features that are ready. A release consists of:
  • A developer merging QA-approved changes from the qa branch to the staging branch.
  • Members of the team having a last look at the change's functionality in the staging environment.
  • The developer of a change, after confirming that it works correctly in staging, merges the change into the prod branch before a designated release cutoff time.
  • The developer changes the status of the JIRA story to "Resolved"
  • The system administrators deploy a build including the last commit before the cutoff time. For us, this entails a brief period of down-time, so the release is coordinated with the editors and others who potentially will be affected.

Further Thoughts

That's a summary of how I work, and although everything is sensible, it's a bit in flux. These are things which could be changed:
  • We can get rid of the staging environment, and merge directly from QA. I see the value in this extra level of testing, but I believe four stages is a bit cumbersome.
  • A project does not necessarily need a maintainer, and if we use Crucible, perhaps not even pull requests. A developer can merge his change directly into the QA branch and submit the story to QA on his/her own. I prefer to have a project maintainer.
  • We can get rid of Crucible, and just use the code review system in Github. It might not be as feature-filled, but if we use pull requests, it's readily available and could streamline the process. I like Crucible, although it might be worth exploring eliminating this redundancy.
After years of using many other version control systems, Git has proven to be the one that makes the most sense. It's certainly not dependent on a reliable Internet connection. It's fast. It's very flexible. After more than twenty years of professional software development, I conclude Git is an absolutely indispensable tool.

Scott Danzig has been programming for more than 20 years. His personal projects on Github can be found at https://Github.com/sdanzig.

Related Article

Getting Started with Git: The Fundamentals


Sent from my iPad

No comments:

Post a Comment