Thursday, September 12, 2013

Getting Started with Git: The Fundamentals

Check out this story I read from Dr. Dobbs: Getting Started with Git: The Fundamentals.

The distributed SCM system that's taking the world by storm has its own unique way of doing things. This tutorial explains how things work and the basic commands for getting started and checking-in changes.
I was not particularly inspired by any SCM system until I dove into Git, created by Linus Torvalds, the founder of Linux. In this tutorial, I discuss what's unique about Git and I demonstrate how to set up a repository on GitHub, one the main free Git hosting services. Then I explain how to make a local copy of the GitHub repository, make some changes locally, and push them back to GitHub. The second installment of this tutorial will build on this base, explain branching and merging, and discuss a workflow that I use, which might be of interest to you. As a side note, I learned much of what I know about Git from the book Pro Git, which is is hosted free online. I recommend that you use the book to fill out the matter presented here and as a reference for later work with Git.

Why Git?

Git has numerous attractive benefits that, for me, make it my preferred DVCS:
  • When you create a new branch, Git doesn't copy all your files over. A branch will point to the original files and only track the changes (commits) specific to that branch. This makes it blazingly fast to create branches compared to other approaches, such as Subversion (which laboriously copies the files).
  • Git lets you work on your own copy of a project, merging your commits into the central repository, often on GitHub.com, when you want your commits to be available to others. Github.com, by the way, will host your project for free as long as it's open source. (And cheaply, if it's not. Another alternative is Bitbucket, which allows unlimited private Git repositories.) This means you can reliably access your code from anywhere with an Internet connection. If you lose that Internet connection, you can continue to work locally and sync up your changes when you're able to reconnect.
  • When you screw up, you can usually undo your changes. You might need to call in an expert in serious cases, but there's always hope. This is the best "key benefit" a version control system can have.
  • Git also lets you keep your commit history very organized. If you have lots of little changes, it lets you easily rewrite history so you see it as one big change (via something called "rebasing"). You can add/remove files in each commit, and certainly change the descriptions of each.
  • It's open source, fast, and very flexible, so it's widely adopted and well-supported.
  • With Git, you can create "hooks," which enable actions to occur automatically when you work on your code. A common use case is to create a hook to check the description submitted with each commit to make sure it conforms to a particular format. Perhaps you have your bugs described in a bug tracking system, and each bug has an ID #. Git can ensure each message has an entry for"Bug: SomeNumber".
  • Another under-appreciated feature is how Git tracks files. It uses the SHA-1 algorithm to take the contents of files and produce a large hexadecimal number (hash code). The same file will always produce the same hash code. This way, if you move a file to a different folder, Git can detect that the file moved, and not think that you deleted one file and added another. This allows Git to avoid keeping two copies of the same file.
  • While Git is not necessarily the most intuitive version control system out there, once you get used to it, you're able to browse through its internal directories and it makes complete sense. Wondering where the file with the hash code"d482acb1302c49af36d5dabe0bccea04546496f7" is? Check out this file:"/.git/objects/d4/82acb1302c49af36d5dabe0bccea04546496f7" There are also lots of lower-level commands that let you build the operations you want, in case, for instance, Git'smerge command doesn't work how you'd like it to.

Tutorial

Let's jump in. In whatever programming language, you're going to start a new project, and you want to use version control? I'm going to create a silly, sample application in Scala that's very easy to understand for a demonstration. I'll assume you're familiar with your operating system's command-line interface, and that you're able to write something in the language of your choice.

Setup

Github is one of the go-to places to get your code hosted for free and it's what I'll use here. (BitBucket, Google Code, and SourceForge are some of the other free repository hosts that support Git). All these hosts give you a home for your code that you can access from anywhere. Initial steps:
  1. Go to http://GitHub.com and "Sign up for Github"
  2. You'll need Git. Follow this step-by-step installation process
  3. Review how to create a new repository
  4. Finally, you're going to want to get used to viewing files that start with a "." These files are hidden by default; so at the command line, when you're listing the contents of a directory, you need to include an "a" option. That's "ls -a" in OSX and Linux, and "dir /a" for Windows. In your folder options, you can turn on "Show hidden files and folders" as well.

Once you get this far, there's nothing stopping you (outside of setting aside some time to explore what Git has to offer). Let's look at some of the typical actions.

Clone a Repository

Cloning a repository lets you grab the source code from an existing project (yours or someone else's) that you have access to. Unless it's your project, you won't be able to make changes unless you "fork" the project, which means creating your own copy of it under your own account, after which you can modify it to your heart's content. I keep all of my projects locally (on my computer) in a "projects" folder in my home directory, "/Users/sdanzig/projects", so I'm going to use "projects" for this demo.
First, I fork my repository…
I create a sample project, called potayto, on GitHub, as you now should know how to do. Let's get this project onto your hard drive so you can add comments to my source code for me. First, log into your GitHub account, then go to my repository at https://GitHub.com/sdanzig/potayto and click Fork:
Git Part 1
Figure 1: Forking (cloning) a repository.

Then select your user account on GitHub and copy it there. When this is complete, it's as though it were your own repository and you can actually make changes to the code on GitHub. Now, let's copy the repository onto your local hard drive, so we can both edit and compile the code there.
Git Part 1
Figure 2: Copying a repository.

Folder Structure

There are a few key things to know about what Git is doing with your files. Type: cd potayto. There are useful things to see here when you list the contents in the potayto folder, being careful to show the hidden files and folders (with the –a option):
Git Part 1
Figure 3: Examining the contents of a Git repository.

The src folder contains the source code, and its structure conforms to the Maven standard directory structure. You'll also see a .git folder, which contains a complete record of all the changes that were made to the potayto project, as well as a .gitignore text file. We're not going to dive into the contents of .git in this tutorial, but it's easier to understand than you think. If you're curious, please refer to the free online book.

Git Log

A "commit" is a change recorded in your local repository. Type "git log," and you might have to press your space bar to scroll and type "q" at the end to quit displaying the file:
Git Part 1
Figure 4: Output from git log.

Git's log shows that the potayto project has 3 commits so far, from the oldest on the bottom to the most recent on top. You see the big hexadecimal numbers preceded by the word "commit"? Those are the SHA-1 codes I referred to earlier. Git also uses these SHA-1 codes to identify commits. They're big and scary, but you can just copy and paste them. Also, you need to type only enough letters and numbers for it to be uniquely identified (five is usually enough).
Let's see how my first commit started. To see the details of the first commit, type: git show bfaa. Figure 5 shows the results.
Git Part 1
Figure 5: Contents of commit.

At the bottom of Figure 5, you can see that I initially checked-in my Scala application as something that merely printed out "Tomayto tomahto," "Potayto potahto!" You can see that near the bottom. The main() method of the Potayto is executed, and there are those two print lines.
Earlier in Figure 5, you can see the addition of the .gitignore I provided. I'm making Git ignore my Eclipse-specific dot-something files (for example, Eclipse's .project) and also the target directory, where my source code is compiled to. Git's show command is showing the changes in this file, not the entire files. The +'s before each line mean the lines were added. In this case, they were added because the file was previously nonexistent. That's why you see the /dev/null there.
Now type git show 963e to get the output in Figure 6.
Git Part 1
Figure 6: Commit message.

Here you see my informative commit message about what changed. These commit messages should be concise but comprehensive, so you're able to find the change when you need it.
After that, you see that I did exactly what the message says. I changed the order of the lyrics. You see two lines beginning with "-", preceding the lines removed; and two lines beginning with "+", preceding the lines added. You get the idea.

The .gitignore File, and Git Status

View the .gitignore file, which was dumped in Figure 5.

.cache
.settings
.classpath
.project
target

This is a manually created file in which I tell Git what to ignore. If you don't want files tracked, you add them here. I use the Eclipse IDE to write my code, and it creates hidden project files, which Git will see and want to add in to the project. Why should you be confined to using not only the same software as me to mess with my code, but also the same settings? Some teams might want to conform to the same development environments and checking-in the project files might be a time saver, but these days, there are tools that let you easily generate such project files for popular IDEs. Therefore, I have Git ignore all the Eclipse-specific files, which all happen to start with a "."

There's also a "target" folder in .gitignore. I've configured Eclipse to put my compiled code into that folder. We don't want Git tracking the files generated upon compilation. Let developers grabbing your source code compile it themselves after they make their modifications. You're going to want to create one for your own projects. This .gitignore file gets checked-in along with your project, so people who modify your code don't accidentally check-in their generated code as well. Other developers might be using IntelliJ IDE, which writes .idea folders and .ipr and .iws files, so they would add those to the .gitignore file.

Getting the Status

Now, let's try this. Type git status.
Git Part 1
Figure 7: Status showing no new artifacts to commit.

It shows that there is nothing new to commit to your local repository. You also see in Figure 7 that you're on the main branch of your project, "master." Being "on a branch" means your commits are appended to that branch. Now create a text file named "deleteme.txt" using whatever editor you want in that potayto folder and type git status again:
Git Part 1
Figure 8: Status with artifacts to commit.

Use that same text editor to add "deleteme.txt" as the last line of .gitignore and check this out (Figure 9).
Git Part 1
Figure 9: Status with no changes to commit.

Other than its special treatment by Git, .gitignore is a file just like any other file in your repository, so if you want the new information saved, you have to commit the change just like you would commit a change to your code.

Staging Changes

One of Git's best features is that it offers a staging process. You can stage the modified files that you want to commit. Other version control systems await your one command before your files are changed in the repository — generally the remote repository for the entire team. When you commit files in Git, files are held in a staging area. You will later commit all the files from the staging area to the larger repository.
So, let's say you wanted to make a change involving files A and B. You changed file A. You then remembered something unrelated to do with file Z and you modified that. Then you went back to your initial change, modifying file B. Git allows you to add files A and B to staging, while leaving file Z"unstaged." Then you can push only the staged files to your repository. But you don't! You realize you need to make a change to file C as well. You "add" it. Now files AB, and C are staged, and Z is still unstaged. You commit the staged changes only.
Read that last paragraph repeatedly if you didn't follow it fully. It's important. See how Git lets you prepare your commit beforehand? With a version control system such as Subversion, you'd have to remember to make your change to file Z later, and your "commit history" would show that you changed files A and B, then, in another entry, that you changed file C later.
We won't be as intricate. Let's just stage our one file for now. Look at Figure 9. Git gives you instructions for what you can do while in the repository's current state. Git is not known for having intuitive commands, but it is known for helping you out. "git checkout -- .gitignore" to undo your change? It's strange, but at least it tells you exactly what to do.
To promote .gitignore to "staged" status, type git add .gitignore.
Git Part 1
Figure 10: Promoting to staged status.

The important thing to note here is that now your file change is listed under "Changes to be committed" and Git is spoon-feeding you what you need to type if you want to undo this staging. Don't type this: git reset HEAD .gitignore.
You should strive to understand what's going on (check out the Pro Git book I linked to for those details), but in this situation, you simply are given means to an end when you might need it (in case you change your mind about what to stage).
By the way, it's often more convenient to just type "git add " to add all modifications of files in a folder (and subfolders of that folder). It is also very common to type the shortcut "git add ." to stage all the modified files in your repository. This is fine as long as you're certain that you're not accidentally adding a file such as Z that you don't want to be grouped into this change in your commit history.
It's also useful to know how to stage the deletion of a file. Use git rm for that.

Committing Changes to Your Repository

Time to do our first commit! To make the change in .gitignore official, type git commit -m "Added deleteme.txt to .gitignore" .
Git Part 1
Figure 11: Committing with a commit message.

The –m option is followed by the commit message. You could just type git commit, but then Git would load up a text editor and you'd be required to type a commit message anyway. In Mac OS X and Linux, vim is the editor that would load up; and in Windows, you'd get an error. If you prefer a full screen editor in Windows, you can type this to configure it:
git config --global core.editor "notepad"
If you end up in vim and are unfamiliar with it, note that it's a very geeky and unintuitive but powerful editor to use. In general, pressing the escape key, and typing ":x" will save what you're writing and then exit. The same syntax will work to choose a new full screen editor in OS X and Linux, of course replacing notepad with the /full/path/and/filename of a different editor.
The full screen editor is necessary if you want a commit message with multiple lines, so if you hate vim, configure Git to use an editor you do like.
To see the commit you just made, type git log.
Git Part 1
Figure 12: Log of the most recent commit.

The change on top is yours. Oh, what the heck, let's take a look at it with diff:
Git Part 1
Figure 13: A diff output of the commit.

The +deleteme.txt is the change that was just committed. The way this diff works is that Git tries to show you three lines before and after each of your changes. Here, there were no lines below your addition. The -3,3 and +3,4 are ranges. - precedes the old file's range, and+ is for the new file. The first number in each range is a starting line number. The second number is the number of lines of the displayed sample before and after your modification. The 4 lines displayed only totaled 3 before your change.
Note that if you want to revert changes you made, the safest way is to use "git revert," which automatically creates a new commit that undoes the changes in another commit. If you wanted to undo that last commit, which has the SHA-1 starting with 0c22, you would type: git revert 0c22.(Don't actually do this if you are following along.)

Pushing Your Changes to Remote Repository

You cloned your repository from your GitHub account. Unless something went horribly wrong, the repository on GitHub should be: https://GitHub.com//potayto.git
Git automatically labels the location you cloned a repository from as "origin." Remember when I said the internals of a Git repository were easily accessible in that .Git folder in your project? Look at the text file .git/config:
Git Part 1
Figure 14: The contents of the .git/config file.

It's as simple as this.
Before I explain how to make your changes on the version of your code stored on GitHub, I should first explain more about branches. I already noted how a branch is a separate version of your code. A change made to one branch does not affect the version of your repository represented by another branch, unless you explicitly merge the change into it. By default, Git will put your code on a "master" branch. When you clone a project from a remote repository ("remote" in this case means hosted by GitHub), it will automatically create a local branch that "tracks" a remote branch. Tracking a branch means that Git will help you to:
  • See the differences between commits made to the tracking branch (the local one) and the tracked branch (remote)
  • Add your new local commits to the remote branch
  • Put the new remote commits on your local branch
If you didn't have your local branch track the remote branch, you could still move changes from one to another, but it becomes more of a manual process. To do this, first, type git status.
Git Part 1
Figure 15: Showing that the local repository is ahead of the remote repository.

That deleteme.txt change you made in your local master branch is not yet on Github! You have one commit that Github's origin) remote master branch (denoted as origin/master) does not yet have.
Let's put the change on Github. Git's push command, if you don't provide arguments, will just push all the changes committed in your local branches to the remote branches they track. This can be dangerous, if you have commits in another local branch and you're not quite ready to push those out also. (I once accidentally erased a week of changes in New York Magazine's main repository doing this. We did manage to recover them, but don't ask.) It's better to be explicit. Type git push origin master.
Git Part 1
Figure 16: Pushing commit.

You don't really need to concern yourself with the details of how Git does the upload. But as for the command you just typed, Git's push lets you specify the "remote" that you're pushing to, as well as the branch. By specifying the branch, you tell Git to take that particular branch ("master," in this case) and update the remote branch, on the origin (your Github potayto repository) with the same name (it will create a new remote "master" branch if it doesn't exist). If you don't specify "master," Git will try to push the changes in all your branches to branches of the same names on the origin (if they exist there).
If you type "git status" again, you'll see your branch now matches the remote repository's copy of it. You can also look at the changes to the origin, by typing git log origin/master.
Git Part 1
Figure 17: Log of changes to the origin.

This is the syntax to see a log of the commits in the master branch on your "origin" remote. You can see the change is there. You can also see this list of commits by logging into Github, viewing your Potayto repository, and clicking on the link in Figure 18.
Git Part 1
Figure 18: Seeing changes to the origin.

In the next installment of this tutorial, we'll examine how to pull changes from the remote repository, how to handle merges and merge conflicts, and other workflow tasks that are part of standard SCM work with Git.

Scott Danzig has been programming for more than 20 years. His personal projects on Github can be found at https://Github.com/sdanzig.


Sent from my iPad

No comments:

Post a Comment