The Fugue Counterpoint by Hans Fugal

12Feb/090

Why I switched to Git

I love it when someone else writes what I've been meaning to write, so I don't have to write it.

This article covers the reasons why I switched from mercurial to git, about as well as he could possibly hope to without consulting me, reading my mind, or being me. It's a bit creepy.

He explains it very well (probably better than I would) and has more insight into the technical details than I do.

While we're on the subject, you should all go read git for computer scientists so you can think like a git. If you already have a CS background, it's quite painless I assure you.

10Nov/0840

git-push is worse than worthless

Ugh

Let's say you are a web developer, and you do development on your laptop, then when things are nice and shiny you want to push those changes to the webserver. Seems natural enough, right?

git clone server:/var/www/foo
# ...
git pull

# edit stuff
git commit -a -m 'i edited stuff'

Now, let's say you're a (possibly former) darcs/bzr/mercurial user and this time you're using git. Git has git-push. You read the man page like a good little code monkey, and it seems like it does the same or similar thing to darcs push or hg push. It seems like if you want to push your changes to the server, you'd do this:

git push

Am I off in left field or does this not seem 100% rational? But wo be unto the code monkey that utters this unfortunate incantation. Observe:

$ mkdir foo
$ cd foo
$ git init
Initialized empty Git repository in /private/tmp/foo/.git/
$ echo hello > foo.txt
$ git add foo.txt
$ git commit -m 'hello'
Created initial commit bee50da: hello
 1 files changed, 1 insertions(+), 0 deletions(-)
 create mode 100644 foo.txt
$ cd ..
$ git clone foo bar
Initialized empty Git repository in /private/tmp/bar/.git/
$ cd bar
$ echo goodbye >> foo.txt
$ git commit -a -m goodbye
Created commit 99c13c1: goodbye
 1 files changed, 1 insertions(+), 0 deletions(-)
$ git push ../foo
Counting objects: 5, done.
Writing objects: 100% (3/3), 248 bytes, done.
Total 3 (delta 0), reused 0 (delta 0)
Unpacking objects: 100% (3/3), done.
To ../foo
   bee50da..99c13c1  master -> master
$ cd ../foo
$ git status
# On branch master
# Changes to be committed:
#   (use "git reset HEAD <file>..." to unstage)
#
#   modified:   foo.txt
#
$ git diff
$ git diff --cache
error: invalid option: --cache
$ git diff --cached
diff --git a/foo.txt b/foo.txt
index a32119c..ce01362 100644
--- a/foo.txt
+++ b/foo.txt
@@ -1,2 +1 @@
 hello
-goodbye
$ git log
commit 99c13c1e60888ae2c0e221898411e1cd52ad3815
Author: Hans Fugal <hans@fugal.net>
Date:   Mon Nov 10 17:11:57 2008 -0700

    goodbye

commit bee50da72798edc47ddc36dbc4f559f141b1e28b
Author: Hans Fugal <hans@fugal.net>
Date:   Mon Nov 10 17:11:34 2008 -0700

    hello

I promise I didn't fake that. Yes, you saw that correctly—git wants to undo the changes you just committed. If you happen to have a clean working directory, all you need to do to return to sanity is git reset HEAD. If not, heaven help you.

This is totally unacceptable. It's unforgivable on so many levels. At the very least, the manpage should warn you to not push to repositories with working copies. Git should warn you before you push and screw up your repo that it has a working copy checked out. Ideally, git would behave like darcs and update the working copy. Suboptimally, it would behave like mercurial and make it a new revision that you have to manually checkout. But this is simply ridiculous.

So what is the solution? They tell you to use pull. Hello! Is anyone home? My laptop is roving. It's often behind a NATing firewall. I'm supposed to find my public IP address and figure out how to subvert the evil firewalls of the world every time I want to push my changes to the server?

A workaround, and probably the best real-world workflow, is to have a second bare repository or a second branch on the server, push into that, then ssh into the server and pull the changes. I think this page describes how to do that with a second branch, though I'm short on time to actually try it out at the moment.

More of this sickening story in this thread, where you will learn that at least one other person out there has his head screwed on properly, that the developers are more interested in how hooks work (and fail to allow you to do this even if you grok them), and that they've discussed the problem before and decided the correct response is to RTFM (M for minds this time, since the manual was completely unhelpful).

Update: Some of you have been quick to defend git and the design choice of how push behaves. I want to clarify that I don't care so much that push updates the repository but not the working directory. Mercurial works this way too. Not the way I'd do it but it's a valid approach. The problem here is that git push seems like a natural thing to do but screws up your working directory on the remote side. Mercurial doesn't change the working directory, but neither does it silently rebase it and set you up to undo your changes if you're not careful. The problem here is a lack of safety and a lack of warning. They know it's a problem, they've fielded enough "morons". A few words of warning in the man page is all it would take to make me happy.

And now, I have had time to work out a more specific workaround. Here's what I did, and it seems to work well:

# Server setup (set up incoming branch)
server$ git branch incoming
server$ git branch
  incoming
* master
  origin

# Laptop setup (local master to remote incoming)
laptop$ git config remote.origin.push master:incoming

# Everyday usage
laptop$ git push
Counting objects: 5, done.
Compressing objects: 100% (2/2), done.
Writing objects: 100% (3/3), 279 bytes, done.
Total 3 (delta 1), reused 0 (delta 0)
To server:/tmp/foo
   b108a07..a9d3282  master -> incoming

server$ git status
# On branch master
nothing to commit (working directory clean)
server$ git pull . incoming
From .
 * branch            incoming   -> FETCH_HEAD
Updating b108a07..a9d3282
Fast forward
 foo.txt |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

You could use hooks to automatically do that git pull . incoming if you liked, making it more like darcs than mercurial.

Updated update: On further thought, the cleanest solution is probably to have a separate master (bare) repository, e.g.

$ mkdir master
$ cd master
$ git init
Initialized empty Git repository in /private/tmp/foo/master/.git/
$ git commit --allow-empty -m initial
Created initial commit 999755e: initial
$ cd ..
$ git clone master live
Initialized empty Git repository in /private/tmp/foo/live/.git/
$ cd live
$ git branch
* master
$ cd ..
$ git clone master laptop
Initialized empty Git repository in /private/tmp/foo/laptop/.git/
$ cd laptop
$ echo hello > foo.txt
$ git add foo.txt
$ git commit -m hello
Created commit 2297bcf: hello
 1 files changed, 1 insertions(+), 0 deletions(-)
 create mode 100644 foo.txt
$ git push
Counting objects: 4, done.
Writing objects: 100% (3/3), 239 bytes, done.
Total 3 (delta 0), reused 0 (delta 0)
Unpacking objects: 100% (3/3), done.
To /tmp/foo/master/.git
   999755e..2297bcf  master -> master
$ cd ../live
$ ls
$ git pull
remote: Counting objects: 4, done.
remote: Total 3 (delta 0), reused 0 (delta 0)
Unpacking objects: 100% (3/3), done.
From /tmp/foo/master/
   999755e..2297bcf  master     -> origin/master
Updating 999755e..2297bcf
Fast forward
 foo.txt |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)
 create mode 100644 foo.txt
$ echo goodbye >> foo.txt
$ git commit -a -m goodbye
Created commit 04f6702: goodbye
 1 files changed, 1 insertions(+), 0 deletions(-)
$ git push
Counting objects: 5, done.
Writing objects: 100% (3/3), 248 bytes, done.
Total 3 (delta 0), reused 0 (delta 0)
Unpacking objects: 100% (3/3), done.
To /tmp/foo/master/.git
   2297bcf..04f6702  master -> master
$ cd ../laptop
$ git pull
remote: Counting objects: 5, done.
remote: Total 3 (delta 0), reused 0 (delta 0)
Unpacking objects: 100% (3/3), done.
From /tmp/foo/master/
   2297bcf..04f6702  master     -> origin/master
Updating 2297bcf..04f6702
Fast forward
 foo.txt |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)
15Dec/070

Tailor, Mercurial, Leopard

I was getting this error when trying to use Tailor with Mercurial on Leopard:

Common base for tailor exceptions: 'hg' is not a known VCS kind: No module named mercurial

Mercurial was installed, so the error mystified me. Turns out to be a problem
with python versions in use. Leopard's python is version 2.5.1, but mercurial
from MacPorts depends on python 2.4, and so MacPorts installs python24. So when
running tailor with python 2.5 and trying to load the mercurial module which is
installed for 2.4, not 2.5, it naturally fails. The workaround is to run tailor
with python 2.4 instead, which works fine.

#! /bin/sh
# Save as ~/bin/tailor and chmod +x
tailor=$HOME/src/tailor/tailor
exec python2.4 $tailor
11Dec/070

Darcs2 Prerelease

Levi referred me to a post entitled "How I stopped missing Darcs and started loving Git". It's an interesting read, but ironically the thing that interested me most was a comment mentioning that just yesterday darcs 2.0.0pre1 was released. It looks like some very exciting things are coming down the darcs pipe:

  1. It should no longer be possible to confuse darcs or freeze it indefinitely by merging conflicting changes.
  2. Identical primitive changes no longer conflict.
  3. Darcs get is now much faster, and always operates in a "lazy" fashion, meaning that patches are downloaded only when they are needed.
  4. Darcs now supports caching of patches and file contents to reduce bandwidth and save disk space.
  5. Speed improvements.

The most exciting change is, of course, the elimination of the exponential merge. This is very good news indeed, if it means what I think it means. The second change I listed is also very interesting to me. You'll remember I posited that exact situation in a previous post, and was teased incessantly as a result.

Do read the release notes. If darcs2 is released within a reasonable time frame, it will continue to be a strong contender.

Tagged as: , , No Comments
21Nov/070

hgk with MacPorts

If you use mercurial from MacPorts, you need the following in your ~/.hgrc to enable the hgk extension (GUI tree browser, invoked with hg view):

[extensions]
hgk=

[hgk]
path=/opt/local/share/mercurial/contrib/hgk

While you're in there, enable the mq, fetch, and record extensions.

16Nov/0713

Mercurial and Darcs

I'm a long-time distributed revision control user and advocate. I think it's the only way to go, but this article isn't about convincing you of that. I'll just take it for granted.

From almost the beginning, I have used darcs. I did try Tom's ahem Lame Arch, which I found to be completely unusable. Darcs hasn't been around as long as arch, but it is in roughly the same generation as other early systems like monotone and bazaar. I think of these as the second wave of distributed revision control (the first wave being, basically, tla). When I read about darcs I was impressed by the theory of patches. When I tried darcs I was very impressed by how easy it is to use. Soon it became second nature and I never looked back.

But I did look forward, and sideways. When collaborating on a project using svn I gave svk a whirl. I don't hesitate to recommend svk if you're otherwise stuck with svn.

When the third wave of distributed revision control came along, ushered in by the Great Linux/BitKeeper Fiasco, I took a cursory look at git. A couple years later (aka earlier this year) I took a closer look. I can see that it has come a long way (in user interface). You don't need "chrome" or whatever they used to call the wrappers like cogito anymore. It's fairly usable. But it's also fairly complicated (that's an understatement, if you left your understatement detector at home). It's perfectly tailored for what Linus and other kernel developers need, but that's not necessarily what the rest of us need. I wouldn't hesitate to nod my head in agreement if you moved from CVS or subversion to git.

Not entirely satisfied with git, and not being able to use darcs (because of the dreaded exponential merge, and general slowness), I looked at Mercurial (Hg from now on). I found that it was fast like git, but easy to use like darcs. The documentation is excellent, probably the best in the game. The underlying theory is easy to grok, and this is important. Darcs is the same way, although the two theories are definitely different. It has served me well in the application where I couldn't use darcs, but I remained a darcs user for other things.

But mercurial has been seeping into my consciousness of late. Rumors surface from time to time of darcs users migrating to mercurial (it seems to be the only one they migrate to). Other people praise it too. I find it plenty easy to use even as a second-class arrow in my quiver. Finally it gains enough ground that I decide to use it for a project where I'm actually exercising the version control system. I have really liked what I've seen. So the rest of this already-long post will be a comparison of darcs and mercurial. I'm not saying either one or even both together are "the best" by any objective measure, though I definitely prefer them over others myself. I don't know that I'll come to any conclusion over whether to use darcs or mercurial in the future (I haven't yet, anyway, though I'm leaning towards mercurial). I just want to get my comparison out there. Note also that I'm probably not going to mention every feature that I love that they both have in common. As such this isn't an evangelism for either one over the rest of the pack. Just a comparison of what distinguishes these two.

First, darcs. Darcs has excellent support for cherry picking. Its user interface is second to none. It is easy to mail a set of patches to another darcs user or to someone in general. The theory of patches is very interesting and flexible, and easy to follow (until strange things happen, when it gets really hard to follow). It's written in Haskell by a physicist, which has got to count for something. darcs record, which will ask you whether you want to check in each hunk that's changed in your working directory, rather than an all-or-nothing commit, is a joy to work with (especially for those of us who seem to lack the discipline to check in features and bugfixes in neat little packages).

On the other hand, darcs uses _darcs to store its metadata, which is probably for cross-platform reasons but is definitely an eyesore. (.darcs would be better). It's annoyingly inconvenient to grab a specific revision (partly because such a concept doesn't really exist) or a specific point of time. Darcs has no idea of branching. Or more precisely, every darcs repository is its own branch and so it leaves that up to you. That's theoretically ok, but I do prefer to keep the clutter of multiple working copies at bay. Combined with not having to rebuild the whole project from scratch and not wasting the disk space for a large working directory, in-place switching to certain branches/revisions like git and hg do is really nice. Darcs doesn't support symlinks or permissions, again for cross-platform reasons. The two real thorns in the side of darcs are the dreaded exponential merge (I have run into it, alas), and the fact that I have a devil of a time getting it to run everywhere I am. Haskell is not that common, and it's a big thing to have to build, e.g. with MacPorts. That's when it will build at all (but that MacPorts rant is for another post). There are binaries for windows and mac, but sometimes they don't work and versions don't match up... it can be a nightmare. It can usually work out, but it can be difficult.

I mentioned cherry picking. One UI flaw in darcs that greatly reduces the real-world utility of its excellent cherry picking support is that you can't tell it that you would prefer to refuse this patch for ever and ever. Every time you pull you have to tell it "no, I don't want that patch". This makes maintaining similar but subtly different branches impractical. A better tool for this job is quilt. In practice, I've done precious little cherry picking. It's always nice to know I can, but after some deep thought on the subject over the last couple days I'm not at all convinced that it's worth supporting in the revision control system. More on that in another post.

Now, mercurial. It's fast, easy to use, and easy to install. It is written in Python (with some C) and works well on all three platforms. The .hg directory is not an eyesore. hg is quick and easy to type. It supports branching. It has a powerful extension mechanism, with many interesting extensions available. The theory is easy to grok and follow, so you're not surprised very often. The implementation seems excellent, from the point of view of both a user and a system administrator. CVS-like abbreviations for minimal typing are handy (like hg ci for hg commit). I've already mentioned the excellent documentation. I might as well mention an excellent Google Tech Talk given by the author of said documentation. hg clone is fast and efficient and uses hardlinks where it can (for the metadata, not the working directory). It has a built in webserver for quick and easy web-based collaboration (for firewall reasons or for an interface to the repo for non-hg users). It's storage efficient, and has the novel and all-important optimization principle of avoiding disk seeks. That hg, written in python, gives git a run for its money shows that the devs know what they're doing when it comes to systems programming. Mercurial Queues is extremely cool and useful.

As for the cons, although the theory is easy to grok and doesn't surprise you much, it will surprise you as a newcomer unless you grok the theory. This is especially true of pull and push which don't update the working directory. Some of the most useful stuff is provided by extensions that are disabled by default (at least they are distributed). fetch, record (for darcs-like hunk-by-hunk recording), mq, bisect, transplant (for cherry picking). It doesn't have the cherry picking abilities of darcs, though there is the record extension (for cherry picking from your working directory) and the transplant extension (for cherry picking proper). I haven't used the latter yet so I don't know if it works well or not. .hgignore is easy to manage, but comes with almost no useful defaults. This is for performance reasons apparently, but I'd be willing to take the hit. Let me override with a minimal .hgignore if performance matters that much on this project. I don't like that I have to type ssh urls out, as ssh://example.com//path/to/repository, and it's too difficult to set the default push/pull location so I end up having to type it more than I'd like. Directories aren't first class, which makes me slightly uneasy. It hasn't caused me grief yet though.

I don't like that you don't get compatible repositories if you start from the same code. For example:

tar xzf foo-1.0.tar.gz
rsync -a foo-1.0 foo-1.0-2
cd foo-1.0
hg init
hg ci -Am 'initial'
cd ../foo-1.0-2
hg init
hg ci -Am 'initial'

At this point you cannot pull or push from one of these repositories to the
other, although they are semantically identical. I do this from time to time
with darcs.

Last, and while certainly not least it's certainly not the biggest deal, mercurial's website is ugly.

So there you have it. Look for a post on quilt and MQ and some thoughts on cherry picking soon.