The Fugue Counterpoint by Hans Fugal

12Feb/090

Why I switched to Git

I love it when someone else writes what I've been meaning to write, so I don't have to write it.

This article covers the reasons why I switched from mercurial to git, about as well as he could possibly hope to without consulting me, reading my mind, or being me. It's a bit creepy.

He explains it very well (probably better than I would) and has more insight into the technical details than I do.

While we're on the subject, you should all go read git for computer scientists so you can think like a git. If you already have a CS background, it's quite painless I assure you.

10Nov/0840

git-push is worse than worthless

Ugh

Let's say you are a web developer, and you do development on your laptop, then when things are nice and shiny you want to push those changes to the webserver. Seems natural enough, right?

git clone server:/var/www/foo
# ...
git pull

# edit stuff
git commit -a -m 'i edited stuff'

Now, let's say you're a (possibly former) darcs/bzr/mercurial user and this time you're using git. Git has git-push. You read the man page like a good little code monkey, and it seems like it does the same or similar thing to darcs push or hg push. It seems like if you want to push your changes to the server, you'd do this:

git push

Am I off in left field or does this not seem 100% rational? But wo be unto the code monkey that utters this unfortunate incantation. Observe:

$ mkdir foo
$ cd foo
$ git init
Initialized empty Git repository in /private/tmp/foo/.git/
$ echo hello > foo.txt
$ git add foo.txt
$ git commit -m 'hello'
Created initial commit bee50da: hello
 1 files changed, 1 insertions(+), 0 deletions(-)
 create mode 100644 foo.txt
$ cd ..
$ git clone foo bar
Initialized empty Git repository in /private/tmp/bar/.git/
$ cd bar
$ echo goodbye >> foo.txt
$ git commit -a -m goodbye
Created commit 99c13c1: goodbye
 1 files changed, 1 insertions(+), 0 deletions(-)
$ git push ../foo
Counting objects: 5, done.
Writing objects: 100% (3/3), 248 bytes, done.
Total 3 (delta 0), reused 0 (delta 0)
Unpacking objects: 100% (3/3), done.
To ../foo
   bee50da..99c13c1  master -> master
$ cd ../foo
$ git status
# On branch master
# Changes to be committed:
#   (use "git reset HEAD <file>..." to unstage)
#
#   modified:   foo.txt
#
$ git diff
$ git diff --cache
error: invalid option: --cache
$ git diff --cached
diff --git a/foo.txt b/foo.txt
index a32119c..ce01362 100644
--- a/foo.txt
+++ b/foo.txt
@@ -1,2 +1 @@
 hello
-goodbye
$ git log
commit 99c13c1e60888ae2c0e221898411e1cd52ad3815
Author: Hans Fugal <hans@fugal.net>
Date:   Mon Nov 10 17:11:57 2008 -0700

    goodbye

commit bee50da72798edc47ddc36dbc4f559f141b1e28b
Author: Hans Fugal <hans@fugal.net>
Date:   Mon Nov 10 17:11:34 2008 -0700

    hello

I promise I didn't fake that. Yes, you saw that correctly—git wants to undo the changes you just committed. If you happen to have a clean working directory, all you need to do to return to sanity is git reset HEAD. If not, heaven help you.

This is totally unacceptable. It's unforgivable on so many levels. At the very least, the manpage should warn you to not push to repositories with working copies. Git should warn you before you push and screw up your repo that it has a working copy checked out. Ideally, git would behave like darcs and update the working copy. Suboptimally, it would behave like mercurial and make it a new revision that you have to manually checkout. But this is simply ridiculous.

So what is the solution? They tell you to use pull. Hello! Is anyone home? My laptop is roving. It's often behind a NATing firewall. I'm supposed to find my public IP address and figure out how to subvert the evil firewalls of the world every time I want to push my changes to the server?

A workaround, and probably the best real-world workflow, is to have a second bare repository or a second branch on the server, push into that, then ssh into the server and pull the changes. I think this page describes how to do that with a second branch, though I'm short on time to actually try it out at the moment.

More of this sickening story in this thread, where you will learn that at least one other person out there has his head screwed on properly, that the developers are more interested in how hooks work (and fail to allow you to do this even if you grok them), and that they've discussed the problem before and decided the correct response is to RTFM (M for minds this time, since the manual was completely unhelpful).

Update: Some of you have been quick to defend git and the design choice of how push behaves. I want to clarify that I don't care so much that push updates the repository but not the working directory. Mercurial works this way too. Not the way I'd do it but it's a valid approach. The problem here is that git push seems like a natural thing to do but screws up your working directory on the remote side. Mercurial doesn't change the working directory, but neither does it silently rebase it and set you up to undo your changes if you're not careful. The problem here is a lack of safety and a lack of warning. They know it's a problem, they've fielded enough "morons". A few words of warning in the man page is all it would take to make me happy.

And now, I have had time to work out a more specific workaround. Here's what I did, and it seems to work well:

# Server setup (set up incoming branch)
server$ git branch incoming
server$ git branch
  incoming
* master
  origin

# Laptop setup (local master to remote incoming)
laptop$ git config remote.origin.push master:incoming

# Everyday usage
laptop$ git push
Counting objects: 5, done.
Compressing objects: 100% (2/2), done.
Writing objects: 100% (3/3), 279 bytes, done.
Total 3 (delta 1), reused 0 (delta 0)
To server:/tmp/foo
   b108a07..a9d3282  master -> incoming

server$ git status
# On branch master
nothing to commit (working directory clean)
server$ git pull . incoming
From .
 * branch            incoming   -> FETCH_HEAD
Updating b108a07..a9d3282
Fast forward
 foo.txt |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

You could use hooks to automatically do that git pull . incoming if you liked, making it more like darcs than mercurial.

Updated update: On further thought, the cleanest solution is probably to have a separate master (bare) repository, e.g.

$ mkdir master
$ cd master
$ git init
Initialized empty Git repository in /private/tmp/foo/master/.git/
$ git commit --allow-empty -m initial
Created initial commit 999755e: initial
$ cd ..
$ git clone master live
Initialized empty Git repository in /private/tmp/foo/live/.git/
$ cd live
$ git branch
* master
$ cd ..
$ git clone master laptop
Initialized empty Git repository in /private/tmp/foo/laptop/.git/
$ cd laptop
$ echo hello > foo.txt
$ git add foo.txt
$ git commit -m hello
Created commit 2297bcf: hello
 1 files changed, 1 insertions(+), 0 deletions(-)
 create mode 100644 foo.txt
$ git push
Counting objects: 4, done.
Writing objects: 100% (3/3), 239 bytes, done.
Total 3 (delta 0), reused 0 (delta 0)
Unpacking objects: 100% (3/3), done.
To /tmp/foo/master/.git
   999755e..2297bcf  master -> master
$ cd ../live
$ ls
$ git pull
remote: Counting objects: 4, done.
remote: Total 3 (delta 0), reused 0 (delta 0)
Unpacking objects: 100% (3/3), done.
From /tmp/foo/master/
   999755e..2297bcf  master     -> origin/master
Updating 999755e..2297bcf
Fast forward
 foo.txt |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)
 create mode 100644 foo.txt
$ echo goodbye >> foo.txt
$ git commit -a -m goodbye
Created commit 04f6702: goodbye
 1 files changed, 1 insertions(+), 0 deletions(-)
$ git push
Counting objects: 5, done.
Writing objects: 100% (3/3), 248 bytes, done.
Total 3 (delta 0), reused 0 (delta 0)
Unpacking objects: 100% (3/3), done.
To /tmp/foo/master/.git
   2297bcf..04f6702  master -> master
$ cd ../laptop
$ git pull
remote: Counting objects: 5, done.
remote: Total 3 (delta 0), reused 0 (delta 0)
Unpacking objects: 100% (3/3), done.
From /tmp/foo/master/
   2297bcf..04f6702  master     -> origin/master
Updating 2297bcf..04f6702
Fast forward
 foo.txt |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)
11Dec/070

Darcs2 Prerelease

Levi referred me to a post entitled "How I stopped missing Darcs and started loving Git". It's an interesting read, but ironically the thing that interested me most was a comment mentioning that just yesterday darcs 2.0.0pre1 was released. It looks like some very exciting things are coming down the darcs pipe:

  1. It should no longer be possible to confuse darcs or freeze it indefinitely by merging conflicting changes.
  2. Identical primitive changes no longer conflict.
  3. Darcs get is now much faster, and always operates in a "lazy" fashion, meaning that patches are downloaded only when they are needed.
  4. Darcs now supports caching of patches and file contents to reduce bandwidth and save disk space.
  5. Speed improvements.

The most exciting change is, of course, the elimination of the exponential merge. This is very good news indeed, if it means what I think it means. The second change I listed is also very interesting to me. You'll remember I posited that exact situation in a previous post, and was teased incessantly as a result.

Do read the release notes. If darcs2 is released within a reasonable time frame, it will continue to be a strong contender.

Tagged as: , , No Comments
20Nov/078

Darcs and Mercurial Redux

I've been getting a few comments on my other post, and I just got another excellent reply by email from Trent Buck. As I was writing my reply to him, it occurred to me that it might be better to reply here in order to clarify things for posterity.

Regarding darcs record, Trent said,

Analogous to hg record, new in 0.9.5

I was aware of this, as of that morning, but hadn't had much opportunity to try it out. I did mention it in passing later in the post. It bears repeating what I said then—many of the most useful behaviors of mercurial are in extensions that while distributed must be enabled in your .hgrc.

On the other hand, darcs uses _darcs to store its metadata, which is
probably for cross-platform reasons but is definitely an
eyesore. (.darcs would be better).

I don't know why people complain about this. CVS doesn't use dotdirs
either, but nobody switching away from it says "wow, fantastic! At
last I need ls -a to see the metadata!".

I purposefully didn't get started on CVS. Nobody I know likes seeing CVS directories, and they're really of little use 90% of the time. One thing I love about both darcs and mercurial (and most other distributed RCSs) is that the repo is right there, not stashed away in some other place on the disk. CVS has the worst of both worlds. _darcs made me ecstatic, .hg made me that much more pleased.

It's annoyingly inconvenient to grab a specific revision (partly
because such a concept doesn't really exist) or a specific point of
time.

Important revisions (e.g. releases) can be named (darcs tag). You can
grab a specific subsets of patch by pulling into an null repo with the
--match or --patch argument. While I haven't tried it, I assume you
can --match by date/time less than a particular date.

Yes, the way to do it is by --match or --patch (I don't remember which). It's certainly possible. I may have just got off on the wrong foot in this regards with darcs, but I've always found darcs to be mildly hostile to going back to the past. On the other hand, locating certain patches is very powerful in darcs, and by the nature of darcs you can do powerful things with them.

Darcs has no idea of branching. Or more precisely, every darcs
repository is its own branch and so it leaves that up to you. That's
theoretically ok, but I do prefer to keep the clutter of multiple
working copies at bay. Combined with not having to rebuild the whole
project from scratch and not wasting the disk space for a large
working directory,

Both hg and darcs use hard links automatically when working on
multiple one-branch-per-repo repos, if the filesystem supports hard
links.

Mea culpa. I was aware of this and the implication that darcs didn't do it slipped through in my sloppy writing.

in-place switching to certain branches/revisions like git and hg do
is really nice.

Agreed.

The two real thorns in the side of darcs are the dreaded exponential
merge (I have run into it, alas), and the fact that I have a devil
of a time getting it to run everywhere I am. Haskell is not that
common, and it's a big thing to have to build, e.g. with MacPorts.
That's when it will build at all (but that MacPorts rant is for
another post). There are binaries for windows and mac, but
sometimes they don't work and versions don't match up... it can be a
nightmare. It can usually work out, but it can be difficult.

Conceded.

I mentioned cherry picking. One UI flaw in darcs that greatly
reduces the real-world utility of its excellent cherry picking
support is that you can't tell it that you would prefer to refuse
this patch for ever and ever. Every time you pull you have to tell
it "no, I don't want that patch".

Conceded. I would quite like this feature, too. AFAICT the current
workaround is to pull the patch you don't want, then record an inverse
patch (darcs rollback). The other trick is to make sure you always
pull a restricted subset using --patch or --match. Perhaps you could
automate this with pull match not hash=XXX in _darcs/prefs/defaults,
but I have not tried this.

Interesting workarounds, thanks.

hg is quick and easy to type.

So do

alias d=darcs

or

x=`which darcs`  # not posix
ln -s "$x" "${x%arcs}"

I concede that it's a weak argument, but it was just a stream-of-consciousness set of observations and not a structured argument. In practice I use tab completion and only really do this sort of thing with repetitive options (e.g. ll for ls -l) or where there is an annoying tab-completion ambiguity.

It has a powerful extension mechanism, with many interesting
extensions available.

Agreed, this is an important feature.

CVS-like abbreviations for minimal typing are handy.

Darcs similarly supports truncated commands—to least ambiguous
string, rather than explicit aliases. I'm told Hg and Bzr do not do
it this way because then people would be confused when activating a
new extension suddenly meant you had to type a longer command string,
because there was more ambiguity.

Also true, and fell victim to my not-quite-parallel treatment of darcs and mercurial. The CVS abbreviations are nice for people who 'grew up' during the CVS occupation.

It has a built in webserver for quick and easy web-based
collaboration (for firewall reasons or for an interface to the repo
for non-hg users).

Meh. HTTPds are cheap to set up, e.g.

thttpd -u $USER -d $PWD -p 8118

I assume you mean with a darcs cgi? In any case, hg serve is undeniably easier.

However it's worth noting that the built-in mercurial httpd supports
hg push (via POST), but does no access control, and does not support
other important commands (e.g. in-repo branching).

I was not aware push was supported at all, but according to the wiki, push
support requires
configuration
,
including telling it who can push (access control by http auth and using https).
In any case, for my own project I would either give developers ssh access or require they submit patches by email. But I can see where a fully-featured HTTP repo with access controls would come in handy for some development flows.

It's storage efficient, and has the novel and all-important
optimization principle of avoiding disk seeks.

Meh. Optimize later.

For the most part I agree with you. But it is the sort of thing that demonstrates that the authors have system programming experience and know what kinds of optimizations matter (disk seeks) and what kind don't (writing in Python instead of C). Premature optimization may be the root of all evil, but not all optimization is premature. As a case in point, see the thorn in darcs' side (exponential merge). If they said today that they fixed exponential merge, they would get some of their refugees back, but not all. And much of the damage of bad publicity is already done (and I regret that these posts probably contribute however small to that phenomenon).

This is especially true of pull and push which don't update the
working directory.

hg fetch does, but it does not (currently) support in-repo branching.

Another of those included-but-not-enabled features. I wasn't aware fetch doesn't support in-repo branching, but probably because I think of fetch more as a network action.

Some of the most useful stuff is provided by extensions that are
disabled by default (at least they are distributed). fetch, record
(for darcs-like hunk-by-hunk recording), mq, bisect, transplant (for
cherry picking).

Transplant doesn't actually cherry-pick a changeset; it commits a
different changeset with a different hash which happens to make
identical changes to the working tree.

Very true, and an important distinction. I intend to blog soon on quilt, patches, and cherry picking, so I didn't get into this much.

It doesn't have the cherry picking abilities of darcs, though there
is the record extension (for cherry picking from your working
directory) and the transplant extension (for cherry picking
proper).

While vastly better than bzr shelve, these are quick and dirty hacks
because darcs-style cherry picking isn't possible when each changeset
implicitly depends on its immediate ancestor (the case in hg and bzr).

Or they are simply different approaches. Again, look for that post Real Soon Now™. I might mention that to my (very limited) understanding, git cherry picking works in the same fashion. Darcs is quite unique in this respect, in my experience.

I haven't used the latter yet so I don't know if it works well or
not. .hgignore is easy to manage, but comes with almost no useful
defaults. This is for performance reasons apparently, but I'd be
willing to take the hit.

.hgignore and darcs' boring file are much of a muchness.

Absolutely. As are the equivalents in cvs, svn, git, etc. However, darcs' system-wide boring file is populated much more aggressively. You never have to add .o files or emacs/vim swap files, because they're already there.

I don't like that I have to type ssh urls out, as
ssh://example.com//path/to/repository

Agreed. Darcs uses scp notation, although it also has trouble with
tilde expansion. Note that darcs push to an ssh repository will apply
pushed patches to the working tree, which hg and bzr do not attempt to
do. Like rsync, darcs needs to be installed on both ends of the ssh
link for a push (but not a pull) to accomplish this.

I have run into the tilde expansion issue. The push behavior I described above as surprising.

I don't like that you don't get compatible repositories if you start
from the same code. For example:

tar xzf foo-1.0.tar.gz
rsync -a foo-1.0 foo-1.0-2
cd foo-1.0
hg init
hg ci -Am 'initial'
cd ../foo-1.0-2
hg init
hg ci -Am 'initial'

At this point you cannot pull or push from one of these repositories
to the other, although they are semantically identical. I do this
from time to time with darcs.

This can't be done (safely) in darcs either; it's only an accident
that it worked at all for the OP.

I've (rightfully) gotten the most slack over this bit. I agree it's bad form. Shame on me! The frustration arose out of a frustrating situation. I was mirroring the FlightGear data repository (.hg alone is 890MB) for three different branches (OSG and plib from FlightGear's CVS, and someone else's git repository). Due to limitations of CVS and tailor, I had to redownload the entire CVS repo, and the two had no common history. The fault here lies with CVS and tailor, really, but the above behavior would have allowed me to work around it.

Jamie said in a comment,

I can't claim I have much trouble setting default push/pull locations, and if you hg clone from a remote repository it sets it up automatically to be there for you.

In .hg/hgrc,

  [paths]
  default=ssh://host/path

I maintain that hg should figure it out after the first couple pushes/pulls without a set default (as darcs does), but thanks for the tip. I'm sure I would have found it when I got around to RTFMing the second time around, or when I got annoyed enough.

Someone else defended git as being not overly complicated. I do think git theory is slightly more complicated than hg or darcs, but what I was really referring to is the UI. git help -a returns a staggering 126 commands. Some may call that power and flexibility, but IMHO it's complication. Still, it has come a long long way in usability and I don't hesitate to nod my head in recommendation when someone chooses to use it.

Thanks to all for your comments. I enjoy the discussion.

16Nov/0713

Mercurial and Darcs

I'm a long-time distributed revision control user and advocate. I think it's the only way to go, but this article isn't about convincing you of that. I'll just take it for granted.

From almost the beginning, I have used darcs. I did try Tom's ahem Lame Arch, which I found to be completely unusable. Darcs hasn't been around as long as arch, but it is in roughly the same generation as other early systems like monotone and bazaar. I think of these as the second wave of distributed revision control (the first wave being, basically, tla). When I read about darcs I was impressed by the theory of patches. When I tried darcs I was very impressed by how easy it is to use. Soon it became second nature and I never looked back.

But I did look forward, and sideways. When collaborating on a project using svn I gave svk a whirl. I don't hesitate to recommend svk if you're otherwise stuck with svn.

When the third wave of distributed revision control came along, ushered in by the Great Linux/BitKeeper Fiasco, I took a cursory look at git. A couple years later (aka earlier this year) I took a closer look. I can see that it has come a long way (in user interface). You don't need "chrome" or whatever they used to call the wrappers like cogito anymore. It's fairly usable. But it's also fairly complicated (that's an understatement, if you left your understatement detector at home). It's perfectly tailored for what Linus and other kernel developers need, but that's not necessarily what the rest of us need. I wouldn't hesitate to nod my head in agreement if you moved from CVS or subversion to git.

Not entirely satisfied with git, and not being able to use darcs (because of the dreaded exponential merge, and general slowness), I looked at Mercurial (Hg from now on). I found that it was fast like git, but easy to use like darcs. The documentation is excellent, probably the best in the game. The underlying theory is easy to grok, and this is important. Darcs is the same way, although the two theories are definitely different. It has served me well in the application where I couldn't use darcs, but I remained a darcs user for other things.

But mercurial has been seeping into my consciousness of late. Rumors surface from time to time of darcs users migrating to mercurial (it seems to be the only one they migrate to). Other people praise it too. I find it plenty easy to use even as a second-class arrow in my quiver. Finally it gains enough ground that I decide to use it for a project where I'm actually exercising the version control system. I have really liked what I've seen. So the rest of this already-long post will be a comparison of darcs and mercurial. I'm not saying either one or even both together are "the best" by any objective measure, though I definitely prefer them over others myself. I don't know that I'll come to any conclusion over whether to use darcs or mercurial in the future (I haven't yet, anyway, though I'm leaning towards mercurial). I just want to get my comparison out there. Note also that I'm probably not going to mention every feature that I love that they both have in common. As such this isn't an evangelism for either one over the rest of the pack. Just a comparison of what distinguishes these two.

First, darcs. Darcs has excellent support for cherry picking. Its user interface is second to none. It is easy to mail a set of patches to another darcs user or to someone in general. The theory of patches is very interesting and flexible, and easy to follow (until strange things happen, when it gets really hard to follow). It's written in Haskell by a physicist, which has got to count for something. darcs record, which will ask you whether you want to check in each hunk that's changed in your working directory, rather than an all-or-nothing commit, is a joy to work with (especially for those of us who seem to lack the discipline to check in features and bugfixes in neat little packages).

On the other hand, darcs uses _darcs to store its metadata, which is probably for cross-platform reasons but is definitely an eyesore. (.darcs would be better). It's annoyingly inconvenient to grab a specific revision (partly because such a concept doesn't really exist) or a specific point of time. Darcs has no idea of branching. Or more precisely, every darcs repository is its own branch and so it leaves that up to you. That's theoretically ok, but I do prefer to keep the clutter of multiple working copies at bay. Combined with not having to rebuild the whole project from scratch and not wasting the disk space for a large working directory, in-place switching to certain branches/revisions like git and hg do is really nice. Darcs doesn't support symlinks or permissions, again for cross-platform reasons. The two real thorns in the side of darcs are the dreaded exponential merge (I have run into it, alas), and the fact that I have a devil of a time getting it to run everywhere I am. Haskell is not that common, and it's a big thing to have to build, e.g. with MacPorts. That's when it will build at all (but that MacPorts rant is for another post). There are binaries for windows and mac, but sometimes they don't work and versions don't match up... it can be a nightmare. It can usually work out, but it can be difficult.

I mentioned cherry picking. One UI flaw in darcs that greatly reduces the real-world utility of its excellent cherry picking support is that you can't tell it that you would prefer to refuse this patch for ever and ever. Every time you pull you have to tell it "no, I don't want that patch". This makes maintaining similar but subtly different branches impractical. A better tool for this job is quilt. In practice, I've done precious little cherry picking. It's always nice to know I can, but after some deep thought on the subject over the last couple days I'm not at all convinced that it's worth supporting in the revision control system. More on that in another post.

Now, mercurial. It's fast, easy to use, and easy to install. It is written in Python (with some C) and works well on all three platforms. The .hg directory is not an eyesore. hg is quick and easy to type. It supports branching. It has a powerful extension mechanism, with many interesting extensions available. The theory is easy to grok and follow, so you're not surprised very often. The implementation seems excellent, from the point of view of both a user and a system administrator. CVS-like abbreviations for minimal typing are handy (like hg ci for hg commit). I've already mentioned the excellent documentation. I might as well mention an excellent Google Tech Talk given by the author of said documentation. hg clone is fast and efficient and uses hardlinks where it can (for the metadata, not the working directory). It has a built in webserver for quick and easy web-based collaboration (for firewall reasons or for an interface to the repo for non-hg users). It's storage efficient, and has the novel and all-important optimization principle of avoiding disk seeks. That hg, written in python, gives git a run for its money shows that the devs know what they're doing when it comes to systems programming. Mercurial Queues is extremely cool and useful.

As for the cons, although the theory is easy to grok and doesn't surprise you much, it will surprise you as a newcomer unless you grok the theory. This is especially true of pull and push which don't update the working directory. Some of the most useful stuff is provided by extensions that are disabled by default (at least they are distributed). fetch, record (for darcs-like hunk-by-hunk recording), mq, bisect, transplant (for cherry picking). It doesn't have the cherry picking abilities of darcs, though there is the record extension (for cherry picking from your working directory) and the transplant extension (for cherry picking proper). I haven't used the latter yet so I don't know if it works well or not. .hgignore is easy to manage, but comes with almost no useful defaults. This is for performance reasons apparently, but I'd be willing to take the hit. Let me override with a minimal .hgignore if performance matters that much on this project. I don't like that I have to type ssh urls out, as ssh://example.com//path/to/repository, and it's too difficult to set the default push/pull location so I end up having to type it more than I'd like. Directories aren't first class, which makes me slightly uneasy. It hasn't caused me grief yet though.

I don't like that you don't get compatible repositories if you start from the same code. For example:

tar xzf foo-1.0.tar.gz
rsync -a foo-1.0 foo-1.0-2
cd foo-1.0
hg init
hg ci -Am 'initial'
cd ../foo-1.0-2
hg init
hg ci -Am 'initial'

At this point you cannot pull or push from one of these repositories to the
other, although they are semantically identical. I do this from time to time
with darcs.

Last, and while certainly not least it's certainly not the biggest deal, mercurial's website is ugly.

So there you have it. Look for a post on quilt and MQ and some thoughts on cherry picking soon.

1Jun/061

svk

The problem: Ardour uses Subversion, but I'm addicted to distributed revision
control systems. Actually, svn and I would have got along just fine if it
weren't for svn merge. What an embarrasment for svn lovers everywhere! You
have to manually dig up which revisions to merge with. svn doesn't keep track
of what's already been merged so you also have to be careful not to merge the
same stuff twice. To add insult to injury, you have to type those full, long
svn URLs too. So what would be darcs pull becomes something like:

svn merge -r 536:543 \
  svn+ssh://ardoursvn@ardour.org/ardour2/trunk \
  svn+ssh://ardoursvn@ardour.org/ardour2/branches/region-plugins

Ick.

So I started investigating gateways from svn to darcs or git, either of which I would have happily used. Tailor seems like a good way to do it, but I had a hard time wrapping my head around the bidirectional setup. git-svnimport looks promising for a git solution, but before I got a chance to try it I looked at svk. svk is perfect for this situation.

If you're not aware, svk is a distributed RCS front-end to svn. I knew about it
before but had always thought of it as a hack marrying the worst of two worlds.
Since my last look it has improved considerably, and my approach has also
changed as I'm now looking at it asa developer in an svn project, not the guy
setting up a repo and wondering which system to use.

svk is mostly like svn, except you mirror the repo on your hard disk and can do
disconnected development. To be honest I haven't looked at the truly
distributed aspects of svk (if they exist), but rather I have focused on what I
needed: disconnected operation with the ability to create local branches with
easy merging, and work with the existing svn repo. svk does these things very
well.

Here's my new workflow, from the point of installing svk:

# setup
svk mirror svn+ssh://ardoursvn@ardour.org/ardour2 //mirror/ardour2
svk sync //mirror/ardour2
cd ~/src
svk co //mirror/ardour2

# branch
cd ardour2
svk cp //mirror/ardour2 //local/region-plugins
svk switch //local/region-plugins

# edit stuff then check in
svk ci

# merge in trunk changes to my branch
svk pull

# merge my branch back into the trunk
svk push

Learn more about svk by reading "SVK, A Visual Guide", Jonathan Weiss' blog entry, and the Bieber Labs tutorial.

9Apr/060

git

I spent some time over the weekend with git yesterday, so
I finally have an opinion about it.

Distributed revision control systems are absolutely fabulous. If you haven't
given one a serious go yet, you really should. My favorite is
darcs. This post will address git from the
perspective of a darcs user, and I might throw some comparisons to CVS or
Subversion in, too.

git is really a lot easier to use than I had anticipated. I read lots of
warnings in the documentation about how git is stupid (by definition, this is
one of its goals) and how unless your needs are a lot like Linus' needs, it
won't be right for you. I've found that to be unnecessary modesty. git is very
usable as a distributed revision control system for normal people on any size
of project. It's not as nice as darcs, IMO, but it does have better performance
for large projects, and it doesn't trail far behind anyway. It's a lot nicer
to use than GNU Arch, even in its raw form.

git pretends not to be an SCM, but rather a "filesystem". Whatever. git was
written to do what Linus needed from an SCM, and it has never had any other
purpose. Although it is conceivable that git could be used for other things, as
it is quite general and flexible, that doesn't make it not an SCM. It certainly
is like no SCM you've ever seen (at the low level) but it's still basically an
SCM.

Nevertheless, if you look too close at git you'll be either overwhelmed by the
sheer volume of low-level filesystem-like commands, or thrilled by them.
However you don't need most of them, and if you ignore them you can enjoy git
usability, with the added security of knowing that should you need to doctor
your repository (or have a git guru do so) all the low-level commands you need
are there. That is a bonus in my book, but mostly theoretical.

I think originally git had fewer scm-like commands and so people wrote
Porcelains, like Cogito.
Cogito is supposed to be easier to use and a more perfect SCM built on top of
git. Frankly, I can't tell the difference between Cogito and git core, except
that git core has more commands and options. Cogito does use slightly different
terminology and command names, which only confuses the game. I think I'd rather
learn the git commands and options that I need from well-written documentation
and ignore the rest, than confuse myself with cogito. After only an hour or two
of experience, I may be really missing the boat, so you might want to check it
out anyway.

Compared to darcs, git feels very familiar. darcs' UI is more polished, but git
has a much richer set of commands. One primary difference is that git does
branching, where in darcs a branch is basically a new copy of the repository.
Both are valid approaches, but the git approach does take less disk space (and
network bandwidth) which is important for 300-400 megabyte repositories, like
the kernel. I think I will probably continue using darcs primarily, but I will
probably try out git more in earnest when I get a chance to see if it might be
time to switch.

One thing I do not like about git is the tendency people have for providing
their driver for hardware X in the linux kernel as a git repository. Hello!
Nobody wants to download 350MB for your 10k of changes to the kernel. Nobody
wants to run your bleeding edge git repository just so they can get your
driver. That's idiotic. git makes it easy to make patches against whatever
revision you want, so make patches against the latest stable kernel, or
whatever RC version if necessary, and call it good. Providing access to your
git repo is fine for developers, but don't expect it of users.

In summary, I will be using git for my kernels from now on. It will be easier
to move around between versions, save me disk space, and allow me to do minor
hacks without worrying about them surviving the next kernel upgrade patch. If I
didn't already know and love darcs, I'd start using git for my projects. If you
have wanted to investigate distributed revision control, check out darcs and
git and go with whichever your gut tells you to, I think either one will serve
your needs well. For heaven's sake, stop using CVS in any case!

Tagged as: , , , , , , No Comments