Still no partial checkouts in Git

Update: Git 1.7.0 now supports "sparse" checkouts (quick overview).

Richard Fine has an excellent blog post discussing why it's time to stop using Subversion. I was struck by the similarities between his reasons for leaving SVN in the dust and mine:

  • committing == publishing
  • reliance on a central server
  • merges
  • filesystem operations

It's worth reading his comments on each, and I wondered if he and I ended up switching to the same VCS - and if so, whether we did so for the same reasons. Well, it turns out to be the case: Richard is now using Git, but was initially wary:

When I first heard about Git, I didn’t really get it. Distributed version control seemed like a really bad idea – who’s copy of the code is authoritative? If bugs are logged against a particular build, how do you track down the code that was used to create it? The loss of things like sequential revision numbers seemed like a big drawback. And branching wasn’t something I ever used in Subversion, so I didn’t see why I should care about it in Git.

When I first picked up Git, I didn't understand how a distributed system worked. As unhappy as I was with relying on a central Subversion server, I didn't see how exchanging patches between peers would work well, and I didn't understand how an authoritative repository emerged from the network of peers. Well, I've gotten over that, and the other misgivings Richard mentions. But there was one issue I had which he doesn't mention: Subversion lets you check out part of the source tree; Git doesn't.

In the past, this capability has been really useful for setting up MediaWiki, and I imagine it would work for other web applications as well where all you really need is the tree of files. This makes upgrading trivial - svn up and run an update script and you're done, typically.

Git explicitly does not support partial checkouts. Not only that, they don't apologize for it, and fail to understand the utility. I hope this has changed recently - it looks like there might have been some movement in that direction with "shallow clone" and "sparse checkout" but it isn't clear to me that these can do what SVN used to do for this use case.

Git's interface has vastly improved since the early days, but the interface isn't the only thing that needs improvement. Explicitly supporting common use cases should be a priority as well.

Comment from Kent Fredric - April 5, 2011 at 11:20 pm

Being reasonably familiar with Git, I don't think it technically very feasible ( at least without a *LOT* of work ) to make a clone of a subtree. At least, not in a way that can be committed to. ( This same caveat applies to shallow clones, being shallow adds limitations on what you can do with the repository, and basically limits its utility to being about as useful as getting a tar file of a revision or a set of revisions ).

At its core, Git is just a graph structure of objects. And objects have types which can only be known by inspecting that object, and those objects in turn refer to other objects.

Every branch is really just a named reference to a "commit" object. And committing more commits to a branch merely changes what commit the branch refers to.

Every commit object can then either refer to other commit objects, or "tree" objects, and "tree" objects in turn refer to other tree objects or file objects.

Which means that, the state of a given "subtree" at a given commit point requires a path from the branch reference, to the commit, to the tree, and through its parent trees in order to exist.

So you can't even know which "tree" object explains the contents of the subtree at say, commit "5", you have to first load its parent tree, and to know that, you must first know its parent commit object, and to know that, you must know *its* parent commit object, and to know that, you need a branch head.

This is fine for shallow commits, because you can connect to the server, look up the reference for the head of the branch, and then walk backwards down the commit objects till you're happy, and fetch the respective trees/file objects.

But in order to fetch a subtree, you must do the above, as far back as history goes, and then work out what commits touch the given folder, by downloading every commit, and then walking down the graph of tree objects till you find the right tree object(s) that represent your directory, and then make sure you have enough of them.

So quite simply, the result is identical in terms of bandwidth as a shallow merge in the best case, and as bad as a full clone in the worst case. And then because of this design, you can't commit back to it, making that worst-case have to be repeated multiple times in the worst case scenario.

Its just simpler to mirror a git repo as needed and then select which folders you want to process/use on the receiving side.

I've long since come to terms with this as an acceptable thing though, and I believe its one of those things you only crave because "subversion has it", and its not really *that* useful.

If your argument involves wanting to have multiple projects inside the one tree, then you're doing your source control wrong. You should have a separate repository per project.

If you want one project to incorporate another project somehow as a "subtree", then you should use git submodules to perform this ( a bit like subversion externals )