Thomas Koch

Dubio Sapientiae Initium.

NoSQL summer at lake constance

Have you already started to try out these new storage/database things commonly referred to as NoSQL? (For you google-bot, I'm talking about CouchDB, Cassandra, HBase, Hadoop, Hypertable, MongoDB, Tokyo Cabinet, etc). Well, it's one thing to install and use them but another to understand all the computer science background about them.
Therefor many developers around the world thought to make this summer a NoSQL summer. Local meetings are held in many cities: London, Los Angeles, New York, Paris, lake constance ... :-)
If you like to meet for a beer and learn and discuss about some of the hottest stuff in computing, then come around! Please select the dates when you can join and the papers you'd like to discuss. You can subscribe to email announcements or ical and rss feeds at the lake constance nosql summer page.

udev ate my laptop today?

I finally got punished today for using unstable without knowing enough about my system. :-) Since I can't access my emails I'm hoping for help from planet-debian. Please excuse...

So the last words of my machine:

Loading, please wait...
  One or more specified logical volume(s) not found.
Unable to find LVM volume mylvm/root_crypt
  One or more specified logical volume(s) not found.
Unable to find LVM volume mylvm/swap_crypt
File descriptor 3 (/conf/conf.d/cryptroot) leaked on lvm invocation. Parent PID
352: /bin/sh
udevd-work[77]: kernel-provided name 'dm-0' and NAME= 'mapper/mylvm-swap'
disagree, please use SYMLINK+= or change the kernel to provide the proper name
_

Please, if you can help me, write a comment to this blogpost!
(Thank you, Hanno for borrowing me your laptop!)

update: It wasn't udev (this time). Sorry. The problem was, that I updated libdevmapper without also updating dmsetup. I could boot into an older kernel and solve this. A Bug against the LVM Debian package is already filled.

tnt is not topgit

As I've already written, I'm working on an alternative to topgit. I made a first attempt in perl some weeks ago, but gave up after some frustrating hours. Yesterday I started again in python and had a very nice time putting together the groundwork and the first two commands.
It may be noted, that I've no previous programming experience in neither perl nor python!
By now, I can create a patchset branch and add a patch branch to it. There's still a lot to do. For my talk at the Debian Mini Conference in Berlin next month I'd like to be able to update patch branches, export patchsets and give a status summary.
Maybe I can already find somebody who's interested in joining me with this project? The code is in my github account, however the name will most probably change.
One reason that I've been much faster in python is the fantastic python-git library. I can only recommend it!
In other news: I'm searching a couch to surf in Berlin from june 7.-12. I prefer couchsurfing over hotels mostly to get to know nice people around the world. Please contact me, if you'd like to host me for a night or two. (thomas at koch punkt ro)


Update: Slides of my talk at the debconf are available.

Design document for a patch management system on a DVCS

Dear friends of Debian,

this is my first post to Planet Debian. - The planet with the most geeky registration procedure in the known universe!

I proposed an alternative to topgit some days ago on the vcs-pkg.org list. Martin asked (and encouraged) me to give a better explanation of the idea, which I'll hereby try. Sorry for not giving any drawings, but I'm totally incapable of anything graphical.

Hopefully, I'll manage to come to the Debian Miniconf in Berlin. Then we could discuss the idea further and maybe even start implementing it. (Somebody would need to help me with my first steps in Perl then...)

The following text is available on github. Please help me expand it!

Design document for a patch management system on a DVCS


Requirements

The system to implement manages patchsets. A patchset is a set of patches with a tree-ish dependency graph between the patches. There's one distinct root of this dependency graph.

Patches are managed as branches with each branch representing a patch. Modification of a patch is done by a commit to the respective branch. A branch representing a patch as part of a patchset is called patchbranch.

The patch of a patchbranch is created as the diff between the root of the patchbranch and the head.

The most important management methods are:

  • Export a patchset in different formats
    • quilt
    • a merged commit of all patches
    • a line of commits with each commit representing one patch
  • Update a patchset against an updated root.
  • Copy a patchset
  • Delete a patchset from direct visibility while preserving all history about it
  • Hide and unhide a patchset from direct visibility

Additional requirements:

  • The system should be implementable on top of GIT, Mercurial and eventually Bazaar.
  • The system must easily cope with multiple different and independent patchsets.
  • All information about a patchset must be encoded in one distinct branch. Publishing this one branch must be sufficient to allow somebody else to recreate the patchset with all of its patchbranches.
  • The system should not rely on the presence of hooks.
  • The system should not require the addition of management files in patch branches (like .topmsg and .topdeps in topgit)
  • The system must be easy to understand for a regular user of the underlying DVCS.
  • The implementation may allow a patchset to depend on another patchset(s).

implementation

patchset meta branch

A patchset meta branch holds all informations about one patchset. First, it holds references to the top commits of all patch branches in the form of parent references of commits. Thus pushing the patchset meta branch automatically also pushes all commits of all patch branches.

Secondly, the patchset meta branch contains meta informations about the patchset. These meta informations are:

  • The names of all patch branches together with the most recent commit identifier of a particular patch branch. Let's save this information in a file called branches.
  • A message for each patch branch that explains the patch. These messages can be saved in the file tree as msg/${PATCH-BRANCH-NAME}
  • References to the dependencies of the patch (other patches of the same patchset or the root of the patchset). This is also encoded in the file branches.

Since the patchset meta branch holds all this informations, it is possible, to delete all patch branches and recreate them from this informations.

Although the commits of the patchset meta branches hold references to the patch branches, its file tree does not need to contain any files from the referenced patches. This may confuse the underlying DVCS, but the patch meta branch is not ment to be directly inspected.


The branches file

A branches file for a fictive patchset could look like:

# patch branches without an explicit dependency depend on the root of the
# patchset tree
# A Root can be given as either a fix commit (seen here), a branch or a tag.
# A fixed commit or tag is useful to maintain a patchset against an older
# upstream version
ROOT: 6a8589de32d490806ab86432a3181370e65953ca
# A tag as a dependency
#ROOT: upstream/0.1.2
# A branch as a dependency
#ROOT: upstream

# A regular patch with it's name and last commit
BRANCH: debian/use-debian-jars-in-build-xml 4bab542c261ff1a1ae87151c3536f19ef02d7937

# two other regular patches
BRANCH: upstream-jira/HDFS-1234 a8e4af13106582ca1bfbbcaeb0537f73faf46d87
BRANCH: upstream-jira/MAP-REDUCE-007 e3426bcbcb2537478f851edcf6eb04b34f6c7106

# This patch depends on the above two patches
# The sha1 below the dependency patches references a merge commit of the two
# dependencies
BRANCH: upstream-jira/HDFS-008 517851aa829d77e09bc5e59985fed1b0aa339cc6
DEPENDENCIES:
  upstream-jira/HDFS-1234
  upstream-jira/MAP-REDUCE-007
    cc294f2e4773c4ff71efb83648a0e16835fca841

# A patch branch that belongs to the patch branch, but won't get exported (yet)
BRANCH: upstream-jira/HDFS-9999 74257905azgsa4689bc5e59985fed1b0aa339cc6
BRANCH-FLAGS: noexport



Zookeeper for web developers

Have you ever developed any kind of distributed system? When doing so for the first time, you're very likely to fall in the trap of the Fallacies of Distributed Computing. I've done so, you'll do so too.
Now zookeeper is an application, that helps you implement many distributed protocols on top of it. The hard work of implementing fault tolerance, assuring consistency and that kind of stuff is done by zookeeper in the background. Continue reading "Zookeeper for web developers"

Links