author | Alberto Bertogli
<albertito@gmail.com> 2004-10-10 23:58:15 UTC |
committer | Alberto Bertogli
<albertito@gmail.com> 2007-07-15 13:25:37 UTC |
parent | a5ee5df2a3021f0bb991c261c0feaf09328e866e |
doc/tids | +83 | -0 |
diff --git a/doc/tids b/doc/tids new file mode 100644 index 0000000..bf65f23 --- /dev/null +++ b/doc/tids @@ -0,0 +1,83 @@ + +Transaction ID assignment procedure +Alberto Bertogli (albertogli@telpin.com.ar) +4/October/2004 +--------------------------------------------- + +This brief document describes how libjio assigns an unique number to each +transaction that identifies it univocally during its lifetime. + +It is a very delicate issue, because the rest of the library depends on the +uniqueness of the ID; it has to be coherent across threads and procesess; and +it can't take long: it serializes transaction creation (and it's the only +contention point for independent non-overlapping transactions). + + +Description +----------- + +We have two functions: get_tid() and free_tid(), which return a new +transaction ID, and mark a given transaction ID as no longer in use, +respectively. + +The main piece of the mechanism is the lockfile: a file named "lock" which +holds the maximum transaction ID in use. This file gets opened and mmap()'ed +for faster use inside jopen(). That way, we can treat it directly as an +integer holding the max tid. + +To avoid paralell modifications, we will always lock the file with fcntl() +before accessing it. + +Let's begin by describing how get_tid() works, because it's quite simple: it +locks the lockfile, gets the max tid, adds 1 to it, unlock the file and return +that value. That way, the new tid is always the new max, and with the locking +we can be sure it's impossible to assign the same tid to two different +transactions. + +After a tid has been assigned, the commit process will create a file named +after it inside the journal directory. Then, it will operate on that file all +it wants, and when the moment comes, the transaction is no longer needed and +has to be freed. + +The first thing we do is to unlink that transaction file. And then, we call +free_tid(), which will update the lockfile to represent the new max tid, in +case it has changed. + +free_tid() begins by checking that if the transaction we're freeing is the +greatest, and if not, just returns. + +But if it is, we need to find out the new max tid. We do it by "walking" the +journal directory looking for the file with the greatest number, and that's +our new max tid. If there are no files, we use 0. + + +Things to notice +---------------- + +The following is a list of small things to notice about the mechanism. They're +useful because races tend to be subtle, and I _will_ forget about them. The +descriptions are not really detailed, just enough to give a general idea. + + +* It is possible that we get in free_tid() and the transaction we want to free +is greater than the max tid. In that case, we do nothing: it's a valid +situation. How to get there: two threads about to free two tids. The first one +calls unlink() and just after its return (before it gets a chance to call +free_tid()), another thread, the holder of the current max, steps in and +performs both the unlink() and free_tid(), which would force a lookup to find +a new tid, and as in the first thread we have removed the file, the max tid +could be lower (in particular, it could be 0). This is why we only test for +equalty. + +* Unlink after free_tid() is not desirable: in that case, it'd be normal for +the tid to increment even if we have only one thread writing. It overflows +quite easily. + +* The fact that new tids are always bigger than the current max is not only +because the code is cleaner and faster: that way when recovering we know the +order to apply transactions. A nice catch: this doesn't matter if we're +working with non-overlapping transactions, but if they overlap, we know that +it's impossible that transaction A and B (B gets committed after A) get +applied in the wrong order, because B will only begin to commit _after_ A has +been worked on. +