Transaction ID assignment procedure Alberto Bertogli (albertogli@telpin.com.ar) 4/October/2004 --------------------------------------------- This brief document describes how libjio assigns an unique number to each transaction that identifies it univocally during its lifetime. It is a very delicate issue, because the rest of the library depends on the uniqueness of the ID; it has to be coherent across threads and procesess; and it can't take long: it serializes transaction creation (and it's the only contention point for independent non-overlapping transactions). Description ----------- We have two functions: get_tid() and free_tid(), which return a new transaction ID, and mark a given transaction ID as no longer in use, respectively. The main piece of the mechanism is the lockfile: a file named "lock" which holds the maximum transaction ID in use. This file gets opened and mmap()'ed for faster use inside jopen(). That way, we can treat it directly as an integer holding the max tid. To avoid paralell modifications, we will always lock the file with fcntl() before accessing it. Let's begin by describing how get_tid() works, because it's quite simple: it locks the lockfile, gets the max tid, adds 1 to it, unlock the file and return that value. That way, the new tid is always the new max, and with the locking we can be sure it's impossible to assign the same tid to two different transactions. After a tid has been assigned, the commit process will create a file named after it inside the journal directory. Then, it will operate on that file all it wants, and when the moment comes, the transaction is no longer needed and has to be freed. The first thing we do is to unlink that transaction file. And then, we call free_tid(), which will update the lockfile to represent the new max tid, in case it has changed. free_tid() begins by checking that if the transaction we're freeing is the greatest, and if not, just returns. But if it is, we need to find out the new max tid. We do it by "walking" the journal directory looking for the file with the greatest number, and that's our new max tid. If there are no files, we use 0. Things to notice ---------------- The following is a list of small things to notice about the mechanism. They're useful because races tend to be subtle, and I _will_ forget about them. The descriptions are not really detailed, just enough to give a general idea. * It is possible that we get in free_tid() and the transaction we want to free is greater than the max tid. In that case, we do nothing: it's a valid situation. How to get there: two threads about to free two tids. The first one calls unlink() and just after its return (before it gets a chance to call free_tid()), another thread, the holder of the current max, steps in and performs both the unlink() and free_tid(), which would force a lookup to find a new tid, and as in the first thread we have removed the file, the max tid could be lower (in particular, it could be 0). This is why we only test for equalty. * Unlink after free_tid() is not desirable: in that case, it'd be normal for the tid to increment even if we have only one thread writing. It overflows quite easily. * The fact that new tids are always bigger than the current max is not only because the code is cleaner and faster: that way when recovering we know the order to apply transactions. A nice catch: this doesn't matter if we're working with non-overlapping transactions, but if they overlap, we know that it's impossible that transaction A and B (B gets committed after A) get applied in the wrong order, because B will only begin to commit _after_ A has been worked on.