author | Alberto Bertogli
<albertito@gmail.com> 2004-05-30 15:47:56 UTC |
committer | Alberto Bertogli
<albertito@gmail.com> 2007-07-15 12:49:42 UTC |
parent | 0d6147432920e05db5bff24f08bd63f755c0ccab |
doc/libjio.html | +444 | -0 |
diff --git a/doc/libjio.html b/doc/libjio.html new file mode 100644 index 0000000..c1f99d9 --- /dev/null +++ b/doc/libjio.html @@ -0,0 +1,444 @@ +<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"> + +<!--Converted with LaTeX2HTML 2002-2-1 (1.70) +original version by: Nikos Drakos, CBLU, University of Leeds +* revised and updated by: Marcus Hennecke, Ross Moore, Herb Swan +* with significant contributions from: + Jens Lippmann, Marek Rouchal, Martin Wilck and others --> +<HTML> +<HEAD> +<TITLE>libjio - A library for journaled I/O </TITLE> +<META NAME="description" CONTENT="libjio - A library for journaled I/O "> +<META NAME="keywords" CONTENT="libjio"> +<META NAME="resource-type" CONTENT="document"> +<META NAME="distribution" CONTENT="global"> + +<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1"> +<META NAME="Generator" CONTENT="LaTeX2HTML v2002-2-1"> +<META HTTP-EQUIV="Content-Style-Type" CONTENT="text/css"> + +<LINK REL="STYLESHEET" HREF="libjio.css"> + +</HEAD> + +<BODY > +<!--Navigation Panel--> +<IMG WIDTH="81" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="next_inactive" + SRC="file:/usr/lib/latex2html/icons/nx_grp_g.png"> +<IMG WIDTH="26" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="up" + SRC="file:/usr/lib/latex2html/icons/up_g.png"> +<IMG WIDTH="63" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="previous" + SRC="file:/usr/lib/latex2html/icons/prev_g.png"> +<BR> +<BR> +<BR> +<!--End of Navigation Panel--> + +<P> + +<P> + +<P> +<H1 ALIGN="CENTER">libjio - A library for journaled I/O </H1> +<DIV> + +<P ALIGN="CENTER"><STRONG>Alberto Bertogli (albertogli@telpin.com.ar) </STRONG></P> +</DIV> +<BR> + +<H2><A NAME="SECTION00010000000000000000"> +Contents</A> +</H2> +<!--Table of Contents--> + +<UL> +<LI><A NAME="tex2html12" + HREF="libjio.html#SECTION00020000000000000000">1 Introduction</A> +<LI><A NAME="tex2html13" + HREF="libjio.html#SECTION00030000000000000000">2 General on-disk data organization</A> +<UL> +<LI><A NAME="tex2html14" + HREF="libjio.html#SECTION00031000000000000000">2.1 The transaction file</A> +</UL> +<BR> +<LI><A NAME="tex2html15" + HREF="libjio.html#SECTION00040000000000000000">3 The commit procedure</A> +<LI><A NAME="tex2html16" + HREF="libjio.html#SECTION00050000000000000000">4 The rollback procedure</A> +<LI><A NAME="tex2html17" + HREF="libjio.html#SECTION00060000000000000000">5 The recovery procedure</A> +<LI><A NAME="tex2html18" + HREF="libjio.html#SECTION00070000000000000000">6 High-level functions</A> +<LI><A NAME="tex2html19" + HREF="libjio.html#SECTION00080000000000000000">7 ACID (or How does libjio fit into theory)</A> +<LI><A NAME="tex2html20" + HREF="libjio.html#SECTION00090000000000000000">8 Working from outside</A> +</UL> +<!--End of Table of Contents--> + +<P> + +<H1><A NAME="SECTION00020000000000000000"> +1 Introduction</A> +</H1> + +<P> +<I>libjio</I> is a library for doing journaled transaction-oriented +I/O, providing atomicity warantees and a simple to use but powerful +API. + +<P> +This document explains the design of the library, how it works internally +and why it works that way. You should read it even if you don't plan +to do use the library in strange ways, it provides (or at least tries +to =) an insight view on how the library performs its job, which can +be very valuable knowledge when working with it. + +<P> +To the user, libjio provides two groups of functions, one UNIX-alike +that implements the journaled versions of the classic functions (<I>open()</I>, +<I>read()</I>, <I>write()</I> and friends); and a lower-level one +that center on transactions and allows the user to manipulate them +directly by providing means of commiting and rollbacking. The former, +as expected, are based on the latter and interact safely with them. +Besides, it's designed in a way that allows efficient and safe interaction +with I/O performed from outside the library in case you want to. + +<P> +The following sections describe different concepts and procedures +that the library bases its work on. It's not intended to be a replace +to reading the source code: please do so if you have any doubts, it's +not big at all (less than 800 lines, including comments) and I hope +it's readable enough. If you think that's not the case, please let +me know and I'll try to give you a hand. + +<P> + +<H1><A NAME="SECTION00030000000000000000"> +2 General on-disk data organization</A> +</H1> + +<P> +On the disk, the file you are working on will look exactly as you +expect and hasn't got a single bit different that what you would get +using the regular API. But, besides the working file, you will find +a directory named after it where the journaling information lives. + +<P> +Inside, there are two kind of files: the lock file and transaction +files. The first one is used as a general lock and holds the next +transaction ID to assign, and there is only one; the second one holds +one transaction, which is composed by a header of fixed size and a +variable-size payload, and can be as many as in-flight transactions. + +<P> +This impose some restrictions to the kind of operations you can perform +over a file while it's currently being used: you can't move it (because +the journal directory name depends on the filename) and you can't +unlink it (for similar reasons). + +<P> +This warnings are no different from a normal simultaneous use under +classic UNIX environments, but they are here to remind you that even +tho the library warantees a lot and eases many things from its user +(specially from complex cases, like multiple threads using the file +at the same time), you should still be careful when doing strange +things with files while working on them. + +<P> + +<H2><A NAME="SECTION00031000000000000000"> +2.1 The transaction file</A> +</H2> + +<P> +The transaction file is composed of two main parts: the header and +the payload. + +<P> +The header holds basic information about the transaction itself, including +the ID, some flags, the offset to commit to and the lenght of the +data. The payload holds the data, in three parts: user-defined data, +previous data, and real data. + +<P> +User-defined data is not used by the library itself, but it's a space +where the user can save private data that can be useful later. Previous +data is saved by the library prior applying the commit, so transactions +can be rollbacked. Real data is just the data to save to the disk, +and it is saved because if a crash occurs when while we are applying +the transaction we can recover gracefuly. + +<P> + +<H1><A NAME="SECTION00040000000000000000"> +3 The commit procedure</A> +</H1> + +<P> +We call "commit" to the action of <I>safely</I> +and <I>atomically</I> write some given data to the disk. + +<P> +The former, <I>safely</I>, means that after a commit has been done +we can assume the data will not get lost and can be retrieved, unless +of course some major event happens (like a hardware failure). For +us, this means that the data was effectively written to the disk and +if a crash occurs after the commit operation has returned, the operation +will be complete and data will be available from the file. + +<P> +The latter, <I>atomically</I>, warantees that the operation is either +completely done, or not done at all. This is a really common word, +specially if you have worked with multiprocessing, and should be quite +familiar. We implement atomicity by combining fine-grained locks and +journaling, which can assure us both to be able to recover from crashes, +and to have exclusive access to a portion of the file without having +any other transaction overlap it. + +<P> +Well, so much for talking, now let's get real; libjio applies commits +in a very simple and straightforward way, inside <I>jtrans_commit()</I>: + +<P> + +<UL> +<LI>Lock the section where the commit takes place +</LI> +<LI>Open the transaction file +</LI> +<LI>Write the header +</LI> +<LI>Write the user data (if any) +</LI> +<LI>Read the previous data from the file +</LI> +<LI>Write the previous data in the transaction +</LI> +<LI>Write the data to the file +</LI> +<LI>Mark the transaction as commited by setting a flag in the header +</LI> +<LI>Unlink the transaction file +</LI> +<LI>Unlock the section where the commit takes place +</LI> +</UL> +This may look as a lot of steps, but they're not as much as it looks +like inside the code, and allows a recovery from interruptions in +every step of the way (or even in the middle of a step). + +<P> + +<H1><A NAME="SECTION00050000000000000000"> +4 The rollback procedure</A> +</H1> + +<P> +First of all, rollbacking is like ``undo'' a commit: return the +data to the state it had exactly before a given commit was applied. +Due to the way we handle commits, doing this operation becomes quite +simple and straightforward. + +<P> +In the previous section we said that each transaction held, besides +the data to commit to the disk, the data that was on it before commiting. +That data is saved precisely to be able to rollback. So, to rollback +a transaction all that has to be done is recover that ``previous +data'' from the transaction we want to rollback, and save it to the +disk. In the end, this ends up being a new transaction with the previous +data as the new one, so we do that: create a new transaction structure, +fill in the data from the transaction we want to rollback, and commit +it. All this is performed by <I>jtrans_rollback()</I>. + +<P> +By doing this we can provide the same warranties a commit has, it's +really fast, eases the recovery, and the code is simple and clean. +What a deal. + +<P> +But be aware that rollbacking is dangerous. And I really mean it: +you should <B><I>only</I></B> do it if you're really sure it's ok. +Consider, for instance, that you commit transaction A, then B, and +then you rollback A. If A and B happen to touch the same portion of +the file, the rollback will, of course, not return the state previous +to B, but previous to A. If it's not done safely, this can lead to +major corruption. Now, if you add to this transactions that extend +the file (and thus rollbacking truncates it back), you not only have +corruption but data loss. So, again, be aware, I can't stress this +enough, <B><I>rollback only if you really really know what +you are doing</I></B>. + +<P> + +<H1><A NAME="SECTION00060000000000000000"> +5 The recovery procedure</A> +</H1> + +<P> +Recovering from crashes is done by the <I>jfsck()</I> call (or the +program <I>jiofsck</I> which is just a simple invocation to that function), +which opens the file and goes through all transactions in the journal +(remember that transactions are removed from the journal directory +after they're applied), loading and rollbacking them if necessary. +There are several steps where it can fail: there could be no journal, +a given transaction file might be corrupted, incomplete, and so on; +but in the end, there are two cases regarding each transaction: either +it's complete and can be rollbacked, or not. + +<P> +In the case the transaction is not complete, there is no possibility +that it has been partially applied to the disk, remember that, from +the commit procedure, we only apply the transaction <I>after</I> saving +it in the journal, so there is really nothing left to be done. So +if the transaction is complete, we only need to rollback. + +<P> +In any case, after making the recovery you can simply remove the journal +entirely and let the library create a new one, and you can be sure +that transaction atomicity was preserved. + +<P> + +<H1><A NAME="SECTION00070000000000000000"> +6 High-level functions</A> +</H1> + +<P> +We call <I>high-level functions</I> to the ones provided by the library +that emulate the good old unix file manipulation calls. Most of them +are just wrappers around commits, and implement proper locking when +operating in order to allow simultaneous operations (either across +threads or processes). They are described in detail in the manual +pages, we'll only list them here for completion: + +<P> + +<UL> +<LI>jopen() +</LI> +<LI>jread(), jpread(), jreadv() +</LI> +<LI>jwrite(), jpwrite(), jwritev() +</LI> +<LI>jtruncate() +</LI> +<LI>jclose() +</LI> +</UL> + +<P> + +<H1><A NAME="SECTION00080000000000000000"> +7 ACID (or How does libjio fit into theory)</A> +</H1> + +<P> +I haven't read much theory about this, and the library was implemented +basically by common sense and not theorethical study. + +<P> +However, I'm aware that database people like ACID (well, that's not +news for anybody ;), which they say mean "Atomicity, Consistency, +Isolation, Durability" (yeah, right!). + +<P> +So, even libjio is not a purely database thing, it can be used to +achieve those attributes in a simple and efficient way. + +<P> +Let's take a look one by one: + +<P> + +<UL> +<LI>Atomicity: In a transaction involving two or more discrete pieces +of information, either all of the pieces are committed or none are. +This has been talked before and we've seen how the library achieves +this point, mostly based on locks and relying on a commit procedure. +</LI> +<LI>Consistency: A transaction either creates a new and valid state of +data, or, if any failure occurs, returns all data to its state before +the transaction was started. This, like atomicity, has been discussed +before, specially in the recovery section, when we saw how in case +of a crash we end up with a fully applied transaction, or no transaction +applied at all. +</LI> +<LI>Isolation: A transaction in process and not yet committed must remain +isolated from any other transaction. This comes as a side effect of +doing proper locking on the sections each transaction affect, and +guarantees that there can't be two transactions working on the same +section at the same time. +</LI> +<LI>Durability: Committed data is saved by the system such that, even +in the event of a failure and system restart, the data is available +in its correct state. For this point we rely on the disk as a method +of permanent storage, and expect that when we do syncronous I/O, data +is safely written and can be recovered after a crash. +</LI> +</UL> + +<P> + +<H1><A NAME="SECTION00090000000000000000"> +8 Working from outside</A> +</H1> + +<P> +If you want, and are careful enough, you can safely do I/O without +using the library. Here I'll give you some general guidelines that +you need to follow in order to prevent corruption. Of course you can +bend or break them according to your use, this is just a general overview +on how to interact from outside. + +<P> + +<UL> +<LI>Lock the sections you want to use: the library, as we have already +exposed, relies on fcntl locking; so, if you intend to operate on +parts on the file while using it, you should lock them. +</LI> +<LI>Don't tuncate, unlink or rename: these operations have serious implications +when they're done while using the library, because the library itself +assumes that names don't change, and files don't dissapear beneath +it. It could potentially lead to corruption, although most of the +time you would just get errors from every call. +</LI> +</UL> + +<P> + +<H1><A NAME="SECTION000100000000000000000"> +About this document ...</A> +</H1> + <STRONG>libjio - A library for journaled I/O </STRONG><P> +This document was generated using the +<A HREF="http://www.latex2html.org/"><STRONG>LaTeX</STRONG>2<tt>HTML</tt></A> translator Version 2002-2-1 (1.70) +<P> +Copyright © 1993, 1994, 1995, 1996, +<A HREF="http://cbl.leeds.ac.uk/nikos/personal.html">Nikos Drakos</A>, +Computer Based Learning Unit, University of Leeds. +<BR> +Copyright © 1997, 1998, 1999, +<A HREF="http://www.maths.mq.edu.au/~ross/">Ross Moore</A>, +Mathematics Department, Macquarie University, Sydney. +<P> +The command line arguments were: <BR> + <STRONG>latex2html</STRONG> <TT>-no_subdir -split 0 -show_section_numbers /tmp/lyx_tmpdir2441q5CgGo/lyx_tmpbuf0/libjio.tex</TT> +<P> +The translation was initiated by root on 2004-04-30<HR> +<!--Navigation Panel--> +<IMG WIDTH="81" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="next_inactive" + SRC="file:/usr/lib/latex2html/icons/nx_grp_g.png"> +<IMG WIDTH="26" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="up" + SRC="file:/usr/lib/latex2html/icons/up_g.png"> +<IMG WIDTH="63" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="previous" + SRC="file:/usr/lib/latex2html/icons/prev_g.png"> +<BR> +<!--End of Navigation Panel--> +<ADDRESS> +root +2004-04-30 +</ADDRESS> +</BODY> +</HTML>