author | Alberto Bertogli
<albertito@blitiri.com.ar> 2009-03-02 05:48:40 UTC |
committer | Alberto Bertogli
<albertito@blitiri.com.ar> 2009-03-02 05:51:46 UTC |
parent | f4f54460bfe2fd3321da7ed0e3c51d167094066f |
INSTALL | +0 | -4 |
README | +0 | -4 |
doc/big_transactions | +0 | -9 |
doc/guide.lyx | +0 | -778 |
doc/guide.rst | +285 | -0 |
doc/guide.txt | +0 | -378 |
doc/jiofsck | +0 | -12 |
doc/libjio.3 | +162 | -145 |
doc/libjio.html | +0 | -444 |
doc/libjio.lyx | +0 | -433 |
doc/libjio.rst | +244 | -0 |
doc/libjio.txt | +0 | -305 |
doc/{layout => source_layout} | +0 | -0 |
doc/threads | +0 | -20 |
doc/tids | +0 | -83 |
doc/tids.rst | +76 | -0 |
diff --git a/INSTALL b/INSTALL index 246dd15..69b49bc 100644 --- a/INSTALL +++ b/INSTALL @@ -36,7 +36,3 @@ instead. After installing, you're ready to use the library; you can see how by looking at the manpage with "man libjio". - -If you have any question, suggestion or comment, please send it to my email -address, albertito@blitiri.com.ar. - diff --git a/README b/README index c6136f6..1adc539 100644 --- a/README +++ b/README @@ -19,10 +19,6 @@ everything is isolated. There are more detailed documents: a programming guide, a brief introduction to the design and inner workings, and the manpage; all in the doc/ directory. -The first two, called 'guide' and 'libjio' respectively, are both in txt and -lyx formats, with HTML, Postscript and PDF versions in the website, which are -not included in the package for space reasons. - To see how to install it, please read the INSTALL file. diff --git a/doc/big_transactions b/doc/big_transactions deleted file mode 100644 index 3c5392d..0000000 --- a/doc/big_transactions +++ /dev/null @@ -1,9 +0,0 @@ - -If you have to create big transactions, instead of creating a huge buffer you -can mmap a temporary file and periodically sync it; and when you're done, just -jtrans_commit() the whole thing. - -This would be a quite efficient way, without any performance penalty and a -very simple approach; I originally thought of doing this on the journal, but -it had many drawbacks that made it much expensive, slower and complex. - diff --git a/doc/guide.lyx b/doc/guide.lyx deleted file mode 100644 index 364fe8c..0000000 --- a/doc/guide.lyx +++ /dev/null @@ -1,778 +0,0 @@ -#LyX 1.3 created this file. For more info see http://www.lyx.org/ -\lyxformat 221 -\textclass article -\language english -\inputencoding auto -\fontscheme default -\graphics default -\paperfontsize default -\papersize Default -\paperpackage a4 -\use_geometry 0 -\use_amsmath 0 -\use_natbib 0 -\use_numerical_citations 0 -\paperorientation portrait -\secnumdepth 3 -\tocdepth 3 -\paragraph_separation indent -\defskip medskip -\quotes_language english -\quotes_times 2 -\papercolumns 1 -\papersides 1 -\paperpagestyle default - -\layout Title - -libjio Programmer's Guide -\layout Author - -Alberto Bertogli (albertito@blitiri.com.ar) -\layout Standard - - -\begin_inset LatexCommand \tableofcontents{} - -\end_inset - - -\layout Section - -Introduction -\layout Standard - -This small document attempts serve as a guide to the programmer who wants - to make use of the library. - It's not a replacement for the man page or reading the code; but it's a - good starting point for everyone who wants to get involved with it. -\layout Standard - -The library is not complex to use at all, and the interfaces were designed - to be as intuitive as possible, so the text is structured as a guide to - present the reader all the common structures and functions the way they're - normally used. -\layout Section - -Definitions -\layout Standard - -This is a library which provides a journaled transaction-oriented I/O API. - You've probably read this a hundred times already in the documents, and - if you haven't wondered yet what on earth does this mean you should be - reading something else! -\layout Standard - -We say this is a transaction-oriented API because we make transactions the - center of our operations, and journaled because we use a journal (which - takes the form of a directory with files on it) to guarantee coherency - even after a crash at any point. -\layout Standard - -Here we think a transaction as a list of -\emph on -(buffer, length, offset) -\emph default - to be applied to a file. - That triple is called an -\emph on -operation -\emph default -, so we can say that a transaction represent an ordered group of operations - on the same file -\emph on -. -\layout Standard - -The act of -\emph on -committing -\emph default - a transaction means writing all the elements of the list; and -\emph on -rollbacking -\emph default - means to undo a previous commit, and leave the data just as it was before - doing the commit. -\begin_inset Foot -collapsed false - -\layout Standard - -While all this definitions may seem obvious to some people, it requires - special attention because there are a lot of different definitions, and - it's not that common to see -\begin_inset Quotes eld -\end_inset - -transaction -\begin_inset Quotes erd -\end_inset - - applied to file I/O (it's a term used mostly on database stuff), so it's - important to clarify before continuing. -\end_inset - - -\layout Standard - -It's important to note that the library not only provides a convenient and - easy API to perform this kind of operations, but provides a lot of guarantees - while doing this. - The most relevant and useful is that at any point of time, even if we crash - horribly, a transaction will be either fully applied or not applied at - all. - You should not ever see partial transactions or any kind of data corruption. -\layout Standard - -To achieve this, the library uses what is called a -\emph on -journal -\emph default -, a very vague (and fashionable) term we use to describe a set of auxiliary - files that get created to store temporary data at several stages. - The proper definition and how we use them is outside the scope of this - document, and you as a programmer shouldn't need to deal with it. - In case you're curious, it's described in a bit more detail in another - text which talks about how the library works internally. - Now let's get real. -\layout Section - -The data types -\layout Standard - -To understand any library, it's essential to be confident in the knowledge - of their data structures and how they relate each other. - In libjio we have two basic structures which have a very strong relationship, - and represent the essential objects we deal with. - Note that you normally don't manipulate them directly, because they have - their own initializer functions, but they are the building blocks for the - rest of the text, which, once this is understood, is obvious and self-evident. -\layout Standard - -The first structure we face is -\family typewriter -struct\SpecialChar ~ -jfs -\family default -, called the -\emph on -file structure -\emph default -, and it represents an open file, just like a regular file descriptor or - a -\family typewriter -FILE\SpecialChar ~ -* -\family default -. -\layout Standard - -Then you find -\family typewriter -struct\SpecialChar ~ -jtrans -\family default -, called the -\emph on -transaction structure -\emph default -, which represents a single transaction. - You can have as many transactions as you want, and operate on all of them - simultaneously without problems; the library is entirely thread safe so - there's no need to worry about that. -\layout Section - -The basic functions -\layout Standard - -Now that we've described our data types, let's see how we can really operate - with the library. - -\layout Standard - -First of all, as with regular I/O, you need to open your files. - This is done with -\family typewriter -jopen() -\family default -, which looks a lot like -\family typewriter -open() -\family default - but takes a file structure instead of a file descriptor (this will be very - common among all the functions), and adds a new parameter -\emph on -jflags -\emph default - that can be used to modify some subtle library behaviour we'll see later, - and it's normally not used. -\layout Standard - -We have a happy file structure open now, and the next thing to do would - be to create a transaction. - This is what -\family typewriter -jtrans_init() -\family default - is for: it takes a file structure and a transaction structure and initializes - the latter, leaving it ready to use. -\layout Standard - -So we have our transaction, let's add a write operation to it; to do this - we use -\family typewriter -jtrans_add() -\family default -. - We could keep on adding operations to the transaction by keep on calling - -\family typewriter -jtrans_add() -\family default - as many times as we want. -\layout Standard - -Finally, we decide to apply our transaction to the file, that is, write - all the operations we've added. - And this is the easiest part: we call -\family typewriter -jtrans_commit() -\family default -, and that's it! -\layout Standard - -When we're done using the file, we call -\family typewriter -jclose() -\family default -, just like we call -\family typewriter -close() -\family default -. -\layout Standard - -Let's put it all together and code a nice -\begin_inset Quotes eld -\end_inset - -hello world -\begin_inset Quotes erd -\end_inset - - program (return values are ignored for simplicity): -\layout LyX-Code - -char buf[] = "Hello world!"; -\layout LyX-Code - -struct jfs file; -\layout LyX-Code - -struct jtrans trans; -\newline - -\newline - -\layout LyX-Code - -jopen(&file, "filename", O_RDWR | O_CREAT, 0600, 0); -\layout LyX-Code - -jtrans_init(&file, &trans); -\newline - -\newline - -\layout LyX-Code - -jtrans_add(&trans, buf, strlen(buf), 0); -\newline - -\newline - -\layout LyX-Code - -jtrans_commit(&trans); -\newline - -\newline - -\layout LyX-Code - -jclose(&file); -\layout Standard - -As we've seen, we open the file and initialize the structure with -\family typewriter -jopen() -\family default - (with the parameter -\emph on -jflags -\emph default - being the last 0)and -\family typewriter -jtrans_init() -\family default -, then add an operation with -\family typewriter -jtrans_add() -\family default - (the last 0 is the offset, in this case the beginning of the file), commit - the transaction with -\family typewriter -jtrans_commit() -\family default -, and finally close the file with -\family typewriter -jclose() -\family default -. -\layout Section - -Advanced functions -\layout Subsection - -Interaction with reads -\begin_inset LatexCommand \label{sub:Interaction-with-reads} - -\end_inset - - -\layout Standard - -So far we've seen how to use the library to perform writes, but what about - reads? The only and main issue with reads is that, because we provide transacti -on atomicity, a read must never be able to -\begin_inset Quotes eld -\end_inset - -see -\begin_inset Quotes erd -\end_inset - - a transaction partially applied. - This is achieved internally by using fine-grained file locks; but you shouldn't - mind about it if you use the functions the library gives you because they - take care of all the locking. -\layout Standard - -This set of functions are very similar to the UNIX ones ( -\family typewriter -read() -\family default -, -\family typewriter -readv() -\family default -, etc.); and in fact are named after them: they're called -\family typewriter -jread() -\family default -, -\family typewriter -jreadv() -\family default - and -\family typewriter -jpread() -\family default -; and have the same parameters except for the first one, which instead of - a file descriptor is a file structure -\begin_inset Foot -collapsed false - -\layout Standard - -In fact, this set of functions is a part of what is called the -\begin_inset Quotes eld -\end_inset - -UNIX API -\begin_inset Quotes erd -\end_inset - -, which is described below. -\end_inset - -. - Bear in mind that transactions are only visible by reads -\emph on -after -\emph default - you commit them with -\family typewriter -jtrans_commit() -\family default -. -\layout Subsection - -Rollback -\layout Standard - -There is a very nice and important feature in transactions, that allow them - to be -\begin_inset Quotes eld -\end_inset - -undone -\begin_inset Quotes erd -\end_inset - -, which means that you can undo a transaction and leave the file just as - it was the moment before applying it. - The action of undoing it is called to -\emph on -rollback -\emph default -, and the function is called -\family typewriter -jtrans_rollback() -\family default -, which takes the transaction as the only parameter. -\layout Standard - -Be aware that rollbacking a transaction can be dangerous if you're not careful - and cause you a lot of troubles. - For instance, consider you have two transactions (let's call them 1 and - 2, and assume they were applied in that order) that modify the same offset, - and you rollback transaction 1; then 2 would be lost. - It is not an dangerous operation itself, but its use requires care and - thought. -\layout Subsection - -Integrity checking and recovery -\layout Standard - -An essential part of the library is taking care of recovering from crashes - and be able to assure a file is consistent. - When you're working with the file, this is taking care of; but what when - you first open it? To answer that question, the library provides you with - a function named -\family typewriter -jfsck() -\family default -, which checks the integrity of a file and makes sure that everything is - consistent. - It must be called -\begin_inset Quotes eld -\end_inset - -offline -\begin_inset Quotes erd -\end_inset - -, that is when you are not actively committing and rollbacking; it is normally - done before calling -\family typewriter -jopen() -\family default -. - Another good practise is call jfsck_cleanup() after calling jfsck() to - make sure we're starting up with a fresh clean journal. - After both calls, it is safe to assume that the file is and ready to use. -\layout Standard - -You can also do this manually with an utility named -\emph on -jiofsck -\emph default -, which can be used from the shell to perform the checking and cleanup. -\layout Subsection - -Threads and locking -\layout Standard - -The library is completely safe to use in multithreaded applications; however, - there are some very basic and intuitive locking rules you have to bear - in mind. -\layout Standard - -Most is fully threadsafe so you don't need to worry about concurrency; in - fact, a lot of effort has been put in making parallel operation safe and - fast. -\layout Standard - -You need to care only when opening, closing and checking for integrity. - In practise, that means that you shouldn't call -\family typewriter -jopen() -\family default -, -\family typewriter -jclose() -\family default - in parallel with the same jfs structure, or in the middle of an I/O operation, - just like you do when using the normal UNIX calls. - In the case of -\family typewriter -jfsck() -\family default -, you shouldn't invoke it for the same file more than once at the time; - while it will cope with that situation, it's not recommended. -\layout Standard - -All other operations (commiting a transaction, rollbacking it, adding operations -, etc.) and all the wrappers are safe and don't require any special consideration -s. -\layout Subsection - -Lingering transactions -\layout Standard - -If you need to increase performance, you can use lingering transactions. - In this mode, transactions take up more disk space but allows you to do - the synchronous write only once, making commits much faster. - To use them, just add -\family typewriter -J_LINGER -\family default - to the jflags parameter in -\family typewriter -jopen() -\family default -. - It is very wise to call -\family typewriter -jsync() -\family default - frequently to avoid using up too much space. -\layout Section - -Disk layout -\layout Standard - -The library creates a single directory for each file opened, named after - it. - So if we open a file -\begin_inset Quotes eld -\end_inset - -output -\begin_inset Quotes erd -\end_inset - -, a directory named -\begin_inset Quotes eld -\end_inset - -.output.jio -\begin_inset Quotes erd -\end_inset - - will be created. - We call it the -\emph on -journal directory -\emph default -, and it's used internally by the library to save temporary data; you shouldn't - modify any of the files that are inside it, or move it while it's in use. - It doesn't grow much (it only uses space for transactions that are in the - process of committing) and gets automatically cleaned while working with - it so you can (and should) ignore it. - Besides that, the file you work with has no special modification and is - just like any other file, all the internal stuff is kept isolated on the - journal directory. -\layout Section - -Other APIs -\layout Standard - -We're all used to do things our way, and when we learn something new it's - often better if it looks alike what we already know. - With this in mind, the library comes with two sets of APIs that look a - lot like traditional, well known ones. - Bear in mind that they are not as powerful as the transaction API that - is described above, and they can't provide the same functionality in a - lot of cases; however for a lot of common and simple use patterns they - are good enough. -\layout Subsection - -UNIX API -\layout Standard - -There is a set of functions that emulate the UNIX API ( -\family typewriter -read() -\family default -, -\family typewriter -write() -\family default -, and so on) which make each operation a transaction. - This can be useful if you don't need to have the full power of the transactions - but only to provide guarantees between the different functions. - They are a lot like the normal UNIX functions, but instead of getting a - file descriptor as their first parameter, they get a file structure. - You can check out the manual page to see the details, but they work just - like their UNIX version, only that they preserve atomicity and thread-safety - -\emph on -within each call -\emph default -. -\layout Standard - -In particular, the group of functions related to reading (which was described - above in -\begin_inset LatexCommand \ref{sub:Interaction-with-reads} - -\end_inset - -) are extremely useful because they take care of the locking needed for - the library proper behaviour. - You should use them instead of the regular calls. -\layout Standard - -The full function list is available on the man page and I won't reproduce - it here; however the naming is quite simple: just prepend a 'j' to all - the names: -\family typewriter -jread() -\family default -, -\family typewriter -jwrite() -\family default -, etc. -\layout Subsection - -ANSI C API -\layout Standard - -Besides the UNIX API you can find an ANSI C API, which emulates the traditional - -\family typewriter -fread() -\family default -, -\family typewriter -fwrite() -\family default -, etc. - They're still in development and has not been tested carefully, so I won't - spend time documenting them. - Let me know if you need them. -\layout Section - -Compiling and linking -\layout Standard - -When you want to use your library, besides including the -\emph on - -\begin_inset Quotes eld -\end_inset - -libjio.h -\begin_inset Quotes erd -\end_inset - - -\emph default - header, you have to make sure your application uses the Large File Support - ( -\begin_inset Quotes eld -\end_inset - -LFS -\begin_inset Quotes erd -\end_inset - - from now on), to be able to handle large files properly. - This means that you will have to pass some special standard flags to the - compiler, so your C library uses the same data types as the library. - For instance, on 32-bit platforms (like x86), when using LFS, offsets are - usually 64 bits, as opposed to the usual 32. -\layout Standard - -The library is always built with LFS; however, link it against an application - without LFS support could lead to serious problems because this kind of - size differences and ABI compatibility. -\layout Standard - -The Single Unix Specification standard proposes a simple and practical way - to get the flags you need to pass your C compiler to tell you want to compile - your application with LFS: use a program called -\emph on - -\begin_inset Quotes eld -\end_inset - -getconf -\begin_inset Quotes erd -\end_inset - - -\emph default - which should be called like -\emph on - -\begin_inset Quotes eld -\end_inset - -getconf LFS_CFLAGS -\begin_inset Quotes erd -\end_inset - - -\emph default -, and it outputs the appropiate parameters. - Sadly, not all platforms implement it, so it's also wise to pass -\emph on - -\begin_inset Quotes eld -\end_inset - --D_FILE_OFFSET_BITS=64 -\begin_inset Quotes erd -\end_inset - - -\emph default - just in case. -\layout Standard - -In the end, the command line would be something like: -\layout LyX-Code - -gcc `getconf LFS_CFLAGS` -D_FILE_OFFSET_BITS=64 -\backslash - -\layout LyX-Code - - app.c -ljio -lpthread -o app -\layout Standard - -If you want more detailed information or examples, you can check out how - the library and sample applications get built. -\layout Section - -Where to go from here -\layout Standard - -If you're still interested in learning more, you can find some small and - clean samples are in the -\begin_inset Quotes eld -\end_inset - -samples -\begin_inset Quotes erd -\end_inset - - directory (full.c is a simple and complete one), other more advanced examples - can be found in the web page, as well as modifications to well known software - to make use of the library. - For more information about the inner workings of the library, you can read - the -\begin_inset Quotes eld -\end_inset - -libjio -\begin_inset Quotes erd -\end_inset - - document, and the source code. -\the_end diff --git a/doc/guide.rst b/doc/guide.rst new file mode 100644 index 0000000..3f3bdef --- /dev/null +++ b/doc/guide.rst @@ -0,0 +1,285 @@ + +libjio Programmer's Guide +========================= + +Introduction +------------ + +This small document attempts serve as a guide to the programmer who wants to +make use of the library. It's not a replacement for the man page or reading +the code; but it's a good starting point for everyone who wants to get +involved with it. + +The library is not complex to use at all, and the interfaces were designed to +be as intuitive as possible, so the text is structured as a guide to present +the reader all the common structures and functions the way they're normally +used. + +Definitions +----------- + +This is a library which provides a journaled, transaction-oriented I/O API. +You've probably read this a hundred times already in the documents, and if you +haven't wondered yet what on earth does, this mean you should be reading +something else! + +We say this is a transaction-oriented API because we make transactions the +center of our operations, and journaled because we use a journal (which takes +the form of a directory with files on it) to guarantee coherency even after a +crash at any point. + +In this document, we think of a transaction as a list of *(buffer, length, +offset)* to be applied to a file. That triplet is called an *operation*, so we +can say that a transaction represent an ordered group of operations on the +same file. + +The act of *committing* a transaction means writing all the elements of the +list; and rollbacking means to undo a previous commit, and leave the data just +as it was before doing the commit. While all this definitions may seem obvious +to some people, it requires special attention because there are a lot of +different definitions, and it's not that common to see "transaction" applied +to file I/O, because it's a term used mostly on database stuff. + +The library provides several guarantees, the most relevant and useful being +that at any point of time, even if the machine crash horribly, a transaction +will be either fully applied or not applied at all. + +To achieve this, the library uses what is called a journal, a very vague (and +fashionable) term we use to describe a set of auxiliary files that get created +to store temporary data at several stages. The proper definition and how we +use them is outside the scope of this document, and you as a programmer +shouldn't need to deal with it. In case you're curious, it's described in a +bit more detail in another text which talks about how the library works +internally. Now let's get real. + + +The data types +-------------- + +To understand any library, it's essential to be confident in the knowledge of +their data structures and how they relate each other. libjio has two basic +structures which have a very strong relationship, and represent the essential +objects it deals with. Note that you normally don't manipulate them directly, +because they have their own initializer functions, but they are the building +blocks for the rest of the text, which, once this is understood, should be +obvious and self-evident. + +The first structure we face is *struct jfs*, usually called the file +structure, and it represents an open file, just like a regular file descriptor +or a FILE \*. + +Then you find *struct jtrans*, usually called the transaction structure, which +represents a single transaction. + + +Basic operation +--------------- + +Now that we've described our data types, let's see how we can operate with the +library. + +First of all, as with regular I/O, you need to open your files. This is done +with *jopen()*, which looks a lot like *open()* but takes a file structure +instead of a file descriptor (this will be very common among all the +functions), and adds a new parameter *jflags* that can be used to modify some +subtle library behaviour we'll see later, and is normally not used. + +We have a happy file structure open now, and the next thing to do would be to +create a transaction. This is what *jtrans_init()* is for: it takes a file +structure and a transaction structure and initializes the latter, leaving it +ready to use. + +Now that we have our transaction, let's add a write operation to it; to do +this we use *jtrans_add()*. We could keep on adding operations to the +transaction by keep on calling jtrans_add() as many times as we want. +Operations within a transaction may overlap, and will be applied in order. + +Finally, we decide to apply our transaction to the file, that is, write all +the operations we've added. And this is the easiest part: we call +*jtrans_commit()*, and that's it! + +When we're done using the file, we call *jclose()*, just like we would call +*close()*. + +Let's put it all together and code a nice "hello world" +program (return values are ignored for simplicity):: + + char buf[] = "Hello world!"; + struct jfs file; + struct jtrans trans; + + jopen(&file, "filename", O_RDWR | O_CREAT, 0600, 0); + + jtrans_init(&file, &trans); + jtrans_add(&trans, buf, strlen(buf), 0); + jtrans_commit(&trans); + + jclose(&file); + +As we've seen, we open the file and initialize the structure with *jopen()* +(with the parameter *jflags* being the last 0), create a new transaction with +*jtrans_init()*, then add an operation with *jtrans_add()* (the last 0 is the +offset, in this case the beginning of the file), commit the transaction with +*jtrans_commit()*, and finally close the file with *jclose()*. + +Reading is much easier: the library provides three functions, *jread()*, +*jpread()* and *jreadv()*, that behave exactly like *read()*, *pread()* and +*readv()*, except that they play safe with libjio's writing code. You should +use these to read from files you're writing with libjio. + + +Integrity checking and recovery +------------------------------- + +An essential part of the library is taking care of recovering from crashes and +be able to assure a file is consistent. When you're working with the file, +this is taking care of; but what when you first open it? To answer that +question, the library provides you with a function named *jfsck()*, which +checks the integrity of a file and makes sure that everything is consistent. + +It must be called "offline", that is when you are not actively committing and +rollbacking; it is normally done before calling *jopen()* and is **very, very +important**. Another good practise is to call *jfsck_cleanup()* after calling +*jfsck()*, to make sure we're starting up with a fresh clean journal. After +both calls, it is safe to assume that the file is and ready to use. + +You can also do this manually with an utility named jiofsck, which can be used +from the shell to perform the checking and cleanup. + + +Rollback +-------- + +There is a very nice and important feature in transactions, that allow them to +be "undone", which means that you can undo a transaction and leave the file +just as it was the moment before applying it. The action of undoing it is +called to rollback, and the function is called jtrans_rollback(), which takes +the transaction as the only parameter. + +Be aware that rollbacking a transaction can be dangerous if you're not careful +and cause you a lot of troubles. For instance, consider you have two +transactions (let's call them 1 and 2, and assume they were applied in that +order) that modify the same offset, and you rollback transaction 1; then 2 +would be lost. It is not an dangerous operation itself, but its use requires +care and thought. + + +UNIX-alike API +-------------- + +There is a set of functions that emulate the UNIX API (*read()*, *write()*, +and so on) which make each operation a transaction. This can be useful if you +don't need to have the full power of the transactions but only to provide +guarantees between the different functions. They are a lot like the normal +UNIX functions, but instead of getting a file descriptor as their first +parameter, they get a file structure. You can check out the manual page to see +the details, but they work just like their UNIX version, only that they +preserve atomicity and thread-safety within each call. + +In particular, the group of functions related to reading (which was described +above in `Basic operation`_) are extremely useful because they take care of +the locking needed for the library proper behaviour. You should use them +instead of the regular calls. + +The full function list is available on the man page and I won't reproduce it +here; however the naming is quite simple: just prepend a 'j' to all the names: +*jread()*, *jwrite()*, etc. + + +Threads and locking +------------------- + +The library is completely safe to use in multithreaded applications; however, +there are some very basic and intuitive locking rules you have to bear in +mind. + +Most is fully threadsafe so you don't need to worry about concurrency; in +fact, a lot of effort has been put in making parallel operation safe and fast. + +You need to care only when opening, closing and checking for integrity. In +practise, that means that you shouldn't call *jopen()*, *jclose()* in parallel +with the same jfs structure, or in the middle of an I/O operation, just like +you do when using the normal UNIX calls. In the case of *jfsck()*, you +shouldn't invoke it for the same file more than once at the time; while it +will cope with that situation, it's not recommended. + +All other operations (commiting a transaction, rollbacking it, adding +operations, etc.) and all the wrappers are safe and don't require any special +considerations. + + +Lingering transactions +---------------------- + +If you need to increase performance, you can use lingering transactions. In +this mode, transactions take up more disk space but allows you to do the +synchronous write only once, making commits much faster. To use them, just add +*J_LINGER* to the *jflags* parameter in *jopen()*. You should call *jsync()* +frequently to avoid using up too much space. + + +Disk layout +----------- + +The library creates a single directory for each file opened, named after it. +So if we open a file *output*, a directory named *.output.jio* will be +created. We call it the journal directory, and it's used internally by the +library to save temporary data; **you shouldn't modify any of the files that +are inside it, nor move it while it's in use**. + +It doesn't grow much (it only uses space for transactions that are in the +process of committing) and gets automatically cleaned while working with it so +you can (and should) ignore it. Besides that, the file you work with has no +special modification and is just like any other file, all the internal stuff +is kept isolated on the journal directory. + + +ANSI C alike API +---------------- + +Besides the UNIX-alike API you can find an ANSI C alike API, which emulates +the traditional *fread()*, *fwrite()*, etc. It's still in development and has +not been tested carefully, so I won't spend time documenting them. Let me know +if you need them. + + +Compiling and linking +--------------------- + +When you want to use your library, besides including the "libjio.h" header, +you have to make sure your application uses the Large File Support ("LFS" from +now on), to be able to handle large files properly. This means that you will +have to pass some special standard flags to the compiler, so your C library +uses the same data types as the library. For instance, on 32-bit platforms +(like x86), when using LFS, offsets are usually 64 bits, as opposed to the +usual 32. + +The library is always built with LFS; however, link it against an application +without LFS support could lead to serious problems because this kind of size +differences and ABI compatibility. + +The Single Unix Specification standard proposes a simple and practical way to +get the flags you need to pass your C compiler to tell you want to compile +your application with LFS: use a program called "getconf" which should be +called like "getconf LFS_CFLAGS", and it outputs the appropiate parameters. +Sadly, not all platforms implement it, so it's also wise to pass +"-D_FILE_OFFSET_BITS=64" just in case. + +In the end, the command line would be something like:: + + gcc `getconf LFS_CFLAGS` -D_FILE_OFFSET_BITS=64 app.c -ljio -o app + +If you want more detailed information or examples, you can check out how the +library and sample applications get built. + + +Where to go from here +--------------------- + +If you're still interested in learning more, you can find some small and clean +samples are in the "samples" directory (full.c is a simple and complete one), +other more advanced examples can be found in the web page, as well as +modifications to well known software to make use of the library. For more +information about the inner workings of the library, you can read the "libjio" +document, and the source code. + diff --git a/doc/guide.txt b/doc/guide.txt deleted file mode 100644 index 44ba7d5..0000000 --- a/doc/guide.txt +++ /dev/null @@ -1,378 +0,0 @@ -libjio Programmer's Guide - -Alberto Bertogli (albertito@blitiri.com.ar) - -Table of Contents - -1 Introduction -2 Definitions -3 The data types -4 The basic functions -5 Advanced functions - 5.1 Interaction with reads - 5.2 Rollback - 5.3 Integrity checking and recovery - 5.4 Threads and locking - 5.5 Lingering transactions -6 Disk layout -7 Other APIs - 7.1 UNIX API - 7.2 ANSI C API -8 Compiling and linking -9 Where to go from here - - - -1 Introduction - -This small document attempts serve as a guide to the -programmer who wants to make use of the library. It's -not a replacement for the man page or reading the code; -but it's a good starting point for everyone who wants -to get involved with it. - -The library is not complex to use at all, and the -interfaces were designed to be as intuitive as -possible, so the text is structured as a guide to -present the reader all the common structures and -functions the way they're normally used. - -2 Definitions - -This is a library which provides a journaled -transaction-oriented I/O API. You've probably read this -a hundred times already in the documents, and if you -haven't wondered yet what on earth does this mean you -should be reading something else! - -We say this is a transaction-oriented API because we -make transactions the center of our operations, and -journaled because we use a journal (which takes the -form of a directory with files on it) to guarantee -coherency even after a crash at any point. - -Here we think a transaction as a list of (buffer, -length, offset) to be applied to a file. That triple is -called an operation, so we can say that a transaction -represent an ordered group of operations on the same file. - -The act of committing a transaction means writing all -the elements of the list; and rollbacking means to undo -a previous commit, and leave the data just as it was -before doing the commit.While all this definitions may seem obvious to some -people, it requires special attention because there are -a lot of different definitions, and it's not that -common to see "transaction" applied to file I/O (it's a -term used mostly on database stuff), so it's important -to clarify before continuing. - -It's important to note that the library not only -provides a convenient and easy API to perform this kind -of operations, but provides a lot of guarantees while -doing this. The most relevant and useful is that at any -point of time, even if we crash horribly, a transaction -will be either fully applied or not applied at all. You -should not ever see partial transactions or any kind of -data corruption. - -To achieve this, the library uses what is called a -journal, a very vague (and fashionable) term we use to -describe a set of auxiliary files that get created to -store temporary data at several stages. The proper -definition and how we use them is outside the scope of -this document, and you as a programmer shouldn't need -to deal with it. In case you're curious, it's described -in a bit more detail in another text which talks about -how the library works internally. Now let's get real. - -3 The data types - -To understand any library, it's essential to be -confident in the knowledge of their data structures and -how they relate each other. In libjio we have two basic -structures which have a very strong relationship, and -represent the essential objects we deal with. Note that -you normally don't manipulate them directly, because -they have their own initializer functions, but they are -the building blocks for the rest of the text, which, -once this is understood, is obvious and self-evident. - -The first structure we face is struct jfs, called the -file structure, and it represents an open file, just -like a regular file descriptor or a FILE *. - -Then you find struct jtrans, called the transaction -structure, which represents a single transaction. You -can have as many transactions as you want, and operate -on all of them simultaneously without problems; the -library is entirely thread safe so there's no need to -worry about that. - -4 The basic functions - -Now that we've described our data types, let's see how -we can really operate with the library. - -First of all, as with regular I/O, you need to open -your files. This is done with jopen(), which looks a -lot like open() but takes a file structure instead of a -file descriptor (this will be very common among all the -functions), and adds a new parameter jflags that can be -used to modify some subtle library behaviour we'll see -later, and it's normally not used. - -We have a happy file structure open now, and the next -thing to do would be to create a transaction. This is -what jtrans_init() is for: it takes a file structure -and a transaction structure and initializes the latter, -leaving it ready to use. - -So we have our transaction, let's add a write operation -to it; to do this we use jtrans_add(). We could keep on -adding operations to the transaction by keep on calling -jtrans_add() as many times as we want. - -Finally, we decide to apply our transaction to the -file, that is, write all the operations we've added. -And this is the easiest part: we call jtrans_commit(), -and that's it! - -When we're done using the file, we call jclose(), just -like we call close(). - -Let's put it all together and code a nice "hello world" -program (return values are ignored for simplicity): - -char buf[] = "Hello world!"; - -struct jfs file; - -struct jtrans trans; - - - -jopen(&file, "filename", O_RDWR | O_CREAT, 0600, 0); - -jtrans_init(&file, &trans); - - - -jtrans_add(&trans, buf, strlen(buf), 0); - - - -jtrans_commit(&trans); - - - -jclose(&file); - -As we've seen, we open the file and initialize the -structure with jopen() (with the parameter jflags being -the last 0)and jtrans_init(), then add an operation -with jtrans_add() (the last 0 is the offset, in this -case the beginning of the file), commit the transaction -with jtrans_commit(), and finally close the file with jclose(). - -5 Advanced functions - -5.1 Interaction with reads<sub:Interaction-with-reads> - -So far we've seen how to use the library to perform -writes, but what about reads? The only and main issue -with reads is that, because we provide transaction -atomicity, a read must never be able to "see" a -transaction partially applied. This is achieved -internally by using fine-grained file locks; but you -shouldn't mind about it if you use the functions the -library gives you because they take care of all the locking. - -This set of functions are very similar to the UNIX ones -(read(), readv(), etc.); and in fact are named after -them: they're called jread(), jreadv() and jpread(); -and have the same parameters except for the first one, -which instead of a file descriptor is a file structureIn fact, this set of functions is a part of what is -called the "UNIX API", which is described below. -. Bear in mind that transactions are only visible by -reads after you commit them with jtrans_commit(). - -5.2 Rollback - -There is a very nice and important feature in -transactions, that allow them to be "undone", which means -that you can undo a transaction and leave the file just -as it was the moment before applying it. The action of -undoing it is called to rollback, and the function is -called jtrans_rollback(), which takes the transaction -as the only parameter. - -Be aware that rollbacking a transaction can be -dangerous if you're not careful and cause you a lot of -troubles. For instance, consider you have two -transactions (let's call them 1 and 2, and assume they -were applied in that order) that modify the same -offset, and you rollback transaction 1; then 2 would be -lost. It is not an dangerous operation itself, but its -use requires care and thought. - -5.3 Integrity checking and recovery - -An essential part of the library is taking care of -recovering from crashes and be able to assure a file is -consistent. When you're working with the file, this is -taking care of; but what when you first open it? To -answer that question, the library provides you with a -function named jfsck(), which checks the integrity of a -file and makes sure that everything is consistent. It -must be called "offline", that is when you are not -actively committing and rollbacking; it is normally -done before calling jopen(). Another good practise is -call jfsck_cleanup() after calling jfsck() to make sure -we're starting up with a fresh clean journal. After -both calls, it is safe to assume that the file is and -ready to use. - -You can also do this manually with an utility named -jiofsck, which can be used from the shell to perform -the checking and cleanup. - -5.4 Threads and locking - -The library is completely safe to use in multithreaded -applications; however, there are some very basic and -intuitive locking rules you have to bear in mind. - -Most is fully threadsafe so you don't need to worry -about concurrency; in fact, a lot of effort has been -put in making parallel operation safe and fast. - -You need to care only when opening, closing and -checking for integrity. In practise, that means that -you shouldn't call jopen(), jclose() in parallel with -the same jfs structure, or in the middle of an I/O -operation, just like you do when using the normal UNIX -calls. In the case of jfsck(), you shouldn't invoke it -for the same file more than once at the time; while it -will cope with that situation, it's not recommended. - -All other operations (commiting a transaction, -rollbacking it, adding operations, etc.) and all the -wrappers are safe and don't require any special considerations. - -5.5 Lingering transactions - -If you need to increase performance, you can use -lingering transactions. In this mode, transactions take -up more disk space but allows you to do the synchronous -write only once, making commits much faster. To use -them, just add J_LINGER to the jflags parameter in -jopen(). It is very wise to call jsync() frequently to -avoid using up too much space. - -6 Disk layout - -The library creates a single directory for each file -opened, named after it. So if we open a file "output", a -directory named ".output.jio" will be created. We call it -the journal directory, and it's used internally by the -library to save temporary data; you shouldn't modify -any of the files that are inside it, or move it while -it's in use. It doesn't grow much (it only uses space -for transactions that are in the process of committing) -and gets automatically cleaned while working with it so -you can (and should) ignore it. Besides that, the file -you work with has no special modification and is just -like any other file, all the internal stuff is kept -isolated on the journal directory. - -7 Other APIs - -We're all used to do things our way, and when we learn -something new it's often better if it looks alike what -we already know. With this in mind, the library comes -with two sets of APIs that look a lot like traditional, -well known ones. Bear in mind that they are not as -powerful as the transaction API that is described -above, and they can't provide the same functionality in -a lot of cases; however for a lot of common and simple -use patterns they are good enough. - -7.1 UNIX API - -There is a set of functions that emulate the UNIX API -(read(), write(), and so on) which make each operation -a transaction. This can be useful if you don't need to -have the full power of the transactions but only to -provide guarantees between the different functions. -They are a lot like the normal UNIX functions, but -instead of getting a file descriptor as their first -parameter, they get a file structure. You can check out -the manual page to see the details, but they work just -like their UNIX version, only that they preserve -atomicity and thread-safety within each call. - -In particular, the group of functions related to -reading (which was described above in [sub:Interaction-with-reads]) are extremely -useful because they take care of the locking needed for -the library proper behaviour. You should use them -instead of the regular calls. - -The full function list is available on the man page and -I won't reproduce it here; however the naming is quite -simple: just prepend a 'j' to all the names: jread(), -jwrite(), etc. - -7.2 ANSI C API - -Besides the UNIX API you can find an ANSI C API, which -emulates the traditional fread(), fwrite(), etc. -They're still in development and has not been tested -carefully, so I won't spend time documenting them. Let -me know if you need them. - -8 Compiling and linking - -When you want to use your library, besides including -the "libjio.h" header, you have to make sure your -application uses the Large File Support ("LFS" from now -on), to be able to handle large files properly. This -means that you will have to pass some special standard -flags to the compiler, so your C library uses the same -data types as the library. For instance, on 32-bit -platforms (like x86), when using LFS, offsets are -usually 64 bits, as opposed to the usual 32. - -The library is always built with LFS; however, link it -against an application without LFS support could lead -to serious problems because this kind of size -differences and ABI compatibility. - -The Single Unix Specification standard proposes a -simple and practical way to get the flags you need to -pass your C compiler to tell you want to compile your -application with LFS: use a program called "getconf" -which should be called like "getconf LFS_CFLAGS", and it -outputs the appropiate parameters. Sadly, not all -platforms implement it, so it's also wise to pass " --D_FILE_OFFSET_BITS=64" just in case. - -In the end, the command line would be something like: - -gcc `getconf LFS_CFLAGS` -D_FILE_OFFSET_BITS=64 \ - - app.c -ljio -lpthread -o app - -If you want more detailed information or examples, you -can check out how the library and sample applications -get built. - -9 Where to go from here - -If you're still interested in learning more, you can -find some small and clean samples are in the "samples" -directory (full.c is a simple and complete one), other -more advanced examples can be found in the web page, as -well as modifications to well known software to make -use of the library. For more information about the -inner workings of the library, you can read the "libjio" -document, and the source code. diff --git a/doc/jiofsck b/doc/jiofsck deleted file mode 100644 index b42aeb2..0000000 --- a/doc/jiofsck +++ /dev/null @@ -1,12 +0,0 @@ - -Note that jfsck does not warantee that all the transactions are fully -completed, it can only do so if you run it without any other process accessing -the journal. - -If you want to see this, you can take a look at the struct jfsck_result. It -include a field named in_progress which tell the number of transactions that -were in progress at the moment of checking, and as such weren't checked. - -Be aware that the counter is not atomic, as two checkers can be running at the -same time. - diff --git a/doc/libjio.3 b/doc/libjio.3 index 76b2e1e..414f47f 100644 --- a/doc/libjio.3 +++ b/doc/libjio.3 @@ -1,78 +1,19 @@ .TH libjio 3 "21/Feb/2004" .SH NAME libjio - A library for Journaled I/O - -.SH FUNCTIONS - +.SH SYNOPSYS +.nf .B #include <libjio.h> -.BI "int jopen(struct jfs *" fs ", const char *" name ", int " flags ", int " mode ", int " jflags " ); - -.BI "ssize_t jread(struct jfs *" fs ", void *" buf ", size_t " count " ); - -.BI "ssize_t jpread(struct jfs *" fs ", void *" buf ", size_t " count ", off_t " offset " ); - -.BI "ssize_t jreadv(struct jfs *" fs ", struct iovec *" vector ", int " count " ); - -.BI "ssize_t jwrite(struct jfs *" fs ", const void *" buf ", size_t " count " ); - -.BI "ssize_t jpwrite(struct jfs *" fs ", const void *" buf ", size_t " count ", off_t " offset " ); - -.BI "ssize_t jwritev(struct jfs *" fs ", const struct iovec *" vector ", int " count " ); - -.BI "int jtruncate(struct jfs *" fs ", off_t " lenght " ); +.B struct jfs; -.BI "int jclose(struct jfs *" fs " ); - -.BI "void jtrans_init(struct jfs *" fs " ,struct jtrans *" ts " ); - -.BI "int jtrans_commit(struct jtrans *" ts " ); - -.BI "int jtrans_add(struct jtrans *" ts ", const void * " buf ", size_t " count ", off_t " offset " ); - -.BI "int jtrans_rollback(struct jtrans *" ts " ); - -.BI "void jtrans_free(struct jtrans *" ts " ); - -.BI "int jfsck(const char *" name ", struct jfsck_result *" res " ); - -.BI "int jfsck_cleanup(const char *" name" ); - -.SH STRUCTURES -.PP -.br -.nf -.in +2n -struct jfs { - int fd; /* main file descriptor */ - char *name; /* and its name */ - int jfd; /* journal's lock file descriptor */ - int flags; /* journal mode options used in jopen() */ - pthread_mutex_t lock; /* a soft lock used in some operations */ +.BR "struct jtrans " { ... -}; -.FI -.in -2n - -.PP -.br -.nf -.in +2n -struct jtrans { - struct jfs *fs; /* journal file structure to operate on */ - char *name; /* name of the transaction file */ - int id; /* transaction id */ - int flags; /* misc flags */ + unsigned int flags; ... }; -.FI -.in -2n -.PP -.br -.nf -.in +2n -struct jfsck_result { +.BR "struct jfsck_result" " {" int total; /* total transactions files we looked at */ int invalid; /* invalid files in the journal directory */ int in_progress; /* transactions in progress */ @@ -81,106 +22,175 @@ struct jfsck_result { int rollbacked; /* transactions that were rollbacked */ ... }; -.FI -.in -2n + +.BI "int jopen(struct jfs *" fs ", const char *" name "," +.BI " int " flags ", int " mode ", int " jflags ");" +.BI "ssize_t jread(struct jfs *" fs ", void *" buf ", size_t " count ");" +.BI "ssize_t jpread(struct jfs *" fs ", void *" buf ", size_t " count "," +.BI " off_t " offset ");" +.BI "ssize_t jreadv(struct jfs *" fs ", struct iovec *" vector "," +.BI " int " count ");" +.BI "ssize_t jwrite(struct jfs *" fs ", const void *" buf ", size_t " count ");" +.BI "ssize_t jpwrite(struct jfs *" fs ", const void *" buf ", size_t " count "," +.BI " off_t " offset ");" +.BI "ssize_t jwritev(struct jfs *" fs ", const struct iovec *" vector "," +.BI " int " count ");" +.BI "int jtruncate(struct jfs *" fs ", off_t " lenght ");" + +.BI "int jclose(struct jfs *" fs ");" + +.BI "void jtrans_init(struct jfs *" fs " ,struct jtrans *" ts ");" +.BI "int jtrans_commit(struct jtrans *" ts ");" +.BI "int jtrans_add(struct jtrans *" ts ", const void * " buf "," +.BI " size_t " count ", off_t " offset ");" +.BI "int jtrans_rollback(struct jtrans *" ts ");" +.BI "void jtrans_free(struct jtrans *" ts ");" + +.BI "int jfsck(const char *" name ", struct jfsck_result *" res ");" +.BI "int jfsck_cleanup(const char *" name ");" .SH DESCRIPTION libjio is a library to do transaction-oriented journaled I/O. This manpage -describes it's C API very briefly, further information can be found in the +describes its C API very briefly, further information can be found in the documentation that comes along with the library itself, or on the web at http://blitiri.com.ar/p/libjio. -We can group the functions into three groups: the common functions, the basic -functions and the UNIX API. - -The common functions provide functionality common to the other two: jopen to -open files to use them with the library, and jfsck and jfsck_cleanup to -provide integrity checking. +Functions can be grouped in three different groups: the common functions, the +UNIX-alike API, and the basic functions. -The basic functions consists of jtrans_commit, jtrans_add and jtrans_rollback. -They provide a method for manipulating transactions, which are defined in the -jtrans structure (described above). +The common functions provide functionality common to the other two: +.B jopen() +to open files in order to use them with the library, and +.BR "jfsck() " and " jfsck_cleanup()" +to provide integrity checking. The second group mimics somehow the traditional UNIX API by providing similar interfaces to read(), write(), and their friends. -.SH Common functions +The basic functions consists of +.BR "jtrans_commit()" , " jtrans_add() " and " jtrans_rollback()" . +They provide a method for manipulating transactions, which are defined in the +.IR "jtrans structure" " (described above)." + +.SS STRUCTURES + +Both +.IR "struct jfs" " and " "struct jtrans" +are meant to be treated as opaque types, except for the fields documented +above, which you should treat as read-only. + +.B struct jfsck_result +holds the results of a +.B jfsck() +run, see below for details. + +.SS COMMON FUNCTIONS Most functions reference somehow the structures described avobe, specially -struct jfs and struct jtrans. They represent a file to operate on and a single -transaction, respectively. To open a file, you should use the jopen() call, -which is just like the normal open() call but affects a pointer to a struct -jfs. To close a file, use jclose(). They're exactly like the open() and -close() functions but use a struct jfs instead of a file descriptor; take a -look at their manpages if you have any doubts about how to use them. - -There are two functions that differs from the rest, which are jfsck() and -jfsck_cleanup(). - -The first one, jfsck(), is used to perform journal checking and recovery in -case of a crash. It must be performed when nobody else is using the file (like -in the case of a filesystem which can't be mounted), and it returns 0 if -success or an error code != 0 in case of a failure. If it succeeded, it will -fill jfsck_result summarizing the outcome of the operation. The error codes -can be either J_ENOENT (no such file), J_ENOJOURNAL (no journal associated -with that file) or J_ENOMEM (not enough free memory). There is also a program -named jiofsck which is just a simple human frontend to this function. - -The second, jfsck_cleanup(), is intended to be used after jfsck() by programs -wanting to remove all the stall transaction files and leave the journal -directory ready to use. After calling jfsck(), the transaction files will no -longer be needed, so by cleaning up the directory you make sure you're -starting over with a clean journal. It returns 0 if there was an error, or 1 -if it succeeded. - -.SH UNIX API - -The UNIX API, as explained before, consists of the functions jread(), -jpread(), jreadv(), jwrite(), jpwrite(), jwritev(), jtruncate(). In most cases -you will only need to use this, because they're simple and familiar. +.IR "struct jfs" " and " "struct jtrans" . +They represent a file to operate on and a single transaction, respectively. To +open a file, you should use the +.B jopen() +call, which is just like the normal +.B open(3) +call but affects a pointer to a +.IR struct jfs . +To close a file, use +.BR jclose() . +They're exactly like the +.BR open(3) " and close() +functions but use a +.I struct jfs +instead of a file descriptor; take a look at their manpages if you have any +doubts about how to use them. + +There are two functions that differ from the rest, which are +.BR jfsck() " and " jfsck_cleanup() . + +The first one, +.BR jfsck() , +is used to perform journal checking and recovery in case of a crash. It must +be performed when nobody else is using the file (like in the case of a +filesystem which can't be mounted), and it returns 0 if success or an error +code != 0 in case of a failure. If it succeeded, it will fill jfsck_result +summarizing the outcome of the operation. The error codes can be either +.I J_ENOENT +(no such file), +.I J_ENOJOURNAL +(no journal associated with that file) or +.I J_ENOMEM +(not enough free memory). There is also a program named +.I jiofsck +which is just a simple human frontend to this function. + +The second, +.BR jfsck_cleanup() , +is intended to be used after +.B jfsck() +by programs wanting to remove all the stall transaction files and leave the +journal directory ready to use. After calling +.BR jfsck() , +the transaction files will no longer be needed, so by cleaning up the +directory you make sure you're starting over with a clean journal. It returns +0 if there was an error, or 1 if it succeeded. The aforementioned +.I jiofsck +can also optionally invoke this function after performing the regular checks. + +.SS UNIX-alike API + +The UNIX-alike API, as explained before, consists of the functions +.BR jread() ", " jpread() ", " jreadv() ", " jwrite() ", " jpwrite() ", " +.BR jwritev() ", " jtruncate() . They are all exactly like the UNIX equivalent (if you still don't get it, take the initial 'j' out), and behave the same way, with the only exception that -instead of a file descriptor you need to pass a pointer to a struct jfs (just -like jopen() and jclose()). Again, I will not duplicate the manpage for all -these functions, just refer to the regular UNIX versions to see how to use -them, they all have the same semantics and behave the same way. +instead of a file descriptor you need to pass a pointer to a +.IR "struct jfs" . +Again, I will not duplicate the manpage for all these functions, just refer to +the regular UNIX versions to see how to use them, they all have the same +semantics and behave the same way. -.SH Basic functions +.SS BASIC FUNCTIONS The basic functions are the ones which manipulate transactions directly; they -are five: jtrans_init(), jtrans_add(), jtrans_commit(), jtrans_rollback() and -jtrans_free(). These are intended to be use in special situations where your -application needs direct control over the transactions. - -jtrans_init() and jtrans_free() just initialize and free a given transaction, -the former should be called prior any use, and the latter when you want to -destroy a transaction. Note that jtrans_free() is not a disk operation, but -only frees the pointers that were previously allocated by the library; all -disk operations are performed by the other two functions. They have no return -value. - -jtrans_add() is used to add operations to a transaction, and it takes the same -parameters as the pwrite() call. It gets a buffer, it's lenght and the offset -where it should be applied, and adds it to the transaction. You can add -multiple operations to a transaction, and they will be applied in order. -Operation within the same transaction must not overlap; if they do, commiting -the transaction will fail. - -jtrans_commit() is in charge of commiting the given transaction, and after its -return the data has been saved to the disk atomically. It returns the number -of bytes written or -1 if there was an error. - -jtrans_rollback() reverses a transaction that was applied with -jtrans_commit(), and leaves the file as it was before applying it. Be very -very careful with this function, it's quite dangerous if you don't know for -sure that you're doing the right thing. It returns as jtrans_commit(). - -.SH BUGS - -None that I'm aware of, but if you find one please let me know at -albertito@blitiri.com.ar. +are five: +.BR jtrans_init() ", " jtrans_add() ", " jtrans_commit() ", " jtrans_rollback() +and +.BR jtrans_free() . +These are intended to be use where your application requires direct control +over the transactions. + +.BR jtrans_init() " and " jtrans_free() +just initialize and free a given transaction; the former should be called +prior any use, and the latter when you want to destroy a transaction. Note +that +.B jtrans_free() +is not a disk operation, but only frees the pointers that were previously +allocated by the library; all disk operations are performed by the other two +functions. + +.B jtrans_add() +is used to add operations to a transaction, and it takes the same parameters +as +.BR pwrite() . +It gets a buffer, its lenght and the offset where it should be applied, and +adds it to the transaction. You can add multiple operations to a transaction, +and they will be applied in order. + +.B jtrans_commit() +commits the given transaction to disk. After it has returned, data has been +saved to the disk. It returns the number of bytes written or -1 if there was +an error. The commit operation is atomic with regards to other read or write +operations on different processes, as long as they all access it via libjio. + +.B jtrans_rollback() +reverses a transaction that was applied with +.BR jtrans_commit() , +and leaves the file as it was before applying it. Be very very careful with +this function, it's quite dangerous if you don't know for sure that you're +doing the right thing. It returns the number of bytes written or -1 if there +was an error. .SH SEE ALSO @@ -193,3 +203,10 @@ albertito@blitiri.com.ar. .BR pwrite (2), .BR ftruncate (2), .BR close (2) + +.SH BUGS + +None that I'm aware of, but if you find one please let me know at +If you want to report bugs, or have any questions or comments, just let me +know at albertito@blitiri.com.ar. + diff --git a/doc/libjio.html b/doc/libjio.html deleted file mode 100644 index e43e1d8..0000000 --- a/doc/libjio.html +++ /dev/null @@ -1,444 +0,0 @@ -<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"> - -<!--Converted with LaTeX2HTML 2002-2-1 (1.70) -original version by: Nikos Drakos, CBLU, University of Leeds -* revised and updated by: Marcus Hennecke, Ross Moore, Herb Swan -* with significant contributions from: - Jens Lippmann, Marek Rouchal, Martin Wilck and others --> -<HTML> -<HEAD> -<TITLE>libjio - A library for journaled I/O </TITLE> -<META NAME="description" CONTENT="libjio - A library for journaled I/O "> -<META NAME="keywords" CONTENT="libjio"> -<META NAME="resource-type" CONTENT="document"> -<META NAME="distribution" CONTENT="global"> - -<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1"> -<META NAME="Generator" CONTENT="LaTeX2HTML v2002-2-1"> -<META HTTP-EQUIV="Content-Style-Type" CONTENT="text/css"> - -<LINK REL="STYLESHEET" HREF="libjio.css"> - -</HEAD> - -<BODY > -<!--Navigation Panel--> -<IMG WIDTH="81" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="next_inactive" - SRC="file:/usr/lib/latex2html/icons/nx_grp_g.png"> -<IMG WIDTH="26" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="up" - SRC="file:/usr/lib/latex2html/icons/up_g.png"> -<IMG WIDTH="63" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="previous" - SRC="file:/usr/lib/latex2html/icons/prev_g.png"> -<BR> -<BR> -<BR> -<!--End of Navigation Panel--> - -<P> - -<P> - -<P> -<H1 ALIGN="CENTER">libjio - A library for journaled I/O </H1> -<DIV> - -<P ALIGN="CENTER"><STRONG>Alberto Bertogli (albertito@blitiri.com.ar) </STRONG></P> -</DIV> -<BR> - -<H2><A NAME="SECTION00010000000000000000"> -Contents</A> -</H2> -<!--Table of Contents--> - -<UL> -<LI><A NAME="tex2html12" - HREF="libjio.html#SECTION00020000000000000000">1 Introduction</A> -<LI><A NAME="tex2html13" - HREF="libjio.html#SECTION00030000000000000000">2 General on-disk data organization</A> -<UL> -<LI><A NAME="tex2html14" - HREF="libjio.html#SECTION00031000000000000000">2.1 The transaction file</A> -</UL> -<BR> -<LI><A NAME="tex2html15" - HREF="libjio.html#SECTION00040000000000000000">3 The commit procedure</A> -<LI><A NAME="tex2html16" - HREF="libjio.html#SECTION00050000000000000000">4 The rollback procedure</A> -<LI><A NAME="tex2html17" - HREF="libjio.html#SECTION00060000000000000000">5 The recovery procedure</A> -<LI><A NAME="tex2html18" - HREF="libjio.html#SECTION00070000000000000000">6 High-level functions</A> -<LI><A NAME="tex2html19" - HREF="libjio.html#SECTION00080000000000000000">7 ACID (or How does libjio fit into theory)</A> -<LI><A NAME="tex2html20" - HREF="libjio.html#SECTION00090000000000000000">8 Working from outside</A> -</UL> -<!--End of Table of Contents--> - -<P> - -<H1><A NAME="SECTION00020000000000000000"> -1 Introduction</A> -</H1> - -<P> -<I>libjio</I> is a library for doing journaled transaction-oriented -I/O, providing atomicity warantees and a simple to use but powerful -API. - -<P> -This document explains the design of the library, how it works internally -and why it works that way. You should read it even if you don't plan -to do use the library in strange ways, it provides (or at least tries -to =) an insight view on how the library performs its job, which can -be very valuable knowledge when working with it. - -<P> -To the user, libjio provides two groups of functions, one UNIX-alike -that implements the journaled versions of the classic functions (<I>open()</I>, -<I>read()</I>, <I>write()</I> and friends); and a lower-level one -that center on transactions and allows the user to manipulate them -directly by providing means of commiting and rollbacking. The former, -as expected, are based on the latter and interact safely with them. -Besides, it's designed in a way that allows efficient and safe interaction -with I/O performed from outside the library in case you want to. - -<P> -The following sections describe different concepts and procedures -that the library bases its work on. It's not intended to be a replace -to reading the source code: please do so if you have any doubts, it's -not big at all (less than 800 lines, including comments) and I hope -it's readable enough. If you think that's not the case, please let -me know and I'll try to give you a hand. - -<P> - -<H1><A NAME="SECTION00030000000000000000"> -2 General on-disk data organization</A> -</H1> - -<P> -On the disk, the file you are working on will look exactly as you -expect and hasn't got a single bit different that what you would get -using the regular API. But, besides the working file, you will find -a directory named after it where the journaling information lives. - -<P> -Inside, there are two kind of files: the lock file and transaction -files. The first one is used as a general lock and holds the next -transaction ID to assign, and there is only one; the second one holds -one transaction, which is composed by a header of fixed size and a -variable-size payload, and can be as many as in-flight transactions. - -<P> -This impose some restrictions to the kind of operations you can perform -over a file while it's currently being used: you can't move it (because -the journal directory name depends on the filename) and you can't -unlink it (for similar reasons). - -<P> -This warnings are no different from a normal simultaneous use under -classic UNIX environments, but they are here to remind you that even -tho the library warantees a lot and eases many things from its user -(specially from complex cases, like multiple threads using the file -at the same time), you should still be careful when doing strange -things with files while working on them. - -<P> - -<H2><A NAME="SECTION00031000000000000000"> -2.1 The transaction file</A> -</H2> - -<P> -The transaction file is composed of two main parts: the header and -the payload. - -<P> -The header holds basic information about the transaction itself, including -the ID, some flags, the offset to commit to and the lenght of the -data. The payload holds the data, in three parts: user-defined data, -previous data, and real data. - -<P> -User-defined data is not used by the library itself, but it's a space -where the user can save private data that can be useful later. Previous -data is saved by the library prior applying the commit, so transactions -can be rollbacked. Real data is just the data to save to the disk, -and it is saved because if a crash occurs when while we are applying -the transaction we can recover gracefuly. - -<P> - -<H1><A NAME="SECTION00040000000000000000"> -3 The commit procedure</A> -</H1> - -<P> -We call "commit" to the action of <I>safely</I> -and <I>atomically</I> write some given data to the disk. - -<P> -The former, <I>safely</I>, means that after a commit has been done -we can assume the data will not get lost and can be retrieved, unless -of course some major event happens (like a hardware failure). For -us, this means that the data was effectively written to the disk and -if a crash occurs after the commit operation has returned, the operation -will be complete and data will be available from the file. - -<P> -The latter, <I>atomically</I>, warantees that the operation is either -completely done, or not done at all. This is a really common word, -specially if you have worked with multiprocessing, and should be quite -familiar. We implement atomicity by combining fine-grained locks and -journaling, which can assure us both to be able to recover from crashes, -and to have exclusive access to a portion of the file without having -any other transaction overlap it. - -<P> -Well, so much for talking, now let's get real; libjio applies commits -in a very simple and straightforward way, inside <I>jtrans_commit()</I>: - -<P> - -<UL> -<LI>Lock the section where the commit takes place -</LI> -<LI>Open the transaction file -</LI> -<LI>Write the header -</LI> -<LI>Write the user data (if any) -</LI> -<LI>Read the previous data from the file -</LI> -<LI>Write the previous data in the transaction -</LI> -<LI>Write the data to the file -</LI> -<LI>Mark the transaction as commited by setting a flag in the header -</LI> -<LI>Unlink the transaction file -</LI> -<LI>Unlock the section where the commit takes place -</LI> -</UL> -This may look as a lot of steps, but they're not as much as it looks -like inside the code, and allows a recovery from interruptions in -every step of the way (or even in the middle of a step). - -<P> - -<H1><A NAME="SECTION00050000000000000000"> -4 The rollback procedure</A> -</H1> - -<P> -First of all, rollbacking is like ``undo'' a commit: return the -data to the state it had exactly before a given commit was applied. -Due to the way we handle commits, doing this operation becomes quite -simple and straightforward. - -<P> -In the previous section we said that each transaction held, besides -the data to commit to the disk, the data that was on it before commiting. -That data is saved precisely to be able to rollback. So, to rollback -a transaction all that has to be done is recover that ``previous -data'' from the transaction we want to rollback, and save it to the -disk. In the end, this ends up being a new transaction with the previous -data as the new one, so we do that: create a new transaction structure, -fill in the data from the transaction we want to rollback, and commit -it. All this is performed by <I>jtrans_rollback()</I>. - -<P> -By doing this we can provide the same warranties a commit has, it's -really fast, eases the recovery, and the code is simple and clean. -What a deal. - -<P> -But be aware that rollbacking is dangerous. And I really mean it: -you should <B><I>only</I></B> do it if you're really sure it's ok. -Consider, for instance, that you commit transaction A, then B, and -then you rollback A. If A and B happen to touch the same portion of -the file, the rollback will, of course, not return the state previous -to B, but previous to A. If it's not done safely, this can lead to -major corruption. Now, if you add to this transactions that extend -the file (and thus rollbacking truncates it back), you not only have -corruption but data loss. So, again, be aware, I can't stress this -enough, <B><I>rollback only if you really really know what -you are doing</I></B>. - -<P> - -<H1><A NAME="SECTION00060000000000000000"> -5 The recovery procedure</A> -</H1> - -<P> -Recovering from crashes is done by the <I>jfsck()</I> call (or the -program <I>jiofsck</I> which is just a simple invocation to that function), -which opens the file and goes through all transactions in the journal -(remember that transactions are removed from the journal directory -after they're applied), loading and rollbacking them if necessary. -There are several steps where it can fail: there could be no journal, -a given transaction file might be corrupted, incomplete, and so on; -but in the end, there are two cases regarding each transaction: either -it's complete and can be rollbacked, or not. - -<P> -In the case the transaction is not complete, there is no possibility -that it has been partially applied to the disk, remember that, from -the commit procedure, we only apply the transaction <I>after</I> saving -it in the journal, so there is really nothing left to be done. So -if the transaction is complete, we only need to rollback. - -<P> -In any case, after making the recovery you can simply remove the journal -entirely and let the library create a new one, and you can be sure -that transaction atomicity was preserved. - -<P> - -<H1><A NAME="SECTION00070000000000000000"> -6 High-level functions</A> -</H1> - -<P> -We call <I>high-level functions</I> to the ones provided by the library -that emulate the good old unix file manipulation calls. Most of them -are just wrappers around commits, and implement proper locking when -operating in order to allow simultaneous operations (either across -threads or processes). They are described in detail in the manual -pages, we'll only list them here for completion: - -<P> - -<UL> -<LI>jopen() -</LI> -<LI>jread(), jpread(), jreadv() -</LI> -<LI>jwrite(), jpwrite(), jwritev() -</LI> -<LI>jtruncate() -</LI> -<LI>jclose() -</LI> -</UL> - -<P> - -<H1><A NAME="SECTION00080000000000000000"> -7 ACID (or How does libjio fit into theory)</A> -</H1> - -<P> -I haven't read much theory about this, and the library was implemented -basically by common sense and not theorethical study. - -<P> -However, I'm aware that database people like ACID (well, that's not -news for anybody ;), which they say mean "Atomicity, Consistency, -Isolation, Durability" (yeah, right!). - -<P> -So, even libjio is not a purely database thing, it can be used to -achieve those attributes in a simple and efficient way. - -<P> -Let's take a look one by one: - -<P> - -<UL> -<LI>Atomicity: In a transaction involving two or more discrete pieces -of information, either all of the pieces are committed or none are. -This has been talked before and we've seen how the library achieves -this point, mostly based on locks and relying on a commit procedure. -</LI> -<LI>Consistency: A transaction either creates a new and valid state of -data, or, if any failure occurs, returns all data to its state before -the transaction was started. This, like atomicity, has been discussed -before, specially in the recovery section, when we saw how in case -of a crash we end up with a fully applied transaction, or no transaction -applied at all. -</LI> -<LI>Isolation: A transaction in process and not yet committed must remain -isolated from any other transaction. This comes as a side effect of -doing proper locking on the sections each transaction affect, and -guarantees that there can't be two transactions working on the same -section at the same time. -</LI> -<LI>Durability: Committed data is saved by the system such that, even -in the event of a failure and system restart, the data is available -in its correct state. For this point we rely on the disk as a method -of permanent storage, and expect that when we do syncronous I/O, data -is safely written and can be recovered after a crash. -</LI> -</UL> - -<P> - -<H1><A NAME="SECTION00090000000000000000"> -8 Working from outside</A> -</H1> - -<P> -If you want, and are careful enough, you can safely do I/O without -using the library. Here I'll give you some general guidelines that -you need to follow in order to prevent corruption. Of course you can -bend or break them according to your use, this is just a general overview -on how to interact from outside. - -<P> - -<UL> -<LI>Lock the sections you want to use: the library, as we have already -exposed, relies on fcntl locking; so, if you intend to operate on -parts on the file while using it, you should lock them. -</LI> -<LI>Don't tuncate, unlink or rename: these operations have serious implications -when they're done while using the library, because the library itself -assumes that names don't change, and files don't dissapear beneath -it. It could potentially lead to corruption, although most of the -time you would just get errors from every call. -</LI> -</UL> - -<P> - -<H1><A NAME="SECTION000100000000000000000"> -About this document ...</A> -</H1> - <STRONG>libjio - A library for journaled I/O </STRONG><P> -This document was generated using the -<A HREF="http://www.latex2html.org/"><STRONG>LaTeX</STRONG>2<tt>HTML</tt></A> translator Version 2002-2-1 (1.70) -<P> -Copyright © 1993, 1994, 1995, 1996, -<A HREF="http://cbl.leeds.ac.uk/nikos/personal.html">Nikos Drakos</A>, -Computer Based Learning Unit, University of Leeds. -<BR> -Copyright © 1997, 1998, 1999, -<A HREF="http://www.maths.mq.edu.au/~ross/">Ross Moore</A>, -Mathematics Department, Macquarie University, Sydney. -<P> -The command line arguments were: <BR> - <STRONG>latex2html</STRONG> <TT>-no_subdir -split 0 -show_section_numbers /tmp/lyx_tmpdir2441q5CgGo/lyx_tmpbuf0/libjio.tex</TT> -<P> -The translation was initiated by root on 2004-04-30<HR> -<!--Navigation Panel--> -<IMG WIDTH="81" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="next_inactive" - SRC="file:/usr/lib/latex2html/icons/nx_grp_g.png"> -<IMG WIDTH="26" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="up" - SRC="file:/usr/lib/latex2html/icons/up_g.png"> -<IMG WIDTH="63" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="previous" - SRC="file:/usr/lib/latex2html/icons/prev_g.png"> -<BR> -<!--End of Navigation Panel--> -<ADDRESS> -root -2004-04-30 -</ADDRESS> -</BODY> -</HTML> diff --git a/doc/libjio.lyx b/doc/libjio.lyx deleted file mode 100644 index cfa8df0..0000000 --- a/doc/libjio.lyx +++ /dev/null @@ -1,433 +0,0 @@ -#LyX 1.3 created this file. For more info see http://www.lyx.org/ -\lyxformat 221 -\textclass article -\language english -\inputencoding auto -\fontscheme default -\graphics default -\paperfontsize default -\papersize Default -\paperpackage a4 -\use_geometry 0 -\use_amsmath 0 -\use_natbib 0 -\use_numerical_citations 0 -\paperorientation portrait -\secnumdepth 3 -\tocdepth 3 -\paragraph_separation indent -\defskip medskip -\quotes_language english -\quotes_times 2 -\papercolumns 1 -\papersides 1 -\paperpagestyle default - -\layout Title - -libjio - A library for journaled I/O -\layout Author - -Alberto Bertogli (albertito@blitiri.com.ar) -\layout Standard - - -\begin_inset LatexCommand \tableofcontents{} - -\end_inset - - -\layout Section - -Introduction -\layout Standard - - -\emph on -libjio -\emph default - is a library for doing journaled transaction-oriented I/O, providing atomicity - warantees and a simple to use but powerful API. -\layout Standard - -This document explains the design of the library, how it works internally - and why it works that way. - You should read it even if you don't plan to do use the library in strange - ways, it provides (or at least tries to =) an insight view on how the library - performs its job, which can be very valuable knowledge when working with - it. - It assumes that there is some basic knowledge about how the library is - used, which can be found in the manpage or in the programmer's guide. -\layout Standard - -To the user, libjio provides two groups of functions, one UNIX-alike that - implements the journaled versions of the classic functions ( -\emph on -open() -\emph default -, -\emph on -read() -\emph default -, -\emph on -write() -\emph default - and friends); and a lower-level one that center on transactions and allows - the user to manipulate them directly by providing means of commiting and - rollbacking. - The former, as expected, are based on the latter and interact safely with - them. - Besides, it's designed in a way that allows efficient and safe interaction - with I/O performed from outside the library in case you want to. -\layout Standard - -The following sections describe different concepts and procedures that the - library bases its work on. - It's not intended to be a replace to reading the source code: please do - so if you have any doubts, it's not big at all (less than 1500 lines, including - comments) and I hope it's readable enough. - If you think that's not the case, please let me know and I'll try to give - you a hand. -\layout Section - -General on-disk data organization -\layout Standard - -On the disk, the file you are working on will look exactly as you expect - and hasn't got a single bit different that what you would get using the - regular API. - But, besides the working file, you will find a directory named after it - where the journaling information lives. - -\layout Standard - -Inside, there are two kind of files: the lock file and transaction files. - The first one is used as a general lock and holds the next transaction - ID to assign, and there is only one; the second one holds one transaction, - which is composed by a header of fixed size and a variable-size payload, - and can be as many as in-flight transactions. - -\layout Standard - -This impose some restrictions to the kind of operations you can perform - over a file while it's currently being used: you can't move it (because - the journal directory name depends on the filename) and you can't unlink - it (for similar reasons). - -\layout Standard - -This warnings are no different from a normal simultaneous use under classic - UNIX environments, but they are here to remind you that even tho the library - warantees a lot and eases many things from its user (specially from complex - cases, like multiple threads using the file at the same time), you should - still be careful when doing strange things with files while working on - them. - -\layout Subsection - -The transaction file -\layout Standard - -The transaction file is composed of two main parts: the header and the payload. -\layout Standard - -The header holds basic information about the transaction itself, including - the ID, some flags, and the amount of operations it includes. - Then the payload has all the operations one after the other, divided in - two parts: the first one includes static information about the operation - (the lenght of the data, the offset of the file where it should be applied, - etc.) and the data itself, which is saved by the library prior applying - the commit, so transactions can be rollbacked. -\layout Section - -The commit procedure -\layout Standard - -We call "commit" to the action of -\emph on -safely -\emph default - and -\emph on -atomically -\emph default - write some given data to the disk. -\layout Standard - -The former, -\emph on -safely -\emph default -, means that after a commit has been done we can assume the data will not - get lost and can be retrieved, unless of course some major event happens - (like a physical hard disk crash). - For us, this means that the data was effectively written to the disk and - if a crash occurs after the commit operation has returned, the operation - will be complete and data will be available from the file. -\layout Standard - -The latter, -\emph on -atomically -\emph default -, guarantees that the operation is either completely done, or not done at - all. - This is a really common word, specially if you have worked with multiprocessing -, and should be quite familiar. - We implement atomicity by combining fine-grained locks and journaling, - which can assure us both to be able to recover from crashes, and to have - exclusive access to a portion of the file without having any other transaction - overlap it. -\layout Standard - -Well, so much for talking, now let's get real; libjio applies commits in - a very simple and straightforward way, inside -\emph on -jtrans_commit() -\emph default -: -\layout Itemize - -Lock the file offsets where the commit takes place -\layout Itemize - -Open the transaction file -\layout Itemize - -Write the header -\layout Itemize - -Read all the previous data from the file -\layout Itemize - -Write the previous data in the transaction -\layout Itemize - -Write the data to the file -\layout Itemize - -Mark the transaction as commited by setting a flag in the header -\layout Itemize - -Unlink the transaction file -\layout Itemize - -Unlock the offsets where the commit takes place -\layout Standard - -This may look as a lot of steps, but they're not as much as it looks like - inside the code, and allows a recovery from interruptions in every step - of the way (or even in the middle of a step). -\layout Section - -The rollback procedure -\layout Standard - -First of all, rollbacking is like -\begin_inset Quotes eld -\end_inset - -undo -\begin_inset Quotes erd -\end_inset - - a commit: return the data to the state it had exactly before a given commit - was applied. - Due to the way we handle commits, doing this operation becomes quite simple - and straightforward. -\layout Standard - -In the previous section we said that each transaction held the data that - was on it before commiting. - That data is saved precisely to be able to rollback. - So, to rollback a transaction all that has to be done is recover that -\begin_inset Quotes eld -\end_inset - -previous data -\begin_inset Quotes erd -\end_inset - - from the transaction we want to rollback, and save it to the disk. - In the end, this ends up being a new transaction with the previous data - as the new one, so we do that: create a new transaction structure, fill - in the data from the transaction we want to rollback, and commit it. - All this is performed by -\emph on -jtrans_rollback() -\emph default -. -\layout Standard - -By doing this we can provide the same warranties a commit has, it's really - fast, eases the recovery, and the code is simple and clean. - What a deal. -\layout Standard - -But be aware that rollbacking is dangerous. - And I really mean it: you should -\series bold -\emph on -only -\series default -\emph default - do it if you're really sure it's ok. - Consider, for instance, that you commit transaction A, then B, and then - you rollback A. - If A and B happen to touch the same portion of the file, the rollback will, - of course, not return the state previous to B, but previous to A. - If it's not done safely, this can lead to major corruption. - Now, if you add to this transactions that extend the file (and thus rollbacking - truncates it back), you not only have corruption but data loss. - So, again, be aware, I can't stress this enough, -\series bold -\emph on -rollback only if you really really know what you are doing -\series default -\emph default -. -\layout Section - -The recovery procedure -\layout Standard - -Recovering from crashes is done by the -\emph on -jfsck() -\emph default - call (or the program -\emph on -jiofsck -\emph default - which is just a simple invocation to that function), which opens the file - and goes through all transactions in the journal (remember that transactions - are removed from the journal directory after they're applied), loading - and rollbacking them if necessary. - There are several steps where it can fail: there could be no journal, a - given transaction file might be corrupted, incomplete, and so on; but in - the end, there are two cases regarding each transaction: either it's complete - and can be rollbacked, or not. -\layout Standard - -In the case the transaction is not complete, there is no possibility that - it has been partially applied to the disk, remember that, from the commit - procedure, we only apply the transaction -\emph on -after -\emph default - saving it in the journal, so there is really nothing left to be done. - So if the transaction is complete, we only need to rollback. -\layout Standard - -In any case, after making the recovery you can simply remove the journal - entirely and let the library create a new one, and you can be sure that - transaction atomicity was preserved. -\layout Section - -UNIX API -\layout Standard - -We call -\emph on -UNIX API -\emph default - to the functions provided by the library that emulate the good old UNIX - file manipulation calls. - Most of them are just wrappers around commits, and implement proper locking - when operating in order to allow simultaneous operations (either across - threads or processes). - They are described in detail in the manual pages, we'll only list them - here for completion: -\layout Itemize - -jopen() -\layout Itemize - -jread(), jpread(), jreadv() -\layout Itemize - -jwrite(), jpwrite(), jwritev() -\layout Itemize - -jtruncate() -\layout Itemize - -jclose() -\layout Section - -ACID (or How does libjio fit into theory) -\layout Standard - -I haven't read much theory about this, and the library was implemented basically - by common sense and not theorethical study. - -\layout Standard - -However, I'm aware that database people like ACID (well, that's not news - for anybody ;), which they say mean "Atomicity, Consistency, Isolation, - Durability" (yeah, right!). - -\layout Standard - -So, even libjio is not a purely database thing, it can be used to achieve - those attributes in a simple and efficient way. - -\layout Standard - -Let's take a look one by one: -\layout Itemize - -Atomicity: In a transaction involving two or more discrete pieces of information -, either all of the pieces are committed or none are. - This has been talked before and we've seen how the library achieves this - point, mostly based on locks and relying on a commit procedure. -\layout Itemize - -Consistency: A transaction either creates a new and valid state of data, - or, if any failure occurs, returns all data to its state before the transaction - was started. - This, like atomicity, has been discussed before, specially in the recovery - section, when we saw how in case of a crash we end up with a fully applied - transaction, or no transaction applied at all. -\layout Itemize - -Isolation: A transaction in process and not yet committed must remain isolated - from any other transaction. - This comes as a side effect of doing proper locking on the sections each - transaction affect, and guarantees that there can't be two transactions - working on the same section at the same time. -\layout Itemize - -Durability: Committed data is saved by the system such that, even in the - event of a failure and system restart, the data is available in its correct - state. - For this point we rely on the disk as a method of permanent storage, and - expect that when we do syncronous I/O, data is safely written and can be - recovered after a crash. -\layout Section - -Working from outside -\layout Standard - -If you want, and are careful enough, you can safely do I/O without using - the library. - Here I'll give you some general guidelines that you need to follow in order - to prevent corruption. - Of course you can bend or break them according to your use, this is just - a general overview on how to interact from outside. - -\layout Itemize - -Lock the sections you want to use: the library, as we have already exposed, - relies on fcntl locking; so, if you intend to operate on parts on the file - while using it, you should lock them. - -\layout Itemize - -Don't tuncate, unlink or rename: these operations have serious implications - when they're done while using the library, because the library itself assumes - that names don't change, and files don't dissapear beneath it. - It could potentially lead to corruption, although most of the time you - would just get errors from every call. -\the_end diff --git a/doc/libjio.rst b/doc/libjio.rst new file mode 100644 index 0000000..32de123 --- /dev/null +++ b/doc/libjio.rst @@ -0,0 +1,244 @@ + +libjio - A library for journaled I/O +====================================== + +Introduction +------------ + +libjio is a library for doing journaled, transaction-oriented I/O, providing +atomicity warantees and a simple to use but powerful API. + +This document explains the design of the library, how it works internally and +why it works that way. You should read it even if you don't plan to do use the +library in strange ways, it provides (or at least tries to) an insight view on +how the library performs its job, which can be very valuable knowledge when +working with it. It assumes that there is some basic knowledge about how the +library is used, which can be found in the manpage or in the programmer's +guide. + +To the user, libjio provides two groups of functions, one UNIX-alike that +implements the journaled versions of the classic functions (open(), read(), +write() and friends); and a lower-level one that center on transactions and +allows the user to manipulate them directly by providing means of commiting +and rollbacking. The former, as expected, are based on the latter and interact +safely with them. Besides, it's designed in a way that allows efficient and +safe interaction with I/O performed from outside the library in case you want +to. + +The following sections describe different concepts and procedures that the +library bases its work on. It's not intended to be a replace to reading the +source code: please do so if you have any doubts, it's not big at all (less +than 1500 lines, including comments) and I hope it's readable enough. If you +think that's not the case, please let me know and I'll try to give you a hand. + + +General on-disk data organization +--------------------------------- + +On the disk, the file you are working on will look exactly as you expect and +hasn't got a single bit different that what you would get using the regular +UNIX API. But, besides the working file, you will find a directory named after +it where the journaling information lives. + +Inside, there are two kind of files: the lock file and transaction files. The +first one is used as a general lock and holds the next transaction ID to +assign, and there is only one; the second one holds one transaction, which is +composed by a header of fixed size and a variable-size payload, and can be as +many as in-flight transactions. + +This impose some restrictions to the kind of operations you can perform over a +file while it's currently being used: you can't move it (because the journal +directory name depends on the filename) and you can't unlink it (for similar +reasons). + +This warnings are no different from a normal simultaneous use under classic +UNIX environments, but they are here to remind you that even tho the library +warranties a lot and eases many things from its user, you should still be +careful when doing strange things with files while working on them. + +The transaction file +~~~~~~~~~~~~~~~~~~~~ + +The transaction file is composed of two main parts: the header and the +payload. + +The header holds basic information about the transaction itself, including the +ID, some flags, and the amount of operations it includes. Then the payload has +all the operations one after the other, divided in two parts: the first one +includes static information about the operation (the length of the data, the +offset of the file where it should be applied, etc.) and the data itself, +which is saved by the library prior applying the commit, so transactions can +be rollbacked. + +At the end of the transaction file, a checksum is stored, to detect journal +corruption. + + +The commit procedure +-------------------- + +We call *commit* to the action of safely and atomically write some given data +to the disk. + +The former, "safely", means that after a commit has been done we can assume +the data will not get lost and can be retrieved, unless of course some major +event happens (like a physical hard disk crash). For us, this means that the +data was effectively written to the disk and if a crash occurs after the +commit operation has returned, the operation will be complete and data will be +available from the file. + +The latter, "atomically", guarantees that the operation is either completely +done, or not done at all. This is a really common word, specially if you have +worked with multiprocessing, and should be quite familiar. We implement +atomicity by combining fine-grained locks and journaling, which can assure us +both to be able to recover from crashes, and to have exclusive access to a +portion of the file without having any other transaction overlap it. + +Well, so much for talking, now let's get real; libjio applies commits in a +very simple and straightforward way, inside jtrans_commit(): + + - Lock the file offsets where the commit takes place + - Open the transaction file + - Write the header + - Read all the previous data from the file + - Write the previous data in the transaction + - Write the data to the file + - Mark the transaction as committed by setting a flag in the header + - Unlink the transaction file + - Unlock the offsets where the commit takes place + +This may seem like a lot of steps, but they're not as much as it looks like +inside the code, and allows a recovery from interruptions in every step of the +way, and even in the middle of a step. + + +The rollback procedure +---------------------- + +First of all, rollbacking is like "undo" a commit: returns the data to the +state it had exactly before a given commit was applied. Due to the way we +handle commits, doing this operation becomes quite simple and straightforward. + +In the previous section we said that each transaction held the data that was +on it before commiting. That data saved is precisely the one we need to be +able to rollback. + +So, to rollback a transaction all that has to be done is recover the +previous data from the transaction we want to rollback, and save it to the +disk. In the end, this ends up being a new transaction with the previous data +as the new one, and that's how it's done: create a new transaction structure, +fill in the data from the transaction we want to rollback, and commit it. All +this is performed by jtrans_rollback(). + +By doing this we can provide the same warranties a commit has, it's really +fast, eases the recovery, and the code is simple and clean. What a deal. + +But be aware that rollbacking is dangerous. And I really mean it: you should +only do it if you're really sure it's ok. Consider, for instance, that you +commit transaction A, then B, and then you rollback A. If A and B happen to +touch the same portion of the file, the rollback will, of course, not return +the state previous to B, but previous to A. + +If it's not done safely, this can lead to major corruption. Now, if you add to +this transactions that extend the file (and thus rollbacking truncates it +back), it gets even worse. So, again, be aware, I can't stress this enough, +rollback only if you really really know what you are doing. + + +The recovery procedure +---------------------- + +Recovering from crashes is done by the jfsck() call (or the program *jiofsck* +which is just a simple invocation to that function), which opens the file and +goes through all transactions in the journal (remember that transactions are +removed from the journal directory after they're applied), loading and +rollbacking them if necessary. There are several steps where it can fail: +there could be no journal, a given transaction file might be corrupted, +incomplete, and so on; but in the end, there are two cases regarding each +transaction: either it's complete and can be rollbacked, or not. + +In the case the transaction file was not completely written, there is no +possibility that it has been partially applied to the disk: remember that, +from the commit procedure, we only apply the transaction after saving it in +the journal, so there is really nothing left to be done. So if the transaction +is complete, we only need to rollback. + +In any case, after making the recovery you can simply remove the journal +entirely and let the library create a new one, and you can be sure that +transaction atomicity was preserved. You can use jfsck_cleanup() for that +purpose. + + +UNIX-alike API +-------------- + +We call UNIX-alike API to the functions provided by the library that emulate +the good old UNIX file manipulation calls. Most of them are just wrappers +around commits, and implement proper locking when operating in order to allow +simultaneous operations (either across threads or processes). They are +described in detail in the manual pages, we'll only list them here for +completion: + + - jopen() + - jread(), jpread(), jreadv() + - jwrite(), jpwrite(), jwritev() + - jtruncate() + - jclose() + + +ACID warranties +--------------- + +Database people like ACID (well, that's not news for anybody), which they say +mean "Atomicity, Consistency, Isolation, Durability". + +So, even when libjio is not a purely database thing, its transactions provide +those properties. Let's take a look one by one: + +Atomicity + In a transaction involving two or more discrete pieces of information, + either all of the pieces are committed or none are. This has been talked + before and we've seen how the library achieves this point, mostly based on + locks and relying on a commit procedure. + +Consistency + A transaction either creates a new and valid state of data, or, if any + failure occurs, returns all data to its state before the transaction was + started. This, like atomicity, has been discussed before, specially in the + recovery section, when we saw how in case of a crash we end up with a fully + applied transaction, or no transaction applied at all. + +Isolation + A transaction in process and not yet committed must remain isolated from any + other transaction. This comes as a side effect of doing proper locking on + the sections each transaction affect, and guarantees that there can't be two + transactions working on the same section at the same time. + +Durability + Committed data is saved by the system such that, even in the event of a + failure, the data is available in a correct state. To provide this, libjio + relies on the disk as a method of permanent storage, and expects that when + it does syncronous I/O, data is safely written and can be recovered after a + crash. + + +Working from outside +-------------------- + +If you want, and are careful enough, you can safely use the library and still +do I/O using the regular UNIX calls. + +This section provides some general guidelines that you need to follow in order +to prevent corruption. Of course you can bend or break them according to your +use, this is just a general overview on how to interact from outside. + + - Lock the sections you want to use: the library, as we have already exposed, + relies on fcntl() locking; so, if you intend to operate on parts on the + file while using it, you should lock them. + - Don't truncate, unlink or rename: these operations have serious + implications when they're done while using the library, because the library + itself assumes that names don't change, and files don't disappear from + underneath it. It could potentially lead to corruption, although most of + the time you would just get errors from every call. + + diff --git a/doc/libjio.txt b/doc/libjio.txt deleted file mode 100644 index 10797f8..0000000 --- a/doc/libjio.txt +++ /dev/null @@ -1,305 +0,0 @@ -libjio - A library for journaled I/O - -Alberto Bertogli (albertito@blitiri.com.ar) - -Table of Contents - -1 Introduction -2 General on-disk data organization - 2.1 The transaction file -3 The commit procedure -4 The rollback procedure -5 The recovery procedure -6 UNIX API -7 ACID (or How does libjio fit into theory) -8 Working from outside - - - -1 Introduction - -libjio is a library for doing journaled -transaction-oriented I/O, providing atomicity warantees -and a simple to use but powerful API. - -This document explains the design of the library, how -it works internally and why it works that way. You -should read it even if you don't plan to do use the -library in strange ways, it provides (or at least tries -to =) an insight view on how the library performs its -job, which can be very valuable knowledge when working -with it. It assumes that there is some basic knowledge -about how the library is used, which can be found in -the manpage or in the programmer's guide. - -To the user, libjio provides two groups of functions, -one UNIX-alike that implements the journaled versions -of the classic functions (open(), read(), write() and -friends); and a lower-level one that center on -transactions and allows the user to manipulate them -directly by providing means of commiting and -rollbacking. The former, as expected, are based on the -latter and interact safely with them. Besides, it's -designed in a way that allows efficient and safe -interaction with I/O performed from outside the library -in case you want to. - -The following sections describe different concepts and -procedures that the library bases its work on. It's not -intended to be a replace to reading the source code: -please do so if you have any doubts, it's not big at -all (less than 1500 lines, including comments) and I -hope it's readable enough. If you think that's not the -case, please let me know and I'll try to give you a hand. - -2 General on-disk data organization - -On the disk, the file you are working on will look -exactly as you expect and hasn't got a single bit -different that what you would get using the regular -API. But, besides the working file, you will find a -directory named after it where the journaling -information lives. - -Inside, there are two kind of files: the lock file and -transaction files. The first one is used as a general -lock and holds the next transaction ID to assign, and -there is only one; the second one holds one -transaction, which is composed by a header of fixed -size and a variable-size payload, and can be as many as -in-flight transactions. - -This impose some restrictions to the kind of operations -you can perform over a file while it's currently being -used: you can't move it (because the journal directory -name depends on the filename) and you can't unlink it -(for similar reasons). - -This warnings are no different from a normal -simultaneous use under classic UNIX environments, but -they are here to remind you that even tho the library -warantees a lot and eases many things from its user -(specially from complex cases, like multiple threads -using the file at the same time), you should still be -careful when doing strange things with files while -working on them. - -2.1 The transaction file - -The transaction file is composed of two main parts: the -header and the payload. - -The header holds basic information about the -transaction itself, including the ID, some flags, and -the amount of operations it includes. Then the payload -has all the operations one after the other, divided in -two parts: the first one includes static information -about the operation (the lenght of the data, the offset -of the file where it should be applied, etc.) and the -data itself, which is saved by the library prior -applying the commit, so transactions can be rollbacked. - -3 The commit procedure - -We call "commit" to the action of safely and atomically -write some given data to the disk. - -The former, safely, means that after a commit has been -done we can assume the data will not get lost and can -be retrieved, unless of course some major event happens -(like a physical hard disk crash). For us, this means -that the data was effectively written to the disk and -if a crash occurs after the commit operation has -returned, the operation will be complete and data will -be available from the file. - -The latter, atomically, guarantees that the operation -is either completely done, or not done at all. This is -a really common word, specially if you have worked with -multiprocessing, and should be quite familiar. We -implement atomicity by combining fine-grained locks and -journaling, which can assure us both to be able to -recover from crashes, and to have exclusive access to a -portion of the file without having any other -transaction overlap it. - -Well, so much for talking, now let's get real; libjio -applies commits in a very simple and straightforward -way, inside jtrans_commit(): - -* Lock the file offsets where the commit takes place - -* Open the transaction file - -* Write the header - -* Read all the previous data from the file - -* Write the previous data in the transaction - -* Write the data to the file - -* Mark the transaction as commited by setting a flag in - the header - -* Unlink the transaction file - -* Unlock the offsets where the commit takes place - -This may look as a lot of steps, but they're not as -much as it looks like inside the code, and allows a -recovery from interruptions in every step of the way -(or even in the middle of a step). - -4 The rollback procedure - -First of all, rollbacking is like "undo" a commit: return -the data to the state it had exactly before a given -commit was applied. Due to the way we handle commits, -doing this operation becomes quite simple and straightforward. - -In the previous section we said that each transaction -held the data that was on it before commiting. That -data is saved precisely to be able to rollback. So, to -rollback a transaction all that has to be done is -recover that "previous data" from the transaction we want -to rollback, and save it to the disk. In the end, this -ends up being a new transaction with the previous data -as the new one, so we do that: create a new transaction -structure, fill in the data from the transaction we -want to rollback, and commit it. All this is performed -by jtrans_rollback(). - -By doing this we can provide the same warranties a -commit has, it's really fast, eases the recovery, and -the code is simple and clean. What a deal. - -But be aware that rollbacking is dangerous. And I -really mean it: you should only do it if you're really -sure it's ok. Consider, for instance, that you commit -transaction A, then B, and then you rollback A. If A -and B happen to touch the same portion of the file, the -rollback will, of course, not return the state previous -to B, but previous to A. If it's not done safely, this -can lead to major corruption. Now, if you add to this -transactions that extend the file (and thus rollbacking -truncates it back), you not only have corruption but -data loss. So, again, be aware, I can't stress this -enough, rollback only if you really really know what -you are doing. - -5 The recovery procedure - -Recovering from crashes is done by the jfsck() call (or -the program jiofsck which is just a simple invocation -to that function), which opens the file and goes -through all transactions in the journal (remember that -transactions are removed from the journal directory -after they're applied), loading and rollbacking them if -necessary. There are several steps where it can fail: -there could be no journal, a given transaction file -might be corrupted, incomplete, and so on; but in the -end, there are two cases regarding each transaction: -either it's complete and can be rollbacked, or not. - -In the case the transaction is not complete, there is -no possibility that it has been partially applied to -the disk, remember that, from the commit procedure, we -only apply the transaction after saving it in the -journal, so there is really nothing left to be done. So -if the transaction is complete, we only need to rollback. - -In any case, after making the recovery you can simply -remove the journal entirely and let the library create -a new one, and you can be sure that transaction -atomicity was preserved. - -6 UNIX API - -We call UNIX API to the functions provided by the -library that emulate the good old UNIX file -manipulation calls. Most of them are just wrappers -around commits, and implement proper locking when -operating in order to allow simultaneous operations -(either across threads or processes). They are -described in detail in the manual pages, we'll only -list them here for completion: - -* jopen() - -* jread(), jpread(), jreadv() - -* jwrite(), jpwrite(), jwritev() - -* jtruncate() - -* jclose() - -7 ACID (or How does libjio fit into theory) - -I haven't read much theory about this, and the library -was implemented basically by common sense and not -theorethical study. - -However, I'm aware that database people like ACID -(well, that's not news for anybody ;), which they say -mean "Atomicity, Consistency, Isolation, Durability" -(yeah, right!). - -So, even libjio is not a purely database thing, it can -be used to achieve those attributes in a simple and -efficient way. - -Let's take a look one by one: - -* Atomicity: In a transaction involving two or more - discrete pieces of information, either all of the - pieces are committed or none are. This has been - talked before and we've seen how the library achieves - this point, mostly based on locks and relying on a - commit procedure. - -* Consistency: A transaction either creates a new and - valid state of data, or, if any failure occurs, - returns all data to its state before the transaction - was started. This, like atomicity, has been discussed - before, specially in the recovery section, when we - saw how in case of a crash we end up with a fully - applied transaction, or no transaction applied at all. - -* Isolation: A transaction in process and not yet - committed must remain isolated from any other - transaction. This comes as a side effect of doing - proper locking on the sections each transaction - affect, and guarantees that there can't be two - transactions working on the same section at the same time. - -* Durability: Committed data is saved by the system - such that, even in the event of a failure and system - restart, the data is available in its correct state. - For this point we rely on the disk as a method of - permanent storage, and expect that when we do - syncronous I/O, data is safely written and can be - recovered after a crash. - -8 Working from outside - -If you want, and are careful enough, you can safely do -I/O without using the library. Here I'll give you some -general guidelines that you need to follow in order to -prevent corruption. Of course you can bend or break -them according to your use, this is just a general -overview on how to interact from outside. - -* Lock the sections you want to use: the library, as we - have already exposed, relies on fcntl locking; so, if - you intend to operate on parts on the file while - using it, you should lock them. - -* Don't tuncate, unlink or rename: these operations - have serious implications when they're done while - using the library, because the library itself assumes - that names don't change, and files don't dissapear - beneath it. It could potentially lead to corruption, - although most of the time you would just get errors - from every call. diff --git a/doc/layout b/doc/source_layout similarity index 100% rename from doc/layout rename to doc/source_layout diff --git a/doc/threads b/doc/threads deleted file mode 100644 index bf4b6fb..0000000 --- a/doc/threads +++ /dev/null @@ -1,20 +0,0 @@ - -The library is entirely threadsafe. - -This will make some people who worked with threads a bit concerned, because -everybody knows that if a file descriptor is shared among threads, and two -threads decide to read/write/perform any op that moves the file pointer, a -mess is waiting to happen. And almost operations do touch the file pointer. - -But don't worry, the library is _truly_ threadsafe: it uses pread/pwrite, -which do not touch the file pointer, and allows working on the same file -simultaneously without concerns. Besides, it slightly improves performance by -having less locking, less system calls, lower overhead and less calculation to -perform the operation. - - -Still, bear in mind that if you decide to work on the file outside libjio you -need to lockf() the sections you're going to work on, because libjio relies on -lockf() locking to warantee atomicity. - - diff --git a/doc/tids b/doc/tids deleted file mode 100644 index 43f831f..0000000 --- a/doc/tids +++ /dev/null @@ -1,83 +0,0 @@ - -Transaction ID assignment procedure -Alberto Bertogli (albertito@blitiri.com.ar) -4/October/2004 ---------------------------------------------- - -This brief document describes how libjio assigns an unique number to each -transaction that identifies it univocally during its lifetime. - -It is a very delicate issue, because the rest of the library depends on the -uniqueness of the ID; it has to be coherent across threads and procesess; and -it can't take long: it serializes transaction creation (and it's the only -contention point for independent non-overlapping transactions). - - -Description ------------ - -We have two functions: get_tid() and free_tid(), which return a new -transaction ID, and mark a given transaction ID as no longer in use, -respectively. - -The main piece of the mechanism is the lockfile: a file named "lock" which -holds the maximum transaction ID in use. This file gets opened and mmap()'ed -for faster use inside jopen(). That way, we can treat it directly as an -integer holding the max tid. - -To avoid parallel modifications, we will always lock the file with fcntl() -before accessing it. - -Let's begin by describing how get_tid() works, because it's quite simple: it -locks the lockfile, gets the max tid, adds 1 to it, unlock the file and return -that value. That way, the new tid is always the new max, and with the locking -we can be sure it's impossible to assign the same tid to two different -transactions. - -After a tid has been assigned, the commit process will create a file named -after it inside the journal directory. Then, it will operate on that file all -it wants, and when the moment comes, the transaction is no longer needed and -has to be freed. - -The first thing we do is to unlink that transaction file. And then, we call -free_tid(), which will update the lockfile to represent the new max tid, in -case it has changed. - -free_tid() begins by checking that if the transaction we're freeing is the -greatest, and if not, just returns. - -But if it is, we need to find out the new max tid. We do it by "walking" the -journal directory looking for the file with the greatest number, and that's -our new max tid. If there are no files, we use 0. - - -Things to notice ----------------- - -The following is a list of small things to notice about the mechanism. They're -useful because races tend to be subtle, and I _will_ forget about them. The -descriptions are not really detailed, just enough to give a general idea. - - -* It is possible that we get in free_tid() and the transaction we want to free -is greater than the max tid. In that case, we do nothing: it's a valid -situation. How to get there: two threads about to free two tids. The first one -calls unlink() and just after its return (before it gets a chance to call -free_tid()), another thread, the holder of the current max, steps in and -performs both the unlink() and free_tid(), which would force a lookup to find -a new tid, and as in the first thread we have removed the file, the max tid -could be lower (in particular, it could be 0). This is why we only test for -equalty. - -* Unlink after free_tid() is not desirable: in that case, it'd be normal for -the tid to increment even if we have only one thread writing. It overflows -quite easily. - -* The fact that new tids are always bigger than the current max is not only -because the code is cleaner and faster: that way when recovering we know the -order to apply transactions. A nice catch: this doesn't matter if we're -working with non-overlapping transactions, but if they overlap, we know that -it's impossible that transaction A and B (B gets committed after A) get -applied in the wrong order, because B will only begin to commit _after_ A has -been worked on. - diff --git a/doc/tids.rst b/doc/tids.rst new file mode 100644 index 0000000..4b21416 --- /dev/null +++ b/doc/tids.rst @@ -0,0 +1,76 @@ +Transaction ID assignment procedure +=================================== + +This brief document describes how libjio assigns an unique number to each +transaction that identifies it univocally during its lifetime. + +It is a very delicate issue, because the rest of the library depends on the +uniqueness of the ID. An ID has to be coherent across threads and procesess, +and choosing one it can't take long: it serializes transaction creation (and +it's the only contention point for independent non-overlapping transactions). + + +Description +----------- + +We have two functions: *get_tid()* and *free_tid()*, which respectively return +a new transaction ID, and mark a given transaction ID as no longer in use. + +The main piece of the mechanism is the lockfile: a file named *lock* which +holds the maximum transaction ID in use. This file gets opened and mmap()'ed +for faster use inside *jopen()*. That way, we can treat it directly as an +integer holding the max tid. + +To avoid parallel modifications, we will always lock the file with *fcntl()* +before accessing it. + +Let's begin by describing how *get_tid()* works, because it's quite simple: it +locks the lockfile, gets the max tid, adds 1 to it, unlock the file and return +that value. That way, the new tid is always the new max, and with the locking +we can be sure it's impossible to assign the same tid to two different +transactions. + +After a tid has been assigned, the commit process will create a file named +after it inside the journal directory. Then, it will operate on that file all +it wants, and when the moment comes, the transaction is no longer needed and +has to be freed. + +The first thing we do is to unlink that transaction file. And then, we call +*free_tid()*, which will update the lockfile to represent the new max tid, in +case it has changed. + +*free_tid()* begins by checking that if the transaction we're freeing is the +greatest, and if not, just returns. + +But if it is, we need to find out the new max tid. We do it by "walking" the +journal directory looking for the file with the greatest number, and that's +our new max tid. If there are no files, we use 0. + + +Things to notice +---------------- + +The following is a list of small things to notice about the mechanism. They're +useful because races tend to be subtle, and I *will* forget about them. The +descriptions are not really detailed, just enough to give a general idea. + + - It is possible that we get in *free_tid()* and the transaction we want to + free is greater than the max tid. In that case, we do nothing: it's a valid + situation. How to get there: two threads about to free two tids. The first + one calls *unlink()* and just after its return (before it gets a chance to + call *free_tid()*), another thread, the holder of the current max, steps in + and performs both the *unlink()* and *free_tid()*, which would force a + lookup to find a new tid, and as in the first thread we have removed the + file, the max tid could be lower (in particular, it could be 0). This is + why we only test for equalty. + - Unlink after *free_tid()* is not desirable: in that case, it'd be normal + for the tid to increment even if we have only one thread writing. It + overflows quite easily. + - The fact that new tids are always bigger than the current max is not only + because the code is cleaner and faster: that way when recovering we know + the order to apply transactions. A nice catch: this doesn't matter if we're + working with non-overlapping transactions, but if they overlap, we know + that it's impossible that transaction A and B (B gets committed after A) + get applied in the wrong order, because B will only begin to commit *after* + A has been worked on. +