author | Alberto Bertogli
<albertito@gmail.com> 2007-01-09 02:50:30 UTC |
committer | Alberto Bertogli
<albertito@gmail.com> 2007-01-09 02:50:30 UTC |
parent | 17865b5ffa24fba22db6ceca63b99decedb01498 |
doc/guide.rst | +408 | -0 |
diff --git a/doc/guide.rst b/doc/guide.rst new file mode 100644 index 0000000..f0de916 --- /dev/null +++ b/doc/guide.rst @@ -0,0 +1,408 @@ + +================ +nmdb User Guide +================ +:Author: Alberto Bertogli (albertito@gmail.com) + + +Introduction +============ + +nmdb_ is a simple and fast cache and database for TIPC_ clusters. It allows +applications in the cluster to use a centralized, shared cache and database in +a very easy way. It stores *(key, value)* pairs, with each key having only one +associated value. + +This document explains how to setup nmdb and a simple guide to writing +clients. It also includes a "quick start" section for the anxious. + + +Installing nmdb +=============== + +If you installed nmdb using your Linux distribution package system, you can +skip this section entirely. + + +Prerequisites +------------- + +Before you install nmdb, you will need the following software: + +- libevent_, a library for fast event handling. +- `Linux kernel`_ 2.6.16 or newer, compiled with TIPC_ support. +- QDBM_, for the database backend. + + +Compiling and installing +------------------------ + +There are three components of the nmdb tarball: the server in the *nmdb/* +directory, the C library in *libnmdb/*, and the Python module in *python/*. + +- To install the server, run ``cd nmdb; make install``. +- To install the C library, run ``cd libnmdb; make install; ldconfig``. +- To install the Python module, run ``cd python; python setup.py install``. + + +Quick start +=========== + +For a very quick start, using a single host, you can do the following:: + + # dpmgr create /tmp/nmdb-db # create the backend database + # nmdb -d /tmp/nmdb-db # start the server + +At this point you have created a database and started the server. An easy and +simple way to test it is to use the python module, like this:: + + # python + Python 2.5 (r25:51908, Sep 21 2006, 20:38:23) + [GCC 4.1.1 (Gentoo 4.1.1)] on linux2 + Type "help", "copyright", "credits" or "license" for more information. + >>> import nmdb # import the module + >>> db = nmdb.DB() # connect to the server + >>> db['x'] = 1 # store some data + >>> db[(1, 2)] = (2, 6) + >>> print db['x'], db[(1, 2)] # retreive the values + 1 (3, 5) + >>> del db['x'] # delete from the database + +Everything should have worked as shown, and you are now ready to use some +nmdb application, or develop your own. + +If you want to use this with several machines, read the next section to find +out how to setup a simple TIPC cluster. + + +TIPC setup +========== + +If you want to use the server and the clients in different machines, you need +to setup your TIPC network. If you just want to run everything in one machine, +or you already have a TIPC network set up, you can skip this section. + +Before we begin, all the machines should already be connected in an Ethernet +LAN, and have the tipc-config application that should come with your Linux +distribution with a package named "tipcutils" or similar (if it doesn't, you +can find it at http://tipc.sourceforge.net/download.html). + +The only thing you will need to do is assign each machine a TIPC address and +specify which interface to use for the network connection. You do it like +this:: + + # tipc-config -a=1.1.10 -be=eth:eth0 + +The *-a* parameter specifies the address, and *-be* the type and name of the +interface to use. + +Addresses are composed of three integers. They represent the zone number, the +cluster number, and the node number respectively. The zone number and cluster +number should be the same for all nodes in your network, so you should change +the last one for each machine. Each machine can have only one address. + +That should be enough to get you started for a small network. If you have a +very big network, or want to use some of the advanced TIPC features like link +redundancy, you should read TIPC's docs. + + +Example +------- + +If you have five machines, you can assign each one their address like this:: + + box1# tipc-config -a=1.1.1 -be=eth:eth0 + box2# tipc-config -a=1.1.2 -be=eth:eth0 + box3# tipc-config -a=1.1.3 -be=eth:eth0 + box4# tipc-config -a=1.1.4 -be=eth:eth0 + box5# tipc-config -a=1.1.5 -be=eth:eth0 + + +Starting the server +=================== + +Before starting the server, there are some things you need to know about it: + +Port numbers + Each server instance in your network (even the ones running in the same + machine) should get a **unique** port to listen to requests. Ports identify + an application instance inside the whole network, not just the machine as in + TCP/IP. + + The port space is very very large, and it's private to nmdb, so you can + choose numbers without fear of colliding with other TIPC applications. The + default port is 10. + + So, if you are going to start more than one nmdb server, **be careful**. If + you assign two active servers the same port you will get no error, but + everything will act weird. + +Cache size + nmdb's cache is a main component of the server. In fact you can use it + exclusively for caching purposes, like memcached_. So the size becomes an + important issue if you have performance requirements. + + It is only possible to limit the cache size by the maximum number of objects + in the cache. + +Backend database + You will need to create a backend database using QDBM_'s utilities. This is + quite simple, just run ``dpmgr create /path/to/the/database`` and you're + done. + + If for some reason (hardware failure, for instance) the database becomes + corrupt, you should use QDBM's utilities to fix it. It shouldn't happen, so + it's a good idea to report it if it does. + + QDBM databases are not meant to be shared among processes, so avoid having + other processes using them. + +Database redundancy + If you want to have redundancy over the database, you can start a "passive + server" along a normal one using the same port number. It will listen to + database requests and act upon them, but it will not reply anything. + + It is only useful to keep a live mirror of the database. Note that it does + not do replication or failure detection, it's just a mirror. + + This is the only case where you want to start two servers with the same port. + +Distributed queries + If you have more than one server in the network, the library can distribute + the queries among them. This is entirely done on the client side and the + server doesn't know about it. + + +Now that you know all that, starting a server should be quite simple: first +create the database as explained above, and then run the daemon with +``nmdb -d /path/to/the/database``. + +To change the port, use ``-l port``, to change the cache size, use ``-c nobj`` +(where *nobj* is the number of objects in thousands), to make the server +passive, use ``-p``. Of course you won't remember all that (I know I don't), +that's why ``-h`` is your friend. + +Nothing prevents you from starting more than one server in the same machine, +so be careful to select different ports and databases for each one. + + +Example +------- + +Following the previous example, if you want to start three servers you can do +it like this:: + + box1# ndbm -d /var/lib/nmdb/db-1 -l 11 + box2# ndbm -d /var/lib/nmdb/db-2 -l 12 + box3# ndbm -d /var/lib/nmdb/db-3 -l 13 + + +Writing clients +=============== + +At the moment you can write clients in C (documented in the *libnmdb*'s +manpage) and in Python (documented using Python docstrings). In this guide we +will give some examples of common use as an introduction, you should consult +the appropriate documentation when doing serious development. + +Before we begin, you should know about the following things: + +Thread safety + While the library itself is thread safe, neither the C library connections + nor the Python objects are. So don't share *nmdb_t* variables (C) or + *nmdb.** objects (Python) among threads; instead, create one for each thread + that needs it. + +Available operations + You can request the server to do three operations: *set* a value to a key, + *get* the value associated with the given key, and *delete* a given key + (with its associated value). + +Request modes + For each operation, you will have three different modes available: + + - A *normal mode*, which makes the operation impact on the database + asynchronously (ie. the functions return right after the operation was + queued, there is no completion notification). + - A *synchronous mode* similar to the previous one, but when the functions + return, the operation has hit the disk. + - A *cache-only mode* where the operations do not impact the database, only + the cache, and can be used to implement distributed caching in a similar + way to memcached_. + + Be careful with the last one, because mixing cache-only with database + operations is a recipe for disaster. + +Atomicity and coherence + All operations are atomic, and synchronous and asynchronous operations are + fully coherent. + +Distributed queries + You can distribute your queries among several servers, and this is entirely + done on the client side. To do this, you should add each server (identified + by their port numbers) to the connection **before beginning to interact with + them**. + + +For all examples we will assume that you have three servers running in your +network, in ports 11, 12 and 13. + + +The Python module +------------------ + +The Python module it's quite easy to use, because its interface is very +similar to a dictionary. It has similar limitations regarding the key (it must +be an object you can use as a key in a dictionary), and the values must be +pickable objects (see the *pickle* module documentation for more information). +In short, you should only use number, strings or tuples as keys, and simple +objects as values, unless you know what you are doing. + +To start a connection to the servers, you must first decide which mode you are +going to use: the normal database-backed mode, database-backed with +synchronous access, or cache only. Let's say you want to use the normal mode +and connect to the server at port 11, and then add the other two servers:: + + import nmdb + db = nmdb.DB(11) + db.add_server(12) + db.add_server(13) + +Now you're ready to use it. Let's suppose you want to write a recursive +function to calculate the factorial of a number. But before doing the +calculation, you can check if the previous factorial already is in the +database to avoid recalculating it:: + + def fact(n): + if n == 1: + return 1 + if db.has_key(n): + return db[n] + + result = n * fact(n - 1) + db[n] = result + return result + +That was easy, wasn't it? You can use the same trick for SQL queries, complex +distributed calculations, geographical data processing, whatever you want. + +Now let's have some fun and do something a little advanced: a decorator for a +distributed function cache. If Python magic scares you, look away and skip to +the next section. + +Some functions (usually the mathematical ones) have the property that the +value they return depends only on the parameters, and not on the context. So +they can be cached, using the parameters as keys, with the function's result +as their associated values. Applying this technique is commonly known as +*memoization*, and when we apply it to a function we say we're *memoizing* it. + +We can use a local dictionary to cache the data, but that would mean we would +have to write some cache management code to avoid using too much memory, and, +worse of all, each instance of the code running in the network would have its +own private cache and can't reuse calculations performed by other instances. +Instead, we can use nmdb to make a cache that is shared among the network. + +The functions are usually restricted to using simple types as input, like +numbers, strings, tuples or dictionaries. We will take advantage of this by +using as a key to the cache the string ``<function module>-<function +name>-<string representation of the arguments>``. So to cache an invocation +like ``mod.f(1, (2, 6))`` that returns ``26``, we want to have the following +association in the database: ``mod-f-(1, (2, 6)) = 26``. + +We will use nmdb in cache-only mode, where the things we store are not saved +permanently to a database, but live in the server's memory. This is very +similar to what we did before, and has the advantage of not having to write +our own cache management routines:: + + import nmdb + db = nmdb.Cache(11) + db.add_server(12) + db.add_server(13) + +Let's write the decorator:: + + def shared_memoize(f): + def newf(*args, **kwargs): + key = '%s-%s-%s-%s' % (f.__module__, f.__name__, + repr(args), repr(kwargs)) + if key in db: + return db[key] + r = f(*args, **kwargs) + db[key] = r + return r + return newf + +Now we can use it with a normal implementation of the recursive factorial +function like we did before, and a function that calculates tetrations_:: + + @shared_memoize + def fact(n): + if n == 1: + return 1 + return n * fact(n - 1) + + @shared_memoize + def tetration(a, b): + if b == 1: + return a + return pow(a, tetration(a, b - 1)) + +As you can see, the module is very easy to use, but you can do useful things +with it. For more information you can read the module's built-in +documentation. + + +The C library +------------- + +The C library is in essence similar to the Python module, so we won't make a +very long example here, only a brief display of the available functions. + +Let's begin by creating a "nmdb descriptor" which is of type *nmdb_t*, and +connecting it to your three servers:: + + unsigned char *key, *val; + size_t ksize, vsize; + nmdb_t *db; + + db = nmdb_connect(11); + nmdb_add_server(db, 12); + nmdb_add_server(db, 13); + +Now you can do some operations (allocations and checks are not shown for brevity):: + + r = nmdb_set(db, key, ksize, val, vsize); + ... + r = nmdb_get(db, key, ksize, val, vsize); + ... + r = nmdb_del(db, key, ksize); + +And finally close and free the connection:: + + nmdb_free(db); + +The operation functions have variants for cache-only (*nmdb_cache_**) and synchronous +operation (*nmdb_sync_**). For more information you should check the manpage. + + +Where to go from here +===================== + +The best place to go from here is to your text editor, to start writing some +simple clients to play with. + +If you are in doubt about something, you can consult the manpages or the +documentation inside the *doc/* directory. And if you can't find an answer to +your question there, you can ask me, Alberto Bertogli, at +*albertito@gmail.com*. + + + +.. _nmdb: http://auriga.wearlab.de/~alb/nmdb/ +.. _libevent: http://www.monkey.org/~provos/libevent/ +.. _TIPC: http://tipc.sf.net +.. _memcached: http://www.danga.com/memcached/ +.. _QDBM: http://qdbm.sf.net +.. _`Linux kernel`: http://kernel.org +.. _tetrations: http://en.wikipedia.org/wiki/Tetration +