This is a reimplementation of the netatalk CNID database support that attempts to put all current functionality into a separate daemon called cnid_dbd. There is one such daemon per netatalk volume. The underlying database structure is still based on Berkeley DB and the database format is the same as in the current CNID implementation, so this can be used as a drop-in replacement. Advantages: - No locking issues or leftover locks due to crashed afpd daemons any more. Since there is only one thread of control accessing the database, no locking is needed and changes appear atomic. - Berkeley DB transactions are difficult to get right with several processes attempting to access the CNID database simultanously. This is much easier with a single process and the database can be made nearly crashproof this way (at a performance cost). - No problems with user permissions and access to underlying database files, the cnid_dbd process runs under a configurable user ID that normally also owns the underlying database and can be contacted by whatever afpd daemon accesses a volume. - If an afpd process crashes, the CNID database is unaffected. If the process was making changes to the database at the time of the crash, those changes will either be completed (no transactions) or rolled back entirely (transactions). If the process was not using the database at the time of the crash, no corrective action is necessary. In both cases, database consistency is assured. Disadvantages: - Performance in an environment of processes sharing the database (files) is potentially better for two reasons: i) IPC overhead. ii) r/o access to database pages is possible by more than one process at once, r/w access is possible for nonoverlapping regions. The current implementation of cnid_dbd uses unix domain sockets as the IPC mechanism. While this is not the fastest possible method, it is very portable and the cnid_dbd IPC mechanisms can be extended to use faster IPC (like mmap) on architectures where it is supported. As a ballpark figure, 20000 requests/replies to the cnid_dbd daemon take about 0.6 seconds on a Pentium III 733 Mhz running Linux Kernel 2.4.18 using unix domain sockets. The requests are "empty" (no database lookups/changes), so this is just the IPC overhead. I have not measured the effects of the advantages of simultanous database access. Installation and configuration cnid_dbd is now part of a new CNID framework whereby various CNID backends (including the ones present so far) can be selected for afpd as a runtime option for a given volume. The default is to compile support for all these backends, so normally there is no need to specify other options to configure. The only exception is transactional support, which is enabled as the default. If you want to turn it off use --with-cnid-dbd-txn=no. In order to turn on cnid_dbd backend support for a given volume, set the option -cnidscheme:dbd in your AppleVolumes.default file or equivalent. The default for this parameter is -cnidscheme:cdb. There are two executeables that will be built in etc/cnid_dbd and installed into the systems binaries directories of netatalk (e.g. /usr/local/netatalk/sbin or whatever you specify with --sbindir to configure): cnid_metad and cnid_dbd. cnid_metad should run all the time with root permissions. It will be notified when an instance of afpd starts up and will in turn make sure that a cnid_dbd daemon is started for the volume that afpd wishes to access. The daemon runs as long as necessary (see the idle_timeout option below) and services any other instances of afpd that access the volume. You can safely kill it with SIGTERM, it will be restarted automatically by cnid_metad as soon as the volume is accessed again. cnid_metad needs one command line argument, the name of the cnid_dbd executeable. You can either specify "cnid_dbd" if it is in the path for cnid_metad or otherwise use the fully qualified pathname to cnid_dbd, e.g. /usr/local/netatalk/sbin/cnid_dbd. cnid_metad also uses a unix domain socket to receive requests from afpd. The pathname for that socket is /tmp/cnid_meta. It should not be deleted while cnid_metad runs. cnid_dbd changes to the Berkeley DB directory on startup and sets effective UID and GID to owner and group of that directory. Database and supporting files should therefore be writeable by that user/group. cnid_dbd reads configuration information from the file db_param in the database directory on startup. If the file does not exist, suitable default values for all parameters are used. The format for a single parameter is the parameter name, followed by one or more spaces, followed by the parameter value, followed by a newline. The following parameters are currently recognized: Name Default backlog 20 cachesize 1024 nosync 0 flush_frequency 100 flush_interval 30 usock_file /usock fd_table_size 16 idle_timeout 600 "backlog" specifies the maximum number of connection requests that can be pending on the unix domain socket cnid_dbd uses. listen(2) has more information about this value on your system. "cachesize" determines the size of the Berkeley DB cache in kilobytes. Each cnid_dbd process grabs that much memory on top of its normal memory footprint. It can be used to tune database performance. The db_stat utility with the "-m" option that comes with Berkely DB can help you determine wether you need to change this value. The default is pretty conservative so that a large percentage of requests should be satisfied from the cache directly. If memory is not a bottleneck on your system you might want to leave it at that value. The Berkeley DB Tutorial and Reference Guide has a section "Selecting a cache size" that gives more detailed information. "nosync" is only valid if transactional support is enabled. If it is set to 1, transactional changes to the database are not synchronously written to disk when the transaction completes. This will increase performance considerably at the risk of recent changes getting lost in case of a crash. The database will still be consistent, though. See "Transaction throughput" in the Berkeley DB Tutorial for more information. "flush_frequency" and "flush_interval" control how often changes to the database are written to the underlying database files if no transactions are used or how often the transaction system is checkpointed for transactions. Both of these operations are performed if either i) more than flush_frequency requests have been received or ii) more than flush_interval seconds have elapsed since the last save/checkpoint. If you use transactions with nosync set to zero these parameters only influence how long recovery takes after a crash, there should never be any lost data. If nosync is 1, changes might be lost, but only since the last checkpoint. Be careful to check your harddisk configuration for on disk cache settings. Many IDE disks just cache writes as the default behaviour, so even flushing database files to disk will not have the desired effect. "usock_file" gives the pathname of the unix domain socket file that that instance of cnid_dbd will use for receiving requests. You might have to use another value if the total length of the default pathname exceeds the limits for unix domain socket files on your system, usually this is something like 100 bytes. "fd_table_size" is the maximum number of connections (filedescriptors) that can be open for afpd client processes in cnid_dbd. If this number is exceeded, one of the existing connections is closed and reused. The affected afpd process will transparently reconnect later, which causes slight overhead. On the other hand, setting this parameter too high could affect performance in cnid_dbd since all descriptors have to be checked in a select() system call, or worse, you might exceed the per process limit of open file descriptors on your system. It is safe to set the value to 1 on volumes where only one afpd client process is expected to run, e.g. home directories. "idle_timeout" is the number of seconds of inactivity before an idle cnid_dbd exits. Set this to 0 to disable the timeout. Current shortcomings: - The implementation for cnid_dbd doubles code from libatalk/cnid, making it more difficult to keep changes to the library interface and the semantics of database requests in sync. If cnid_dbd takes over the world, this problem will eventually disappear, otherwise it should be possible to merge things again, if transactional support is eliminated from libatalk/cnid. I do not think it can work relieably in the current form anyway. - It would be more flexible to have transaction support as a run time option as well. - mmap for IPC would be nice as an alternative. - The parameter file parsing of db_param is very simpleminded. It is easy to cause buffer overruns and the like. Also, there is no support for blanks (or weird characters) in filenames for the usock_file parameter. - There is no protection against a malicious user connecting to the cnid_dbd socket and changing the database. I will adress this problem soon. Please feel free to grep the source in etc/cnid_dbd and the file libatalk/cnid/dbd/cnid_dbd.c for the string TODO, which indicates comments that adress other, less important points. Joerg Lenneis, lenneis@wu-wien.ac.at