File and Directory IDs explained.

What are File and Directory ID's?

On a mac the file system stores all files and directory information on each 
volume in a big file called the catalogue file.  Inside the catalogue, all 
files and directories are accessed using a key, central to which is a number, 
called the ID.  In the case of a file its a file ID (FID) or for a directory, 
a directory ID (DID).

How many IDs can there be?

The ID in a catalogue key is stored using 4 bytes, which means it can only 
be between 0 and 4,294,967,295 (FF FF FF FF in hex).  However the first 16 
IDs are reserved so you don't have quite that many.  Each ID is unique for 
ever, on any given volume.  Once all 4 billion have been used, you cannot 
create new files, so will need to reformat the volume to continue using it.

Why are IDs so important?

Most system calls relating to files inside a mac (either vi a network or a 
hard disk) can refer to files or directories by ID.  This makes the ID a 
powerful piece of information.  

So whats the problem?

The problem lies in file servers that don't use IDs.  The protocol used by
macs to share files over a network is called the Apple Filing Protocol (AFP)
and it requires the use of IDs.  So if you want to support AFP fully, any
AFP server, must adopt its own system for storing and maintaining a link
between each file or directory and its respective ID.  This is most critical
when acessing directories.  

So why does this matter on a non mac server like netatalk?

The three big stumbling blocks that crop up with AFP servers that don't 
fully support IDs are 1) aliases, 2) the trash can and 3) linked documents.

Alias problems.

An alias on a mac is quite special.  Rather than just storing the path to 
the original file (like a unix symlink), it stores the ID to that file and 
a special identifier for the volume (and the server it's on).  Ideally this 
is great.  If the file moves or is renamed, the alias still works.  However 
if either the file (or directory) ID changes, or the volume identifier 
(or server identifer), then the alias will break.  The file it claims to 
point to will claim to have been removed.  

Trash can (accidentally deleted file) problems.

The trash can has similar problems.  Files that have been moved to the trash
are represented by their ID.  When you empty the trash all ID's listed are 
deleted.  However if the ID of a file that was in the trash, is reallocated
to an ordinary file, then when the trash is emptied that file will be deleted.

Linked document problems.

Finally linked documents: Linked documents are documents that contain hidden
links to other documents.  Print setting and layout application (such as 
Quark) use this technique.  Sometimes these documents contain IDs linking to
their embeded documents.  These can break in the same way as aliases.  

So how does netatalk approach the problem?

Netatalk has two different methods of allocating IDs: last and cnid.

DID = last.

This uses a running number to allocate IDs.  When an ID is allocated the 
server remembers this by adding it to a table.  If an ID is referenced, then
the server looks up on the table.  When the server is restarted, the table is
lost.  This is the most simple method, but it is unreliable.  If you stick to
the mac features which don't rely heavily on IDs it works fine.  If you try
to use IDs much, things break.  

DID = cnid. 

This uses a Berkeley database to store and maintain a directory of IDs
similar to that of a catalogue file on a mac.  Consequently it is the most
reliable method.  Unfortunately there seem to be heavy multi user problems 
that lead to database corruption.  These are being worked on, but cnid remains
the safest and most reliable DID scheme.  See README.cnid for more details.