doc/README.ids

   1 File and Directory IDs explained.
   2
   3 What are File and Directory ID's?
   4
   5 On a mac the file system stores all files and directory information on each
   6 volume in a big file called the catalogue file.  Inside the catalogue, all
   7 files and directories are accessed using a key, central to which is a number,
   8 called the ID.  In the case of a file its a file ID (FID) or for a directory,
   9 a directory ID (DID).
  10
  11 How many IDs can there be?
  12
  13 The ID in a catalogue key is stored using 4 bytes, which means it can only
  14 be between 0 and 4,294,967,295 (FF FF FF FF in hex).  However the first 16
  15 IDs are reserved so you don't have quite that many.  Each ID is unique for
  16 ever, on any given volume.  Once all 4 billion have been used, you cannot
  17 create new files, so will need to reformat the volume to continue using it.
  18
  19 Why are IDs so important?
  20
  21 Most system calls relating to files inside a mac (either vi a network or a
  22 hard disk) can refer to files or directories by ID.  This makes the ID a
  23 powerful piece of information.
  24
  25 So whats the problem?
  26
  27 The problem lies in file servers that don't use IDs.  The protocol used by
  28 macs to share files over a network is called the Apple Filing Protocol (AFP)
  29 and it requires the use of IDs.  So if you want to support AFP fully, any
  30 AFP server, must adopt its own system for storing and maintaining a link
  31 between each file or directory and its respective ID.  This is most critical
  32 when acessing directories.
  33
  34 So why does this matter on a non mac server like netatalk?
  35
  36 The three big stumbling blocks that crop up with AFP servers that don't
  37 fully support IDs are 1) aliases, 2) the trash can and 3) linked documents.
  38
  39 Alias problems.
  40
  41 An alias on a mac is quite special.  Rather than just storing the path to
  42 the original file (like a unix symlink), it stores the ID to that file and
  43 a special identifier for the volume (and the server it's on).  Ideally this
  44 is great.  If the file moves or is renamed, the alias still works.  However
  45 if either the file (or directory) ID changes, or the volume identifier
  46 (or server identifer), then the alias will break.  The file it claims to
  47 point to will claim to have been removed.
  48
  49 Trash can (accidentally deleted file) problems.
  50
  51 The trash can has similar problems.  Files that have been moved to the trash
  52 are represented by their ID.  When you empty the trash all ID's listed are
  53 deleted.  However if the ID of a file that was in the trash, is reallocated
  54 to an ordinary file, then when the trash is emptied that file will be deleted.
  55
  56 Linked document problems.
  57
  58 Finally linked documents: Linked documents are documents that contain hidden
  59 links to other documents.  Print setting and layout application (such as
  60 Quark) use this technique.  Sometimes these documents contain IDs linking to
  61 their embeded documents.  These can break in the same way as aliases.
  62
  63 So how does netatalk approach the problem?
  64
  65 Netatalk has two different methods of allocating IDs: last and cnid.
  66
  67 DID = last.
  68
  69 This uses a running number to allocate IDs.  When an ID is allocated the
  70 server remembers this by adding it to a table.  If an ID is referenced, then
  71 the server looks up on the table.  When the server is restarted, the table is
  72 lost.  This is the most simple method, but it is unreliable.  If you stick to
  73 the mac features which don't rely heavily on IDs it works fine.  If you try
  74 to use IDs much, things break.
  75
  76 DID = cnid.
  77
  78 The CNID scheme in Netatalk attempts to assign unique IDs to each file and
  79 directory, then keep those IDs persistent across mounts of the volume.  This
  80 way, cross-volume aliases will work, and users are less likely to encounter
  81 duplicate CNID errors.  Prior to Netatalk 1.6.0, the CNID calculation
  82 scheme was not persistent, and IDs were assigned based on the UNIX device and
  83 inode number of a given file or directory (see DID = last above).  This was
  84 fine for the most part, but due to limitations, not all available CNIDs could
  85 be used.  As well, these IDs could change independently from Netatalk, and
  86 thus were not persistent.  As of Netatalk 1.6.0, the CNID scheme is now the
  87 default. On top of that, Netatalk uses the Concurrent Datastore method to
  88 avoid the need for database locking and transactions.
  89
  90 As stated above, CNID requires Berkeley DB.  Currently, Netatalk supports
  91 BDB 3.1.17, 3.2.9, 3.3.11, 4.0.14, and 4.1.25.  The recommended version is
  92 3.3.11 as that is the version on which most testing has been done.
  93
  94 CNID has seen many contributors over the years.  It was conceived by
  95 Adrian Sun <asun@zoology.washington.edu>.  His developer notes can be found
  96 libatalk/cnid/README file.  It was later picked up and modernized by Uwe Hees
  97 <uwe.hees@rz-online.de>.  Then, Joe Marcus Clarke <marcus@marcuscom.com>
  98 started fixing bugs and adding additional features.  The Concurrent
  99 Datastore support was subsequently added by Dan Wilga <dwilga@mtholyoke.edu>.
 100 The CNID code is currently maintained by Joe Marcus Clarke.