1 This file describes the design, layouts, and file formats of a
2 libsvn_fs_fs repository.
7 In FSFS, each committed revision is represented as an immutable file
8 containing the new node-revisions, contents, and changed-path
9 information for the revision, plus a second, changeable file
10 containing the revision properties.
12 In contrast to the BDB back end, the contents of recent revision of
13 files are stored as deltas against earlier revisions, instead of the
14 other way around. This is less efficient for common-case checkouts,
15 but brings greater simplicity and robustness, as well as the
16 flexibility to make commits work without write access to existing
17 revisions. Skip-deltas and delta combination mitigate the checkout
20 In-progress transactions are represented with a prototype rev file
21 containing only the new text representations of files (appended to as
22 changed file contents come in), along with a separate file for each
23 node-revision, directory representation, or property representation
24 which has been changed or added in the transaction. During the final
25 stage of the commit, these separate files are marshalled onto the end
26 of the prototype rev file to form the immutable revision file.
28 Layout of the FS directory
29 --------------------------
31 The layout of the FS directory (the "db" subdirectory of the
34 revs/ Subdirectory containing revs
35 <shard>/ Shard directory, if sharding is in use (see below)
36 <revnum> File containing rev <revnum>
37 <shard>.pack/ Pack directory, if the repo has been packed (see below)
38 pack Pack file, if the repository has been packed (see below)
39 manifest Pack manifest file, if a pack file exists (see below)
40 revprops/ Subdirectory containing rev-props
41 <shard>/ Shard directory, if sharding is in use (see below)
42 <revnum> File containing rev-props for <revnum>
43 <shard>.pack/ Pack directory, if the repo has been packed (see below)
44 <rev>.<count> Pack file, if the repository has been packed (see below)
45 manifest Pack manifest file, if a pack file exists (see below)
46 revprops.db SQLite database of the packed revprops (format 5 only)
47 transactions/ Subdirectory containing transactions
48 <txnid>.txn/ Directory containing transaction <txnid>
49 txn-protorevs/ Subdirectory containing transaction proto-revision files
50 <txnid>.rev Proto-revision file for transaction <txnid>
51 <txnid>.rev-lock Write lock for proto-rev file
52 txn-current File containing the next transaction key
53 locks/ Subdirectory containing locks
54 <partial-digest>/ Subdirectory named for first 3 letters of an MD5 digest
55 <digest> File containing locks/children for path with <digest>
56 node-origins/ Lazy cache of origin noderevs for nodes
57 <partial-nodeid> File containing noderev ID of origins of nodes
58 current File specifying current revision and next node/copy id
59 fs-type File identifying this filesystem as an FSFS filesystem
60 write-lock Empty file, locked to serialise writers
61 pack-lock Empty file, locked to serialise 'svnadmin pack' (f. 7+)
62 txn-current-lock Empty file, locked to serialise 'txn-current'
63 uuid File containing the repository IDs
64 format File containing the format number of this filesystem
65 fsfs.conf Configuration file
66 min-unpacked-rev File containing the oldest revision not in a pack file
67 min-unpacked-revprop Same for revision properties (format 5 only)
68 rep-cache.db SQLite database mapping rep checksums to locations
70 Files in the revprops directory are in the hash dump format used by
73 The format of the "current" file is:
75 * Format 3 and above: a single line of the form
76 "<youngest-revision>\n" giving the youngest revision for the
79 * Format 2 and below: a single line of the form "<youngest-revision>
80 <next-node-id> <next-copy-id>\n" giving the youngest revision, the
81 next unique node-ID, and the next unique copy-ID for the
84 The "write-lock" file is an empty file which is locked before the
85 final stage of a commit and unlocked after the new "current" file has
86 been moved into place to indicate that a new revision is present. It
87 is also locked during a revprop propchange while the revprop file is
88 read in, mutated, and written out again. Furthermore, it will be used
89 to serialize the repository structure changes during 'svnadmin pack'
90 (see also next section). Note that readers are never blocked by any
91 operation - writers must ensure that the filesystem is always in a
94 The "pack-lock" file is an empty file which is locked before an 'svnadmin
95 pack' operation commences. Thus, only one process may attempt to modify
96 the repository structure at a time while other processes may still read
97 and write (commit) to the repository during most of the pack procedure.
98 It is only available with format 7 and newer repositories. Older formats
99 use the global write-lock instead which disables commits completely
100 for the duration of the pack process.
102 The "txn-current" file is a file with a single line of text that
103 contains only a base-36 number. The current value will be used in the
104 next transaction name, along with the revision number the transaction
105 is based on. This sequence number ensures that transaction names are
106 not reused, even if the transaction is aborted and a new transaction
107 based on the same revision is begun. The only operation that FSFS
108 performs on this file is "get and increment"; the "txn-current-lock"
109 file is locked during this operation.
111 "fsfs.conf" is a configuration file in the standard Subversion/Python
112 config format. It is automatically generated when you create a new
113 repository; read the generated file for details on what it controls.
115 When representation sharing is enabled, the filesystem tracks
116 representation checksum and location mappings using a SQLite database in
117 "rep-cache.db". The database has a single table, which stores the sha1
118 hash text as the primary key, mapped to the representation revision, offset,
119 size and expanded size. This file is only consulted during writes and never
120 during reads. Consequently, it is not required, and may be removed at an
121 abritrary time, with the subsequent loss of rep-sharing capabilities for
122 revisions written thereafter.
127 The "format" file defines what features are permitted within the
128 filesystem, and indicates changes that are not backward-compatible.
129 It serves the same purpose as the repository file of the same name.
131 The filesystem format file was introduced in Subversion 1.2, and so
132 will not be present if the repository was created with an older
133 version of Subversion. An absent format file should be interpreted as
134 indicating a format 1 filesystem.
136 The format file is a single line of the form "<format number>\n",
137 followed by any number of lines specifying 'format options' -
138 additional information about the filesystem's format. Each format
139 option line is of the form "<option>\n" or "<option> <parameters>\n".
141 Clients should raise an error if they encounter an option not
142 permitted by the format number in use.
146 Format 1, understood by Subversion 1.1+
147 Format 2, understood by Subversion 1.4+
148 Format 3, understood by Subversion 1.5+
149 Format 4, understood by Subversion 1.6+
150 Format 5, understood by Subversion 1.7-dev, never released
151 Format 6, understood by Subversion 1.8
152 Format 7, understood by Subversion 1.9
154 The differences between the formats are:
156 Delta representation in revision files
157 Format 1: svndiff0 only
158 Formats 2+: svndiff0 or svndiff1
161 Formats 1-2: none permitted
162 Format 3+: "layout" option
163 Format 7+: "addressing" option
165 Transaction name reuse
166 Formats 1-2: transaction names may be reused
167 Format 3+: transaction names generated using txn-current file
169 Location of proto-rev file and its lock
170 Formats 1-2: transactions/<txnid>/rev and
171 transactions/<txnid>/rev-lock.
172 Format 3+: txn-protorevs/<txnid>.rev and
173 txn-protorevs/<txnid>.rev-lock.
175 Node-ID and copy-ID generation
176 Formats 1-2: Node-IDs and copy-IDs are guaranteed to form a
177 monotonically increasing base36 sequence using the "current"
179 Format 3+: Node-IDs and copy-IDs use the new revision number to
180 ensure uniqueness and the "current" file just contains the
184 Format 1-2: minfo-here and minfo-count node-revision fields are not
185 stored. svn_fs_get_mergeinfo returns an error.
186 Format 3+: minfo-here and minfo-count node-revision fields are
187 maintained. svn_fs_get_mergeinfo works.
189 Revision changed paths list:
190 Format 1-3: Does not contain the node's kind.
191 Format 4+: Contains the node's kind.
192 Format 7+: Contains the mergeinfo-mod flag.
195 Format 4: Applied to revision data only.
196 Format 5: Revprops would be packed independently of revision data.
197 Format 6+: Applied equally to revision data and revprop data
198 (i.e. same min packed revision)
201 Format 1-6: Physical addressing; uses fixed positions within a rev file
202 Format 7+: Logical addressing; uses item index that will be translated
203 on-the-fly to the actual rev / pack file location
206 Format 1+: The first line of db/uuid contains the repository UUID
207 Format 7+: The second line contains the instance ID (in UUID formatting)
209 # Incomplete list. See SVN_FS_FS__MIN_*_FORMAT
212 Filesystem format options
213 -------------------------
215 Currently, the only recognised format options are "layout" and "addressing".
216 The first specifies the paths that will be used to store the revision
217 files and revision property files. The second specifies that logical to
218 physical address translation is required.
220 The "layout" option is followed by the name of the filesystem layout
221 and any required parameters. The default layout, if no "layout"
222 keyword is specified, is the 'linear' layout.
224 The known layouts, and the parameters they require, are as follows:
227 Revision files and rev-prop files are named after the revision they
228 represent, and are placed directly in the revs/ and revprops/
229 directories. r1234 will be represented by the revision file
230 revs/1234 and the rev-prop file revprops/1234.
232 "sharded <max-files-per-directory>"
233 Revision files and rev-prop files are named after the revision they
234 represent, and are placed in a subdirectory of the revs/ and
235 revprops/ directories named according to the 'shard' they belong to.
237 Shards are numbered from zero and contain between one and the
238 maximum number of files per directory specified in the layout's
241 For the "sharded 1000" layout, r1234 will be represented by the
242 revision file revs/1/1234 and rev-prop file revprops/1/1234. The
243 revs/0/ directory will contain revisions 0-999, revs/1/ will contain
244 1000-1999, and so on.
246 The "addressing" option is followed by the name of the addressing mode
247 and any required parameters. The default addressing, if no "addressing"
248 keyword is specified, is the 'physical' addressing.
250 The supported modes, and the parameters they require, are as follows:
253 All existing and future revision files will use the traditional
254 physical addressing scheme. All references are given as rev/offset
255 pairs with "offset" being the byte offset relative to the beginning of
256 the revision in the respective rev or pack file.
259 All existing and future revision files will use logical
260 addressing. It is illegal to use logical addressing on non-sharded
267 Two addressing modes are supported in format 7: physical and logical
268 addressing. Both use the same address format but apply a different
269 interpretation to it. Older formats only support physical addressing.
271 All items are addressed using <rev> <item_index> pairs. In physical
272 addressing mode, item_index is the (ASCII decimal) number of bytes from
273 the start of the revision file to the start of the respective item. For
274 non-packed files that is also the absolute file offset. Revision pack
275 files simply concatenate multiple rev files, i.e. the absolute file offset
278 absolute offset = rev offset taken from manifest + item_index
280 This simple addressing scheme makes it hard to change the location of
281 any item since that may break references from later revisions.
283 Logical addressing uses an index file to translate the rev / item_index
284 pairs into absolute file offsets. There is one such index for every rev /
285 pack file using logical addressing and both are created in sync. That
286 makes it possible to reorder items during pack file creation, particularly
287 to mix items from different revisions.
289 Some item_index values are pre-defined and apply to every revision:
291 0 ... not used / invalid
292 1 ... changed path list
293 2 ... root node revision
295 A reverse index (phys-to-log) is being created as well that allows for
296 translating arbitrary file locations into item descriptions (type, rev,
297 item_index, on-disk length). Known item types
299 0 ... unused / empty section
300 1 ... file representation
301 2 ... directory representation
302 3 ... file property representation
303 4 ... directory property representation
305 6 ... changed paths list
307 The various representation types all share the same morphology. The
308 distinction is only made to allow for more effective reordering heuristics.
309 Zero-length items are allowed.
315 A filesystem can optionally be "packed" to conserve space on disk. The
316 packing process concatenates all the revision files in each full shard to
317 create a pack file. The original shard is removed, and reads are
318 redirected to the pack file.
320 With physical addressing, a manifest file is created for each shard which
321 records the indexes of the corresponding revision files in the pack file.
322 The manifest file consists of a list of offsets, one for each revision in
323 the pack file. The offsets are stored as ASCII decimal, and separated by
326 Revision pack files using logical addressing don't use manifest files but
327 appends index data to the revision contents. The revisions inside a pack
328 file will also get interleaved to reduce I/O for typical access patterns.
329 There is no structural difference between packed and non-packed revision
333 Packing revision properties (format 5: SQLite)
334 ---------------------------
336 This was supported by 1.7-dev builds but never included in a blessed release.
338 See r1143829 of this file:
339 http://svn.apache.org/viewvc/subversion/trunk/subversion/libsvn_fs_fs/structure?view=markup&pathrev=1143829
342 Packing revision properties (format 6+)
343 ---------------------------
345 Similarly to the revision data, packing will concatenate multiple
346 revprops into a single file. Since they are mutable data, we put an
347 upper limit to the size of these files: We will concatenate the data
348 up to the limit and then use a new file for the following revisions.
350 The limit can be set and changed at will in the configuration file.
351 It is 64kB by default. Because a pack file must contain at least one
352 complete property list, files containing just one revision may exceed
355 Furthermore, pack files can be compressed which saves about 75% of
356 disk space. A configuration file flag enables the compression; it is
357 off by default and may be switched on and off at will. The pack size
358 limit is always applied to the uncompressed data. For this reason,
359 the default is 256kB while compression has been enabled.
361 Files are named after their start revision as "<rev>.<counter>" where
362 counter will be increased whenever we rewrite a pack file due to a
363 revprop change. The manifest file contains the list of pack file
364 names, one line for each revision.
366 Many tools track repository global data in revision properties at
367 revision 0. To minimize I/O overhead for those applications, we
368 will never pack that revision, i.e. its data is always being kept
373 Top level: <packed container>
375 We always apply data compression to the pack file - using the
376 SVN_DELTA_COMPRESSION_LEVEL_NONE level if compression is disabled.
377 (Note that compression at SVN_DELTA_COMPRESSION_LEVEL_NONE is not
378 a no-op stream transformation although most of the data will remain
381 container := header '\n' (revprops)+
382 header := start_rev '\n' rev_count '\n' (size '\n')+
384 All numbers in the header are given as ASCII decimals. rev_count
385 is the number of revisions packed into this container. There must
386 be exactly as many "size" and serialized "revprops". The "size"
387 values in the list are the length in bytes of the serialized
388 revprops of the respective revision.
390 Writing to packed revprops
392 The old pack file is being read and the new revprops serialized.
393 If they fit into the same pack file, a temp file with the new
394 content gets written and moved into place just like an non-packed
395 revprop file would. No name change or manifest update required.
397 If they don't fit into the same pack file, i.e. exceed the pack
398 size limit, the pack will be split into 2 or 3 new packs just
399 before and / or after the modified revision.
401 In the current implementation, they will never be merged again.
402 To minimize fragmentation, the initial packing process will only
403 use about 90% of the limit, i.e. leave some room for growth.
405 When a pack file gets split, its counter is being increased
406 creating a new file and leaving the old content in place and
407 available for concurrent readers. Only after the new manifest
408 file got moved into place, will the old pack files be deleted.
410 Write access to revprops is being serialized by the global
411 filesystem write lock. We only need to build a few retries into
412 the reader code to gracefully handle manifest changes and pack
419 A node-rev ID consists of the following three fields:
421 node_revision_id ::= node_id '.' copy_id '.' txn_id
423 At this level, the form of the ID is the same as for BDB - see the
424 section called "ID's" in <../libsvn_fs_base/notes/structure>.
426 In order to support efficient lookup of node-revisions by their IDs
427 and to simplify the allocation of fresh node-IDs during a transaction,
428 we treat the fields of a node-rev ID in new and interesting ways.
430 Within a new transaction:
432 New node-revision IDs assigned within a transaction have a txn-id
433 field of the form "t<txnid>".
435 When a new node-id or copy-id is assigned in a transaction, the ID
436 used is a "_" followed by a base36 number unique to the transaction.
440 Within a revision file, node-revs have a txn-id field of the form
441 "r<rev>/<item_index>", to support easy lookup. See addressing modes
444 During the final phase of a commit, node-revision IDs are rewritten
445 to have repository-wide unique node-ID and copy-ID fields, and to have
446 "r<rev>/<item_index>" txn-id fields.
448 In Format 3 and above, this uniqueness is done by changing a temporary
449 id of "_<base36>" to "<base36>-<rev>". Note that this means that the
450 originating revision of a line of history or a copy can be determined
451 by looking at the node ID.
453 In Format 2 and below, the "current" file contains global base36
454 node-ID and copy-ID counters; during the commit, the counter value is
455 added to the transaction-specific base36 ID, and the value in
456 "current" is adjusted.
458 (It is legal for Format 3 repositories to contain Format 2-style IDs;
459 this just prevents I/O-less node-origin-rev lookup for those nodes.)
461 The temporary assignment of node-ID and copy-ID fields has
462 implications for svn_fs_compare_ids and svn_fs_check_related. The ID
463 _1.0.t1 is not related to the ID _1.0.t2 even though they have the
464 same node-ID, because temporary node-IDs are restricted in scope to
465 the transactions they belong to.
467 There is a lazily created cache mapping from node-IDs to the full
468 node-revision ID where they are created. This is in the node-origins
469 directory; the file name is the node-ID without its last character (or
470 "0" for single-character node IDs) and the contents is a serialized
471 hash mapping from node-ID to node-revision ID. This cache is only
472 used for node-IDs of the pre-Format 3 style.
474 Copy-IDs and copy roots
475 -----------------------
477 Copy-IDs are assigned in the same manner as they are in the BDB
480 * A node-rev resulting from a creation operation (with no copy
481 history) receives the copy-ID of its parent directory.
483 * A node-rev resulting from a copy operation receives a fresh
484 copy-ID, as one would expect.
486 * A node-rev resulting from a modification operation receives a
487 copy-ID depending on whether its predecessor derives from a
488 copy operation or whether it derives from a creation operation
489 with no intervening copies:
491 - If the predecessor does not derive from a copy, the new
492 node-rev receives the copy-ID of its parent directory. If the
493 node-rev is being modified through its created-path, this will
494 be the same copy-ID as the predecessor node-rev has; however,
495 if the node-rev is being modified through a copied ancestor
496 directory (i.e. we are performing a "lazy copy"), this will be
499 - If the predecessor derives from a copy and the node-rev is
500 being modified through its created-path, the new node-rev
501 receives the copy-ID of the predecessor.
503 - If the predecessor derives from a copy and the node-rev is not
504 being modified through its created path, the new node-rev
505 receives a fresh copy-ID. This is called a "soft copy"
506 operation, as distinct from a "true copy" operation which was
507 actually requested through the svn_fs interface. Soft copies
508 exist to ensure that the same <node-ID,copy-ID> pair is not
509 used twice within a transaction.
511 Unlike the BDB implementation, we do not have a "copies" table.
512 Instead, each node-revision record contains a "copyroot" field
513 identifying the node-rev resulting from the true copy operation most
514 proximal to the node-rev. If the node-rev does not itself derive from
515 a copy operation, then the copyroot field identifies the copy of an
516 ancestor directory; if no ancestor directories derive from a copy
517 operation, then the copyroot field identifies the root directory of
523 A revision file contains a concatenation of various kinds of data:
525 * Text and property representations
527 * The changed-path data
528 * Index data (logical addressing only)
529 * Revision / pack file footer (logical addressing only)
531 A representation begins with a line containing either "PLAIN\n" or
532 "DELTA\n" or "DELTA <rev> <item_index> <length>\n", where <rev>,
533 <item_index>, and <length> give the location of the delta base of the
534 representation and the amount of data it contains (not counting the header
535 or trailer). If no base location is given for a delta, the base is the
536 empty stream. After the initial line comes raw svndiff data, followed
537 by a cosmetic trailer "ENDREP\n".
539 If the representation is for the text contents of a directory node,
540 the expanded contents are in hash dump format mapping entry names to
541 "<type> <id>" pairs, where <type> is "file" or "dir" and <id> gives
542 the ID of the child node-rev.
544 If a representation is for a property list, the expanded contents are
545 in the form of a dumped hash map mapping property names to property
548 The marshalling syntax for node-revs is a series of fields terminated
549 by a blank line. Fields have the syntax "<name>: <value>\n", where
550 <name> is a symbolic field name (each symbolic name is used only once
551 in a given node-rev) and <value> is the value data. Unrecognized
552 fields are ignored, for extensibility. The following fields are
555 id The ID of the node-rev
557 pred The ID of the predecessor node-rev
558 count Count of node-revs since the base of the node
559 text "<rev> <item_index> <length> <size> <digest>" for text rep
560 props "<rev> <item_index> <length> <size> <digest>" for props rep
561 <rev> and <item_index> give location of rep
562 <length> gives length of rep, sans header and trailer
563 <size> gives size of expanded rep (*)
564 <digest> gives hex MD5 digest of expanded rep
565 ### in formats >=4, also present:
566 <sha1-digest> gives hex SHA1 digest of expanded rep
567 <uniquifier> see representation_t->uniquifier in fs.h
568 cpath FS pathname node was created at
569 copyfrom "<rev> <path>" of copyfrom data
570 copyroot "<rev> <created-path>" of the root of this copy
571 minfo-cnt The number of nodes under (and including) this node
572 which have svn:mergeinfo.
573 minfo-here Exists if this node itself has svn:mergeinfo.
575 (*) Earlier versions of this document would state that <size> may be 0
576 if the actual value matches <length>. This is only true for property
577 and directory representations and should be avoided in general. File
578 representations may not be handled correctly by SVN before 1.7.20,
579 1.8.12 and 1.9.0, if they have 0 <size> fields for non-empty contents.
580 Releases 1.8.0 through 1.8.11 may have falsely created instances of
581 that (see issue #4554). Finally, 0 <size> fields are only ever legal
582 for DELTA representations if the reconstructed full-text is actually
585 The predecessor of a node-rev crosses both soft and true copies;
586 together with the count field, it allows efficient determination of
587 the base for skip-deltas. The first node-rev of a node contains no
588 "pred" field. A node-revision with no properties may omit the "props"
589 field. A node-revision with no contents (a zero-length file or an
590 empty directory) may omit the "text" field. In a node-revision
591 resulting from a true copy operation, the "copyfrom" field gives the
592 copyfrom data. The "copyroot" field identifies the root node-revision
593 of the copy; it may be omitted if the node-rev is its own copy root
594 (as is the case for node-revs with copy history, and for the root node
595 of revision 0). Copy roots are identified by revision and
596 created-path, not by node-rev ID, because a copy root may be a
597 node-rev which exists later on within the same revision file, meaning
598 its location is not yet known.
600 The changed-path data is represented as a series of changed-path
601 items, each consisting of two lines. The first line has the format
602 "<id> <action> <text-mod> <prop-mod> <mergeinfo-mod> <path>\n",
603 where <id> is the node-rev ID of the new node-rev, <action> is "add",
604 "delete", "replace", or "modify", <text-mod>, <prop-mod>, and
605 <mergeinfo-mod> are "true" or "false" indicating whether the text,
606 properties and/or mergeinfo changed, and <path> is the changed pathname.
607 For deletes, <id> is the node-rev ID of the deleted node-rev, and
608 <text-mod> and <prop-mod> are always "false". The second line has the
609 format "<rev> <path>\n" containing the node-rev's copyfrom information
610 if it has any; if it does not, the second line is blank.
612 Starting with FS format 4, <action> may contain the kind ("file" or
613 "dir") of the node, after a hyphen; for example, an added directory
614 may be represented as "add-dir".
616 Prior to FS format 7, <mergeinfo-mod> flag is not available. It may
617 also be missing in revisions upgraded from pre-f7 formats.
619 In physical addressing mode, at the very end of a rev file is a pair of
620 lines containing "\n<root-offset> <cp-offset>\n", where <root-offset> is
621 the offset of the root directory node revision and <cp-offset> is the
622 offset of the changed-path data.
624 In logical addressing mode, the revision footer has the form
626 <l2p offset> <l2p checksum> <p2l offset> <p2l checksum><terminal byte>
628 The terminal byte contains the length (as plain 8 bit value) of the footer
629 excluding that length byte. The first offset is the start of the log-to-
630 phys index, followed by the digest of the MD5 checksum over its content.
631 The other pair gives the same of for the phys-to-log index.
633 All numbers in the rev file format are unsigned and are represented as
639 A transaction directory has the following layout:
641 props Transaction props
642 props-final Final transaction props (optional)
643 next-ids Next temporary node-ID and copy-ID
644 changes Changed-path information so far
645 node.<nid>.<cid> New node-rev data for node
646 node.<nid>.<cid>.props Props for new node-rev, if changed
647 node.<nid>.<cid>.children Directory contents for node-rev
648 <sha1> Text representation of that sha1
650 In FS formats 1 and 2, it also contains:
652 rev Prototype rev file with new text reps
653 rev-lock Lockfile for writing to the above
655 (In newer formats, these files are in the txn-protorevs/ directory.)
657 In format 7+ logical addressing mode, it contains two additional index
658 files (see structure-indexes for a detailed description) and one more
661 itemidx Next item_index value as decimal integer
662 index.l2p Log-to-phys proto-index
663 index.p2l Phys-to-log proto-index
665 The prototype rev file is used to store the text representations as
666 they are received from the client. To ensure that only one client is
667 writing to the file at a given time, the "rev-lock" file is locked for
668 the duration of each write.
670 The three kinds of props files are all in hash dump format. The "props"
671 file will always be present. The "node.<nid>.<cid>.props" file will
672 only be present if the node-rev properties have been changed. The
673 "props-final" only exists while converting the transaction into a revision.
676 The <sha1> files have been introduced in FS format 6. Their content
677 is that of text rep references: "<rev> <item_offset> <length> <size> <digest>"
678 They will be written for text reps in the current transaction and be
679 used to eliminate duplicate reps within that transaction.
681 The "next-ids" file contains a single line "<next-temp-node-id>
682 <next-temp-copy-id>\n" giving the next temporary node-ID and copy-ID
683 assignments (without the leading underscores). The next node-ID is
684 also used as a uniquifier for representations which may share the same
687 The "children" file for a node-revision begins with a copy of the hash
688 dump representation of the directory entries from the old node-rev (or
689 a dump of the empty hash for new directories), and then an incremental
690 hash dump entry for each change made to the directory.
692 The "changes" file contains changed-path entries in the same form as
693 the changed-path entries in a rev file, except that <id> and <action>
694 may both be "reset" (in which case <text-mod> and <prop-mod> are both
695 always "false") to indicate that all changes to a path should be
696 considered undone. Reset entries are only used during the final merge
697 phase of a transaction. Actions in the "changes" file always contain
698 a node kind, even if the FS format is older than format 4.
700 The node-rev files have the same format as node-revs in a revision
701 file, except that the "text" and "props" fields are augmented as
704 * The "props" field may have the value "-1" if properties have
705 been changed and are contained in a "props" file within the
706 node-rev subdirectory.
708 * For directory node-revs, the "text" field may have the value
709 "-1" if entries have been changed and are contained in a
710 "contents" file in the node-rev subdirectory.
712 * For the directory node-rev representing the root of the
713 transaction, the "is-fresh-txn-root" field indicates that it has
714 not been made mutable yet (see Issue #2608).
716 * For file node-revs, the "text" field may have the value "-1
717 <offset> <length> <size> <digest>" if the text representation is
718 within the prototype rev file.
720 * The "copyroot" field may have the value "-1 <created-path>" if the
721 copy root of the node-rev is part of the transaction in process.
726 Locks in FSFS are stored in serialized hash format in files whose
727 names are MD5 digests of the FS path which the lock is associated
728 with. For the purposes of keeping directory inode usage down, these
729 digest files live in subdirectories of the main lock directory whose
730 names are the first 3 characters of the digest filename.
732 Also stored in the digest file for a given FS path are pointers to
733 other digest files which contain information associated with other FS
734 paths that are beneath our path (an immediate child thereof, or a
735 grandchild, or a great-grandchild, ...).
737 To answer the question, "Does path FOO have a lock associated with
738 it?", one need only generate the MD5 digest of FOO's
739 absolute-in-the-FS path (say, 3b1b011fed614a263986b5c4869604e8), look
740 for a file located like so:
742 /path/to/repos/locks/3b1/3b1b011fed614a263986b5c4869604e8
744 And then see if that file contains lock information.
746 To inquire about locks on children of the path FOO, you would
747 reference the same path as above, but look for a list of children in
748 that file (instead of lock information). Children are listed as MD5
749 digests, too, so you would simply iterate over those digests and
750 consult the files they reference for lock information.
756 Format 7 introduces logical addressing that requires item indexes
757 to be translated / mapped to physical rev / pack file offsets.
758 These indexes are appended to the respective rev / pack file.
760 Details of the binary format used by these index files can be
761 found in structure-indexes.