share/doc/psd/01.cacm/p2

   1 .\" This module is believed to contain source code proprietary to AT&T.
   2 .\" Use and redistribution is subject to the Berkeley Software License
   3 .\" Agreement and your Software Agreement with AT&T (Western Electric).
   4 .\"
   5 .\"     @(#)p2  8.1 (Berkeley) 6/8/93
   6 .\"
   7 .\" $FreeBSD$
   8 .SH
   9 3.6 I/O calls
  10 .PP
  11 The system calls to do I/O are designed to eliminate
  12 the differences between the various devices and styles of
  13 access.
  14 There is no distinction between ``random''
  15 and ``sequential'' I/O, nor is any logical record size imposed
  16 by the system.
  17 The size of an ordinary file is determined
  18 by the number of bytes written on it;
  19 no predetermination of the size of a file is necessary
  20 or possible.
  21 .PP
  22 To illustrate the essentials of I/O,
  23 some of the basic calls are
  24 summarized below
  25 in an anonymous language that will
  26 indicate the required parameters without getting into the
  27 underlying
  28 complexities.
  29 Each call to the system may potentially result in an error
  30 return, which for simplicity is not represented
  31 in the calling sequence.
  32 .PP
  33 To read or write a file assumed to exist already, it must
  34 be opened by the following call:
  35 .P1
  36 filep = open\|(\|name, flag\|)
  37 .P2
  38 where
  39 .UL name
  40 indicates the name of the file.
  41 An arbitrary path name may be given.
  42 The
  43 .UL flag
  44 argument indicates whether the file is to be read, written,
  45 or ``updated,'' that is, read and written simultaneously.
  46 .PP
  47 The returned value
  48 .UL filep
  49 is called a
  50 .IT "file descriptor" .
  51 It is a small integer used to identify the file
  52 in subsequent calls to read, write,
  53 or otherwise manipulate the file.
  54 .PP
  55 To create a new file or completely rewrite an old one,
  56 there is a
  57 .UL create
  58 system call that
  59 creates the given file if it does not exist,
  60 or truncates it to zero length
  61 if it does exist;
  62 .UL create
  63 also opens the new file for writing
  64 and, like
  65 .UL open ,
  66 returns a file descriptor.
  67 .PP
  68 The file system maintains no locks visible to the user, nor is there any
  69 restriction on the number of users who may have a file
  70 open for reading or writing.
  71 Although it is possible for the contents of a file
  72 to become scrambled when two users write on it simultaneously,
  73 in practice difficulties do not arise.
  74 We take the view that locks are neither
  75 necessary nor sufficient, in our environment,
  76 to prevent interference between users of the same file.
  77 They are unnecessary because we are not
  78 faced with large, single-file data bases
  79 maintained by independent processes.
  80 They are insufficient because
  81 locks in the ordinary sense, whereby
  82 one user is prevented from writing on a file that another
  83 user is reading,
  84 cannot prevent confusion
  85 when, for example, both users are editing
  86 a file with an editor that makes
  87 a copy of the file being edited.
  88 .PP
  89 There are, however,
  90 sufficient internal interlocks to maintain
  91 the logical consistency of the file system
  92 when two users engage simultaneously in
  93 activities such as writing on
  94 the same file,
  95 creating files in the same directory,
  96 or deleting each other's open files.
  97 .PP
  98 Except as indicated below, reading and writing
  99 are sequential.
 100 This means that if a particular
 101 byte in the file was the last byte written (or read),
 102 the next I/O call implicitly refers to the
 103 immediately following byte.
 104 For each open file there is a pointer, maintained
 105 inside the system,
 106 that indicates the next byte to be read
 107 or written.
 108 If
 109 .IT n
 110 bytes are read or written, the pointer advances
 111 by
 112 .IT n
 113 bytes.
 114 .PP
 115 Once a file is open, the following calls
 116 may be used:
 117 .P1
 118 n = read\|(\|filep, buffer, count\|)
 119 n = write\|(\|filep, buffer, count\|)
 120 .P2
 121 Up to
 122 .UL count
 123 bytes are transmitted between the file specified
 124 by
 125 .UL filep
 126 and the byte array
 127 specified by
 128 .UL buffer .
 129 The returned value
 130 .UL n
 131 is the number of bytes actually transmitted.
 132 In the
 133 .UL write
 134 case,
 135 .UL n
 136 is the same as
 137 .UL count
 138 except under exceptional conditions, such as I/O errors or
 139 end of physical medium on special files;
 140 in a
 141 .UL read ,
 142 however,
 143 .UL n
 144 may without error be less than
 145 .UL count .
 146 If the read pointer is so near the end of the
 147 file that reading
 148 .UL count
 149 characters
 150 would cause reading beyond the end, only sufficient
 151 bytes are transmitted to reach the end of the
 152 file;
 153 also, typewriter-like terminals
 154 never return more than one line of input.
 155 When a
 156 .UL read
 157 call returns with
 158 .UL n
 159 equal
 160 to zero, the end of the file has been reached.
 161 For disk files this occurs when the read pointer
 162 becomes equal to the current
 163 size of the file.
 164 It is possible to generate an end-of-file
 165 from a terminal by use of an escape
 166 sequence that depends on the device used.
 167 .PP
 168 Bytes written affect only those parts of a file implied by
 169 the position of the write pointer and the
 170 count; no other part of the file
 171 is changed.
 172 If the last byte lies beyond the end of the file, the
 173 file is made to grow as needed.
 174 .PP
 175 To do random (direct-access) I/O
 176 it is only necessary to move the read or write pointer
 177 to the appropriate location in the file.
 178 .P1
 179 location = lseek\|(\|filep, offset, base\|)
 180 .P2
 181 The pointer
 182 associated with
 183 .UL filep
 184 is moved to a position
 185 .UL offset
 186 bytes from the beginning of the file, from the current position
 187 of the pointer, or from the end of the file,
 188 depending on
 189 .UL base.
 190 .UL \&offset
 191 may be negative.
 192 For some devices (e.g., paper
 193 tape and
 194 terminals) seek calls are
 195 ignored.
 196 The actual offset from the beginning of the file
 197 to which the pointer was moved is returned
 198 in
 199 .UL location .
 200 .PP
 201 There are several additional system entries
 202 having to do with I/O and with the file
 203 system that will not be discussed.
 204 For example:
 205 close a file,
 206 get the status of a file,
 207 change the protection mode or the owner
 208 of a file,
 209 create a directory,
 210 make a link to an existing file,
 211 delete a file.
 212 .SH
 213 IV. IMPLEMENTATION OF THE FILE SYSTEM
 214 .PP
 215 As mentioned in Section 3.2 above, a directory entry contains
 216 only a name for the associated file and a pointer to the
 217 file itself.
 218 This pointer is an integer called the
 219 .IT i-number
 220 (for index number)
 221 of the file.
 222 When the file is accessed,
 223 its i-number is used as an index into
 224 a system table (the
 225 .IT i-list \|)
 226 stored in a known
 227 part of the device on which
 228 the directory resides.
 229 The entry found thereby (the file's
 230 .IT i-node \|)
 231 contains
 232 the description of the file:
 233 .IP i
 234 the user and group-\*sID\*n of its owner
 235 .IP ii
 236 its protection bits
 237 .IP iii
 238 the physical disk or tape addresses for the file contents
 239 .IP iv
 240 its size
 241 .IP v
 242 time of creation, last use, and last modification
 243 .IP vi
 244 the number of links to the file, that is, the number of times it appears in a directory
 245 .IP vii
 246 a code indicating whether the file is a directory, an ordinary file, or a special file.
 247 .LP
 248 The purpose of an
 249 .UL open
 250 or
 251 .UL create
 252 system call is to turn the path name given by the user
 253 into an i-number
 254 by searching the explicitly or implicitly named directories.
 255 Once a file is open,
 256 its device, i-number, and read/write pointer are stored in a system table
 257 indexed by the file descriptor returned by the
 258 .UL open
 259 or
 260 .UL create .
 261 Thus, during a subsequent
 262 call to read or write the
 263 file,
 264 the descriptor
 265 may be easily related to the information necessary to access the file.
 266 .PP
 267 When a new file is created,
 268 an i-node is allocated for it and a directory entry is made
 269 that contains the name of the file and the i-node
 270 number.
 271 Making a link to an existing file involves
 272 creating a directory entry with the new name,
 273 copying the i-number from the original file entry,
 274 and incrementing the link-count field of the i-node.
 275 Removing (deleting) a file is done by
 276 decrementing the
 277 link-count of the i-node specified by its directory entry
 278 and erasing the directory entry.
 279 If the link-count drops to 0,
 280 any disk blocks in the file
 281 are freed and the i-node is de-allocated.
 282 .PP
 283 The space on all disks that
 284 contain a file system is divided into a number of
 285 512-byte
 286 blocks logically addressed from 0 up to a limit that
 287 depends on the device.
 288 There is space in the i-node of each file for 13 device addresses.
 289 For nonspecial files,
 290 the first 10 device addresses point at the first
 291 10 blocks of the file.
 292 If the file is larger than 10 blocks,
 293 the 11 device address points to an
 294 indirect block containing up to 128 addresses
 295 of additional blocks in the file.
 296 Still larger files use the twelfth device address
 297 of the i-node to point to
 298 a double-indirect block naming
 299 128 indirect blocks,
 300 each
 301 pointing to 128 blocks of the file.
 302 If required,
 303 the thirteenth device address is
 304 a triple-indirect block.
 305 Thus files may conceptually grow to
 306 [\|(10+128+128\u\s62\s0\d+128\u\s63\s0\d)\*m512\|] bytes.
 307 Once opened,
 308 bytes numbered below 5120 can be read with a single
 309 disk access;
 310 bytes in the range 5120 to 70,656
 311 require two accesses;
 312 bytes in the range 70,656
 313 to 8,459,264
 314 require three accesses;
 315 bytes from there to the
 316 largest file
 317 (1,082,201,088)
 318 require four accesses.
 319 In practice,
 320 a device cache mechanism
 321 (see below)
 322 proves effective in eliminating
 323 most of the indirect fetches.
 324 .PP
 325 The foregoing discussion applies to ordinary files.
 326 When an I/O request is made to a file whose i-node indicates that it
 327 is special,
 328 the last 12 device address words are immaterial,
 329 and the first specifies
 330 an internal
 331 .IT "device name" ,
 332 which is interpreted as a pair of numbers
 333 representing,
 334 respectively, a device type
 335 and subdevice number.
 336 The device type indicates which
 337 system routine will deal with I/O on that device;
 338 the subdevice number selects, for example, a disk drive
 339 attached to a particular controller or one of several
 340 similar terminal interfaces.
 341 .PP
 342 In this environment, the implementation of the
 343 .UL mount
 344 system call (Section 3.4) is quite straightforward.
 345 .UL \&mount
 346 maintains a system table whose
 347 argument is the i-number and device name of the
 348 ordinary file specified
 349 during the
 350 .UL mount ,
 351 and whose corresponding value is the
 352 device name of the indicated special file.
 353 This table is searched for each i-number/device pair
 354 that turns up while a path name is being scanned
 355 during an
 356 .UL open
 357 or
 358 .UL create ;
 359 if a match is found,
 360 the i-number is replaced by the i-number of the root
 361 directory
 362 and the device name is replaced by the table value.
 363 .PP
 364 To the user, both reading and writing of files appear to
 365 be synchronous and unbuffered.
 366 That is, immediately after
 367 return from a
 368 .UL read
 369 call the data are available; conversely,
 370 after a
 371 .UL write
 372 the user's workspace may be reused.
 373 In fact, the system maintains a rather complicated
 374 buffering mechanism that reduces greatly the number
 375 of I/O operations required to access a file.
 376 Suppose a
 377 .UL write
 378 call is made specifying transmission
 379 of a single byte.
 380 The system
 381 will search its buffers to see
 382 whether the affected disk block currently resides in main memory;
 383 if not, it will be read in from the device.
 384 Then the affected byte is replaced in the buffer and an
 385 entry is made in a list of blocks to be written.
 386 The return from the
 387 .UL write
 388 call may then take place,
 389 although the actual I/O may not be completed until a later time.
 390 Conversely, if a single byte is read, the system determines
 391 whether the secondary storage block in which the byte is located is already
 392 in one of the system's buffers; if so, the byte can be returned immediately.
 393 If not, the block is read into a buffer and the byte picked out.
 394 .PP
 395 The system recognizes when
 396 a program has
 397 made accesses to
 398 sequential blocks of a file,
 399 and asynchronously
 400 pre-reads the next block.
 401 This significantly reduces
 402 the running time of most programs
 403 while adding little to
 404 system overhead.
 405 .PP
 406 A program that reads or writes files in units of 512 bytes
 407 has an advantage over a program that reads or writes
 408 a single byte at a time, but the gain is not immense;
 409 it comes mainly from the avoidance of system overhead.
 410 If a program is used rarely or does
 411 no great volume of I/O, it may quite reasonably
 412 read and write in units as small as it wishes.
 413 .PP
 414 The notion of the i-list is an unusual feature
 415 of
 416 .UX .
 417 In practice, this method of organizing the file system
 418 has proved quite reliable and easy to deal with.
 419 To the system itself, one of its strengths is
 420 the fact that each file has a short, unambiguous name
 421 related in a simple way to the protection, addressing,
 422 and other information needed to access the file.
 423 It also permits a quite simple and rapid
 424 algorithm for checking the consistency of a file system,
 425 for example, verification
 426 that the portions of each device containing useful information
 427 and those free to be allocated are disjoint and together
 428 exhaust the space on the device.
 429 This algorithm is independent
 430 of the directory hierarchy, because it need only scan
 431 the linearly organized i-list.
 432 At the same time the notion of the i-list induces certain
 433 peculiarities not found in other file system organizations.
 434 For example, there is the question of who is to be charged
 435 for the space a file occupies,
 436 because all directory entries for a file have equal status.
 437 Charging the owner of a file is unfair in general,
 438 for one user may create a file, another may link to
 439 it, and the first user may delete the file.
 440 The first user is still the owner of the
 441 file, but it should be charged
 442 to the second user.
 443 The simplest reasonably fair algorithm
 444 seems to be to spread the charges
 445 equally among users who have links to a file.
 446 Many installations
 447 avoid the
 448 issue by not charging any fees at all.