share/doc/papers/jail/paper.ms

   1 .\"
   2 .\" $FreeBSD$
   3 .\"
   4 .if n .ftr C R
   5 .ig TL
   6 .ds CH "
   7 .nr PI 2n
   8 .nr PS 12
   9 .nr LL 15c
  10 .nr PO 3c
  11 .nr FM 3.5c
  12 .po 3c
  13 .TL
  14 Jails: Confining the omnipotent root.
  15 .FS
  16 This paper was presented at the 2nd International System Administration and Networking Conference "SANE 2000" May 22-25, 2000 in Maastricht, The Netherlands and is published in the proceedings.
  17 .FE
  18 .AU
  19 Poul-Henning Kamp <phk@FreeBSD.org>
  20 .AU
  21 Robert N. M. Watson <rwatson@FreeBSD.org>
  22 .AI
  23 The FreeBSD Project
  24 .FS
  25 This work was sponsored by \fChttp://www.servetheweb.com/\fP and
  26 donated to the FreeBSD Project for inclusion in the FreeBSD
  27 OS.  FreeBSD 4.0-RELEASE was the first release including this
  28 code.
  29 Follow-on work was sponsored by Safeport Network Services,
  30 \fChttp://www.safeport.com/\fP
  31 .FE
  32 .AB
  33 The traditional UNIX security model is simple but inexpressive.
  34 Adding fine-grained access control improves the expressiveness,
  35 but often dramatically increases both the cost of system management
  36 and implementation complexity.
  37 In environments with a more complex management model, with delegation
  38 of some management functions to parties under varying degrees of trust,
  39 the base UNIX model and most natural
  40 extensions are inappropriate at best.
  41 Where multiple mutually un-trusting parties are introduced,
  42 ``inappropriate'' rapidly transitions to ``nightmarish'', especially
  43 with regards to data integrity and privacy protection.
  44 .PP
  45 The FreeBSD ``Jail'' facility provides the ability to partition
  46 the operating system environment, while maintaining the simplicity
  47 of the UNIX ``root'' model.
  48 In Jail, users with privilege find that the scope of their requests
  49 is limited to the jail, allowing system administrators to delegate
  50 management capabilities for each virtual machine
  51 environment.
  52 Creating virtual machines in this manner has many potential uses; the
  53 most popular thus far has been for providing virtual machine services
  54 in Internet Service Provider environments.
  55 .AE
  56 .NH
  57 Introduction
  58 .PP
  59 The UNIX access control mechanism is designed for an environment with two
  60 types of users: those with, and without administrative privilege.
  61 Within this framework, every attempt is made to provide an open
  62 system, allowing easy sharing of files and inter-process communication.
  63 As a member of the UNIX family, FreeBSD inherits these
  64 security properties.
  65 Users of FreeBSD in non-traditional UNIX environments must balance
  66 their need for strong application support, high network performance
  67 and functionality, and low total cost of ownership with the need
  68 for alternative security models that are difficult or impossible to
  69 implement with the UNIX security mechanisms.
  70 .PP
  71 One such consideration is the desire to delegate some (but not all)
  72 administrative functions to untrusted or less trusted parties, and
  73 simultaneously impose system-wide mandatory policies on process
  74 interaction and sharing.
  75 Attempting to create such an environment in the current-day FreeBSD
  76 security environment is both difficult and costly: in many cases,
  77 the burden of implementing these policies falls on user
  78 applications, which means an increase in the size and complexity
  79 of the code base, in turn translating to higher development
  80 and maintenance cost, as well as less overall flexibility.
  81 .PP
  82 This abstract risk becomes more clear when applied to a practical,
  83 real-world example:
  84 many web service providers turn to the FreeBSD
  85 operating system to host customer web sites, as it provides a
  86 high-performance, network-centric server environment.
  87 However, these providers have a number of concerns on their plate, both in
  88 terms of protecting the integrity and confidentiality of their own
  89 files and services from their customers, as well as protecting the files
  90 and services of one customer from (accidental or
  91 intentional) access by any other customer.
  92 At the same time, a provider would like to provide
  93 substantial autonomy to customers, allowing them to install and
  94 maintain their own software, and to manage their own services,
  95 such as web servers and other content-related daemon programs.
  96 .PP
  97 This problem space points strongly in the direction of a partitioning
  98 solution, in which customer processes and storage are isolated from those of
  99 other customers, both in terms of accidental disclosure of data or process
 100 information, but also in terms of the ability to modify files or processes
 101 outside of a compartment.
 102 Delegation of management functions within the system must
 103 be possible, but not at the cost of system-wide requirements, including
 104 integrity and privacy protection between partitions.
 105 .PP
 106 However, UNIX-style access control makes it notoriously difficult to
 107 compartmentalise functionality.
 108 While mechanisms such as chroot(2) provide a modest
 109 level compartmentalisation, it is well known
 110 that these mechanisms have serious shortcomings, both in terms of the
 111 scope of their functionality, and effectiveness at what they provide \s-2[CHROOT]\s+2.
 112 .PP
 113 In the case of the chroot(2) call, a process's visibility of
 114 the file system name-space is limited to a single subtree.
 115 However, the compartmentalisation does not extend to the process
 116 or networking spaces and therefore both observation of and interference
 117 with processes outside their compartment is possible.
 118 .PP
 119 To this end, we describe the new FreeBSD ``Jail'' facility, which
 120 provides a strong partitioning solution, leveraging existing
 121 mechanisms, such as chroot(2), to what effectively amounts to a
 122 virtual machine environment.  Processes in a jail are provided
 123 full access to the files that they may manipulate, processes they
 124 may influence, and network services they can make use of, and neither
 125 access nor visibility of files, processes or network services outside
 126 their partition.
 127 .PP
 128 Unlike other fine-grained security solutions, Jail does not
 129 substantially increase the policy management requirements for the
 130 system administrator, as each Jail is a virtual FreeBSD environment
 131 permitting local policy to be independently managed, with much the
 132 same properties as the main system itself, making Jail easy to use
 133 for the administrator, and far more compatible with applications.
 134 .NH
 135 Traditional UNIX Security, or, ``God, root, what difference?" \s-2[UF]\s+2.
 136 .PP
 137 The traditional UNIX access model assigns numeric uids to each user of the
 138 system. In turn, each process ``owned'' by a user will be tagged with that
 139 user's uid in an unforgeable manner.  The uids serve two purposes: first,
 140 they determine how discretionary access control mechanisms will be applied, and
 141 second, they are used to determine whether special privileges are accorded.
 142 .PP
 143 In the case of discretionary access controls, the primary object protected is
 144 a file.  The uid (and related gids indicating group membership) are mapped to
 145 a set of rights for each object, courtesy the UNIX file mode, in effect acting
 146 as a limited form of access control list.  Jail is, in general, not concerned
 147 with modifying the semantics of discretionary access control mechanisms,
 148 although there are important implications from a management perspective.
 149 .PP
 150 For the purposes of determining whether special privileges are accorded to a
 151 process, the check is simple: ``is the numeric uid equal to 0 ?''.
 152 If so, the
 153 process is acting with ``super-user privileges'', and all access checks are
 154 granted, in effect allowing the process the ability to do whatever it wants
 155 to \**.
 156 .FS
 157 \&... no matter how patently stupid it may be.
 158 .FE
 159 .PP
 160 For the purposes of human convenience, uid 0 is canonically allocated
 161 to the ``root'' user \s-2[ROOT]\s+2.
 162 For the purposes of jail, this behaviour is extremely relevant: many of
 163 these privileged operations can be used to manage system hardware and
 164 configuration, file system name-space, and special network operations.
 165 .PP
 166 Many limitations to this model are immediately clear: the root user is a
 167 single, concentrated source of privilege that is exposed to many pieces of
 168 software, and as such an immediate target for attacks.  In the event of a
 169 compromise of the root capability set, the attacker has complete control over
 170 the system.  Even without an attacker, the risks of a single administrative
 171 account are serious: delegating a narrow scope of capability to an
 172 inexperienced administrator is difficult, as the granularity of delegation is
 173 that of all system management abilities.  These features make the omnipotent
 174 root account a sharp, efficient and extremely dangerous tool.
 175 .PP
 176 The BSD family of operating systems have implemented the ``securelevel''
 177 mechanism which allows the administrator to block certain configuration
 178 and management functions from being performed by root,
 179 until the system is restarted and brought up into single-user mode.
 180 While this does provide some amount of protection in the case of a root
 181 compromise of the machine, it does nothing to address the need for
 182 delegation of certain root abilities.
 183 .NH
 184 Other Solutions to the Root Problem
 185 .PP
 186 Many operating systems attempt to address these limitations by providing
 187 fine-grained access controls for system resources \s-2[BIBA]\s+2.
 188 These efforts vary in
 189 degrees of success, but almost all suffer from at least three serious
 190 limitations:
 191 .PP
 192 First, increasing the granularity of security controls increases the
 193 complexity of the administration process, in turn increasing both the
 194 opportunity for incorrect configuration, as well as the demand on
 195 administrator time and resources.  In many cases, the increased complexity
 196 results in significant frustration for the administrator, which may result
 197 in two
 198 disastrous types of policy: ``all doors open as it's too much trouble'', and
 199 ``trust that the system is secure, when in fact it isn't''.
 200 .PP
 201 The extent of the trouble is best illustrated by the fact that an entire
 202 niche industry has emerged providing tools to manage fine grained security
 203 controls \s-2[UAS]\s+2.
 204 .PP
 205 Second, usefully segregating capabilities and assigning them to running code
 206 and users is very difficult.  Many privileged operations in UNIX seem
 207 independent, but are in fact closely related, and the handing out of one
 208 privilege may, in effect, be transitive to the many others.  For example, in
 209 some trusted operating systems, a system capability may be assigned to a
 210 running process to allow it to read any file, for the purposes of backup.
 211 However, this capability is, in effect, equivalent to the ability to switch to
 212 any other account, as the ability to access any file provides access to system
 213 keying material, which in turn provides the ability to authenticate as any
 214 user.  Similarly, many operating systems attempt to segregate management
 215 capabilities from auditing capabilities.  In a number of these operating
 216 systems, however, ``management capabilities'' permit the administrator to
 217 assign ``auditing capabilities'' to itself, or another account, circumventing
 218 the segregation of capability.
 219 .PP
 220 Finally, introducing new security features often involves introducing new
 221 security management APIs.  When fine-grained capabilities are introduced to
 222 replace the setuid mechanism in UNIX-like operating systems, applications that
 223 previously did an ``appropriateness check'' to see if they were running as
 224 root before executing must now be changed to know that they need not run as
 225 root.  In the case of applications running with privilege and executing other
 226 programs, there is now a new set of privileges that must be voluntarily given
 227 up before executing another program.  These change can introduce significant
 228 incompatibility for existing applications, and make life more difficult for
 229 application developers who may not be aware of differing security semantics on
 230 different systems \s-2[POSIX1e]\s+2.
 231 .NH
 232 The Jail Partitioning Solution
 233 .PP
 234 Jail neatly side-steps the majority of these problems through partitioning.
 235 Rather
 236 than introduce additional fine-grained access control mechanism, we partition
 237 a FreeBSD environment (processes, file system, network resources) into a
 238 management environment, and optionally subset Jail environments.  In doing so,
 239 we simultaneously maintain the existing UNIX security model, allowing
 240 multiple users and a privileged root user in each jail, while
 241 limiting the scope of root's activities to his jail.
 242 Consequently the administrator of a
 243 FreeBSD machine can partition the machine into separate jails, and provide
 244 access to the super-user account in each of these without losing control of
 245 the over-all environment.
 246 .PP
 247 A process in a partition is referred to as ``in jail''.  When a FreeBSD
 248 system is booted up after a fresh install, no processes will be in jail.
 249 When
 250 a process is placed in a jail, it, and any descendents of the process created
 251 after the jail creation, will be in that jail.  A process may be in only one
 252 jail, and after creation, it can not leave the jail.
 253 Jails are created when a
 254 privileged process calls the jail(2) syscall, with a description of the jail as an
 255 argument to the call.  Each call to jail(2) creates a new jail; the only way
 256 for a new process to enter the jail is by inheriting access to the jail from
 257 another process already in that jail.
 258 Processes may never
 259 leave the jail they created, or were created in.
 260 .KF
 261 .if t .PSPIC jail01.eps 4i
 262 .ce 1
 263 Fig. 1 \(em Schematic diagram of machine with two configured jails
 264 .sp
 265 .KE
 266 .PP
 267 Membership in a jail involves a number of restrictions: access to the file
 268 name-space is restricted in the style of chroot(2), the ability to bind network
 269 resources is limited to a specific IP address, the ability to manipulate
 270 system resources and perform privileged operations is sharply curtailed, and
 271 the ability to interact with other processes is limited to only processes
 272 inside the same jail.
 273 .PP
 274 Jail takes advantage of the existing chroot(2) behaviour to limit access to the
 275 file system name-space for jailed processes.  When a jail is created, it is
 276 bound to a particular file system root.
 277 Processes are unable to manipulate files that they cannot address,
 278 and as such the integrity and confidentiality of files outside of the jail
 279 file system root are protected.  Traditional mechanisms for breaking out of
 280 chroot(2) have been blocked.
 281 In the expected and documented configuration, each jail is provided
 282 with its exclusive file system root, and standard FreeBSD directory layout,
 283 but this is not mandated by the implementation.
 284 .PP
 285 Each jail is bound to a single IP address: processes within the jail may not
 286 make use of any other IP address for outgoing or incoming connections; this
 287 includes the ability to restrict what network services a particular jail may
 288 offer.  As FreeBSD distinguishes attempts to bind all IP addresses from
 289 attempts to bind a particular address, bind requests for all IP addresses are
 290 redirected to the individual Jail address.  Some network functionality
 291 associated with privileged calls are wholesale disabled due to the nature of the
 292 functionality offered, in particular facilities which would allow ``spoofing''
 293 of IP numbers or disruptive traffic to be generated have been disabled.
 294 .PP
 295 Processes running without root privileges will notice few, if any differences
 296 between a jailed environment or un-jailed environment.  Processes running with
 297 root privileges will find that many restrictions apply to the privileged calls
 298 they may make.  Some calls will now return an access error \(em for example, an
 299 attempt to create a device node will now fail.  Others will have a more
 300 limited scope than normal \(em attempts to bind a reserved port number on all
 301 available addresses will result in binding only the address associated with
 302 the jail.  Other calls will succeed as normal: root may read a file owned by
 303 any uid, as long as it is accessible through the jail file system name-space.
 304 .PP
 305 Processes within the jail will find that they are unable to interact or
 306 even verify the existence of
 307 processes outside the jail \(em  processes within the jail are
 308 prevented from delivering signals to processes outside the jail, as well as
 309 connecting to those processes with debuggers, or even see them in the
 310 sysctl or process file system monitoring mechanisms.  Jail does not prevent,
 311 nor is it intended to prevent, the use of covert channels or communications
 312 mechanisms via accepted interfaces \(em for example, two processes may communicate
 313 via sockets over the IP network interface.  Nor does it attempt to provide
 314 scheduling services based on the partition; however, it does prevent calls
 315 that interfere with normal process operation.
 316 .PP
 317 As a result of these attempts to retain the standard FreeBSD API and
 318 framework, almost all applications will run unaffected.  Standard system
 319 services such as Telnet, FTP, and SSH all behave normally, as do most third
 320 party applications, including the popular Apache web server.
 321 .NH
 322 Jail Implementation
 323 .PP
 324 Processes running with root privileges in the jail find that there are serious
 325 restrictions on what it is capable of doing \(em in particular, activities that
 326 would extend outside of the jail:
 327 .IP "" 5n
 328 \(bu Modifying the running kernel by direct access and loading kernel
 329 modules is prohibited.
 330 .IP
 331 \(bu Modifying any of the network configuration, interfaces, addresses, and
 332 routing table is prohibited.
 333 .IP
 334 \(bu Mounting and unmounting file systems is prohibited.
 335 .IP
 336 \(bu Creating device nodes is prohibited.
 337 .IP
 338 \(bu Accessing raw, divert, or routing sockets is prohibited.
 339 .IP
 340 \(bu Modifying kernel runtime parameters, such as most sysctl settings, is
 341 prohibited.
 342 .IP
 343 \(bu Changing securelevel-related file flags is prohibited.
 344 .IP
 345 \(bu Accessing network resources not associated with the jail is prohibited.
 346 .PP
 347 Other privileged activities are permitted as long as they are limited to the
 348 scope of the jail:
 349 .IP "" 5n
 350 \(bu Signalling any process within the jail is permitted.
 351 .IP
 352 \(bu Changing the ownership and mode of any file within the jail is permitted, as
 353 long as the file flags permit this.
 354 .IP
 355 \(bu Deleting any file within the jail is permitted, as long as the file flags
 356 permit this.
 357 .IP
 358 \(bu Binding reserved TCP and UDP port numbers on the jails IP address is
 359 permitted.  (Attempts to bind TCP and UDP ports using INADDR_ANY will be
 360 redirected to the jails IP address.)
 361 .IP
 362 \(bu Functions which operate on the uid/gid space are all permitted since they
 363 act as labels for filesystem objects of proceses
 364 which are partitioned off by other mechanisms.
 365 .PP
 366 These restrictions on root access limit the scope of root processes, enabling
 367 most applications to run un-hindered, but preventing calls that might allow an
 368 application to reach beyond the jail and influence other processes or
 369 system-wide configuration.
 370 .PP
 371 .so implementation.ms
 372 .so mgt.ms
 373 .so future.ms
 374 .NH
 375 Conclusion
 376 .PP
 377 The jail facility provides FreeBSD with a conceptually simple security
 378 partitioning mechanism, allowing the delegation of administrative rights
 379 within virtual machine partitions.
 380 .PP
 381 The implementation relies on
 382 restricting access within the jail environment to a well-defined subset
 383 of the overall host environment.  This includes limiting interaction
 384 between processes, and to files, network resources, and privileged
 385 operations.  Administrative overhead is reduced through avoiding
 386 fine-grained access control mechanisms, and maintaining a consistent
 387 administrative interface across partitions and the host environment.
 388 .PP
 389 The jail facility has already seen widespread deployment in particular as
 390 a vehicle for delivering "virtual private server" services.
 391 .PP
 392 The jail code is included in the base system as part of FreeBSD 4.0-RELEASE,
 393 and fully documented in the jail(2) and jail(8) man-pages.
 394 .bp
 395 .SH
 396 Notes & References
 397 .IP \s-2[BIBA]\s+2 .5i
 398 K. J. Biba, Integrity Considerations for Secure
 399 Computer Systems, USAF Electronic Systems Division, 1977
 400 .IP \s-2[CHROOT]\s+2 .5i
 401 Dr. Marshall Kirk Mckusick, private communication:
 402 ``According to the SCCS logs, the chroot call was added by Bill Joy
 403 on March 18, 1982 approximately 1.5 years before 4.2BSD was released.
 404 That was well before we had ftp servers of any sort (ftp did not
 405 show up in the source tree until January 1983).  My best guess as
 406 to its purpose was to allow Bill to chroot into the /4.2BSD build
 407 directory and build a system using only the files, include files,
 408 etc contained in that tree.  That was the only use of chroot that
 409 I remember from the early days.''
 410 .IP \s-2[LOTTERY1]\s+2 .5i
 411 David Petrou and John Milford. Proportional-Share Scheduling:
 412 Implementation and Evaluation in a Widely-Deployed Operating System,
 413 December 1997.
 414 .nf
 415 \s-2\fChttp://www.cs.cmu.edu/~dpetrou/papers/freebsd_lottery_writeup98.ps\fP\s+2
 416 \s-2\fChttp://www.cs.cmu.edu/~dpetrou/code/freebsd_lottery_code.tar.gz\fP\s+2
 417 .IP \s-2[LOTTERY2]\s+2 .5i
 418 Carl A. Waldspurger and William E. Weihl. Lottery Scheduling: Flexible Proportional-Share Resource Management, Proceedings of the First Symposium on Operating Systems Design and Implementation (OSDI '94), pages 1-11, Monterey, California, November 1994.
 419 .nf
 420 \s-2\fChttp://www.research.digital.com/SRC/personal/caw/papers.html\fP\s+2
 421 .IP \s-2[POSIX1e]\s+2 .5i
 422 Draft Standard for Information Technology \(em
 423 Portable Operating System Interface (POSIX) \(em
 424 Part 1: System Application Program Interface (API) \(em Amendment:
 425 Protection, Audit and Control Interfaces [C Language]
 426 IEEE Std 1003.1e Draft 17 Editor Casey Schaufler
 427 .IP \s-2[ROOT]\s+2 .5i
 428 Historically other names have been used at times, Zilog for instance
 429 called the super-user account ``zeus''.
 430 .IP \s-2[UAS]\s+2 .5i
 431 One such niche product is the ``UAS'' system to maintain and audit
 432 RACF configurations on MVS systems.
 433 .nf
 434 \s-2\fChttp://www.entactinfo.com/products/uas/\fP\s+2
 435 .IP \s-2[UF]\s+2 .5i
 436 Quote from the User-Friendly cartoon by Illiad.
 437 .nf
 438 \s-2\fChttp://www.userfriendly.org/cartoons/archives/98nov/19981111.html\fP\s+2