contrib/apr/docs/APRDesign.html

   1 <HTML>
   2 <HEAD><TITLE>APR Design Document</TITLE></HEAD>
   3 <BODY>
   4 <h1>Design of APR</h1>
   5
   6 <p>The Apache Portable Run-time libraries have been designed to provide a common
   7 interface to low level routines across any platform.  The original goal of APR
   8 was to combine all code in Apache to one common code base.  This is not the
   9 correct approach however, so the goal of APR has changed.  There are places
  10 where common code is not a good thing.  For example, how to map requests
  11 to either threads or processes should be platform specific.  APR's place
  12 is now to combine any code that can be safely combined without sacrificing
  13 performance.</p>
  14
  15 <p>To this end we have created a set of operations that are required for cross
  16 platform development.  There may be other types that are desired and those
  17 will be implemented in the future.</p>
  18
  19 <p>This document will discuss the structure of APR, and how best to contribute
  20 code to the effort.</p>
  21
  22 <h2>APR On Windows and Netware</h2>
  23
  24 <p>APR on Windows and Netware is different from APR on all other systems,
  25 because those platforms don't use autoconf. On Unix, apr_private.h (private to
  26 APR) and apr.h (public, used by applications that use APR) are generated by
  27 autoconf from acconfig.h and apr.h.in respectively. On Windows (and Netware),
  28 apr_private.h and apr.h are created from apr_private.hw (apr_private.hwn)
  29 and apr.hw (apr.hwn) respectively.</p>
  30
  31 <p> <strong>
  32         If you add code to acconfig.h or tests to configure.in or aclocal.m4,
  33         please give some thought to whether or not Windows and Netware need
  34         these additions as well.  A general rule of thumb, is that if it is
  35         a feature macro, such as APR_HAS_THREADS, Windows and Netware need it.
  36         In other words, if the definition is going to be used in a public APR
  37         header file, such as apr_general.h, Windows needs it.
  38
  39         The only time it is safe to add a macro or test without also adding
  40         the macro to apr*.h[n]w, is if the macro tells APR how to build.  For
  41         example, a test for a header file does not need to be added to Windows.
  42 </strong></p>
  43
  44 <h2>APR Features</h2>
  45
  46 <p>One of the goals of APR is to provide a common set of features across all
  47 platforms.  This is an admirable goal, it is also not realistic.  We cannot
  48 expect to be able to implement ALL features on ALL platforms.  So we are
  49 going to do the next best thing.  Provide a common interface to ALL APR
  50 features on MOST platforms.</p>
  51
  52 <p>APR developers should create FEATURE MACROS for any feature that is not
  53 available on ALL platforms.  This should be a simple definition which has
  54 the form:</p>
  55
  56 <code>APR_HAS_FEATURE</code>
  57
  58 <p>This macro should evaluate to true if APR has this feature on this platform.
  59 For example, Linux and Windows have mmap'ed files, and APR is providing an
  60 interface for mmapp'ing a file.  On both Linux and Windows, APR_HAS_MMAP
  61 should evaluate to one, and the ap_mmap_* functions should map files into
  62 memory and return the appropriate status codes.</p>
  63
  64 <p>If your OS of choice does not have mmap'ed files, APR_HAS_MMAP should
  65 evaluate to zero, and all ap_mmap_* functions should not be defined.  The
  66 second step is a precaution that will allow us to break at compile time if a
  67 programmer tries to use unsupported functions.</p>
  68
  69 <h2>APR types</h2>
  70
  71 <p>The base types in APR</p>
  72
  73 <ul>
  74 <li>dso<br>
  75         Shared library routines
  76 <li>mmap<br>
  77         Memory-mapped files
  78 <li>poll<br>
  79         Polling I/O
  80 <li>time<br>
  81         Time
  82 <li>user<br>
  83         Users and groups
  84 <li>locks<br>
  85         Process and thread locks (critical sections)
  86 <li>shmem<br>
  87         Shared memory
  88 <li>file_io<br>
  89         File I/O, including pipes
  90 <li>atomic<br>
  91         Atomic integer operations
  92 <li>strings<br>
  93         String handling routines
  94 <li>memory<br>
  95         Pool-based memory allocation
  96 <li>passwd<br>
  97         Reading passwords from the terminal
  98 <li>tables<br>
  99         Tables and hashes
 100 <li>network_io<br>
 101         Network I/O
 102 <li>threadproc<br>
 103         Threads and processes
 104 <li>misc<br>
 105         Any APR type which doesn't have any other place to belong.  This
 106         should be used sparingly.
 107 <li>support<br>
 108         Functions meant to be used across multiple APR types.  This area
 109         is for internal functions only.  If a function is exposed, it should
 110         not be put here.
 111 </ul>
 112
 113 <h2>Directory Structure</h2>
 114
 115 <p>Each type has a base directory.  Inside this base directory, are
 116 subdirectories, which contain the actual code.  These subdirectories are named
 117 after the platforms the are compiled on.  Unix is also used as a common
 118 directory.  If the code you are writing is POSIX based, you should look at the
 119 code in the unix directory.  A good rule of thumb, is that if more than half
 120 your code needs to be ifdef'ed out, and the structures required for your code
 121 are substantively different from the POSIX code, you should create a new
 122 directory.</p>
 123
 124 <p>Currently, the APR code is written for Unix, BeOS, Windows, and OS/2.  An
 125 example of the directory structure is the file I/O directory:</p>
 126
 127 <pre>
 128 apr
 129   |
 130    ->  file_io
 131           |
 132            -> unix            The Unix and common base code
 133           |
 134            -> win32           The Windows code
 135           |
 136            -> os2             The OS/2 code
 137 </pre>
 138
 139 <p>Obviously, BeOS does not have a directory.  This is because BeOS is currently
 140 using the Unix directory for it's file_io.</p>
 141
 142 <p>There are a few special top level directories.  These are test and include.
 143 Test is a directory which stores all test programs.  It is expected
 144 that if a new type is developed, there will also be a new test program, to
 145 help people port this new type to different platforms.  A small document
 146 describing how to create new tests that integrate with the test suite can be
 147 found in the test/ directory.  Include is a directory which stores all
 148 required APR header files for external use.</p>
 149
 150 <h2>Creating an APR Type</h2>
 151
 152 <p>The current design of APR requires that most APR types be incomplete.
 153 It is not possible to write flexible portable code if programs can access
 154 the internals of APR types.  This is because different platforms are
 155 likely to define different native types.  There are only two execptions to
 156 this rule:</p>
 157
 158 <ul>
 159 <li>The first exception to this rule is if the type can only reasonably be
 160 implemented one way.  For example, time is a complete type because there
 161 is only one reasonable time implementation.
 162
 163 <li>The second exception to the incomplete type rule can be found in
 164 apr_portable.h.  This file defines the native types for each platform.
 165 Using these types, it is possible to extract native types for any APR type.</p>
 166 </ul>
 167
 168 <p>For this reason, each platform defines a structure in their own directories.
 169 Those structures are then typedef'ed in an external header file.  For example
 170 in file_io/unix/fileio.h:</p>
 171
 172 <pre>
 173     struct ap_file_t {
 174         apr_pool_t *cntxt;
 175         int filedes;
 176         FILE *filehand;
 177         ...
 178     }
 179 </pre>
 180
 181 <p>In include/apr_file_io.h:</p>
 182     </pre>
 183     typedef struct ap_file_t    ap_file_t;
 184     </pre>
 185
 186 <p> This will cause a compiler error if somebody tries to access the filedes
 187 field in this structure.  Windows does not have a filedes field, so obviously,
 188 it is important that programs not be able to access these.</p>
 189
 190 <p>You may notice the apr_pool_t field.  Most APR types have this field.  This
 191 type is used to allocate memory within APR.  Because every APR type has a pool,
 192 any APR function can allocate memory if it needs to.  This is very important
 193 and it is one of the reasons that APR works.  If you create a new type, you
 194 must add a pool to it.  If you do not, then all functions that operate on that
 195 type will need a pool argument.</p>
 196
 197 <h2>New Function</h2>
 198
 199 <p>When creating a new function, please try to adhere to these rules.</p>
 200
 201 <ul>
 202 <li>  Result arguments should be the first arguments.
 203 <li>  If a function needs a pool, it should be the last argument.
 204 <li>  These rules are flexible, especially if it makes the code easier
 205       to understand because it mimics a standard function.
 206 </ul>
 207
 208 <h2>Documentation</h2>
 209
 210 <p>Whenever a new function is added to APR, it MUST be documented.  New
 211 functions will not be committed unless there are docs to go along with them.
 212 The documentation should be a comment block above the function in the header
 213 file.</p>
 214
 215 <p>The format for the comment block is:</p>
 216
 217 <pre>
 218     /**
 219      * Brief description of the function
 220      * @param parma_1_name explanation
 221      * @param parma_2_name explanation
 222      * @param parma_n_name explanation
 223      * @tip Any extra information people should know.
 224      * @deffunc function prototype if required
 225      */
 226 </pre>
 227
 228 <p>For an actual example, look at any file in the include directory.  The
 229 reason the docs are in the header files is to ensure that the docs always
 230 reflect the current code.  If you change paramters or return values for a
 231 function, please be sure to update the documentation.</p>
 232
 233 <h2>APR Error reporting</h2>
 234
 235 <p>Most APR functions should return an ap_status_t type.  The only time an
 236 APR function does not return an ap_status_t is if it absolutely CAN NOT
 237 fail.  Examples of this would be filling out an array when you know you are
 238 not beyond the array's range.  If it cannot fail on your platform, but it
 239 could conceivably fail on another platform, it should return an ap_status_t.
 240 Unless you are sure, return an ap_status_t.</p>
 241
 242 <strong>
 243         This includes functions that return TRUE/FALSE values.  How that
 244         is handled is discussed below
 245 </strong>
 246
 247 <p>All platforms return errno values unchanged.  Each platform can also have
 248 one system error type, which can be returned after an offset is added.
 249 There are five types of error values in APR, each with it's own offset.</p>
 250
 251 <!--  This should be turned into a table, but I am lazy today -->
 252 <pre>
 253     Name                        Purpose
 254 0)                      This is 0 for all platforms and isn't really defined
 255                         anywhere, but it is the offset for errno values.
 256                         (This has no name because it isn't actually defined,
 257                         but for completeness we are discussing it here).
 258
 259 1) APR_OS_START_ERROR   This is platform dependent, and is the offset at which
 260                         APR errors start to be defined.  Error values are
 261                         defined as anything which caused the APR function to
 262                         fail.  APR errors in this range should be named
 263                         APR_E* (i.e. APR_ENOSOCKET)
 264
 265 2) APR_OS_START_STATUS  This is platform dependent, and is the offset at which
 266                         APR status values start.  Status values do not indicate
 267                         success or failure, and should be returned if
 268                         APR_SUCCESS does not make sense.  APR status codes in
 269                         this range should be name APR_* (i.e. APR_DETACH)
 270
 271 4) APR_OS_START_USEERR  This is platform dependent, and is the offset at which
 272                         APR apps can begin to add their own error codes.
 273
 274 3) APR_OS_START_SYSERR  This is platform dependent, and is the offset at which
 275                         system error values begin.
 276 </pre>
 277
 278 <strong>The difference in naming between APR_OS_START_ERROR and
 279 APR_OS_START_STATUS mentioned above allows programmers to easily determine if
 280 the error code indicates an error condition or a status codition.</strong>
 281
 282 <p>If your function has multiple return codes that all indicate success, but
 283 with different results, or if your function can only return PASS/FAIL, you
 284 should still return an apr_status_t.  In the first case, define one
 285 APR status code for each return value, an example of this is
 286 <code>apr_proc_wait</code>, which can only return APR_CHILDDONE,
 287 APR_CHILDNOTDONE, or an error code.  In the second case, please return
 288 APR_SUCCESS for PASS, and define a new APR status code for failure, an
 289 example of this is <code>apr_compare_users</code>, which can only return
 290 APR_SUCCESS, APR_EMISMATCH, or an error code.</p>
 291
 292 <p>All of these definitions can be found in apr_errno.h for all platforms.  When
 293 an error occurs in an APR function, the function must return an error code.
 294 If the error occurred in a system call and that system call uses errno to
 295 report an error, then the code is returned unchanged.  For example: </p>
 296
 297 <pre>
 298     if (open(fname, oflags, 0777) < 0)
 299         return errno;
 300 </pre>
 301
 302 <p>The next place an error can occur is a system call that uses some error value
 303 other than the primary error value on a platform.  This can also be handled
 304 by APR applications.  For example:</p>
 305
 306 <pre>
 307     if (CreateFile(fname, oflags, sharemod, NULL,
 308                    createflags, attributes, 0) == INVALID_HANDLE_VALUE
 309         return (GetLAstError() + APR_OS_START_SYSERR);
 310 </pre>
 311
 312 <p>These two examples implement the same function for two different platforms.
 313 Obviously even if the underlying problem is the same on both platforms, this
 314 will result in two different error codes being returned.  This is OKAY, and
 315 is correct for APR.  APR relies on the fact that most of the time an error
 316 occurs, the program logs the error and continues, it does not try to
 317 programatically solve the problem.  This does not mean we have not provided
 318 support for programmatically solving the problem, it just isn't the default
 319 case.  We'll get to how this problem is solved in a little while.</p>
 320
 321 <p>If the error occurs in an APR function but it is not due to a system call,
 322 but it is actually an APR error or just a status code from APR, then the
 323 appropriate code should be returned.  These codes are defined in apr_errno.h
 324 and should be self explanatory.</p>
 325
 326 <p>No APR code should ever return a code between APR_OS_START_USEERR and
 327 APR_OS_START_SYSERR, those codes are reserved for APR applications.</p>
 328
 329 <p>To programmatically correct an error in a running application, the error
 330 codes need to be consistent across platforms.  This should make sense.  APR
 331 has provided macros to test for status code equivalency.  For example, to
 332 determine if the code that you received from the APR function means EOF, you
 333 would use the macro APR_STATUS_IS_EOF().</p>
 334
 335 <p>Why did APR take this approach?  There are two ways to deal with error
 336 codes portably.</p>
 337
 338 <ol type=1>
 339 <li>  Return the same error code across all platforms.
 340 <li>  Return platform specific error codes and convert them when necessary.
 341 </ol>
 342
 343 <p>The problem with option number one is that it takes time to convert error
 344 codes to a common code, and most of the time programs want to just output
 345 an error string.  If we convert all errors to a common subset, we have four
 346 steps to output an error string:</p>
 347
 348 <p>The seocnd problem with option 1, is that it is a lossy conversion.  For
 349 example, Windows and OS/2 have a couple hundred error codes, but POSIX errno
 350 only defines about 50 errno values.  This means that if we convert to a
 351 canonical error value immediately, there is no way for the programmer to
 352 get the actual system error.</p>
 353
 354 <pre>
 355     make syscall that fails
 356         convert to common error code                 step 1
 357         return common error code
 358             check for success
 359             call error output function               step 2
 360                 convert back to system error         step 3
 361                 output error string                  step 4
 362 </pre>
 363
 364 <p>By keeping the errors platform specific, we can output error strings in two
 365 steps.</p>
 366
 367 <pre>
 368     make syscall that fails
 369         return error code
 370             check for success
 371             call error output function               step 1
 372                 output error string                  step 2
 373 </pre>
 374
 375 <p>Less often, programs change their execution based on what error was returned.
 376 This is no more expensive using option 2 than it is using option 1, but we
 377 put the onus of converting the error code on the programmer themselves.
 378 For example, using option 1:</p>
 379
 380 <pre>
 381     make syscall that fails
 382         convert to common error code
 383         return common error code
 384             decide execution based on common error code
 385 </pre>
 386
 387 <p>Using option 2:</p>
 388
 389 <pre>
 390     make syscall that fails
 391         return error code
 392             convert to common error code (using ap_canonical_error)
 393             decide execution based on common error code
 394 </pre>
 395
 396 <p>Finally, there is one more operation on error codes.  You can get a string
 397 that explains in human readable form what has happened.  To do this using
 398 APR, call ap_strerror().</p>
 399