2 NOTE -- This is the original TOUR paper distributed with ash and
3 does not represent the current state of the shell. It is provided anyway
4 since it provides helpful information for how the shell is structured,
5 but be warned that things have changed -- the current shell is
6 still under development.
8 ================================================================
12 Copyright 1989 by Kenneth Almquist.
15 DIRECTORIES: The subdirectory bltin contains commands which can
16 be compiled stand-alone. The rest of the source is in the main
19 SOURCE CODE GENERATORS: Files whose names begin with "mk" are
20 programs that generate source code. A complete list of these
23 program input files generates
24 ------- ----------- ---------
25 mkbuiltins builtins.def builtins.h builtins.c
26 mknodes nodetypes nodes.h nodes.c
27 mksyntax - syntax.h syntax.c
30 There are undoubtedly too many of these.
32 EXCEPTIONS: Code for dealing with exceptions appears in
33 exceptions.c. The C language doesn't include exception handling,
34 so I implement it using setjmp and longjmp. The global variable
35 exception contains the type of exception. EXERROR is raised by
36 calling error or errorwithstatus. EXINT is an interrupt.
38 INTERRUPTS: In an interactive shell, an interrupt will cause an
39 EXINT exception to return to the main command loop. (Exception:
40 EXINT is not raised if the user traps interrupts using the trap
41 command.) The INTOFF and INTON macros (defined in exception.h)
42 provide uninterruptible critical sections. Between the execution
43 of INTOFF and the execution of INTON, interrupt signals will be
44 held for later delivery. INTOFF and INTON can be nested.
46 MEMALLOC.C: Memalloc.c defines versions of malloc and realloc
47 which call error when there is no memory left. It also defines a
48 stack oriented memory allocation scheme. Allocating off a stack
49 is probably more efficient than allocation using malloc, but the
50 big advantage is that when an exception occurs all we have to do
51 to free up the memory in use at the time of the exception is to
52 restore the stack pointer. The stack is implemented using a
53 linked list of blocks.
55 STPUTC: If the stack were contiguous, it would be easy to store
56 strings on the stack without knowing in advance how long the
57 string was going to be:
59 *p++ = c; /* repeated as many times as needed */
61 The following three macros (defined in memalloc.h) perform these
62 operations, but grow the stack if you run off the end:
64 STPUTC(c, p); /* repeated as many times as needed */
67 We now start a top-down look at the code:
69 MAIN.C: The main routine performs some initialization, executes
70 the user's profile if necessary, and calls cmdloop. Cmdloop
71 repeatedly parses and executes commands.
73 OPTIONS.C: This file contains the option processing code. It is
74 called from main to parse the shell arguments when the shell is
75 invoked, and it also contains the set builtin. The -i and -m op-
76 tions (the latter turns on job control) require changes in signal
77 handling. The routines setjobctl (in jobs.c) and setinteractive
78 (in trap.c) are called to handle changes to these options.
80 PARSING: The parser code is all in parser.c. A recursive des-
81 cent parser is used. Syntax tables (generated by mksyntax) are
82 used to classify characters during lexical analysis. There are
83 four tables: one for normal use, one for use when inside single
84 quotes and dollar single quotes, one for use when inside double
85 quotes and one for use in arithmetic. The tables are machine
86 dependent because they are indexed by character variables and
87 the range of a char varies from machine to machine.
89 PARSE OUTPUT: The output of the parser consists of a tree of
90 nodes. The various types of nodes are defined in the file node-
93 Nodes of type NARG are used to represent both words and the con-
94 tents of here documents. An early version of ash kept the con-
95 tents of here documents in temporary files, but keeping here do-
96 cuments in memory typically results in significantly better per-
97 formance. It would have been nice to make it an option to use
98 temporary files for here documents, for the benefit of small
99 machines, but the code to keep track of when to delete the tem-
100 porary files was complex and I never fixed all the bugs in it.
101 (AT&T has been maintaining the Bourne shell for more than ten
102 years, and to the best of my knowledge they still haven't gotten
103 it to handle temporary files correctly in obscure cases.)
105 The text field of a NARG structure points to the text of the
106 word. The text consists of ordinary characters and a number of
107 special codes defined in parser.h. The special codes are:
109 CTLVAR Parameter expansion
110 CTLENDVAR End of parameter expansion
111 CTLBACKQ Command substitution
112 CTLBACKQ|CTLQUOTE Command substitution inside double quotes
113 CTLARI Arithmetic expansion
114 CTLENDARI End of arithmetic expansion
115 CTLESC Escape next character
117 A variable substitution contains the following elements:
119 CTLVAR type name '=' [ alternative-text CTLENDVAR ]
121 The type field is a single character specifying the type of sub-
122 stitution. The possible types are:
126 VSMINUS|VSNUL ${var:-text}
128 VSPLUS|VSNUL ${var:+text}
129 VSQUESTION ${var?text}
130 VSQUESTION|VSNUL ${var:?text}
132 VSASSIGN|VSNUL ${var:=text}
133 VSTRIMLEFT ${var#text}
134 VSTRIMLEFTMAX ${var##text}
135 VSTRIMRIGHT ${var%text}
136 VSTRIMRIGHTMAX ${var%%text}
138 VSERROR delayed error
140 In addition, the type field will have the VSQUOTE flag set if the
141 variable is enclosed in double quotes and the VSLINENO flag if
142 LINENO is being expanded (the parameter name is the decimal line
143 number). The parameter's name comes next, terminated by an equals
144 sign. If the type is not VSNORMAL (including when it is VSLENGTH),
145 then the text field in the substitution follows, terminated by a
148 The type VSERROR is used to allow parsing bad substitutions like
149 ${var[7]} and generate an error when they are expanded.
151 Commands in back quotes are parsed and stored in a linked list.
152 The locations of these commands in the string are indicated by
153 CTLBACKQ and CTLBACKQ+CTLQUOTE characters, depending upon whether
154 the back quotes were enclosed in double quotes.
156 Arithmetic expansion starts with CTLARI and ends with CTLENDARI.
158 The character CTLESC escapes the next character, so that in case
159 any of the CTL characters mentioned above appear in the input,
160 they can be passed through transparently. CTLESC is also used to
161 escape '*', '?', '[', and '!' characters which were quoted by the
162 user and thus should not be used for file name generation.
164 CTLESC characters have proved to be particularly tricky to get
165 right. In the case of here documents which are not subject to
166 variable and command substitution, the parser doesn't insert any
167 CTLESC characters to begin with (so the contents of the text
168 field can be written without any processing). Other here docu-
169 ments, and words which are not subject to file name generation,
170 have the CTLESC characters removed during the variable and command
171 substitution phase. Words which are subject to file name
172 generation have the CTLESC characters removed as part of the file
175 EXECUTION: Command execution is handled by the following files:
176 eval.c The top level routines.
177 redir.c Code to handle redirection of input and output.
178 jobs.c Code to handle forking, waiting, and job control.
179 exec.c Code to do path searches and the actual exec sys call.
180 expand.c Code to evaluate arguments.
181 var.c Maintains the variable symbol table. Called from expand.c.
183 EVAL.C: Evaltree recursively executes a parse tree. The exit
184 status is returned in the global variable exitstatus. The alter-
185 native entry evalbackcmd is called to evaluate commands in back
186 quotes. It saves the result in memory if the command is a buil-
187 tin; otherwise it forks off a child to execute the command and
188 connects the standard output of the child to a pipe.
190 JOBS.C: To create a process, you call makejob to return a job
191 structure, and then call forkshell (passing the job structure as
192 an argument) to create the process. Waitforjob waits for a job
193 to complete. These routines take care of process groups if job
196 REDIR.C: Ash allows file descriptors to be redirected and then
197 restored without forking off a child process. This is accom-
198 plished by duplicating the original file descriptors. The redir-
199 tab structure records where the file descriptors have been dupli-
202 EXEC.C: The routine find_command locates a command, and enters
203 the command in the hash table if it is not already there. The
204 third argument specifies whether it is to print an error message
205 if the command is not found. (When a pipeline is set up,
206 find_command is called for all the commands in the pipeline be-
207 fore any forking is done, so to get the commands into the hash
208 table of the parent process. But to make command hashing as
209 transparent as possible, we silently ignore errors at that point
210 and only print error messages if the command cannot be found
213 The routine shellexec is the interface to the exec system call.
215 EXPAND.C: As the routine argstr generates words by parameter
216 expansion, command substitution and arithmetic expansion, it
217 performs word splitting on the result. As each word is output,
218 the routine expandmeta performs file name generation (if enabled).
220 VAR.C: Variables are stored in a hash table. Probably we should
221 switch to extensible hashing. The variable name is stored in the
222 same string as the value (using the format "name=value") so that
223 no string copying is needed to create the environment of a com-
224 mand. Variables which the shell references internally are preal-
225 located so that the shell can reference the values of these vari-
226 ables without doing a lookup.
228 When a program is run, the code in eval.c sticks any environment
229 variables which precede the command (as in "PATH=xxx command") in
230 the variable table as the simplest way to strip duplicates, and
231 then calls "environment" to get the value of the environment.
233 BUILTIN COMMANDS: The procedures for handling these are scat-
234 tered throughout the code, depending on which location appears
235 most appropriate. They can be recognized because their names al-
236 ways end in "cmd". The mapping from names to procedures is
237 specified in the file builtins.def, which is processed by the
240 A builtin command is invoked with argc and argv set up like a
241 normal program. A builtin command is allowed to overwrite its
242 arguments. Builtin routines can call nextopt to do option pars-
243 ing. This is kind of like getopt, but you don't pass argc and
244 argv to it. Builtin routines can also call error. This routine
245 normally terminates the shell (or returns to the main command
246 loop if the shell is interactive), but when called from a non-
247 special builtin command it causes the builtin command to
248 terminate with an exit status of 2.
250 The directory bltins contains commands which can be compiled in-
251 dependently but can also be built into the shell for efficiency
252 reasons. The header file bltin.h takes care of most of the
253 differences between the ash and the stand-alone environment.
254 The user should call the main routine "main", and #define main to
255 be the name of the routine to use when the program is linked into
256 ash. This #define should appear before bltin.h is included;
257 bltin.h will #undef main if the program is to be compiled
258 stand-alone. A similar approach is used for a few utilities from
261 CD.C: This file defines the cd and pwd builtins.
263 SIGNALS: Trap.c implements the trap command. The routine set-
264 signal figures out what action should be taken when a signal is
265 received and invokes the signal system call to set the signal ac-
266 tion appropriately. When a signal that a user has set a trap for
267 is caught, the routine "onsig" sets a flag. The routine dotrap
268 is called at appropriate points to actually handle the signal.
269 When an interrupt is caught and no trap has been set for that
270 signal, the routine "onint" in error.c is called.
272 OUTPUT: Ash uses its own output routines. There are three out-
273 put structures allocated. "Output" represents the standard out-
274 put, "errout" the standard error, and "memout" contains output
275 which is to be stored in memory. This last is used when a buil-
276 tin command appears in backquotes, to allow its output to be col-
277 lected without doing any I/O through the UNIX operating system.
278 The variables out1 and out2 normally point to output and errout,
279 respectively, but they are set to point to memout when appropri-
280 ate inside backquotes.
282 INPUT: The basic input routine is pgetc, which reads from the
283 current input file. There is a stack of input files; the current
284 input file is the top file on this stack. The code allows the
285 input to come from a string rather than a file. (This is for the
286 -c option and the "." and eval builtin commands.) The global
287 variable plinno is saved and restored when files are pushed and
288 popped from the stack. The parser routines store the number of
289 the current line in this variable.
291 DEBUGGING: If DEBUG is defined in shell.h, then the shell will
292 write debugging information to the file $HOME/trace. Most of
293 this is done using the TRACE macro, which takes a set of printf
294 arguments inside two sets of parenthesis. Example:
295 "TRACE(("n=%d0, n))". The double parenthesis are necessary be-
296 cause the preprocessor can't handle functions with a variable
297 number of arguments. Defining DEBUG also causes the shell to
298 generate a core dump if it is sent a quit signal. The tracing