1 .\" Copyright (C) Caldera International Inc. 2001-2002. All rights reserved.
3 .\" Redistribution and use in source and binary forms, with or without
4 .\" modification, are permitted provided that the following conditions are
7 .\" Redistributions of source code and documentation must retain the above
8 .\" copyright notice, this list of conditions and the following
11 .\" Redistributions in binary form must reproduce the above copyright
12 .\" notice, this list of conditions and the following disclaimer in the
13 .\" documentation and/or other materials provided with the distribution.
15 .\" All advertising materials mentioning features or use of this software
16 .\" must display the following acknowledgement:
18 .\" This product includes software developed or owned by Caldera
19 .\" International, Inc. Neither the name of Caldera International, Inc.
20 .\" nor the names of other contributors may be used to endorse or promote
21 .\" products derived from this software without specific prior written
24 .\" USE OF THE SOFTWARE PROVIDED FOR UNDER THIS LICENSE BY CALDERA
25 .\" INTERNATIONAL, INC. AND CONTRIBUTORS ``AS IS'' AND ANY EXPRESS OR
26 .\" IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
27 .\" WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
28 .\" DISCLAIMED. IN NO EVENT SHALL CALDERA INTERNATIONAL, INC. BE LIABLE
29 .\" FOR ANY DIRECT, INDIRECT INCIDENTAL, SPECIAL, EXEMPLARY, OR
30 .\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
31 .\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
32 .\" BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
33 .\" WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE
34 .\" OR OTHERWISE) RISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN
35 .\" IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
37 .\" @(#)ss4 8.1 (Berkeley) 6/8/93
41 4: How the Parser Works
43 Yacc turns the specification file into a C program, which
44 parses the input according to the specification given.
45 The algorithm used to go from the
46 specification to the parser is complex, and will not be discussed
48 the references for more information).
49 The parser itself, however, is relatively simple,
50 and understanding how it works, while
51 not strictly necessary, will nevertheless make
52 treatment of error recovery and ambiguities much more
55 The parser produced by Yacc consists
56 of a finite state machine with a stack.
57 The parser is also capable of reading and remembering the next
58 input token (called the
63 is always the one on the top of the stack.
64 The states of the finite state machine are given
65 small integer labels; initially, the machine is in state 0,
66 the stack contains only state 0, and no lookahead token has been read.
68 The machine has only four actions available to it, called
74 A move of the parser is done as follows:
76 Based on its current state, the parser decides
77 whether it needs a lookahead token to decide
78 what action should be done; if it needs one, and does
79 not have one, it calls
81 to obtain the next token.
83 Using the current state, and the lookahead token
84 if needed, the parser decides on its next action, and
86 This may result in states being pushed onto the stack, or popped off of
87 the stack, and in the lookahead token being processed
92 action is the most common action the parser takes.
93 Whenever a shift action is taken, there is always
95 For example, in state 56 there may be
100 which says, in state 56, if the lookahead token is IF,
101 the current state (56) is pushed down on the stack,
102 and state 34 becomes the current state (on the
104 The lookahead token is cleared.
108 action keeps the stack from growing without
110 Reduce actions are appropriate when the parser has seen
111 the right hand side of a grammar rule,
112 and is prepared to announce that it has seen
113 an instance of the rule, replacing the right hand side
114 by the left hand side.
115 It may be necessary to consult the lookahead token
116 to decide whether to reduce, but
117 usually it is not; in fact, the default
118 action (represented by a ``.'') is often a reduce action.
120 Reduce actions are associated with individual grammar rules.
121 Grammar rules are also given small integer
122 numbers, leading to some confusion.
137 Suppose the rule being reduced is
141 The reduce action depends on the
142 left hand symbol (A in this case), and the number of
143 symbols on the right hand side (three in this case).
144 To reduce, first pop off the top three states
146 (In general, the number of states popped equals the number of symbols on the
147 right side of the rule).
148 In effect, these states were the ones
149 put on the stack while recognizing
154 and no longer serve any useful purpose.
155 After popping these states, a state is uncovered
156 which was the state the parser was in before beginning to
158 Using this uncovered state, and the symbol
159 on the left side of the rule, perform what is in
161 A new state is obtained, pushed onto the stack, and parsing continues.
162 There are significant differences between the processing of
163 the left hand symbol and an ordinary shift of a token,
164 however, so this action is called a
167 In particular, the lookahead token is cleared by a shift, and
168 is not affected by a goto.
169 In any case, the uncovered state contains an entry such as:
173 causing state 20 to be pushed
174 onto the stack, and become the current state.
176 In effect, the reduce action ``turns back the clock'' in the parse,
177 popping the states off the stack to go back to the
178 state where the right hand side of the rule was first seen.
179 The parser then behaves as if it had seen the left side at that time.
180 If the right hand side of the rule is empty,
181 no states are popped off of the stack: the uncovered state
182 is in fact the current state.
184 The reduce action is also important in the treatment of user-supplied
186 When a rule is reduced, the code supplied with the rule is executed
187 before the stack is adjusted.
188 In addition to the stack holding the states, another stack,
189 running in parallel with it, holds the values returned
190 from the lexical analyzer and the actions.
191 When a shift takes place, the external variable
193 is copied onto the value stack.
194 After the return from the user code, the reduction is carried out.
197 action is done, the external variable
199 is copied onto the value stack.
200 The pseudo-variables $1, $2, etc., refer to the value stack.
202 The other two parser actions are conceptually much simpler.
205 action indicates that the entire input has been seen and
206 that it matches the specification.
207 This action appears only when the lookahead token is
208 the endmarker, and indicates that the parser has successfully
212 action, on the other hand, represents a place where the parser
213 can no longer continue parsing according to the specification.
214 The input tokens it has seen, together with the lookahead token,
215 cannot be followed by anything that would result
217 The parser reports an error, and attempts to recover the situation and
218 resume parsing: the error recovery (as opposed to the detection of error)
219 will be covered in Section 7.
221 It is time for an example!
222 Consider the specification
224 %token DING DONG DELL
234 When Yacc is invoked with the
236 option, a file called
238 is produced, with a human-readable description of the parser.
241 file corresponding to the above grammar (with some statistics
242 stripped off the end) is:
245 $accept : \_rhyme $end
254 $accept : rhyme\_$end
274 rhyme : sound place\_ (1)
284 sound : DING DONG\_ (2)
288 Notice that, in addition to the actions for each state, there is a
289 description of the parsing rules being processed in each
290 state. The \_ character is used to indicate
291 what has been seen, and what is yet to come, in each rule.
296 It is instructive to follow the steps of the parser while
297 processing this input.
299 Initially, the current state is state 0.
300 The parser needs to refer to the input in order to decide
301 between the actions available in state 0, so
304 is read, becoming the lookahead token.
305 The action in state 0 on
308 is ``shift 3'', so state 3 is pushed onto the stack,
309 and the lookahead token is cleared.
310 State 3 becomes the current state.
313 is read, becoming the lookahead token.
314 The action in state 3 on the token
317 so state 6 is pushed onto the stack, and the lookahead is cleared.
318 The stack now contains 0, 3, and 6.
319 In state 6, without even consulting the lookahead,
320 the parser reduces by rule 2.
324 This rule has two symbols on the right hand side, so
325 two states, 6 and 3, are popped off of the stack, uncovering state 0.
326 Consulting the description of state 0, looking for a goto on
331 is obtained; thus state 2 is pushed onto the stack,
332 becoming the current state.
334 In state 2, the next token,
337 The action is ``shift 5'', so state 5 is pushed onto the stack, which now has
338 0, 2, and 5 on it, and the lookahead token is cleared.
339 In state 5, the only action is to reduce by rule 3.
340 This has one symbol on the right hand side, so one state, 5,
341 is popped off, and state 2 is uncovered.
342 The goto in state 2 on
344 the left side of rule 3,
346 Now, the stack contains 0, 2, and 4.
347 In state 4, the only action is to reduce by rule 1.
348 There are two symbols on the right, so the top two states are popped off,
349 uncovering state 0 again.
350 In state 0, there is a goto on
352 causing the parser to enter state 1.
353 In state 1, the input is read; the endmarker is obtained,
354 indicated by ``$end'' in the
357 The action in state 1 when the endmarker is seen is to accept,
358 successfully ending the parse.
360 The reader is urged to consider how the parser works
361 when confronted with such incorrect strings as
362 .I "DING DONG DONG" ,
364 .I "DING DONG DELL DELL" ,
366 A few minutes spend with this and other simple examples will
367 probably be repaid when problems arise in more complicated contexts.