Yacc - A parser generator


Yacc - A parser generator

Why Yacc?

It is possible to create a simple parser using Lex alone. by making extensive use of the user-defined states (ie start-conditions). However, such a parser quickly becomes unmaintainable, as the number of user-defined states tends to explode.

Once our input file syntax contains complex structures, such as "balanced" brackets, or contains elements which are context-sensitive, we should be considering yacc.

"Context-sensitive" in this case means that a word or symbol can have different interpretations, depending on where it appears in the input language. For example in C, the '*' character is used for both multiplication, and to specify indirection (ie to dereference a pointer to a piece of memory). It's meaning is "context-sensitive".

The Yacc Specification

Like lex, yacc has it's own specification language. A yacc specification is structured along the same lines as a Lex specification.

   /* C declarations and includes */
   /* Yacc token and type declarations */
   /* Yacc Specification
      in the form of grammer rules like this:
   symbol    :    symbols tokens
                      { $$ = my_c_code($1); }
   /* C language program (the rest) */

The Yacc Specification rules are the place where you "glue" the various tokens together that lex has conviniently provided to you.

Each grammar rule defines a symbol in terms of:

Each rule can have an associated action, which is executed after all the component symbols of the rule have been parsed. Actions are basically C-program statements surrounded by curly braces.

A Yacc Example - olvwm2dtwmrc menu converter

Instead of going into a detailed discussion of the yacc syntax, I'll introduce the concepts one-by-one, by building an example program.

Olvwm Menu Syntax

Our example program will read an olvwm menu file, with the intent of afterwards writing out an equivalent menu, for a different window manager.

Let's review a typical openwin-menu file:

# comments
Workspace		TITLE
Editors		MENU	$OPENWINHOME/lib/openwin-menu-e
Properties		PROPERTIES
Netscape		exec netscape
</usr/X11R6/icons/mini.netscape.xpm> exec netscape
"Long command"		exec netscape \
"Meditation"    MENU
        "Coffee"        TITLE
        "drift"         exec xlock -remote -nolock -mode drift
        "flame"         exec xlock -remote -nolock -mode flame
"Meditation"    END PIN
"Optional Menu"	INCLUDE maybe-existant-menu
"Exit"		EXIT

Most entries are single-line entries, begining with a label or icon filename. We also have various keywords to take into account, plus the more complex structure of a sub-menu.

The Lexer

The lexer will provide us with the following tokens:

The attached file, olmenu.l contains a suitable Lex specification.

The key aspects of the lexer are:

The EXEC token is constructed using the exclusive start-condition CMD together with yymore() These rules allow the command to be extended across multiple lines, if the line(s) end in a backslash. Using yymore() in this fashion ensures that our arbitrarily long command-string does not override the LABEL or keyword tokens by virtue of lex's "longest match" rule.

The last newline after the EXEC token is not appended to the command string, but returned separately. Note that the rules
    <CMD>\\\n  { ... yymore(); }
    <CMD>.$    { ... return(EXEC); }
are treated as being the same length, so it is important that they appear in correct order.

Please ignore the variables yylval and yylloc for now. Their meaning will only become clear after we've started looking at yacc in detail.

Likewise, the lex rules associated with the start-conditions ENV, ENV1, ENV2 are not part of the basic scanner, and will be covered later.

At this point, it's probably a good idea to test the lexer "stand-alone". However, there's still one thing missing: tokens.


We have included a lot of return(XXX) statements, but have not defined the tokens in the brackets. A token is simply a unique integer which yacc associates with an item that the lexer has found. Within yacc, all tokens must be declared in the "Yacc Declarations" section of our yacc specification, like this:
%token	TITLE
%token	MENU

Yacc will choose a suitable integer values (>=256) for lex to use. (see also: "Token Types") Bison will also let you choose a value manually, like this:

%token	END 999

Normally, we let the parse choose values, and that's one less thing to worry about.

Part of the lex-yacc integration is that yacc will generate a suitable set of token-definitions for lex to use. Yacc does so by generating a file y.tab.h (bison generates basename.tab.h ) which contains the token-definitions and looks like this:

#define TITLE   258
#define MENU    259

This also keeps the lex-yacc communications automatically in step with each other.

We don't need the whole yacc specification to do this. We can get by on a something like the file olmenu_tokens.y

Testing the Lexer

Now we are ready to test our lexer, stand-alone, like this:

	bison -d olmenu_tokens.y
	mv olmenu_tokens.tab.h olmenu.tab.h
	flex -d -t olmenu.l > olmenu_lex_test.c
	cc -g -DTEST_LEX olmenu_lex_test.c  -o olmenu_lex_test -ll
	./olmenu_lex_test < /usr/X11R6/lib/openwin-menu |& more

The -d option to lex turns on the "debug" mode. This writes a message to the stderr output for each rule which has been matched.

We are specifically looking for:

See the section "Lexical" in the Bison documentation for more information.

Yacc Rules (...OK)

Yacc Grammar Rules

Simple Rule

Yacc rules define what is a legal sequence of tokens in our specification language. In our case, lets look at the rule for a simple, executable menu-command:

menu_item	:	LABEL  EXEC

This rule defines a non-terminal symbol, menu_item in terms of the two tokens LABEL and EXEC. Tokens are also known as "terminal symbols", because the parser does not need to expand them any further. Conversely, menu_item is a "non-terminal symbol" because it can be expanded into LABEL and EXEC.

You may notice that I'm using UPPER CASE for terminal symbols (tokens), and lower-case for non-terminal symbols. This is not a strict requirement of yacc, but just a convention that has been established. We will follow this convention throughout our discussion.

Alternate Rules

We've just hit our first complication: Any given menu-item may also have the keyword DEFAULT appear between the label and the executable command. Yacc allows us to have, multiple alternate definitions of menu_item, like this:

menu_item	:	LABEL  EXEC

Note that the colon (:) semi-colon (;) and or-symbol (|) are part of the yacc syntax - they are not part of our menu-file definition. All yacc rules follow the basic syntax shown above and must end in a semi-colon. We've put the semi-colon on the next line for clarity, so that it does not get confused with our syntax-definitions. This is not a strict requirement, either, but another convention of style that we will adhere to.

Note also that the word DEFAULT appears litterally, not because it is a keyword in our input-language, but because we have defined a %token called DEFAULT, and the lexer returns this token when it finds a certain piece of text.

Litteral Characters in a Rule

There is a way to include litteral text within a rule, but it requires that the lexer pass the characters to the parser one-by-one, as tokens.

For example, remember that menu-items may have an icon-file instead of a label, like this:

</usr/X11R6/icons/mini.netscape.xpm> exec netscape

When our lexer encounters a < or > it returns the character as a token

We can include litteral characters in a grammar rule, like this:

menu_item	:	LABEL  EXEC  '\n'
		|	'<' LABEL  '>'  EXEC  '\n'

Where the second form of the menu_item is a used when specifying an icon-file instead of a text-label.

This explains why yacc allocates token-numbers starting at >255. Because the values 1-255 are reserved for litteral characters (remember 0 is reserved for end-of-file indication).

Recursive Rules

So far, we've defined a single menu-item, whereas our menu-file may contain any number of such menu-items. Yacc handles this allowing recursive rules, like this:

menu_items	:	menu_item
		|	menu_items menu_item

By defining menu_items in terms of itself, we now have a rule which means "one or more menu items".

Note that we could also have written our recursive definition the other way round, as:

		|	menu_item menu_items

but, due to the internals of yacc, this builds a less memory-efficient parser. Refer to the section "Recursion" in the Yacc/Bison documentation for the reasons behind this.

Empty Rules

Referring back to the rule for a single menu_item, there is another way we could accomodate the optional DEFAULT keyword; by defining an empty rule, like this:

menu_item	:	LABEL  default  EXEC  '\n'
default		:	/* empty */

The comment /* empty */ is ignored by yacc, and can be omitted, but again, it is conventional to include it for any empty rules.

Strange as it may seem, the absence of the keyword DEFAULT is also a valid rule! Yacc acknowledges the empty rule for "default" when it sees it's current look-ahead token is EXEC, and not DEFAULT. See the section "Look-Ahead" in the Bison documentation for more information about "look-ahead".

To understand why this 2nd approach might be considered better than our earlier one, we need to explore Yacc Actions.

Yacc Actions

Actions within yacc rules take the form:

menu_item	:	LABEL  EXEC  '\n'
	{ C-program statements }

So far, we have only considered the tokens LABEL and EXEC as single-valued integers which are passed from the lexer to the parser. What we really need, is access to the text-strings associated with these tokens (ie their semantic value).

We could do this using a global variable (like token_txt in our spam-checking program), except that yacc executes the action after it has read all the tokens up to that point. Hence the string value for EXEC would overwrite the one for LABEL before we had a chance to use it. We could use seperate global variables for the LABEL and EXEC strings, but this won't always work, because sometimes yacc has to read a token in advance before it can decide which rule to use.

Consider the MENU keyword, in our case. Yacc has to check whether it is followed by another string or a newline, before it can decide whether it is being used to introduce a sub-menu within the same file, or an external menu-file.

In any case, yacc provides a formal method for dealing with the semanitic value of tokens. It begins with the lexer. Every time the lexer returns a value, it should also set the external variable yylval to the value of the token. Yacc will then retain the association between the token and the corresonding value of yylval.

In order to accomodate a variety of different token-types, yylval is declared as a union of different types.

Token Types

Token types are declared in yacc using the yacc declaration %union, like this

%union {
	char *str;
	int  num;

This defines yylval as being a union of the types (char*) and (int). This is a classical C-program union, so any number of types may be defined, and the union may even contain struct types, etc. For now, we'll just have these two types.

We also need to tell yacc which type is associated with which token. This is done by modifying our %token declarations to include a type, like this:

%token <str> LABEL
%token <str> EXEC
%token <num> INT

We do not need to modify any other %token declarations, because they are all for keywords, which do not require any associated value.

Now we need to modify the lexer. This is done by including the line:

	yylval.str = strdup(yytext);

just before the lexer returns the LABEL and EXEC tokens. We'll also include the line:

	yylval.num = atoi(yytext);

just before the lexer returns the INT token.

Using Token Types in Yacc Actions

Now that we have the token value, we want to make use of it. Yacc lets us refer to the value of a given token using a syntax similar to that of awk and perl. $1 is the value of the 1st token, $2 is the 2nd, and so on. Here is a typical example of an action:

menu_item:	LABEL  EXEC
	    { itemptr = new_item(); itemptr->label=$1;
	      itemptr->command=$2; }


Let's consider what happens when we want to accomodate the DEFAULT keyword. We'll assume that the structure itemptr contains a itemptr->default variable for storing a simple indicator of whether the DEFAULT keyword was used or not.

If we use our first approach, we would have:

menu_item:	LABEL  EXEC
	    { itemptr = new_item(); itemptr->label=$1;
	      itemptr->command=$2;   itemptr->default=0; }
	    { itemptr = new_item(); itemptr->label=$1;
	      itemptr->command=$3;   itemptr->default=1; }

Which is OK, but there's a lot of almost-identical code in the two actions. So let's try it using the 2nd approach we used above.

menu_item:	LABEL  default  EXEC
	{ itemptr = new_item(); itemptr->label=$1;
	  itemptr->command=$3; }
default	:	/* empty */
	    { itemptr->default=0; /*segv*/ }
	    { itemptr->default=1; /*segv*/ }

This is nicer, because we've removed the repetition in our actions. However, I've added the comment /*segv*/, because that's exactly what we would get. This is because of the way yacc works. Here's what would happen:

  1. Yacc gets the token LABEL, and, since it doesn't have the complete rule yet, it just "saves it for later" (by pushing it onto a stack).
  2. Yacc sees the non-terminal symbol default, and starts processing the rule for it.
  3. The default rule is simply: DEFAULT, or nothing. There are only two possibilties:
    the next token is, say, EXEC.
    It's not DEFAULT, but that's OK (syntactically), because empty is a valid rule in this case. Yacc saves the EXEC token for later. Since the current rule (empty) is both valid and complete, yacc executes the action for it.
    Let's assume that the next token is DEFAULT.
    This is all that we need to complete the rule default, so yacc executes the action for it.
    In either case, we execute an action in the default rule.
  4. Now we get to our EXEC token. We have all the elements of our menu_item rule (including the non-terminal default rule), so we can now execute the action for that.
    But WAIT! We're just about to allocate itemptr using the function new_item(), but we've already used itemptr in an action for the previous rule, default. It's too late, we've already crashed.

Yacc provides a simple yet elegant solution to this dilemma, by extending the concept of the "value" of a token to non-terminal symbols, like default.

First, we have to declare the default in our Yacc Declarations section, like this:

%type <num> default

Note that this is the same approach as we used for %token definitions. We can even use types which are not used by the lexer, but we must add them to our %union declaration.

We assign the value of the left-hand-side of the rule by assigning a value to $$.

Our complete rule-set for menu_item, using this approach would be:

menu_item:	LABEL  default  EXEC
	{ itemptr = new_item(); itemptr->label=$1;
	  itemptr->default=$2; itemptr->command=$3; }
default	:	/* empty */
	    { $$=0; }
	    { $$=1; }

The storage space for our $$ is just another value attached to a token, and this is handled automatically by yacc. So no crash. And we've removed the unwanted duplication of code in our actions.

Type-casting tokens

Just as you might do with C-program variables, it is possible to typecast a value which is being accessed via the $ mechanism, and that by writing it as $<type>n instead of $n, where type is a member of our %union declaration. For example:

	    { printf("addr: %x\n",  $<num>1  ); }

Normally, you would not need to type-cast token-values in this fashion.

The Yacc Default Action

Until now, we have assumed that if we don't specify an action, yacc does nothing. This is not true. In the absence of and explicit action, yacc applies a default action of:

	    { $$=$1; }

Or, put simply, the left-hand-side inherets the value from the 1st symbol on the right-hand-side. (or inherets the "absence of a value", as the case may be).

This can be a problem if the two symbols have different type: yacc will complain about an
    error: type clash (`num' `') on default action
or similar. This means that we should be fussy about the way be assign types to symbols in yacc, and take the same care as we would when writing normal C-program code.

In the early stages of development, you can get rid of these errors by adding an explict action, like
    { }
which will override the default action.

The topic of tokens and their associated semantic values is covered in the section "Semantic Values" in the Bison documentation.

Our Prototype Parser

Now that we understand the hows and whys of yacc rules, we can try our hand at writing a basic parser. Before we can successfully compile it, there's a few more things we'll need.

Yacc generates yyparse()

Yacc generates a single function called yyparse(). This function requires no parameters. and returns either a 0 on success, and 1 on failure. "Failure" in the case of the parser means "if it encounters a syntax error".

The yyerror() function

The yyerror() function is called when yacc encounters an invalid syntax. The yyerror() is passed a single string (char*) argument. Unfortunately, this string usually just says "parse error", so on its own, it's pretty useless. Error recovery is an important topic which we will cover in more detail later on. For now, we just want a basic yyerror() function like this:
yyerror(char *err) {
	fprintf(stderr, "%s\n",err);

See the section "Interface" in the Bison documentation for more information.

Debugging Options

In addition to the usual debugging techniques, there are a few of things we can do to assist debugging

Building the Prototype: olmenu-proto1.y

This is prototype parser because, as you may notice, it does not contain any actions. If you compile it, and run it with a suitable openwin-menu file, you get exactly: nothing.

But that's OK for now, as we just want to check that we have understood our input-syntax properly, and that the parser works as expected.

We'll build the prototype using make. I like to use something like this Makefile.

As with any compiled source, you never get it right the first time, so you have to contend with the usual syntax errors which need fixing. In addition, you may also encounter some type clashes, which we mentioned above. Once you are over these hurdles, yacc will generate C-source code that compiles.

Debugging the Prototype

It it more than likely that even though yacc has generated a working parser, you got a couple of messages:

olmenu-proto1.y contains 3 shift/reduce conflicts
olmenu-proto1.y contains 2 reduce/reduce conflicts

A Shift operation is what the parser does when it saves a token for later use. (Actually, it pushes the token onto a stack)

A Reduce operation is what the parser does when resolves a set of tokens into a single, complete rule. (The corresponding tokens are removed from the stack and replaced with a single token representing the rule. The stack has been "reduced").

It is not strictly necessary to eliminate these warnings, as yacc will still generate an operational parser. However, it is important to understand these conflicts, to be sure that the parser we get we wanted, and not the parser we asked for :-).

Both of these warnings mean that there is an ambiguity in our ruleset.

Our prototype parser does not contain any such situations. Refer to the section "Algorithm" in the Bison documentation for examples and detailed information.

Shift/Reduce Conflicts

A shift/reduce conflict occurs when there would be enough tokens shifted (saved) to make up a complete rule, but the next token may allow a longer rule to be applied.

In the event of a shift/reduce conflict, the parser will opt for the shift operation, and hence try to build the longer rule. You can think of this as being analogous to lex's "longest match" rule.

If your grammar has "optional" structures, such as an optional "else" following an "if" statement, then it may not be possible to eliminate all shift/reduce conflicts from the grammar rules.

See the section "Shift/Reduce" in the Bison documentation for further explanation.

Reduce/Reduce Conflicts

A reduce/reduce conflict occurs when the same set of tokens can be used to form two different rules.

In the event of a reduce/reduce conflict, the parser will use the first rule that appears in the grammar. You can think of this as being analogous to lex's "first match" rule.

Reduce/reduce conflicts are usually an indication of an error in the way the grammar rules have been defined, as the whole point of having a grammar is to avoid such blatant ambiguities. It is usually possible (and desirable) to eliminate all reduce/reduce conflicts from your grammar rules, either by rewriting some rules, or redefining the grammar (if possible).

See the section "Reduce/Reduce" in the Bison documentation for further explanation.

Resolving Shift/Reduce and Reduce/Reduce Conflicts

In order to find out which rules are affected by these conflicts, you will need to refer to the *.output file generated by running yacc with the -v flag, for example:

	bison -v olmenu-proto1.y

This will generate the usual *.tab.c file, plus an additional olmenu-proto1.output file. This file will tell you which rules are causing the conflicts mentioned.

Running the Prototype Parser

The prototype parser, olmenu-proto1 should be invoked with the -v option.

	olmenu-proto1 -v /usr/openwin/lib/openwin-menu

This will print out a lot of messages about what tokens were encountered, and which rules they were used in. This information can be instrumental in determining "where the parser went wrong", if you feed it an input file which is known to be correct, but get a "parse error" from the parser.

See the section "Debugging" in the Bison documentation for further explanation.

Getting Fancy

Adding Line-number information

As it stands, when our parser encounters an incorrect syntax, it will simply print the message "parse error" and exit. At the very least, we would like an indication of the line-number at which the error occured. To do this, we will need the co-operation of the lexer, since the parser is often "unaware" of newline characters. More often than not, newlines are not considered significant form the point of view of the grammar.

The lexer must set an additional variable, yylloc, every time it encounters a newline. This variable must be declared in the lexer like this:

	extern YYLTYPE yylloc;

This variable is a structure of type YYLTYPE. We can use any of it's members to store relevant information, but we are not required to use all of them. One is usually enough.

We should also initialise our line-counter. To this end, we can use the lex macro YY_USER_INIT in the lex declarations section:

	#define YY_USER_INIT yylloc.first_line=1;

Lex should increment this variable every time it encounters a newline in the input stream. Be careful of using yyless() and REJECT in your lex actions, because they can confuse your line-counter.

Once we have set up the lexer to provide line-number information, we can use it within any yacc action. We refer to a token's line-number by using @n, in the same way that $n is used to refer to token's value. For example:

	{ fprintf(stderr,"Line %d\n",@1.first_line); }

This requires additional code to be generated by yacc. The appropriate source-code is generated automatically if you make use of the @n notation within a yacc action. (at least, this is true of bison - I'm not sure of how yacc deals with this).

Once we have coaxed yacc into producing the necessary code, we can also use yylloc within yyerror(), as follows:

int yyerror(char *err) {
    fprintf(stderr, "%s around line no %d\n",
	    err,  yylloc.first_line  );

See the section "Token Positions" in the Bison documentation for further explanation.

Error Handling

As it stands, when our parser encounters an incorrect syntax, it will simply print the message "parse error" and exits.

We've just added line-number information, but even this is a bit vague. Also, we may not want our parser to simply give up at the first syntax error, in the same way the C-compiler gives you more than one error message at a single invocation.

Yacc provides a special symbol for handling errors. The symbol is called error and it should appear within a grammar-rule. For example, we could have:

menu_item:   menu_command
	|    option       '\n'
	|    SEPARATOR    '\n'
	|    menu_file    '\n'
	|    submenu      '\n'
	|                 '\n'
	|    error
		{ printf("Invalid menu item at line %d\n",

Now, when the parser encounters an invalid syntax while processing the rule menu_item, it will

Unfortunately, we cannot vouch for the next token that the parser gets, so typically this error rule will generate several error messages.

Yacc is also capable of more sophisticated error-handling. For example, we can tell the parser to discard some of the subsequent tokens, too, simply by putting a token after the error token.

menu_command :    label  default  EXEC          '\n'
             |    label  default  olvwm_cmd     '\n'
             |    label  default  olvwm_cmd_arg '\n'
             |    label  error                  '\n'
		      { printf("Invalid menu item (%s) at line %d is invalid\n",

In this case, the parser

In this case, the rule

	label error '\n'

is not much different to just having

	error '\n'

except that the former lets us make use of the value of label to give a more informative error message.

This approach is often used when there a "balanced" symbols in the syntax. Consider the rule:

label	:   LABEL
	|   '<'  LABEL  '>'
		{ $$=$2; }
	|   '<'  error  '>'
		{ printf("Bad icon-label at line %d\n",

If the parser encounters something other than a LABEL after a '<', it will discard all tokens up to the next '>'. This technique can be useful for keeping brackets balanced during error-recovery.

In our case, keep in mind that any unidentifiable text after a valid LABEL is converted to an EXEC token by the lexer, and the EXEC token would swallow any subsequent '>'. So this rule in unlikely to trap any real errors, except something likekeyword >

In fact, it is likely to work against us, because we may not see another '>' at all, so we will miss out on a lot of input!

Error recovery is a tricky business, and we don't always get the results we really wanted. It doesn't pay to be too fussy about this aspect of the parser.

See the sections "Error Recovery" and "Action Features" in the Bison documentation for further explanations and examples.

Building the 2nd Prototype: olmenu-proto2.y

This prototype contains the neccessary rules and actions for handling errors in a reasonable way. Building it should be as simple as typing:

	make olmenu-proto2

We want to test it with both correct input, and incorrect input. We need to check that it generate meaningful error messages and does not cause any segv errors or the like.

At this stage, our grammar rules should be complete, so so that we can start populating them with actions. Hopefully, we will not need to change them from here on.

Adding Actions

Up till now, we have been concentrating purely on the grammar rules. This has been quite deliberate, because we should get the grammar right first, before we spend too much time writing (and re-writing) our actions.

The basic goal of our parser will be create an in-memory representation of the input data. Or, in other words, a set of structures, linked lists, arrays, etc, which we can use and manipulate. Once we have our memory-representation, we can generate out new menu-file in the CDE format, using conventional techniques and statements such as printf.

We are free to use any technique we like to build up our memory structures, however yacc lends itself particularly well to a specific style of doing so.

Yacc gives us the ability to assign values to non-terminal symbols, as if they were yylval values supplied by the lexer. This allows us to pass values and pointers along through the grammar rules themselves. We can use this feature to build up linked-lists and other structures "on the fly". We looked at this technique once before, in the section titled "Using Token Types in Yacc Actions", where we passed along an integer value representing the presence or absence of the keyword DEFAULT.

Building Memory Structures

Our menu-file format has two basic structures:

If we allow a menu-item to include a pointer to another menu, then the tree-like nature of the menus will be catered for, so we do not need to keep a separate structure to define the menu-tree.

Our target format, the dtwmrc format, does not use "inline" submenus like olvwm, Instead, each menu is refered to by it's name, and it's contents are defined after the end of the current menu. For example:

Menu DtRootMenu
  "Games "             f.menu "Games"
  "Utilities "         f.menu "Utilities"

Menu "Games"
  "xgalaga"            f.exec exec xgalaga
# ... etc

This means that we should keep a linear list of menus, in addition to the menu-tree structure defined by menus and items. So our structures will look like this:

diagram of linked lists


The first thing we are likely to encounter in our menu-file is a menu item. We can tell if it's the first menu-item, because it will be processed by the rule:

menu_items	:	menu_item ;
and not
menu_items	:	menu_items menu_item ;

This makes the first menu_item rule a good place to allocate our struct menu. We do this mainly so we can get access to the variables first_item and last_item

We can "hold onto" our struct menu without resorting to global variables by circulating it within the menu_items rule. So both alternatives of the rule return a struct menu * (though only the first version allocates it).

When there are no more menu items in the current menu or sub-menu, the rule for menu_items is complete. We then store the struct menu * pointer returned by menu_items:

So now our rule for menu_items liooks like this:

menu_items:  menu_item
	         { menuptr=new_menu();
	  |  menu_items menu_item
	         { add_item($1,$2); $$=$1; }


The only other place we would want to call new_menu() is from within the rule for submenu.

submenu	:  label default MENU '\n' menu_items end '\n'
This rule contains the symbol menu_items, which allocates a struct menu and returns it complete with a linked list of menu-items. So there's not much else to do, other than to create an item using new_item() , and use it to store the struct menu pointer which menu_items returns for us.
submenu	:  label default MENU '\n' menu_items end '\n'
		{ itemptr=new_item($1,$2,SUBMENU, $5 );

menu_items / submenu interaction

Our rule for menu_items works properly, and returns the right values in all cases. However, one peculiarity does arise from the way this rule interacts with the submenu rule.

If our input contains a submenu before the first ordinary item, we call new_menu() for the submenu before the parent-menu. This is not strictly a problem, but it would be nicer to have the menu_list in the order-of-appearance. We can fix this by re-writing the rule for menu_items as:

menu_items:  /* empty */
	         { $$=new_menu(); }
	  |  menu_items menu_item
	         { add_item($1,$2); $$=$1; }

The problem goes away, because we now call new_menu() before we process the first menu-item.


Lastly, there is the issue of the options rule. These items are not really true menu items at all, just some options we can set, like title and columns. These have been defined as members within struct menu However, at the time we are parsing the options, we do not have access to the correct struct menu. We could use a global-variable to store the current menu, but it might get tricky to restore the correct menu when we get to the end of a submenu.

It is easier to let the parser do the work, and pass down values in the manner to which we are now accustommed.

We will use the struct menuitem as a vehicle to transfer the TITLE and COLUMNS options. We will then make the add_item() function treat these items as "special", and free the structure when done,

Mid-Rule Actions

We are not going to use mid-rule actions, but I'll mention them anyway, because they can be useful.

Consider the problem of the rule: submenu.

Let's say we defined the rule for menu_items, such that it returned the start of a linked-list of menu-items, instead of a struct menu pointer. We would then link the struct menu_item list to the struct menu in actions for the rules menufile and submenu. The problem arises of: how do we build the linked-list? We could:

Clearly, the last option is the nicest, but we cannot use a simple global variable as our last_item pointer. If we did, we would get to end of the first submenu, and we would want to restore last_item to the last item of the parent menu. But we've already lost this pointer, because we've been using the same global-variable last_item to process the submenu.

So we need to put last_item somewhere other than a global variable. Our struct menu does nicely. Except that our existing rule for a submenu will read all the menu_items before the action for the submenu rule is executed (remembering what happened when processing our keyword DEFAULT ).

submenu : label default MENU '\n' menu_items end '\n'

There is a way we can allocate our struct menu before we start processing menu_items, and that is to use a mid-rule action, like this:

submenu : label default MENU '\n' 
		{ $<mnu>$ = new_menu(); }
	                          menu_items end '\n'
		{ $<mnu>5->first_item=$6; $<mnu>5->pinnable=$7; }

Notice how...

Note that the above example is not intended to be a "working example", it is just illustrative.

See the section "Mid-Rule Actions" in the Bison documentation for further explanations and examples.

Multiple Input Files

In order to generate a complete dtwmrc file, we should really be reading the files nominated in the rule menu_file. It is technically feasible to open a submenu file as soon as we parse the menu_file line. Due to the use of "look-ahead" in both the lexer and parser, it is not simply a matter of changing yyin.

Flex provides a set of functions and macros to handle this, and these are described in the section "Multiple Input Buffers" in the flex man page.

In addition, we should be aware that the parser may also be one token in front, since it uses a "look-ahead" token to decide which rule to apply.

So, while it is possible to change input-streams on-the-fly, it adds another dimension of complexity to our program.

In our case, a much simpler solution is to read a whole file at a time, calling yyparse() once per file. We stitch the new memory-structures in manually. The function read_menu_files() does just this. It:

As luck would have it, menus which are read from other files are just appended to the end of our menu_list, so we don't need to do anything else to our menu_list

We process the menus in the order they appear menu_list, and this list can be appended-to while we are processing the current menu. Hence, this simple solution also caters for files nested several levels deep.

See the section "Pure Calling" in the Bison documentation for further information.

Pure calling is not required in our case, as we are not calling the parser from within the parser. We are calling yyparse() repeatedly, but we allow it to complete before calling it again.

Environment Variables in File Names - Lex to the rescue

The openwin-menu file syntax also allows the filenames of nested menu files to be referred to using environment variables. For example,
could be written
or even

Traditionally, this variable expansion would be done using C-library calls like strspn() and strtok(), or maybe using a page or so of hand-written code.

However, this seems quite tedious after what we've been doing with lex. After all, this kind of thing is what lex is best at.

The GNU lexer, flex, provides several functions to do exactly what we want: scan a string variable, just like it would a text-file.

The details of the flex functions required to scan a string are described in the flex man page.

Our lexer, olmenu.l contains a function expand_env() which uses this feature of lex. It takes the filename string which was parsed earlier, and processes it, using lex to break the string into several components. If the component is an environment variable, the lexer returns the result of the corresponding getenv(). If the component isn't an environment variable, it is returned verbatim.

The function expand_env() calls yylex() repeatedly, and concatenates the components into the desired result string.

In order that our string-scanner does not interfere with our regular text-scanner, we will set an "exclusive" start condition, ENV, before we call yylex().

We need to remember that yylex() returns 0 when it reaches the end-of-input, but we can return any other integer values we like. In this case, we'll return yytext, or the result of a getenv(yytext). Keep in mind that getenv() returns 0 if the variable doesn't exist, so we have to be careful not to return the getenv() result without checking it.

The really good thing about using flex to do this, is the amount of control we have over the resulting string-scanner. For example, if we wanted to allow backslash quoting to suppress the environment-variable substitution, we could change the rule:

<ENV>[^$]+		return((int)yytext);
to become:
<ENV>[^$\\]+		return((int)yytext);
<ENV>\\.		return((int)yytext+1);

Generating the dtwmrc output file

There's nothing really special about the output part of our program. We just traverse our linked lists, and run a lot of fprintf() statements.

Look at the file dtwmrc.c if you are interested.

The Final Version: olmenu.y

Finally, our menu-converter is complete. The finished result, including grammar rules, actions, and support functions like main() yyerror() and yywrap(), is the file olmenu.y.

You can build it using the command:

	make olmenu

Yacc and Make

Make contains implicit rules for building programs from yacc source files. The source file must end in .y for the implicit rules to work.

The default rule for yacc is:

.y.c :
	$(YACC) $(YFLAGS) $<
	mv -f y.tab.c $@

I like to call my parser files something.l and something.y Unfortunately, this introduces a conflict in make's default rules, which would try to generate something.c from both the lex and yacc source. Hence, I like to "brew my own" implicit rules, which generate something_lex.c and something_yacc.c instead.

See the Makefile included with the examples for the details.

Note that there are some incompatibilities at this level between yacc and bison. Yacc likes to call its output files y.tab.c for the parser and y.tab.h for the token definitions. Bison prefers to use basename.tab.c and basename.tab.h (respectively). Bison will generate y.tab.c and y.tab.h if it is invoked with the -y flag.

This document contains numerous hyper-references to the Bison documentation. In order to use these, you must install the perl script info2www on a web-server on your local machine.

The Bison documentation is in "Info" format, and the info2www gateway is arguably the most convinient way of accessing this type of documentation.

Author: George_Hansper@apana.org.au

Last modified: $Date: 2000/07/04 07:57:50 $

Previous: Lex - a text scanner Next: Conclusions