What is Lex? What is Yacc?

What is Lex?

Lex is officially known as a "Lexical Analyser".

It's main job is to break up an input stream into more usable elements.

Or in, other words, to identify the "interesting bits" in a text file.

For example, if you are writing a compiler for the C programming language, the symbols { } ( ) ; all have significance on their own. The letter a usually appears as part of a keyword or variable name, and is not interesting on it's own. Instead, we are interested in the whole word. Spaces and newlines are completely uninteresting, and we want to ignore them completely, unless they appear within quotes "like this"

All of these things are handled by the Lexical Analyser.

What is Yacc?

Yacc is officially known as a "parser".

It's job is to analyse the structure of the input stream, and operate of the "big picture".

In the course of it's normal work, the parser also verifies that the input is syntactically sound.

Consider again the example of a C-compiler. In the C-language, a word can be a function name or a variable, depending on whether it is followed by a ( or a = There should be exactly one } for each { in the program.

YACC stands for "Yet Another Compiler Compiler". This is because this kind of analysis of text files is normally associated with writing compilers.

However, as we will see, it can be applied to almost any situation where text-based input is being used.

For example, a C program may contain something like:

		int int;
		int = 33;
		printf("int: %d\n",int);

In this case, the lexical analyser would have broken the input sream into a series of "tokens", like this:

	"int: %d\n"
Note that the lexical analyser has already determined that where the keyword int appears within quotes, it is really just part of a litteral string. It is up to the parser to decide if the token int is being used as a keyword or variable. Or it may choose to reject the use of the name int as a variable name. The parser also ensures that each statement ends with a ; and that the brackets balance.

Flex and Bison

Lex and Yacc are part of BSD Unix. GNU has it's own, enhanced, versions called Flex and Bison. I'll keep referring to "Lex" and "Yacc", but you can use Flex and Bison as "drop-in" replacements in most cases. In fact, the additional features of Flex and Bison make them an irresistable choice.

Next: Lex - a text scanner