parsey.y Details




This file contains all the C grammar used to parse the C language file and create an AST (Abstract Syntax Tree) of data for further processing. The information below assumes the user has basic knowledge of Flex and Bison.




Near the beginning of the file, there are options for turning debug printing on or off by defining or not defining options like DEBUG_GRAMMAR and DEBUG_CONST



These are used throughout the grammer rules which are found further down in this file. By defining DEBUG_GRAMMAR, these printf() statements will occur to help with debugging at run time.


Tokens and Associativity

Next, there are various tokens and associativity information. The tokens are defined by scanner.l




YYSTYPE union and Types

Next there is the %union that will define all the YYSTYPE (Bison type for yylval) and Types.


The %type defines the type of data that gets returned during a particular grammar rule. For example:


%type <id_ptr> IDENTIFIER


%type <tptr> primary expression


Thus IDENTIFIER is of type id_ptr which is "char *" data. Thus IDENTIFER will return "char *" data when used in a grammar rule.


The primary_expression grammar rule is of type tptr which is "struct exp_tree *" data. Thus the primary_expression grammar rule returns data of type "struct exp_tree*tptr" as seen in the %union above.


In the grammar rules shown below for primary_expression, notice the $$. $$ returns the type of data defined by %type for primary_expression which is "struct exp_tree*tptr" data.


Note that for the IDENTIFIER rule, that "struct exp_tree*tptr" data is also returned from the function node_operand() after it processes some data of which it's first parameter is $1 which is "char *id_ptr" type data.


Anytime a new rule is added, then it must have a corresponding %type




Note: yylval of type YYSTYPE is declared external in scanner.l as follows




Grammar Rules

In the primary_expression rule, seen just above, the IDENTIFIER token is defined in scanner.l as follows:



Notice on line 180 that yylval.id_ptr (also see id_ptr in the %union as shown earlier) returns data of type "char *", a simple character pointer to a string of data that gets saved by function save_id(). Thus, after the string is saved the grammar rule for IDENTIFIER will pass the string to node_operand($1, ...) where $1 is "char *" data type.


Currently a limited set of C grammar rules are used and more will be added/functioning as time allows. Most of the C rules are in parser.y but those not implemeted are commented out as shown below



Where do we go from here?


Grammar rules process data from a C file in order to create an AST (Abstract Syntax Tree) of information that the compiler will eventually uses to create assembly language and finally binary code that the CPU can execute. Parser.y grammar rules do this by calling various node_ functions, such as node_operand(), node_assign(), node_post_inc(), etc.. in order to build the AST. The operation of these functions are found in file node_functions.c. Once the C file has been processed according to the rules in parser.y, function gen_parsed() is called since "translation_unit" is the top level on an AST for a processed C file. gen_parsed() is found in file gencode.c