Rohit's Realm

// rohitsrealm.com / archive / 2010 / 09 / 24 / fun-with-flex-bison-and-friends

September 24, 2010

Fun with Flex, Bison, and Friends

The past month has been particularly prolific for me, if this (worthless) site is any indication. I wrote more regularly in the last three weeks elapsed than I had in the past three years.

But the all-consuming dragon of existential angst that is a mainstay of my most trivial and farcical condition in this miserable life is not easy to slay. So in addition to readin' and writin', I have also been spending some quality time with a dear old friend of mine from a past life: the C programming language (not quite arithmetic, but close enough). It's hard to contemplate killing oneself when all your ire is directed at gcc.

In this post, I discuss a particularly pernicious bug that has plagued me into the wee hours for two days in a row with flex/bison. (Most of you, therefore, can safely skip this entry. But I warn you: nothing invokes the disaffection, misanthropy, or self-loathing for which this site is renown quite like computer science.) The problem and (hopefully) its solution after the jump.

I assume if you're still with me, you know a little something about flex, bison, and friends, as well as compiler theory, so I won't bother with my usual pedantic ways. Suffice to say that I am using flex and bison to write a compiler for a personal project (more on which, later). Despite it having been seven years since I took compilers at Berkeley (CS 164 what what!), I had no real trouble getting the lexer to tokenize my input and had a subset of the grammar for the parser implemented when I decided to add my first bison action. This is when I started getting this shit:

rohit@tyrant ~/src/parser/ % make
bison -y -d -v -t parser.y
gcc -g -c y.tab.c
flex  scanner.l
gcc -g -c lex.yy.c
scanner.l: In function 'yylex':
scanner.l:63: warning: assignment from incompatible pointer type
gcc -g -Wall -Werror -O0 y.tab.o lex.yy.o  -o parser
  

Here are the relevant code snippets from parser.y:

%{

#include <stdio.h>
#include <stdlib.h>

#define YYSTYPE char *

%}

...

%token WORD
  

And scanner.l:

%{

#include <stdio.h>
#include <string.h>
#include "y.tab.h"

extern YYSTYPE yylval;

%}

%option pointer
%option noyywrap

%%

...

[A-Za-z0-9]  { 
                yylval = strdup(yytext);
                return(WORD); 
              }
  

Simply enough right? The function strdup returns a char *, yylval is of type YYSTYPE which is set as char * with a #define, and yytext is specified as a char * by flex. It should work! So, what the fuck is this incompatible pointer type bullshit, huh, gcc (you asshole!)?

The first four or five hours of the bug hunt yielded almost nothing. Although there are a ton of resources on lex/yacc coding on the web, many if not most are quite dated. I mean, one of the manuals I was using was actually written by dudes at Bell Labs—you know, the ones who came up with lex/yacc (and UNIX) in the first place. And flex/bison (GNU's implementations of the same) are subtly different, making things pretty fucking infuriating.

Around hour six or seven, I went down the wrong path. In particular, I discovered an option in flex—%option bison-bridge—supposedly for connecting up flex and bison. But the same warning persisted even with the bridge allegedly built.

A few hours later, I discovered that the header file—y.tab.h—generated by bison did not include my #define for YYSTYPE and instead had this:

#if ! defined YYSTYPE && ! defined YYSTYPE_IS_DECLARED
typedef int YYSTYPE;
# define yystype YYSTYPE /* obsolescent; will be withdrawn */
# define YYSTYPE_IS_DECLARED 1
# define YYSTYPE_IS_TRIVIAL 1
#endif
  

So that explained the pointer incompatibility. The scanner was never seeing my #define for YYSTYPE since it never made it into y.tab.h and was defaulting to an integer. Why? I have no idea. All the manuals say to put the #define in the C declarations section of the yacc file, but I guess either the manuals are wrong, bison doesn't follow yacc's behavior for this point, or something else entirely is awry.

Regardless, the fix was (or should have been) simple: make a separate header file, say parser.h with the #define for YYSTYPE and #include "parser.h" in both scanner.l and parser.y. Which I did, but . . . it still didn't work!

At this point, I had dedicated almost ten hours to this problem, and gotten absolutely no where. I thought everything through again; how could the scanner not know the proper type of YYSTYPE? It was in the damn header file!

Finally, for no real reason, I did a man flex and saw an option --bison-bridge, description scanner for bison pure parser. Bison pure parser? As in a reentrant one? But clearly I didn't have that; yylval was a global variable for crying out loud! So I hastily stripped out the %option bison-bridge option and voilà: it compiled. I almost wept with joy.

What was the final solution? Separate header file for the #define and no bison bridge. Simple, right? So why didn't anyone say so?!

* * *

Maybe for old hands at compiler writing, this might seem like a trivial bug. But for a newb like myself, it was no joke sorting through the morass of arcane, deceptive, or flat out wrong documents on the web regarding flex/bison. So, at worst this entry will collect dust like so many of its brethren on this site. But if it helps even one person avoid the bullshit I have encountered in the past two days, I will consider it a great success.

A larger question to ponder: why the fuck does bison not output #define statements into y.tab.h, and why the fuck do all the yacc manuals suggest it should? Answer that question for me, and I'll buy you a beer.

Comments

Do you put your code projects on your CV, like Posner's inclusion of his judicial opinions? See http://www.law.uchicago.edu/files/cv/Posner,%20Richard%20CV.pdf at *48-*170.

No, but obviously, I should. Then again, I'm not sure I will ever reach the level of awesomeness necessary to fill up 100 pages on a CV, so what's the point of even trying? Go big or go home!

This was helpful, saved me a few hours. Thanks.

This was helpful, saved me a few hours. Thanks.

Slay that mighty dragon.

Slay that mighty dragon.

Slay that mighty dragon.

Add Comment


 


 


 


 


* required field

E-mail addresses will never be displayed. The following HTML tags are allowed:
a abbr acronym address big blockquote br cite del em li ol p pre q small strong sub sup ul