September 24, 2010
Fun with Flex, Bison, and Friends
The past month has been particularly prolific for me, if this (worthless) site is any indication. I wrote more regularly in the last three weeks elapsed than I had in the past three years.
But the all-consuming dragon of existential angst that is a mainstay of my most trivial and farcical condition in this miserable life is not easy to slay. So in addition to readin' and writin', I have also been spending some quality time with a dear old friend of mine from a past life: the C programming language (not quite arithmetic, but close enough). It's hard to contemplate killing oneself when all your ire is directed at gcc.
In this post, I discuss a particularly pernicious bug that has plagued me into the wee hours for two days in a row with flex/bison. (Most of you, therefore, can safely skip this entry. But I warn you: nothing invokes the disaffection, misanthropy, or self-loathing for which this site is renown quite like computer science.) The problem and (hopefully) its solution after the jump.
I assume if you're still with me, you know a little something about flex, bison, and friends, as well as compiler theory, so I won't bother with my usual pedantic ways. Suffice to say that I am using flex and bison to write a compiler for a personal project (more on which, later). Despite it having been seven years since I took compilers at Berkeley (CS 164 what what!), I had no real trouble getting the lexer to tokenize my input and had a subset of the grammar for the parser implemented when I decided to add my first bison action. This is when I started getting this shit:
rohit@tyrant ~/src/parser/ % make bison -y -d -v -t parser.y gcc -g -c y.tab.c flex scanner.l gcc -g -c lex.yy.c scanner.l: In function 'yylex': scanner.l:63: warning: assignment from incompatible pointer type gcc -g -Wall -Werror -O0 y.tab.o lex.yy.o -o parser
Here are the relevant code snippets from parser.y
:
%{ #include <stdio.h> #include <stdlib.h> #define YYSTYPE char * %} ... %token WORD
And scanner.l
:
%{ #include <stdio.h> #include <string.h> #include "y.tab.h" extern YYSTYPE yylval; %} %option pointer %option noyywrap %% ... [A-Za-z0-9] { yylval = strdup(yytext); return(WORD); }
Simply enough right? The function strdup
returns a char *
, yylval
is of type YYSTYPE
which is set as char *
with a #define
, and yytext
is specified as a char *
by flex. It should work! So, what the fuck is this incompatible pointer type
bullshit, huh, gcc (you asshole!)?
The first four or five hours of the bug hunt yielded almost nothing. Although there are a ton of resources on lex/yacc coding on the web, many if not most are quite dated. I mean, one of the manuals I was using was actually written by dudes at Bell Labs—you know, the ones who came up with lex/yacc (and UNIX) in the first place. And flex/bison (GNU's implementations of the same) are subtly different, making things pretty fucking infuriating.
Around hour six or seven, I went down the wrong path. In particular, I discovered an option in flex—%option bison-bridge
—supposedly for connecting up flex and bison. But the same warning persisted even with the bridge allegedly built.
A few hours later, I discovered that the header file—y.tab.h
—generated by bison did not include my #define
for YYSTYPE
and instead had this:
#if ! defined YYSTYPE && ! defined YYSTYPE_IS_DECLARED typedef int YYSTYPE; # define yystype YYSTYPE /* obsolescent; will be withdrawn */ # define YYSTYPE_IS_DECLARED 1 # define YYSTYPE_IS_TRIVIAL 1 #endif
So that explained the pointer incompatibility. The scanner was never seeing my #define
for YYSTYPE
since it never made it into y.tab.h
and was defaulting to an integer. Why? I have no idea. All the manuals say to put the #define
in the C declarations section of the yacc file, but I guess either the manuals are wrong, bison doesn't follow yacc's behavior for this point, or something else entirely is awry.
Regardless, the fix was (or should have been) simple: make a separate header file, say parser.h
with the #define
for YYSTYPE
and #include "parser.h"
in both scanner.l
and parser.y
. Which I did, but . . . it still didn't work!
At this point, I had dedicated almost ten hours to this problem, and gotten absolutely no where. I thought everything through again; how could the scanner not know the proper type of YYSTYPE
? It was in the damn header file!
Finally, for no real reason, I did a man flex
and saw an option --bison-bridge
, description scanner for bison pure parser.
Bison pure parser? As in a reentrant one? But clearly I didn't have that; yylval
was a global variable for crying out loud! So I hastily stripped out the %option bison-bridge
option and voilà: it compiled. I almost wept with joy.
What was the final solution? Separate header file for the #define
and no bison bridge. Simple, right? So why didn't anyone say so?!
* * *
Maybe for old hands at compiler writing, this might seem like a trivial bug. But for a newb like myself, it was no joke sorting through the morass of arcane, deceptive, or flat out wrong documents on the web regarding flex/bison. So, at worst this entry will collect dust like so many of its brethren on this site. But if it helps even one person avoid the bullshit I have encountered in the past two days, I will consider it a great success.
A larger question to ponder: why the fuck does bison not output #define
statements into y.tab.h
, and why the fuck do all the yacc manuals suggest it should? Answer that question for me, and I'll buy you a beer.
Do you put your code projects on your CV, like Posner's inclusion of his judicial opinions? See http://www.law.uchicago.edu/files/cv/Posner,%20Richard%20CV.pdf at *48-*170.
Posted by James | September 24, 2010 18:21:26 -0700 | Permalink
No, but obviously, I should. Then again, I'm not sure I will ever reach the level of awesomeness necessary to fill up 100 pages on a CV, so what's the point of even trying? Go big or go home!
Posted by Rohit | September 27, 2010 09:28:51 -0700 | Permalink
This was helpful, saved me a few hours. Thanks.
Posted by Mumbler | January 25, 2013 19:24:50 -0800 | Permalink
This was helpful, saved me a few hours. Thanks.
Posted by Mumbler | January 25, 2013 19:24:58 -0800 | Permalink
Slay that mighty dragon.
Posted by Steve | July 02, 2014 11:12:13 -0700 | Permalink
Slay that mighty dragon.
Posted by Steve | July 02, 2014 11:12:18 -0700 | Permalink
Slay that mighty dragon.
Posted by Steve | July 02, 2014 11:12:29 -0700 | Permalink