Adding Token Pasting to a non-ANSI Compiler, A Stack Data Type

Programming in C

By Allen Holub

Most Mac compilers claim to support the ANSI C Standard, in fact they support a subset of ANSI. Among the features most often omitted from that subset are the new preprocessor directives. The program described in this article corrects this problem to some extent. It is a C preprocessor that augments the standard preprocessor used by your compiler, expanding macros itself in order to support token pasting and the five predefined macros specified in the Standard. 

What is Token Pasting?

Token pasting provides a method for concatenating strings in a macro expansion. The concatenation operator is ## and it is just removed from a macro when the macro is expanded. For example, a definition like: 

#define CONCAT(a,b) a##b

when invoked with:

CONCAT(com, pound)

evaluates to:

compound

Note that the concatenation is done when the macro is expanded, not when it’s defined.

An Example: A Stack Abstract Data Type

Token pasting is particularly useful when you’re trying to do data abstraction in a C program, because it lets you hide the details of variable declarations in a macro. As a simple example, I’ll build a set of macros to implement a stack data type. You will declare a stack with: 

stack_dcl(stack, type, size)

Where stack is the stack name, type is it’s type, and size is the number of elements. A stack of 128 character pointers called cp_stack could be declared as follows: 

stack_dcl( cp_stack, char *, 128 );

Once the stack is declared, you can perform various operations (see Listing 1). 

There are two problems here, both of which can be solved with token pasting: (1) two data structures are required to implement a stack — an array and a pointer, and (2) it must be possible to declare more than one stack at the same scoping level. See Listing 2 for the stack_decl macro.

When a definition like the earlier:

stack_dcl( cp_stack, char *, 128 );

is processed, the normal argument-substitution mechanism will be used, but the ## operators will be removed from the expansion, so the following declarations will result: 

typedef char* t_cp_stack;
cp_stack t_cp_stack cp_stack[128];
cp_stack t_cp_stack (*p_cp_stack) 
	= cp_stack + (128);

The macro has transparently created a new type, the name of which is generated by appending the string t_ to the front of the stack name. It has also allocated an array of that type for use as the stack, and a stack pointer (whose name has a p_ added to the front of it), which is initialized to point just past the end of the stack. See Listing 3 for the remainder of the stack macros.

stack_full just tests to see if the stack pointer is in the stack array somewhere. stack_empty the stack size is not maintained in a variable, the compiler must recompute it with: 

sizeof(stack) / sizeof(*stack)

This is a compile-time computation, however, and it saves a variable.

The stack_ele macro on the next line is computing the number of elements in use by subtracting the current offset of the stack pointer from the top of stack (p_##stack-stack) from the stack size. The stack_err() macro is used internally by push() and pop() when an error occurs. The argument is 1 if the error is an overflow, 0 if it’s an underflow. A sequence operator (the comma) is used here to print an error message and then exit the program. The push() and pop() macros will evaluate to whatever stack_err() evaluates to if stack_err() is redefined not to exit the program. For example, if you wanted push() and pop() to evaluate to -1 on an error, you could redefine stack_err() as follows: 

#define stack_err(o) -1

The last two macros are push() and pop() They both start out by checking for an error, and invoking stack_err() if one is found. The casts on the second line of both macros are just making sure that the compiler won’t kick out an error message. You have to cast to long before casting to the stack type because many compilers won’t let you cast an int into a pointer without an printing a warning. The cast to long usually suppresses this warning. The third line of both macros actually does the push or pop. (I’m using a downward growing stack, so a push is a predecrement and a pop is a postincrement.)

PP: A C Preprocessor

PP is a preprocessor that adds token pasting to non-ANSI compilers. Unfortunately, to do simple token pasting, you must build almost all of a C preprocessor, because the ## must be removed when the macro is expanded, not when it is defined. Since I had to do all the macro stuff anyway, I added support for the five predefined macros specified in the ANSI standard:

_#_DATE_#_	Current date (“Mmm dd yyyy”)
_#_TIME_#_	Current time (“hh:mm:ss”)
_#_STDC_#_	Always 0 (Should be true only if
	full ANSI C is supported)
_#_FILE_#_	Current input file name
_#_LINE_#_	Current line of current input file

Not all of the normal preprocessor directives are supported, however. I’m assuming that you’ll run the output of PP through a normal C preprocessor before it is given to the compiler. In particular, normal macro definition and expansion is supported, but #undef is not. #include is supported, but only if the file name is surrounded by quotes (as compared to angle brackets). In addition comments are replaced by white space in the output (in order to preserve line numbers and formatting). 

PP generates a few directives of its own. _#_DATE_#_ _#_TIME_#_and _#_STDC_#_are supported by generating #defines for them. In addition, #line directives are generated in order to make compiler error messages reference the input rather than the output file. (A #line N “file” tells the compiler to assume that we are on line N of the indicated file.) 

All other preprocessor directives are just passed through to the output without modification. This can cause problems with #ifdef and #if, because these directives are ignored by PP. The problem is solved, somewhat, by the introduction of several #pragmadirectives. Issuing:

#pragma pp 0

forces PP into a transparent mode in which macro definitions and expansions are not performed—they are just passed to the output.

#pragma pp 1

puts PP back in normal mode.

Two other pragmas are also supported by PP.

#pragma pm

causes PP to print a list of all the previously defined macros to standard error. A final #pragma is dangerous but occasionally useful. Issuing:

#pragma ac 0

causes PP to stop checking that the number of arguments to a macro matches the definition. It allows you to have a macro with a variable number of arguments, but at the cost of introduced errors if you don’t want a variable number of arguments. Unspecified arguments evaluate to empty strings. For example: 

#pragma ac 0
#define foo(a,b,c) /a/b/c/

foo() 	/* evaluates to ////	*/
foo(1)	/* evaluates to /1///	*/
foo(1,2)	/* evaluates to /1/2//	*/
foo(1,2,3)	/* evaluates to /1/2/3/	*/
foo(1,,3)	/* evaluates to /1//3/	*/
foo(,,3)	/* evaluates to ///3/	*/

The last two of these will actually work even if #pragma ac isn’t issued, but it’s not portable so be careful. Empty argument lists like: 

#define getchar() getc(stdin)

are, of course, permitted.

All macro definitions and invocations must be on a single line. Use a backslash at the end of the line if you must have a long definition: 

#define long_def xxxxxxxxxxxxxxxxxxxxxxxx\
	xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx\
	xxxxxxxxxxxxx

MACRO( a, b, c, 	dddddddddddddddddddddddddddd\
	ddddddddddddddddddddd e, f );

The maximum input line length, including all lines that are concatenated to the previous ones with a backslash, is 2047 characters. Note that, like many C preprocessors, pp expands macro arguments in strings: For example:

#define print( type, object ) printf(“%type,\n"\ object)

when invoked with:

print(d,x);

expands to:

printf(“%d\n", x);

Recursion is not permitted in a macro definition. The preprocessor catches direct recursion immediately. It also catches this:

#define FOO BAR
#define BAR FOO

but it does it by limiting the depth of macro nesting to 16 levels. Recursion is permitted in a macro expansion, however. That is, a macro can be passed to itself as an argument. The following input: 

#define one 1
#define two 2
#define twelve one##two
#define suffix(x) twelve##x
#define cat(a,b) a##b

cat( final(3), cat( cat(4,5), 6) )

correctly expands to 123456

The Source Code

The entire source code for PP follows. It is commented well enough that additional comments aren’t required here. (All the code in this article is available electronically in a straight ASCII file from Software Engineering Consultants, P.O.Box 5679, Berkeley, California 94705. The cost is $29.95 by a check or money order drawn on a U.S. bank, include sales tax if shipping to California.)

About the Author

Allen Holub is a C programmer and consultant in the San Francisco Bay area. He was formerly the technical editor and C columnist for Dr. Dobbs’ Programming Journal. He teaches C and compiler design at the University of California-Berkeley Extension. His scotch is not Dewer’s.

Listing 1

push(cp_stack,x); 	/* Push x onto the stack 	*/
x = pop(cp_stack); 	/* pop an item from the stack to x 	*/
stack_full(cp_stack); 	/* true if the stack is full	 */
stack_empty(cp_stack); 	/* true if the stack is empty 	*/
stack_ele(cp_stack); 	/* number of elements on the stack	 */

Listing 2

#define stack_dcl(stack,type,size)	\
	typedef type t_##stack; 	\
	stack_cls t_##stack stack[size];	\	
	stack_cls t_##stack (*p_##stack) = stack + (size)

Listing 3

#define stack_full(stack)	( (p_##stack) <= stack )
#define stack_empty(stack)	( (p_##stack) >= (stack + sizeof(stack)\
						sizeof(*stack)) )

#define stack_ele(stack)	((sizeof(stack)/sizeof(*stack)) - (p_##stack-stack))

#define stack_err(o)	((o) ? (fprintf(stderr,”Stack overflow0 ),exit(1))	\
		: (fprintf(stderr,”Stack underflow0),exit(1))	)

#define push(stack,x)	( stack_full(stack)	\
			? ((t_##stack)(long)(stack_err(1)))	\
			: *—p_##stack = (x)	)

#define pop(stack) 	( stack_empty(stack)	\
			? ((t_##stack)(long)(stack_err(0)))	\
			: *p_##stack++	)

PP Source Code

/*	PP.C A C preprocessor that does token pasting.
 *	(c) 1989, Allen I. Holub. All rights reserved.
 *
 *	All of the following are standard ANSI include files If your
 *	compiler doesn’t have one of them, then it is doing something in
 *	a nonstandard way, and you’ll have to work harder to port the
 *	code, sorry.
 */

#include	<stdio.h> 	/* standard-I/O-system definitions	 */
#include	<stdlib.h> 	/* prototypes for other run-time library functs.	*/
#include	<ctype.h> 	/* isspace(), isalpha(), etc.		*/
#include	<time.h> 	/* ANSI-time-functions: struct tm & time_t 	*/
#include	<stdarg.h> 	/* ANSI-variable-argument lists, see err() 	*/

/*—————————————————————————————————*/

#define	ARGMAX	16 	/* maximum number of macro arguments 	*/
#define	NAMEMAX	32	/* maximum length of a macro name + 1 	*/
#define	BODYMAX	512	/* maximum number of chracters in macro body 	*/
#define	MAXDEPTH	16	/* maximum macro-nesting depth	*/
#define	OLINEMAX	2048	/* max. input line length, including \ lines 	*/

/*——————————————————————————————————
 *	The symbol table used to hold macro definitions is a binary tree of
 *	the following structures:
 */

typedef struct macro
{
	char 	name [ NAMEMAX ];		/* macro name 		*/
	char 	arg [ ARGMAX ][ NAMEMAX ];	/* argument names	*/
	int	nargs;			/* number of arguments 	*/
	char 	body[ BODYMAX ];		/* body of macro	*/
	struct macro *left;			/* left child in tree 	*/
	struct macro *right;		/* right child in tree 	*/
}
macro;
macro	*Root = NULL ;		/* Root of symbol-table tree 	*/

/*—————————————————————————————————*/

int Lineno 		= 	0;	/* Input line number	*/
char *Inp_file 	= 	“”; 	/* Input file name 	*/
int Check_args 	= 	1;	/* Check for correct number of macro args 	*/
int Preprocess 	= 	1;	/* Preprocess only if true. Else input is 	*/
			 		/* transparent.	*/

			 		/* IS_START_NAME(c) evaluates true if c 	*/
			 		/* can be the first character of an ident- 	*/
			 		/* ifier. ISNAME(c) is true if c can be an 	*/
			 		/* internal character in an identifier. 	*/

#define IS_START_NAME(c) (isalpha(c)	 || (c)==’_’ )
#define ISNAME(c) 	 (IS_START_NAME(c) || isdigit(c) )

/*—————————————————————————————————
 *	Tokens. Input that is not part of a preprocessor directive is
 *	divided into a stream of the following tokens.
 */

#define	NAME	1 /* an identifier   	 		*/
#define	PASTE	2 /* ## 		  	*/
#define	COMMA	3 /* ,  				*/
#define 	WHITE	4 /* 1 or more white space characters except NL 	*/
#define 	NL	5 /* newline   			*/
#define 	LP	6 /* (  				*/
#define 	RP	7 /* )  				*/
#define 	OTHER	8 /* anything else	 		*/

/*—————————————————————————————————
 *	Values returned from the is_pragma() subroutine. 0 is returned if
 *	a #pragma not recognized by pp is encountered (in which case, the
 *	#pragma is passed to the output).
 */

#define	PM 1	/* #pragma pm	*/
#define	AC0 2	/* #pragma ac 0	*/
#define	AC1 3	/* #pragma ac 1	*/
#define	PP0 4	/* #pragma pp 0 	*/
#define 	PP1 5	/* #pragma pp 1 	*/

/*—————————————————————————————————
 *	Function prototypes for local and external functions.
 */

#define	PROTOTYPES		/* Remove this definition to turn function
			 	* prototypes into normal extern statements.
			 	* A definition for “void” is also provided
			 	* in this case.
			 	*/
#ifdef PROTOTYPES
# 	efine P(x) x	
#else
#	define P(x) ()
#	define void int
#endif

/*—————————————————————————————————*/

int		generate_predefined_macros	P((void ));
int		getline		P((char *src, int n, FILE *stream));
int		do_file		P((char *file_name, FILE *fp));
int		expand_macro		P((char *body, macro *cur_mac, int nargs, \
						char **args));
int 		is_an_arg		P((char *lex, int len, char arg[][NAMEMAX],\
						int nargs));
int 		extract_args		P((char **srcp, int argc_max, char **argv));
int 		get_token		P((char **srcp, char **lexemep, int *lenp));
int 		is_include		P((char *src, char *inc_file, int maxname));
int 		is_define		P((char *src));
int 		add_macro		P((char *src));
macro 	*is_macro		P((char *name, int len));
int 		print_macs		P((FILE *stream, macro *root));
char 	*extract_name		P((char *dst, char *src));
int 		lexcmp		P((char *s1, char *s2, int s1_len));
void 	err		P((char *fmt, ... ));

char 	*skipspace		P(( char *src ));

/*————————————————————————————————*/

main( argc, argv )
char	**argv;
{
	FILE *fp;

 	fprintf(stderr,”PP: (c) 1989 Allen I. Holub.\n”);
 	generate_predefined_macros();

 	++argv;
 	if( —argc == 0 )
		do_file( “stdin”, stdin );
 	else
 	{
		for(; —argc >= 0 ; ++argv )
		{
	 		if( !(fp = fopen( *argv, “r” )) )
	 		{
				perror( *argv );
				exit( 1 );
	 		}
	 		else
	 		{
				do_file( *argv, fp );
				fclose ( fp );
	 		}
		}
 	}
}

/*————————————————————————————————*/

generate_predefined_macros()
{
	/	* Kick out macro definitions for __TIME__, __DATE__, and
 	* __STDC__. The other ANSI predefined macros, such as __FILE__
 	* (which is likely to change), are output when macros are
 	* expanded, below. The standard ANSI/UNIX time functions are
 	* used here to get the time and date. If your compiler doesn’t
 	* support these functions, just remove this subroutine and the
 	* call to it, below.
 	*/

 	time_t	clock;
 	struct tm	*sclock;
 	static char	*strmon[] = { “Jan”, “Feb”, “Mar”, “Apr”, “May”,
			“Jun”, “Jul”, “Aug”, “Sep”, “Oct”,”Nov”, “Dec”, };

 	time		( &clock );
 	sclock = localtime ( &clock );

 	printf(“#define _#_TIME_#_\”%02d:%02d:%02d\”\n”,
				sclock->tm_hour,
				sclock->tm_min,
				sclock->tm_sec );

 	printf(“#define _#_DATE_#_\”%3s %2d %4d\”\n”,
				strmon[sclock->tm_mon],
				sclock->tm_mday,
				sclock->tm_year + 1900);
 	printf(“#define _#_STDC_#_0\n”);
}

/*—————————————————————————————————*/

int getline( src, n, stream )
char *src;
int n;
FILE *stream;
{
	/* Works like fgets() but concatenates strings that end with \
 	* and returns the number of source lines that made up the
 	* combined input line (0 at EOF). Replaces all comments with the
 	* equivalent amount of white space (space characters and blank
 	* lines, as appropriate).
 	*/

 	int i;
 	char *p			= src;
 	int nlines		= 1;	/* Number of continued lines	*/
 	static int in_comment	= 0;	/* True when processing a comment. 	*/
					/* Can be active between successive 	*/
					/* calls.		*/

 	if( !fgets(p, n, stream) )	/* Get the first line 		*/
	return 0;

 	++Lineno;

 	while( p[i = strlen(p)-2] == ‘\\’ && i > 0 )
 	{
		/* get any continuation lines (those that follow lines
	 	* ending with a backslash.
	 	*/

 		if( !fgets(p += i, n -= i, stream) )
	 		break;

		++Lineno;
		++nlines;
 	}

 	for( p = src; *p ; ++p ) /* replace comments with spaces */
 	{
		if( p[0]==’/’ && p[1]==’*’ )	 /* start-of-comment token */
		{
	 		in_comment = 1;
	 		p[0] = p[1] = ‘ ‘;
	 		++p;
		}
		else if( p[0]==’*’ && p[1]==’/’ ) /* end-of-comment token */
		{
	 		in_comment = 0;
	 		p[0] = p[1] = ‘ ‘;
	 		++p;
		}
		else if( in_comment && !isspace(*p) )
		{
	 		/* Replace nonspace with ‘ ‘ if in a comment. Existing
	 		* whitespace is not modified so that newlines will be
	 		* preserved.
	 		*/

	 		*p = ‘ ‘;
		}
 	}

 	return( nlines );
}

/*-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-*/

do_file( file_name, fp )
char *file_name;			/* input file name	*/
FILE *fp;				/* FILE pointer for input file. 	*/
{
 	int 	lineno ;			/* Place to remember current line # 	*/
					/* across recursive call	*/
 	char 	inc_file[80];		/* Name of an #included file	*/
 	FILE 	*inc_fp;			/* FILE pointer for #included file 	*/
 	static	char buf[OLINEMAX];	/* Input buffer.	*/
 	int 	nlines, i;
 	static	int nskip;

	/* rocess one file. Note that this routine calls itself
	* recursively to process #include statements.
	*/

 	printf( “#line 1 \”%s\”\n”, file_name );

 	Lineno = 0;	 		/* Line # and file name for error msgs */
 	Inp_file = file_name ;

 	while( nlines = getline(buf, sizeof(buf), fp) )
 	{
		if( is_include( buf, inc_file, sizeof(inc_file) ) )
		{
	 		if( !(inc_fp = fopen( inc_file, “r” ) ) )
				perror( inc_file );
	 		else
	 		{
				lineno = Lineno; 	/* Preserve current input line 	*/
				 	/* number. (The file name is on 	*/
				 	/* the stack, so is preserved 	*/
				 	/* automatically).	*/

				do_file( inc_file, inc_fp );

				Lineno = lineno;
				Inp_file = file_name ;

				printf( “#line %d \”%s\”\n”, Lineno+1, file_name );
				continue;
	 		}
		}

		if( (i = is_pragma(buf)) == PM )
		{
	 		fprintf	(stderr, “macros:\n”);
	 		print_macs	(stderr, Root );
	 		fprintf	(stderr, “\n”);
		}
		else if( i == AC0 )
	 		Check_args = 0;

		else if( i == AC1 )
	 		Check_args = 1;

		else if( i == PP0 )
	 		Preprocess = 0;

		else if( i == PP1 )
	 		Preprocess = 1;

		else if( !Preprocess )
	 		puts( buf );

		else if( nskip = is_define( buf ) )
		{
	 		add_macro( buf + nskip );

	 		/* replace macro defintion with blank lines to
	 		* preserve line numbering.
	 		*/

	 		for( i = nlines; —nlines >= 0; printf(“\n”) )
				;
		}
		else
		{
		 	expand_macro( buf, NULL, 0, NULL );
	 		if( nlines != 1 )
				printf( “#line %d \”%s\”\n”, Lineno+1, file_name );
		}
 	}
}

/*—————————————————————————————————*/

expand_macro( body, mac, nargs, args )
char *body;				/* string to print.				 */
macro *mac;				/* pointer to macro structure or NULL if none 	*/
int nargs;				/* number of arguments in the args[] array	*/
char **args;			/* array of pointers to strings, one for each	*/
					/* argument to the macro whose name is in “body”.	*/
{
	/* This is the main output routine. It prints the string in
	* “body”, expanding any macro invocations as they are found.
	* Since a macro can contain other macros, the expansion is done
	* with a recursive call. If the string being expanded is not,
	* itself, a macro name, then all but the first argument should
	* be NULL or 0. Otherwise, the “body” argument is assumed to be
	* the name of a macro, and the other arguments are used to access
	* the associated “macro” structure and the macro’s arguments.
	* For example, if the “body” array contains the text:
	* “foo macro_invocation(a,b,c)”, the “foo “ is just passed to
	* the output. The “macro_invocation” is recognized as a macro
	* name, so the arguments (a, b, anc c) will be loaded into the
	* first three elements of the “argv” array (below), and then
	* expand_macro will be called recursively, being passed
	* “macro_invocation” as a “body” argument, and the “argv”
	* array as the “args” argument.
	*
	* Note that __LINE__ and __FILE__ are recognized and expanded
	* here. They are not in the normal macro table.
 	*/

 	static int	depth = 0; 	/* Current recursive nesting depth	*/
 	int 	argc;	 	/* # of arguments to macro at next lev	 */
 	macro 	*macp;	 	/* pointer to macro at next level	*/
 	char 	*argv[ARGMAX+1]; 	/* arguments to macro at next level “	*/
 	int 	tok;		/* input token		*/
 	char 	*lexeme;	/* Lexeme associated with current token	 */
 	int 	len;		/* Number of characters in lexeme	*/
 	char 	*olex;	 	/* Remember lexeme for later pushback 	*/
 	int 	i;
 	char 	*p;

 	if( ++depth > MAXDEPTH )
 	{
		err(“%s: Macro nesting too deep (indirect recursion?)\n”,
							mac->name );
		exit( 1 );
 	}

 	while( tok = get_token( &body, &lexeme, &len ) )
 	{
		if( tok == NAME )
		{
	 		if( mac &&
				(i=is_an_arg(lexeme, len, mac->arg, mac->nargs)) >= 0)
	 		{
				/* Must expand the argument (rather than printing it),
		 		* because it could be a macro itself.
		 		*/

				expand_macro( args[i], NULL, 0, NULL );
	 		}
	 		else if( mac && !lexcmp(lexeme, mac->name, len) )
	 		{
			err(“Illegal recursion in macro %s\n”,lexeme);
	 		}
	 		else if( !lexcmp(lexeme, “__LINE__”, len) )
	 		{
			printf(“%d”, Lineno );
	 		}
	 		else if( !lexcmp(lexeme, “__FILE__”, len) )
	 		{
			printf(“%s”, Inp_file );
	 		}
	 		else if( !(macp = is_macro(lexeme,len)) )
	 		{
			printf(“%1.*s”, len, lexeme );
	 		}
	 		else
	 		{
				p = body;
				i = len;
				olex = lexeme;

				if( (tok = get_token(&body, &lexeme, &len)) == LP )
				{
		 			argc = extract_args( &body, ARGMAX +1, argv );
				}
				else
				{
		 			argc = 0; 	/* no arguments	*/
		 			*argv = NULL;
		 			body = p; 	/* push back lookahead token */
		 			len = i;
		 			lexeme = olex;
				}

				if( argc==macp->nargs || !Check_args )
		 			expand_macro( macp->body, macp, argc, argv );
				else
		 			err(“Wrong number of arguments (%d) to macro %s\n”,
					argc, macp->name);
	 		}
		}
		else if( tok != PASTE )
		{
	 		printf( “%1.*s”, len, lexeme );
		}
		/* else ( tok == PASTE ) Ignore it. */
 	}

 	_#_depth;
}

/*—————————————————————————————————*/

int is_an_arg( lexeme, len, arg, nargs )
char *lexeme;
int len;
char arg[ ARGMAX ][ NAMEMAX ];
{
 	/* Return true if if the string “lexeme” which is “len” characters
 	* long, is one of the strings in the “arg” array (which is a list
 	* of macro-argument names. Note that p (below) is a pointer to an
 	* array, so incrementing it causes the compiler to skip the
 	* ENTIRE array, not just the first element.
 	*/

 	char (*p)[ NAMEMAX ];
	for( p = arg; —nargs >= 0; ++p )
		if( !lexcmp(lexeme, (char *)p, len) )
			return( p - arg );

	return -1;
}

/*—————————————————————————————————*/

int extract_args(srcp, argc_max, argv)
char **srcp;
int argc_max;
char **argv;
{
 	/* extract macro arguments and load them into “argv”. “argc_max”
 	* is the maximum size. Up to argc_max-1 arguments are loaded,
 	* the last element in “argv” is always NULL. *srcp is advanced
 	* past the rightmost right parenthesis in the argument list,
 	* and the number of arguments (the number of elements loaded
 	* into argv) is returned (0 if there are no arguments).
 	*
 	* You should already have skipped past the leading left
 	* parentheses of the argument list before calling this
 	* subroutine.
 	*/

 int tok;			/* current token	 */
 char *lexeme;			/* current lexeme	 */
 int len;			/* current lexeme length */
 int nest_lev;			/* parentheses nesting level */
 char **argv_start = argv;	/* address of argv[0].	 */
 int rval;			/* return value.	 */

 while( (tok=get_token(srcp, &lexeme, &len)) != RP && tok != NL)
 {
	if( tok == WHITE )
		tok = get_token(srcp, &lexeme, &len);

	if( —argc_max <= 0 )
	{
		err(“Too many arguments in macro definition.\n”);
		break;
	}

	if( tok == COMMA || tok == NL || tok == RP )
	 	*argv++ = “”;
	else
	{
	 	*argv++ = lexeme;

	 	nest_lev = 0;
	 	while( tok!=NL && !(nest_lev==0 && (tok==COMMA||tok==RP)))
	 	{
			/* While we’re not at end of line, and the current
		 	* token isn’t a comma that’s outside of a paren-
		 	* thesized list. That is, a comma found insided
		 	* parentheses, like this:
		 	* 		MACRO( printf(“%d”, i), x );
		 	* should not be treated as an argument separator.
		 	*/

			if( tok == LP )	 /* Take care of (..(,)...) */
	++nest_lev;

			if( tok == RP && —nest_lev < 0 )
			{
	err( “Missing ( in macro expansion\n” ); /* ) */
	break;
			}

			tok = get_token( srcp, &lexeme, &len );
	 	}

		*lexeme = ‘\0’; /* overwrite comma, NL, or RP with */
			 /* terminator			 */
	}

	if( tok != COMMA )
			break;
}

	rval = argv - argv_start;	/* compute number of valid args. */

	while( —argc_max > 0 )		/* all but last argv entry = “” */
		*argv++ = “”;		/* in case number-of-argument */
					/* checking is turned off.	 */

 	*argv = NULL;			/* last one gets NULL 	 */

 	if( tok == NL )
		err(“Macro invocation not on single line. Truncating.\n”);

 	return rval;
}

/*—————————————————————————————————*/

int get_token( srcp, lexemep, lenp )
char **srcp;
char **lexemep;
int *lenp;
{
 	/* Mini lexical analyzer. Return a token from the input string
 	* (*srcp) and advance *srcp to point past the associated lexeme.
 	* 0 is returned at end of string.
 	*
 	* srcp:	on input: *srcp points at the input string.
 	*	 on output: *srcp points just past the end of the
 	*			lexeme.
 	* lexemep:	on output: *lexemep points to lexeme (not \0
 	*			terminated).
 	* lenp:	on output: *lenp = lexeme length.
 	*
 	*/

	char *start = *srcp; /* start of input string 	*/
	char *p = start; /* get rid of a level of indirection*/
	int token;	 /* token to return.			*/

	if( !*start )
		return 0;

 	else if( *p==’,’	)	{ token = COMMA; ++p;	}
 	else if( *p==’(‘	)	{ token = LP; ++p; 		}
 	else if( *p==’)’	)	{ token = RP; ++p; 		}
 	else if( *p==’\n’	)	{ token = NL; ++p; 		}
 	else if( *p==’#’	)
 	{
 		if( *++p != ‘#’ )		/* # without second # */
	 		token = OTHER;
		else
		{
	 		token = PASTE;		/* ## */
	 		++p;
		}

 	}
 	else if( isspace(*p) )		/* sequence of white space chars */
 	{
		token = WHITE;
		while( isspace(*p) )
	 	++p;
 	}
 	else if( IS_START_NAME(*p) )	/* identifier (NAME) */
 	{
		token = NAME;
		while( ISNAME(*p) )
	 	++p;
 	}
 	else				/* anything else */
 	{
		token = OTHER;
		++p;
 	}

 	*srcp = p;
 	*lexemep = start;
 	*lenp = p - start;

 	return token;
}

/*—————————————————————————————————*/

char *skipspace( p )
char *p;
{
 	/* Return a pointer to the first character following any white
 	* space at the start of p.
 	*/

 	while( isspace(*p) )
		++p;

 	return p;
}

/*—————————————————————————————————*/

int is_include( src, inc_file, maxname )
char *src;
char *inc_file;
int maxname;
{
 	/* Return true if the “src” string contains a #include directive.
 	* If so, up the the first maxname-1 character of the the
 	* associated file name is copied into “inc_file”. Only current-
 	* directory (“name”) forms of the #include are recognized
 	* (“#include <name>” is ignored here, so will be passed to the
 	* output unchanged).
 	/

 	src = skipspace(src);

 	if( *src != ‘#’ )
		return 0;

 	src = skipspace(++src);

 	if( !( src[0]==’i’ && src[1]==’n’ && src[2]==’c’ && src[3]==’l’ &&
	 	src[4]==’u’ && src[5]==’d’ && src[6]==’e’ ))
		return 0;

 	src += 7;

 	src = skipspace(src);

 	if( *src == ‘<‘ )		/* ignore #include <...> */
 	{
		return 0;
 	}
 	else if( *src != ‘“‘ )
 	{
		err( “Syntax error in #include %s\n”, src );
		return 0;
 	}
 	else
 	{
		++src;
		while( *src && *src != ‘“‘ && *src != ‘\n’ && —maxname > 0 )
	 		*inc_file++ = *src++ ;

		*inc_file = ‘\0’;
		return 1;
 	}
}

/*—————————————————————————————————*/

int is_define( src )
char *src;
{
 	/* Return false if not “src” doesn’t start with a “#define,”
 	* otherwise return the width of the #define statement (including
 	* any whitespace surrounding the “#” and the “define”).
 	*/

 	char *start = src;

 	src = skipspace(src);
 	if( *src == ‘#’ )
 	{
		src = skipspace(++src);

		if( src[0]==’d’ && src[1]==’e’ && src[2]==’f’ && src[3]==’i’
				 && src[4]==’n’ && src[5]==’e’)
		{
	 		src = skipspace(src += 6);
	 		return src - start;
		}
	}

 	return 0;
}

/*—————————————————————————————————*/

is_pragma( src )
char *src;
{
 	/* If “src” starts with a “#pragma,” return one of the pragma
 	* types defined at the top of this file:
 	*
 	*		PM 	#pragma pm
 	*		AC0	#pragma ac 0
 	*		AC1	#pragma ac 1
 	*		PP0	#pragma pp 0
 	*		PP1	#pragma pp 1
 	*
 	* or 0 if something other than “pm”, “ac”, or “pp” follows
 	* the “pragma”.
 	*/

 	char *start = src;
 	int i;

 	src = skipspace(src);
 	if( *src == ‘#’ )
 	{
		src = skipspace(++src);

		if( src[0]==’p’ && src[1]==’r’ && src[2]==’a’ && src[3]==’g’
				 && src[4]==’m’ && src[5]==’a’)
		{
	 		src = skipspace(src += 6);

	 		if( src[0]==’p’ && src[1]==’m’ &&
					(!src[2] || isspace(src[2])) )
	 			return PM;

			if( src[0]==’a’ && src[1]==’c’ &&
					(!src[2] || isspace(src[2])) )
			{
				i = atoi( skipspace(src += 2) );
				return i ? AC1 : AC0 ;
			}

	 		if( src[0]==’p’ && src[1]==’p’ &&
					(!src[2] || isspace(src[2])) )
	 		{
				i = atoi( skipspace(src += 2) );
				return i ? PP1 : PP0 ;
	 		}

		}
	}

 	return 0;
}

/*—————————————————————————————————*/

add_macro( src )
char *src;
{
 	/* Add a macro definition to the macro tree. “src” points at the
 	* first character in the name, not at the #define.
 	*/

 	int i;
 	char (*argp)[NAMEMAX];
 	char *p;
 	macro *mp, *root, **next;

 	if( !ISNAME( *src ) )
 	{
		err( “Illegal macro name: %s\n”, src );
		return;
 	}

 	if( !(mp = calloc(1, sizeof(macro))) )
 	{
		err( “Out of macro definition space\n” );
		exit( 1 );
 	}

 	src = extract_name( mp->name, src );

 	if( *src == ‘(‘ )
 	{
		/* process any arguments, adding their names to the macro
	 	* structure
	 	*/

		++src;

		for( argp = mp->arg, i = ARGMAX; —i >= 0 ; ++argp )
		{
	 		if( ISNAME(*src) )
	 		{
				++mp->nargs;
				src = extract_name( (char *)argp, src );
				src = skipspace(src);
	 		}

	 		if( *src == ‘,’ )
				src = skipspace( ++src);
	 		else
				break;
		
		}
		if( i < 0 )
		{
	 		err( “Too many arguments to macro %s\n”, mp->name );
	 		exit( 1 );
		}
		if( *src != ‘)’ )
		{
	 		err( “Macro %s: Malformed argument list\n”, mp->name );
	 		exit( 1 );
		}
		++src;
	}

 		/* Copy the body into the macro structure.
 		*/

 	src = skipspace( src);
 	for( p = mp->body, i = BODYMAX-1; *src && *src != ‘\n’ && i > 0 ;)
 	{
		*p++ = *src++;
		—i;
 	}
 	*p = ‘\0’;

 	/* add the new macro to the tree
 	*/

 	if( !Root )
		Root = mp;
 	else
	{
		for( root = Root; root ;)
		{
	 		if( !(i = strcmp( mp->name, root->name )) )
	 		{
				err( “Ignoring redefinition of macro %s\n”, mp->name );
				free( mp );
				break;
	 		}

	 		next = (i < 0) ? &root->left : &root->right ;
	 		if( !(root = *next) )
				*next = mp;
		}
	}
}

/*—————————————————————————————————*/

macro	*is_macro( name, len )
char	*name;
int	len;
{
 	/* If name is in the macro tree, return a pointer to it,
 	* otherwise return NULL. “name” need not be ‘\0’ terminated,
 	* “len” is the number of characters in it.
 	*/

 	macro *root;
 	int  i;
 	int  term;

 	term = name[ len ];
 	name[len] = ‘\0’;

 	for( root = Root; root ;)
 	{
		if( !(i = strcmp( name, root->name )) )
	 		break;

		root = (i < 0) ? root->left : root->right ;
 	}

 	name[len] = term;
 	return root;	/* name not in tree */
}

/*—————————————————————————————————*/

int lexcmp( s1, s2, s1_len )
char *s1, *s2;
int s1_len;
{
 	/* Like strcmp() but comparison of s1 stops at the “s1_len”th
 	* character
 	*/

 	int term, rval;

 	term		= s1[ s1_len ];
 	s1[s1_len]	= ‘\0’;
 	rval		= strcmp( s1, s2 );
 	s1[s1_len]	= term;
 	return rval;
}

/*—————————————————————————————————*/

print_macs( stream, root )
FILE	*stream;
macro	*root;
{
 	/* Print out all the macros in the tree using a standard inorder
 	* traversal.
 	*/

 	char (*argp)[NAMEMAX];
 	char *p;
 	int i;

 	if( !root )
		return;

 		print_macs( stream, root->left );

 		fprintf(stream, “\t——————>%s(“ /*)*/, root->name );
 	for( argp = root->arg, i = ARGMAX; —i >= 0 ;)
 	{
		if( !**argp )
			break;
		else
		{
	 		fprintf(stream, “%s“, (char *)argp );
	 		if( **++argp )
				fprintf(stream, “, “);
	 		else
				break;
		}

 	}
 	fprintf(stream, /*(*/ “) [%d args]\n”, root->nargs );
 	fprintf(stream, “\t%s\n”, root->body );

 	print_macs( stream, root->right );
}

/*—————————————————————————————————*/

char	*extract_name( dst, src )
char	*dst, *src;
{
 	/* Copy a name from src to dst, discarding surrounding whitespace.
 	* MAXNAME defines the max number of chars in the name. Return a
 	* pointer to the character following the name.
 	*/

 	int i;

 	i = NAMEMAX;
 	while( ISNAME(*src) )
 	{
 		if( —i >= 0 )
	 		*dst++ = *src++;
		else
	 		++src;
 	}

 	return src;
}

/*—————————————————————————————————*/

void	err( fmt, ... )
char	*fmt;
{
 	/* Print an error message. This routine works just like printf(),
 	* except that a line number and input file name are appended to
 	* the left of the string, and output goes to stderr rather than
 	* stdout. The ANSI variable-argument mechanism is used here. If
 	* your compiler doesn’t support this mechanism, you’ll have to
 	* replace all the err() calls with:
 	*
 	* fprintf( stderr, “%s, line %d: <your message goes here>”,
 	*			 Inp_file, Lineno, <your arguments> );
 	*
 	* A discussion of how to write a subroutine with a variable
 	* number of arguments is in Allen Holub, The C Companion
 	* (Engelwood Cliffs: Prentice Hall, 1987) p. 213f.
 	*/

 	va_list args;

 	va_start( args, fmt );

 	fprintf ( stderr, “%s, line %d: “, Inp_file, Lineno );
 	vfprintf( stderr, fmt, args );
}
/*————————————————————————————————*/

--End of Listing--
Please follow and like us:

About the Author