Apple II Basic Structure

by Steve Wozniak

Apple Computer, Inc.
(Reprinted from Dr. Dobbs Journal of Computer Calisthenics and Orhtodontia, Box E, Menlo Park, Ca. 94025, Issue No. 23)

An understanding of the internal representation of a BASIC program is necessary in order to develop RENUMBER and APPEND algorithms. Fig. 1 illustrate s the significant pointers for a program in memory. Variable and symbol table assignment begins at the location whose address is contained in the pointer LOMEM ($4A and $4B where ‘$’ , stands for hex).  This is $800 (2047) on the Apple II unless changed by the user with the LOMEM: Command.   A second pointer, PV (Variable Pointer, at $CC and $CD) contains the address of the location immediately following the last location allocated to variables.  PV is equal to LOMEM if no variables are actively assigned as is the case after a NEW, CLR, or LOMEM: command.  As variables are assigned, PC increases.

The BASIC program is stored beginning with the lowest numbered line at the location whose address is contained in the pointer PP (Program Pointer, at $CA and $CB).  The pointer HIMEM ($4C and $4D) contains the address of the location immediately following the last byte of the last line of the program.  This is normally the top of memory unless changed by the user with the HIMEM: command.  As the program grows, PP decreases.  PP is equal to HIMEM if there is no program in memory.  Adequate checks in the BASIC insure that PV never exceeds PP.  This in essence says that variables and programs are not permitted to overlap.

Lines of a BASIC program are not stored as they were originally entered (in ASCII) on the Apple II due to a pre-translation stage.  Internally, each line begins with a length byte which may serve as a link to the next line.  The length byte is immediately followed by a two-byte line number stored in binary, low-order byte first.  Line numbers range from 0 to 32767.  The line number is followed by items of various types, the final of which is an ‘end of line token ($01).  Refer to Figure 2.

Single bytes of value less than $80 (128) are ‘tokens’ generated by the translator. Each token stands for a fixed unit of text as required by the syntax of the language BASIC. Some stand for keywords such as PRINT or THEN while others stand for punctuation or operators such as ‘,’ or ‘+’. Integer constants are stored as three consecutive bytes. The first contains $BO – $B9 (ASCII ‘0’-‘9’) signifying that the next two contain a binary constant stored Low-order byte first. The line number itself is not preceded by $BO-$B9. All constants are in this form including Line number references such as 500 in the statement GOTO 500. Constants are always followed by a token. ALthough one or both bytes of a constant may be positive (Less than $80) they are not tokens.

Variable names are stored as consecutive ASCII characters with the high-order bit set. The first character is between $C1 and $OA (ASCII ‘A’-‘Z’), distinguishing names from constants.  All names are terminated by a token which is recognizable by a clear high-order bit.  The ‘$’ in string names such as A$ are treated as  token.

String constants are stored as a token of value $28 followed by ASCII text (with high ·order bits set) followed by a token of value $29. REM statements begin with the REM tokcn ($5D) followed by ASCII text (with high·order bits set) followed by the ‘end·of·line’ token.

Val J. Golding was the founder of Apple Pugetsound Program Library Exchange (A.P.P.L.E.) and served on it's board from 1978 to 1984. and 2002 to 2008. He passed away on 2 July 2008. He was one of the founders of the International Apple Corps, served as the editor of Call-A.P.P.L.E. Magazine as well as Apple Orchard and On Three Magazines.