Bright Star’s Talking Heads – Behind the Scenes with HyperAnimator

Elon Gaspar
President of Bright Star Technologies, and developer of HyperAnimator

Joseph Matthews
HyperAnimator Programmer

The future evolution of the Macintosh interface may well include something Apple calls an “anthropomorphic agent” — a software-based lifeform that would reside in your computer, talking and listening to you, acting for you at your behest. Such an agent would be a kind of artificially intelligent alter ego, a software version of yourself that would know something about your style, your passions, your work habits, that could guess the kind of information you need to complete a project and respond intelligently to your desires almost before you’ve expressed them. HyperAnimator, from Bright Star Technology, is a first attempt to create the interface for such an agent.
— Editor

While the Knowledge Navigator is still a dream of the future, HyperCard is reality and has already made a significant impact in the computing community. Bright Star Technology’s development tool, HyperAnimator, is a HyperCard product that supplements your creation of an “agent” interface.

Early applications of computer-augmented text editing and storage were limited to the linear organizational methods of paper-based systems. Hypertext can free the user from having to use words only in a sequential manner. Animation technology has undergone a similar evolution, from celluloid strips to videotape, and on to desktop presentation software, sequential methods have prevailed. HyperAnimation is the first general-purpose system for random access and display of images on a frame-by-frame basis, a system that is organized and synchronized with sounds. Like hypertext, HyperAnimation technology enables its users to transcend the limitations of linearity to create a whole new realm of possibilities.

HyperAnimation synchronizes animation and sound, its two main components. They are produced automatically and in real-time. (See Figure 1.)

Images ∆

Each synthetic actor is made up of sixteen images drawn using HyperCard’s paint tools, digitizations of live subjects, videotape recordings, or claymation-type sculptures. Of the sixteen images, eight are devoted to speaking and eight to animated expressions. While the technology can handle a virtually infinite number of images, sixteen enables use on systems with one megabyte of memory. Actors with sixteen images use a relatively small amount of storage and are easy to create and modify.

The eight speaking images correspond to distinct speech articulations and create acceptably realistic synthetic speaking actors. The other eight images allow the actors to display life-like expressions. Smiles, frowns, and head turns can all be incorporated into the actor’s appearance.

One of the real powers behind HyperAnimation is its ability to display any image at any time. This is necessary if your actor is required to speak any word. Suppose you created an actor to say “New York” using sequential animation techniques (frame by frame, as in a Disney movie). While the actor might appear to say “New York” properly, the sequence that you just created could never say “California.” A separate sequence for “California” would have to be created. Within HyperAnimation, once the images have been created, the animation is automatically presented and synchronized with sound enabling your actor to say anything you wish.

Sound ∆

Sound is the second key component of HyperAnimation technology. In the Macintosh environment, there are two approaches to sound: synthetic speech and digitized recording. HyperAnimation works with both.

Speech synthesizers such as Apple’s Macintalk offer unlimited vocabulary at a very low cost of memory. When using Macintalk with an actor, simply type in the text to be processed: the animation and sound synchronization are automatic. The text is first broken down into its phonetic components. From there, the sound corresponding to each phoneme is generated through the speaker, and the image corresponding to that same phoneme is simultaneously presented on the computer screen. The only drawback with speech synthesizers is that the sound produced is of low quality and has limited expressiveness.

Digitization is the recording of actual sound as numbers, so that it can be used on a computer. The Macintosh has virtually pioneered the use of digitized sound on personal computers. Sounds can be recorded on the Mac with digitizers such as MacRecorder from Farallon. With HyperAnimator, actors can speak with any digitized sound. To accomplish this, a synchronization lab was created for HyperAnimator and will be discussed in more detail later in this article. With digitized sound, your computer actor can speak with John Wayne’s voice, a dog’s growl, or even your own voice. The major drawback with digitized sound, however, is that it takes up more memory and disk space.

Because of the features and drawbacks of the two types of computer speech, it is important that actors be able to work with both. Users often prototype stacks using Macintalk and later replace the Macintalk with digitized recordings. The animation and sound synchronization that HyperAnimator provides is automatic and easy to use.

Creating an Interface ∆

Like the technology itself, the interface to HyperAnimator would have to be new and exciting. Making it easy to use was a primary guideline during its creation. (See Figure 2.)

Because the central focus of HyperAnimator was synthetic talking faces, it seemed only natural to introduce a stage or acting metaphor into HyperAnimator. It’s interesting to note that a movie or cinema metaphor fails because it lends itself better to sequential type of animation. The metaphor was also important because HyperAnimator was a new idea to be conveyed to users.

With the basic functionality of the stack described, we were able to concentrate and add detail to each area. The primary function of the stack was for the creation of the animated images — actors. We needed a place to test and save these actors. Another important feature was the ability to synchronize actors with digitized sound. We also wanted to offer an easy method for manipulating both the sounds and actors between stacks.

Figure 2: HyperAnimator Stack Design Layout

The Dressing Room ∆

The first place to apply our new theme was the Dressing Room, where actors in HyperAnimator are created. (See Figure 3.) The Dressing Room is the most important part of HyperAnimator because users and actors spend most of their time there. It contains 16 cards: one for each image of an actor. Sixteen buttons are provided to navigate among these cards, and each button contains the name of the image it represents. In HyperAnimator, an actor contains eight images for speaking and eight images for expressions. The names of the speaking images are the actual phonemes that the image represents. For example, one image is called “W” and corresponds to the sound “wuh” as in “wax,” the sound created by pursing your lips together. Expressions are numbered from A1 to A8. The Dressing Room buttons are useful because they reinforce the parts that make up an actor and how one is created.

Within the Dressing Room, the images of the actor has to be placed in a common area. This area is a 128 by 160 pixel rectangle. A simple rectangle could have been used to mark off the area but it would have been too dull. Instead, we introduced a painting easel. People understand what the easel represents and how to use it but the easel has a minor flaw: it doesn’t fit in with the acting metaphor. (Nobody’s perfect.)

One of the advantages of working in HyperCard is that we can use the existing paint tools for the creation of actors. Not only did this save us time —we didn’t have to create our own tools — but HyperAnimator is able to use the high quality tools created by Bill Atkinson himself. The only drawback with paint tools is that some people don’t like to draw. For these people, a Face Clip Art section is accessible through the Dressing Room.

The Face Clip Art ∆

The Face Clip Art section contains just about everything you need to create a synthetic person. (See Figure 4.) Eyes, ears, noses and even mouths for all of the lip positions are included within the clip art. The original drawback with clip art was that it was slightly cumbersome to use. First you select the lasso tool. Then select the art. Copy the picture. Select the browse tool. Click on the a button to return to the Dressing Room. Paste the picture. Move the picture. And finally choose the browse tool to complete the process.

Figure 4: Face Clip Art — the eye, ear, nose, and mouth depository.

To make this procedure a lot easier, a script was created that automatically copies the art when it is clicked on and pastes it into the Dressing Room at the location where the user indicates. The script relies on hidden fields large enough to include the piece of art. The coordinates of the field are then used for copying the clip art. The automatic clip art copying procedure is a good example of how powerful scripts can become.

XCMDs ∆

Scripts can be powerful, but you can’t do everything with HyperTalk and HyperCard. For this reason, various XCMDs had to be developed to actually create the actors from the Dressing Room.

One such XCMD was ScrapHandle, used in HyperAnimator to help take images off a card and place them into the actor’s resource file. With HyperCard’s paint tools, we can copy an area of a card and place the image into the clipboard. ScrapHandle is an assembly language XCMD that uses Toolbox calls to remember each image’s position on the application heap and then resets the clipboard. It passes the hex address of that memory location back to HyperCard as an ASCII string. The script copies the string into a variable where it is saved and later passed to the RAVE driver. Because of ScrapHandle, the driver knows where the images are in memory and can easily create the resource file which is the actor.

Other XCMDs similar to ScrapHandle are incorporated in HyperAnimator. Each allows for the transfer of information between HyperCard and the RAVE driver.

Another use for XCMDs was for modal dialog boxes. An interesting question about the user interface design was where to put it, so that the HyperCard/Macintosh Operating System user interface would be seamless. Although the modal dialog box interface could have been adapted and placed within cards in HyperCard, it would not have been a smooth integration. The dialog boxes allow users to easily select stacks and actors as they would files in any Macintosh application.

The Stage ∆

With the creation of these XCMDs, all that is required of the user is to click a single button to build an actor. The HyperAnimator goes through each of the 16 images and builds the actor into memory. Once

the actor is built, the user is taken to a Stage where the lip-sync animation of the actor can be observed.

Figure 5: The Stage is where you take a look at the animation you’ve created.

The Stage contains a field where the user can type in any sentence and watch the actor speak. The process uses Macintalk to pronounce the sentence and the animation is automatic. If the actor needs some work, the user can return to the Dressing Room for some touch-up. If the actor looks terrific, it can be saved to a stack for use.

The Casting Call ∆

Copying, Editing, and Deleting actors from stacks takes place in HyperAnimator’s Casting Call.

Figure 6: Copy, Edit, and Delete actors in HyperAnimator’s Casting Call.

Actors are manipulated like a document in any Macintosh application. Casting Call continues the acting metaphor by showing a little character to the side of a stage. He’s responsible for getting actors on and off the stage. The clipboard the stagehand holds is the same clipboard the user sees on screen. The three buttons for manipulating actors, Copy Actor, Place Actor in Dressing Room, and Delete Actor, are found on the stagehand’s clipboard and are used to manipulate actors.

The Sound Booth ∆

Because digitized sounds are an integral part of HyperAnimation, a feature for Copying and Deleting sound resources is also included. The Sound Booth parallels Casting Call using the same smooth integration to manipulate digitized sound. Digitized sound resources are used in Hyper-Animator’s Speech Sync Lab.

Copying and deleting sounds requires special XCMDs because we wanted to use standard Macintosh features. It was important to seamlessly integrate dialog boxes and HyperCard. The power of XCMDs allowed us to accomplish this goal.

The Speech Sync Lab ∆

Digitized sound is important in HyperAnimator because the technology is terrific at synchronizing sound with animation. While the synchronization to digitized sound is not as automatic as with Macintalk, the process of synchronization has been made so. We envisioned a little laboratory where digitized sound resources could be synchronized.

The Speech Sync Lab examines the sound and automatically creates a phonetic string used to create the animation and sound synchronization. The end result of the lab is a RECITE command that tells the RAVE driver the correct sound resource to use, and the phonetic string and associated timing value that produce the animation. To produce this value string, the user need only enter in the representative text for the recorded sound and click three buttons.

The Speech Sync Lab uses the representative text and converts it into phonetics for the user to see. The second step is for the user to select a sound resource. The RAVE driver compares the text with the sound file and creates an approximate RECITE command. Finally, clicking on the third button shows the sound-synced animation produced by the RECITE command.

Because the sync process is not 100 percent accurate, the user can modify the RECITE command manually. Once the RECITE command is finished, it can be copied and pasted directly into any stack’s script. The elegant design of the Speech Sync Lab is a tribute to HyperCard and the design possibilities it offers. The Speech Sync Lab is truly a new advance in sound/animation technology.

Tying it All Together ∆

With all the features implemented in HyperCard via art, scripts, and XCMDs, the final task remained tying them together with a central menu screen. The Menu Screen is very simple. It contains six buttons for easy navigation to HyperAnimator’s features. At the center of the screen is HyperAnimator’s Navigator. The Navigator serves as both a guide in HyperAnimator and an example of how synthetic agents can be used to enhance applications in HyperCard.

Throughout the design of the stack, special considerations were made for HyperCard and its world. We didn’t want to lock out the scripts of HyperAnimator because they themselves were examples of RAVE scripting. At the same time though, we had to protect users from accidentally modifying elements of the stack. For instance, in the Dressing Room the user needs to have access to the paint tools, or script level four. In Speech Sync, the user needs access to the Edit menu item: script level three. These are the only two sections in HyperAnimator that involve the menu bar and its features. The rest of the time, we want to minimize the user level so that accidents do not occur. For those individuals who want to modify the scripts, we placed global variables that represented the different user level settings. If a user wants HyperAnimator to be at user level five throughout its execution, she can change the variables to five in the stack script.

The RAVE Driver and It’s Scripting Language ∆

The magic that gives life to actors in HyperAnimator is contained in the RAVE driver. Placed in the system folder, the driver is responsible for animation and sound synchronization. It is accessed by HyperCard through its own XCMD.

Once an actor is created, it is controlled in a stack by scripts through the RAVE scripting language. The RAVE XCMD processes information between the scripts and the RAVE driver. Fifteen separate commands exist enabling users to open, close, move, hide, show, and speak. All of the on-screen animation is controlled by scripts in HyperCard through the RAVE language.

The RAVE scripting language was designed to work seamlessly with existing HyperTalk objects. Containers such as the message box, and variables can be attached to RAVE commands. HyperTalk is by nature a simple programming language. The scripting commands for RAVE are simple as well.

Example Script ∆

It’s easy to create scripts that use the RAVE scripting language. Let’s create a button that opens an actor and a field that has a script that makes the actor say whatever is in the field. Although this example is fairly simple, much more complicated scripts can be created taking advantage of the entire RAVE scripting language. The button script would look like this:

on mouseUp
RAVE “{ACTOR ELON at 192,36}”
RAVE “{SHOW ELON}”
end mouseUp

The button’s script loads the actor ELON into memory and specifies that he’s to be placed at the left, top coordinate: 192,36. The second line uses the SHOW command to make the actor visible. The field script would look like this:

on returnInField
RAVE me
end returnInField

Notice that the field’s script is very short and simple. The reserved word me refers to the information contained in our field. The script line, RAVE me, will pass this information to the RAVE driver and the actor ELON will automatically pronounce it.

Click on the button — the actor ELON pops onto the screen. Type a sentence into the field, hit RETURN, and the actor speaks the field’s text using Macintalk.

That’s all there is to it!

HyperAnimator Applied ∆

The success of any tool can be measured directly from the accomplishments of its users. Already, many individuals, businesses, and universities are utilizing actors to address education, demonstration, and training needs.

To date, the most visible of these applications is Albert, the co-star of Disney’s remake of the Absent-Minded Professor (see the picture of Albert on this page). Albert was created by Harry Anderson and Jay Johnson, using HyperAnimator. Harry Anderson plays the part of the Absent-Minded Professor in the show. Albert serves as his sidekick and native guide to a mythical database reminiscent of Apple’s Knowledge Navigator. The animation behind Albert on the Walt Disney series is controlled by HyperAnimator’s RAVE driver. Everything that is shown on the show exists today: the Macintosh, HyperCard, and HyperAnimator (OK, so Flubber isn’t exactly real).

Codex, a division of Motorola, commissioned a training stack which trains new employees using interactive talking agents in simulated office scenarios. The agents add to the realism of the role-playing scenario and as a result, enhance the stacks’ effectiveness and make learning fun.

Now and The Future ∆

Agents can provide a natural interface between people and computers. A young child smiles as her favorite cartoon character teaches her reading skills on a computer. A worker with minimal reading skills learns from a computer about how to use new equipment on the job without having to read a manual. A shopper at the mall is taken through an on-line database of sales and specials from a salesagent. All of this made possible through the human interaction provided by agents and Hyper-Animator.

While Apple’s Knowledge Navigator is a vision of the future, HyperAnimator is a technology of the present. The ability to see and use a bit of this vision in your own HyperCard creations is both exciting and enticing. The successful applications already accomplished using HyperAnimator lend to the credibility of synthetic agents and John Sculley’s prophetic vision.

The future is now. It’s time to bring together existing multimedia technologies: speech recognition, vast storage mediums such as CD ROM and video disk players, increased CPU speed and power, and HyperAnimation to weave this technology together and create a truly user-friendly interface, one that brings the power of the micro computer to all people, equally.

About the Authors ∆

Elon Gaspar is the president of Bright Star Technology. He is a specialist in interactive graphics and real-time programming for small computers. Elon previously was software development manager for a UCLA audiovisual department, and he has taught in the California State University System.

Joseph Matthews has been working as a programmer at Bright Star since its inception, and did a substantial amount of the programming for HyperAnimator. His most recent project was to develop SuperAnimator, a multi-window, color version of Bright Star’s program, developed in Silicon Beach Software’s SuperCard. Joseph recently graduated from the University of Illinois at Champagne/Urbana.