The basics of keyframe animation in VRML are described. We explore the aesthetic and technical considerations that an animator proficient with a commercial animation system must adapt to when animating for a VRML browser. The various sensor mechanisms that VRML provides for capturing user input are discussed, and simple uses of Script nodes are shown to convert user input into events suitable for triggering animation. More complex scripts can be used to make transitions between animation riffs for responsive character animation. Finally, we survey the possibilities for fully runtime-computed animations.
As VRML worlds become ubiquitous on the World Wide Web, there will be great demand for professional animators to use their skills to create compelling content. Those who are working today with any of the commercial animation systems to produce film and video work will be able to transfer most of their expertise to the new environment. But there are important differences between rendered animation and interactive animation; new challenges to be met and new opportunities to create animation with custom responses to the user's input.
In this paper we consider how traditional keyframe animation can be represented in VRML, explore the different user experience of animation in an interactive browser and how it affects an animator's tasks, and look at how to use VRML's advanced capabilities to create more dynamic, interactive content.
We assume the reader has some familiarity with VRML syntax. Consult the following for more information:
All of the example worlds mentioned in this paper in square brackets (e.g. [oneshot.wrl]) can be found in full at this web site.
Straightforward keyframe animation in VRML can be represented with nodes and routes such as those in Figure 1. The sensor detects some input in the scene, such as a click on an object. The optional Script node handles any logic or processing needed to determine when or how to start the animation. The TimeSensor starts the animation when it is triggered by a change to its startTime input, and begins producing a fraction output, which is 0.0 at the beginning, and 1.0 when cycleInterval seconds have elapsed. The fraction is fanned out to a collection of interpolators which store the keyframes, and produce output values of the appropriate type depending on what type of field is to be animated (for example, a PositionInterpolator stores and outputs SFVec3f values suitable for animating the translation or scale fields of a Transform).
Conceptually, it helps to think of an animation as a triggering and logic unit, a single TimeSensor, and the set of interpolators and objects driven from that TimeSensor.
Fig. 1 - VRML representation of keyframe animation
In Figure 2, we see the simplest possible VRML keyframe animation following the pattern of Figure 1, without the optional Script. A Box is animated to travel to 3 different positions over a two second interval, triggered by a click on its own geometry:
#VRML V2.0 utf8 DEF Cube Transform { translation 0 0 -5 children [ DEF CubeTouch TouchSensor { } DEF CubeTimer TimeSensor { cycleInterval 2.0 } DEF CubePositionInterp PositionInterpolator { key [0, .5, 1] keyValue [0 0 -5, 5 0 -5, 5 5 -5] } Shape { geometry Box { } } ] } ROUTE CubeTouch.touchTime TO CubeTimer.set_startTime ROUTE CubeTimer.fraction_changed TO CubePositionInterp.set_fraction ROUTE CubePositionInterp.value_changed TO Cube.set_translation
Fig. 2 - simplest possible keyframe animation [cube.wrl]
Although the most common targets for animation are the fields of a Transform, any eventIn or exposedField in any type of node can be animated, e.g. the coordinates of an IndexedFaceSet, any of the colors of a Material, the position and orientation of a Viewpoint or any of the Light types, or even the pitch of an AudioClip.
All of the VRML interpolator nodes do linear interpolation between keyframes. However, a clever authoring system will provide splined interpolation [KOCH84], and translate it to a collection of linear keyframes in the published VRML file.
Animation for film and video production results in final frames where the animator has precise control over the viewing parameters. Animators often take advantage of this control to reduce the amount and precision of animation that must be done, or to alter an effect so it looks great from the known viewpoint. (This is known as "cheating" an effect). For example in a fight scene between two characters, the action might be staged so that the knockout punch swerves directly towards the camera, and the arm might be moved in a way that isn't even physically realistic, but looks good from that particular point of view.
In contrast, when a world is viewed with a VRML browser, the user normally has complete freedom to roam the camera around the scene and view the animation from any angle. It is difficult to do a complex character animation that is equally artistically successful from all points of view. That punch which was cheated before might look very bad when viewed from a different angle.
VRML ameliorates this problem by providing a set of Viewpoints that the user can select from a list in the browser, but the user may continue to move once they are at a preferred viewpoint, so ultimately the control is not in the author's hands. Often walls, floors and ceilings, and other obscuring objects can be used to limit the range of views.
One other view-related detail to keep in mind: any world which will ever be viewed as a full page plugin, as opposed to embedded in an HTML page, will have its aspect ratio changed when the user resizes the window. Notice that such resizing does not non-uniformly scale the view, but it may expose different amounts of the breadth or height of the scene than previously planned for. Use of a wrapper HTML page with a fixed size for the plugin will avoid this issue.
Animators have always been able to count on animating to a particular frame rate, generally either 24 frames per second (fps) for film, or 30 fps for video work. Particular poses, such as the blink of an eye, the contact of a foot with the ground, or a flash of light, often last only for a single frame.
VRML animation is a very different beast. There are no guarantees about the frame rate that a particular browser will achieve with a particular world on a particular platform. There is not even a guarantee that the frame rate will remain constant over the course of a particular animation. (Some authoring systems may provide the illusion of frames as a convenient gridding mechanism, but they will not be respected at playback time in a browser).
The browser processes events and then renders each frame as quickly as it is able to. If interpolators are driven by a TimeSensor, they are guaranteed to complete in a fixed period of time, the cycleInterval. The advantage of this definition is that animation can be synchronized with audio or other fixed time-base media. But it puts additional artistic constraints on the animation. Here are some tips for working within those constraints:
For an animation that absolutely must be sampled at fixed points, the following technique is useful. Make a TimeSensor with loop set to TRUE; this will output a time event at every rendered frame. Route that to a script which simply counts frames, knows the total number of frames in the animation, and outputs a suitable fraction, stopping the TimeSensor at the end. Remember, this only works when the length of time it takes to play the animation, varying frame rates, and synchronization with other media are not important.
Instead of the linear story line of a film or video, interactive worlds are usually designed to have multiple possible paths which the user can follow. There may be many different animations which can be triggered at different times. In section 5, we discuss some of the techniques which can be used to compose animations which operate sequentially or in parallel on the same fields of an object. But the larger issue is that the author must be responsible for the semantic correctness of all possible states that a user can experience in their world; testing a complex VRML world is similar to testing a video game.
VRML offers the possibility of combining keyframe animation techniques with interactive input. This can range from triggering a canned animation from a simple user input to combining, layering, and sequencing animations, realtime simulations, or even sophisticated autonomous behaviors for animated characters. All of these techniques require use of a Script node, which must be programmed in a language acceptable to the target browser. All of the examples below are in VrmlScript [no printed documentation is available as of this writing, but http://vrml.sgi.com/moving-worlds/spec/vrmlscript.html has the official proposal]. As the behaviors become more complex, there may be a need for collaboration between animators and programmers to achieve the desired results.
An animation can be started by routing any SFTime eventOut to the startTime field of the TimeSensor. The following native VRML node types have such an output:
Although it is possible to ROUTE any of the native trigger events directly to a TimeSensor, it is usually necessary to use at least a simple Script to get the desired behavior. For example, consider a TouchSensor directly wired to start the animation of a door opening when a doorbell is pressed. If the user presses on the button a second time after the animation completes, it will jump to the beginning and play over again! (Nothing will happen if they click again during the first animation because active TimeSensors ignore startTime events). If this is not the intended behavior, one can use a simple script to make the door open only once. (This is commonly referred to as "one-shot" behavior, after similar functionality in logic gates).
... DEF DoorbellTouch TouchSensor {} DEF DoorOpenTimer TimeSensor { cycleTime 3 } DEF OneShot Script { eventOut SFTime startTime eventIn SFTime touchTime field SFBool fired FALSE url "vrmlscript: function touchTime(value, time) { if (! fired) { startTime = value; fired = TRUE; } } " } ... ROUTE DoorbellTouch.touchTime TO OneShot.touchTime ROUTE OneShot.startTime TO DoorOpenTimer.set_startTime ROUTE DoorOpenTimer.fraction_changed TO DoorRot.set_fraction ROUTE DoorRot.value_changed TO Door.set_rotation
Fig. 3 - using a script for simple trigger logic [oneshot.wrl]
(For compactness, the examples will show only the most interesting nodes and routes in the scene; consult the online examples for full detail). The OneShot script simply remembers whether it has already been triggered, using a boolean field named fired, and uses that information to decide whether to pass the TouchSensor's touchTime on through to DoorOpenTimer to start the animation.
If more than one OneShot is needed in a given scene, there must be a separate copy for each use, or the script muse be encapsulated in a PROTO definition. It will not work to DEF and USE the same Script node, since that instantiates the same object, just at different places in the scene graph. The result would be that a click on any of the TouchSensors would start all of the animations at the same time, but only once!
Another common use of this script is to start an animation when the world is initially loaded. A TimeSensor with stopTime <= startTime and loop set to TRUE will begin firing immediately; wire its output to a one-shot script to create a world-entry detector. For efficiency, the script should also disable the TimeSensor so it doesn't continue to send an event every frame. Here are some other possibilities for using simple trigger logic:
Some browsers do a short (~2 second) animation from the current Viewpoint to the next Viewpoint when the user selects from a dashboard list or Anchor. Unfortunately, the bindTime event is sent immediately, so if it is used to trigger an animation of the fields in the new Viewpoint, it will be composed with the automatic animation, and the result will probably not be what was intended. The suggested solution is to use the bindTime event in conjunction with a small ProximitySensor to determine when the user actually ends up at the position of the new Viewpoint, and a script to keep the sensor from triggering again until the next bind to the same Viewpoint. A PROTO can be used to encapsulate this common idiom.
The triggers we discussed above all send out discrete events when some condition in the scene becomes true. But VRML also offers three additional sensor types which can be used to get continuous input as the user clicks and drags the mouse over their sibling geometry:
These sensors cannot directly trigger a TimeSensor, because they do not output an SFTime event. But they can be used to build interactive gadgets in a scene, such as doorknobs, levers or sliders that the user can actually operate. The example below shows an excerpt from a world with a doorknob that the user can twist; the door opens when the handle passes a certain rotation angle, and closes when it goes back the other way:
... DEF Door Transform { children [ DEF DoorLogic Script { eventOut SFTime startClose eventOut SFTime startOpen eventIn SFRotation handleRotation field SFBool isOpen FALSE url "vrmlscript: function handleRotation(value, time) { // If the handle is rotated past 60 degrees, open door if (value[3] > 1.04 && ! isOpen) { startOpen = time; isOpen = TRUE; // If the handle is rotated back near 0 degrees, // close door } else if (value[3] < .1 && isOpen) { startClose = time; isOpen = FALSE; } } " } DEF HandleGroup Transform { children [ DEF DoorSensor CylinderSensor { minAngle 0 maxAngle 1.57 } DEF Handle Transform { ... geometry for door handle ... } ] } ... rest of door geometry ... ] } ROUTE DoorSensor.rotation_changed TO Handle.set_rotation ROUTE Handle.rotation_changed TO DoorLogic.handleRotation ROUTE DoorLogic.startOpen TO DoorOpenTimer.set_startTime ROUTE DoorLogic.startClose TO DoorCloseTimer.set_startTime
Fig 4. - Script to simulate a doorknob [door.wrl]
One consideration when using the CylinderSensor is that it maps the user's dragging into a rotation on a virtual cylinder centered at the local origin, and oriented along the local Y axis. The best way to work with it is to build the sensing geometry at the origin, with its intended axis of rotation aligned with Y, so that its local Transform (Handle in the example) is zeroed out, then group in the CylinderSensor and route from its rotation_changed output to the local Transform's set_rotation field, then add a Transform (HandleGroup) above the whole unit to position, scale, and orient it in the scene. Similar considerations apply when using the other drag sensors.
Here are some other ideas for ways to use drag sensors in combination with keyframe animation:
For continuously looping animation, such as character idle cycles or background ambient animation, unvarying repetition can become boringly mechanical. It is often desirable to introduce pseudo-random variations in the speed or amplitude of these cycles. One way to do this is to insert a Script node between the TimeSensor and the interpolator(s). This script can alter the fraction by adding a random component, or synthesize the fraction from scratch with a mathematical function of time. With clever programming, many effects such as pauses in the cycle or chaotic behavior can be achieved.
For example, a frog character might be expected to breathe at somewhat irregular intervals, and to expand its chest more fully on some breaths than others. A CoordinateInterpolator with the exhale keyframe at fraction = 0.0 and a maximally expanded chest at fraction = 1.0, driven by an out-of-phase sum of sine and cosine waves, and clamped to the range 0-1 will work well. The frog will pause nicely between breaths while the waveform travels below 0.
For character or avatar animation, a common technique is to have a number of canned animations ("riffs") to express different moods or actions. Lets assume we already have a script that can determine which animation or animations should be active, based on whatever collection of state information is available. The next question is how to make smooth transitions between riffs, and possibly how to combine multiple riffs at the same time. A range of strategies are available, discussed below in order of implementation complexity. For simplicity, we will assume the example of a robot with the following riffs: Walk, Run, Kick, Nod, and LookLeft. Each of the gestures returns whatever Transforms it controls back to the rest position at the end of each cycle. Walk and Run are really in place motions; some other animation or script is assumed to be controlling the robot's overall position and orientation in the world.
The simplest case is when the animations aren't really for the same node. Although both Walk and Nod affect parts of the robot hierarchy, they are really completely different Transforms that can be run either in sequence or even at the same time, and no additional logic is needed to reconcile them. (Assuming the robot doesn't bend at the neck while walking).
Next consider Nod and LookLeft. They both affect the neck joint, so if they are to be played at the same time, something must be done to add them together. The easiest solution is to simply add an extra Transform layer at the next level. The parent Transform will be animated for LookLeft, and the child Transform for Nod. That way, Nod will operate about the correct local axis, even if the head is turned.
Now consider Walk, Run, and Kick. They certainly all affect the leg joints, and adding them together will not produce the desired effect: the robot can't kick while in the middle of a dead run. A simple, but low quality solution is to simply do a jump cut. Send a stopTime event to the current animation, and a startTime to the next, and live with the discontinuities. This may be acceptable in some situations. ([robot.wrl] demonstrates this and the the previous two techniques.)
A better effect can be achieved by having the script remember what the desired new state is, wait until the current animation finishes a cycle and returns to the rest position, and then performing the jump cut. The cycleTime output from the TimeSensor can be used to determine when to make the switch. The discontinuity is avoided, but the reaction is delayed until the cycle ends; when the cycle is short, this may work quite well. This solution is often used in video fighting games. (See [state.wrl]).
Better still, but much more difficult to implement, is transitional interpolation. Assume the animation is part way through a Walk cycle, and the script wants to switch to Run. The script starts Run using the same startTime as it did for Walk, and uses the rest of the cycle to interpolate from 100% of the Walk interpolator output values and 0% of Run, to 100% of Run and 0% of Walk. At the end of the cycle, it switches completely to Run. The reason this is painful to implement for multi-jointed animations is that the script must intercept all of the interpolator outputs from each animation, take a weighted average of them, and route the results to the destination fields. See [interp.wrl]. Note that this technique might not be very appropriate for a transition from Walk to Kick (for example), because the Kick may only be visually correct if it is played starting fully from the rest position.
Building on the previous idea, one could also do a continuous interpolation between different animations. Instead of switching between Walk and Run strides at a fixed threshold, the script could continuously interpolate between the two based on velocity. For example it could do 100% Walk when moving at or below 5 meters/second, 100% Run when moving over 15 meters/second, and a smooth combination of the two at intermediate speeds. See [cinterp.wrl]. The goal of such an approach is to integrate artistic, expressively posed animations within the parametric performance of programmed play.
In addition to keyframe animation, it is possible to do entirely runtime computed animation in VRML. With a sufficiently clever script, the only limitations are imagination, programming ability, and most importantly, execution time. The list below is only a starting point.
VRML 2.0 is in its infancy as a standard on the web. As it matures, many new animation styles and techniques will be developed. Because VRML supports a wide range of media, and sophisticated scripting and interaction, there is the potential for unprecedented collaboration between graphic and user interface designers, directors, modelers, animators and programmers. We hope the ideas in the paper will be a useful starting point for people who have been working in traditional production, and we look forward to seeing amazing new worlds.
The author would like to thank Rob Myers, who contributed whole topics to this paper, Paul Strauss and Rich Gossweiler for the helpful discussions, David Story, Rick Pasetto and Daniel Woods for the critical reading, and the entire Starfish / Cosmo group at Silicon Graphics for developing great VRML tools.
[BARA96] Baraff, David, "Linear-Time Dynamics using Lagrange Multipliers", SIGGRAPH '96, pp. 137-146.
[BARZ88] Barzel, Ronen and Barr, A. H., "A Modeling System Based on Dynamic Constraints", SIGGRAPH '88, pp. 179-188.
[HART96], Hartman, Jed and Wernecke, Josie, The VRML 2.0 Handbook, Addison-Wesley, Reading, MA, 1996 , 412 pp.
[HAUM88] Haumann, D. and Parent, R., "The Behavioral Test-Bed: Obtaining Complex Behavior from Simple Rules", The Visual Computer, Vol. 4, No.6, 1988, pp. 332-347.
[KOCH84] Kochanek, D and Bartels, R.,"Interpolating Splines with Local Tension, Continuity and Bias Control," SIGGRAPH '84, pp. 33-42.
[KUDE92] Kuder, Karen, "Using Inverse Kinematics to Position Articulated Figures", Proceedings of the 1992 Western Computer Graphics Symposium, p. 121.
[PHIL91] Phillips, Cary and Badler, Norman I., "Interactive Behaviors for Bipedal Articulated Figures", SIGGRAPH '91, pp. 359-362.
[POTT73] Potter, David, Computational Physics, John Wiley & Sons, New York, 1973.
[REEV83] Reeves, W., "Particle Systems - A Technique for Modeling a Class of Fuzzy Objects", SIGGRAPH '83, pp. 359-376.
[REYN87] Reynolds, C., "Flocks, Herds, and Schools: A Distributed Behavioral Model," SIGGRAPH '87, pp. 25-34.
[ROBE93] Robertson, Barbara, "Powerful Particles," Computer Graphics World, Vol. 16, No. 7, July 1993, pp. 40-48
[TERZ87] Terzopoulos, D., Platt, Isaacs, P. and Cohen, M., "Controlling Dynamic Simulation with Kinematic Constraints, Behavior Functions and Inverse Dynamics", SIGGRAPH '87, pp. 215-224.
[WEIL86] Weil, Jerry, "The Synthesis of Cloth Objects", SIGGRAPH '86, pp. 49-54.
[VAG96], Vrml Architecture Group, "VRML 2.0, The Virtual Reality Modeling Language Specification, ISO/IEC CD 14772", available at http://vag.vrml.org/VRML2.0/FINAL/.
[ZHAO89] Zhao, Jianmin and Badler, Norman I., "Real Time Inverse Kinematics with Joint Limits and Spatial Constraints", Technical Report MS-CIS-89-09, Dept. of Computer and Information Science, Universe of Pennsylvania, Philadelphia, PA, 1989.