Facilities for the graphical computer simulation of head ... - Springer Link

6 downloads 219698 Views 1MB Size Report
cal computer animations is demonstrated on the basis of recent software developments. Three different 3-D animation programs are introduced, which have been developed under ..... a microanalysis of real-life interactions accounting for even.
Belulvior Research Methods. Instruments, & Computers 1989. 21 (4), 455-462

Facilities for the graphical computer simulation of head and body movements GARY BENTE University of Duisburg, Duisburg, Federal Republic of Germany An indispensible precondition for graphical computer simulations of human body movement is the availability of an efficient coding and command language that can function as a link between man and machine. A high-resolution notation system, the Bernese System for Time-Series Notation of Human Movement Behavior, is described, and its capacity to generate scripts for graphical computer animations is demonstrated on the basis of recent software developments. Three different 3-D animation programs are introduced, which have been developed under MS-DOS using an interactive BASIC compiler as the software tool. The programs have been set up to serve different purposes in current nonverbal research: visual feedback during time-series notation, dynamic face-validity check of completed time-series protocols, and experimental simulations of dyadic interactions. In the last decade, computer animation has developed into a powerful tool for the modeling and simulation of complex processes in science, technology, and art. The possibility of retranslating numeric data protocols into a visible reality of changing, moving, and interacting objects has not only given a new impetus to many areas of creative work, but has also provided engineers and researchers with many useful instruments for fast feedback and face-validity testing in measurement, design, and construction. The possibilities, however, that result from the continuous improvement of processor technology seem to go far beyond the level of working facilities. Particularly in science, computer animation can be said to open the door for new types of investigation. Hut and Sussman (1987) have put it this way: "Computational experiments are enriching scientific investigation. They are now becoming as important as theory, observation, and laboratory experiments" (p. 145). Within the behavioral sciences, it seems quite natural to expect computer graphics animation to be of particular interest for researchers investigating the visual aspect of human behavior-especially its dynamic features, such as locomotions, gestures, body movements, facial activities, etc. And indeed the new possibilities in this field, which has traditionally been referred to as "nonverbal communication research," seem to be very promising. Computer animations of nonverbal behavior provide the unique chance to model even very complex and subtle communicative stimuli into digital worlds and to study their perceptual effects while exerting precise and reliable experimental control over intensities, contingencies, and covariations of their constituent elements.

Unfortunately, the actual practice in nonverbal communication research does not draw heavily upon these resources. Of the few systematic endeavors, most deal with the perceptual aspects of moving stimuli in a more general perspective (Cutting, 1978; Girard & Marciejewski, 1985; Johansson, 1973; Ramachandran, 1985). The reason for this neglect is most often discussed in terms of restrictions in the technical equipment, such as lack of object realism, speed and adequacy of animation algorithms, absence of a motion parallax, aliasing effects, and so forth (see Foley, 1987; Proffitt & Kaiser, 1986). Because technology in this field is developing rapidly, it seems worthwhile to focus on the impediments that lie in the domain of nonverbal communication research itself. A problem that has been virulent in the investigation of nonverbal behavior for a long time and whose solution is crucial for the success of computer animation in this field is movement notation. MOVEMENT NOTATION: THE MISSING LINK BETWEEN MAN AND MACIllNE We have little information about significant cues in nonverbal interaction or about the implicit grammars that rule their occurrence; therefore, simulations in this field can hardly be based on a "dictionary" of nonverbal expressions. Computer animations have to start from detailed descriptive data gathered in real-life situations. But only insofar as a raw-data collection arrives at a complete and accurate protocol of the observed behavior can the generated code be used as an instruction set for a realistic computer animation. Most coding strategies in nonverbal communication research fail to meet this aim. Frey, Hirsbrunner, Florin, Daw, and Crawford (1983) have concluded that in view of the enormous complexity and subtlety of nonverbal phenomena, "investigators have been more concerned with discarding behavioral infor-

This research was supported by the Deutsche Forschungsgemeinschaft (DFG) Grant Fr 697/1-1. Special hardware support received from mM Germany is gratefully acknowledged. Correspondence may be addressed to Gary Bente, FB 2, Psychologie, Universital Duisburg, D-4I00 Duisburg, Federal Republic of Germany.

455

Copyright 1989 Psychonomic Society, Inc.

456

BENTE

mation than with collecting it" (p. 148). As Hirsbrunner, Frey, and Crawford (1987) point out, three major coding strategies have been adopted by most researchers in this field (p. 101): 1. Classification of the vast number of visually different movement patterns into a few global categories whose definitions correspond to what the investigator believes to be relevant (generic coding). 2. Restriction of behavioral assessment to a small number of movements that are well defined, easy to observe, and difficult to mistake (restrictive coding). 3. Avoidance of behavioral notation through direct transformation of observations into psychological dimensions, or through ascription of a functional meaning to them (direct evaluation). It is quite evident that data resulting from any of these coding techniques could hardly serve as an instruction set for a graphical computer animation. Either they would leave the behavioral model underdetermined, or they would deliver ambiguous information that could not be interpreted by even the most "intelligent" machine. Alternatively, some authors propose to base movement notation on the biomechanical principles according to which motor activities are governed (see Wilhelms, 1987). Such genotypical coding procedures start with hierarchical models of 3-D objects (body parts), including definitions of joints, the degrees of freedom for positional changes of the various parts, and the forces and torques (muscle activities) acting on them. Animation programs based on biomechanical principles (Armstrong & Green, 1985; Wilhelms & Barsky, 1985) are gaining importance for applications in sports, ergonomics, and robotics (Lee, Gonzales, & Fu, 1983; Rohmert & Schaub, 1988; Singh, Beatty, & Ryman, 1983), but their usefulness in nonverbal communication research is rather limited. By providing universal construction rules, biomechanical principles may offer an efficient basis for the formulation of algorithms in animation. Their descriptive potential in the phase of raw-data collection, however, is minimal (see Wilhelms, 1987). No observer can decide from viewing a video record of a human interaction which muscles of a communicator are active, or to which angle a particular limb is rotated. Nor is it possible in most communication studies to place electrodes or optical markers on the subjects, which could serve as references for direct measurement. On the other hand, technologies that are able to transform video information directly into data protocols of 3-D actions are not yet available. An expedient escape from this dilemma can be seen in the use of phenotypical coding systems that represent a compromise among human observation capacities, highresolution description level, and computer readability. Phenotypic means that the code refers not to implicit construction principles of movements, but to perceivable aspects of the motor activity that can be documented on video tape. An elaborate phenotypical notation system, which has already proved its descriptive power in movement analysis and could close the gap between the hu-

man observer and the computer in simulation experiments, has been developed by Frey and collaborators (Frey et al., 1983; Frey, Hirsbrunner, Pool, & Daw, 1981; Frey, Jorns, & Daw, 1980; Frey & Pool, 1976; Hirsbrunner et al., 1987). Since this system, the Bernese System for Time-Series Notation of Movement Behavior, forms the basis of our undertakings a brief description is provided. TIME-SERIES NOTATION OF BODY MOVEMENT The Bernese system might be called an alphabet, designed to spell body language. Indeed the system owes its efficiency and high-resolution capacities to a principle that is used in alphabetic speech transcription. Hirsbrunner et al. (1987) describe this principle as follows: "The methodological basis on which alphabetic notation achieves a highly sophisticated protocol from just a few symbols is the principle of time-series notation. Instead of assigning a label to a complex vocal-temporal pattern, alphabetic writing systems resolve the stream of verbal behavior into two dimensions, a temporal dimension and a phonetic dimension" (p. 102). The Bemese research group successfully adopted this bivariate coding principle to the transcription of movement behavior. Just as speech can be conceptualized as a time series of sounds, movement can be conceputalized as a time series of positions, and just as speech can be transcribed as a sequence of sound symbols (letters), movement can be transcribed as a sequence of position codes. In contrast to the speech signal, however, which is emitted from one source only (the voice organ system), movement behavior is a multichannel activity, with various independent subsystems (body parts) that may fire virtually simultaneously. Consequently, the Bemese system is constructed as a kind of multidimensional alphabet, by means of which each source of variation is "spelled" separately in a position-time-series protocol. Figure 1 illustrates the code construction containing information about the various subsystems (body parts) included in the notation system, the number of dimensions (spatial degrees of freedom) that are distinguished within the subsystems, the scaling level on which coding is performed, and the number of scale units that are available to discriminate the positions. Except for the few dimensions that account for' 'touch" events, the coding is done on an ordinal scale level, assigning integer numbers of displacements or "flexions" from a predefined position. Figure 2 illustrates this principle for the coding of the head, differentiating among the three movement dimensions: sagittal (up/down tilt), rotational (right/left tum) and lateral (right/left tilt). For purposes of demonstration, the code resolution in this example is limited to five positional states for each dimension. As can be seen from Figure 1, actual coding practice refers to a higher resolution (at least 7 scale units per dimension). For practical reasons only, the labeling of the positions assigns odd and even numbers to indi-

COMPUTER SIMULAnON OF MOVEMENT BEHAVIOR BODY PART Head

Shoulders Trunk

TYPE OF MOVEMENT DEFINED BY DIMENSION

N OF CODED DIMENSIONS

DIMENSION

TYPE OF SCALE N OF UNITS

3

Sagittal Rotational Lateral

Ordinal / 7 Ordinal / 7 Ordinal / 7

Up/down tilt of head Left/right rotation of head Left/right tilt of head

Vertical Depth

Ordinal / 3 Ordinal / 3

Up/down shift of shoulder Forward/backward shift of shoulder

Sagittal Rotational Lateral

Ordinal / 7 Ordi na1 / 7 Ordi na1 / 7

Forward/backward tilt of trunk Left/right rotation of trunk Left/right tilt of trunk

2x2 3

Upper arms

2x3

Vertical Depth Touch

Ordi na1 / 8 Ordinal / 8 Nominal / 7

Up/down lift of upper arm Forward/backward shift of upper arm Upper arm contact with chair/body areas

Lower arms

2x3

Vertical Horizontal Depth

Ordi na1 /14 Ordi na 1 / 9 Ordi na1 / 8

Up/down shift of hand Left/right shift of hand Forward/backward shift of hand

Hands

2x8

X/Y orient. orient. Turn Closure: Thumb Forefinger Middlefinger Ring/little f. Touch

Ordinal / 9 Ordinal / 5 Ordinal / 5

Angle of hand in vertical plane Outward/inward sway of hand Up/down turn of palm

Z

Ordinal Ordinal Ordinal Ordinal Nominal

/ 3 / 3 / 3 / 3 /52

Bend i ng of thumb Bending of forefinger Bending of middlefinger Bending of ring/little finger Hand contact with objects or body areas

Upper legs

2x2+1

Vertical Horizontal Touch

Ordi na1 / 5 Ordi na1 / 5 Ordi na1 / 3

Up/down shift of upper leg Left/right shift of upper leg Contact between knees

Lower legs

2x3

Vertical Horizontal Depth

Ordi na1 / 9 Ordinal / 7 Ordinal / 7

Up/down shift of foot Left/right shift of foot Forward/backward shift of foot

Feet

2x4

Sagittal Rotational Lateral Touch

Ordinal Ordinal Ordinal Nominal

Horizontal Depth

Ordinal / 3 Ordinal / 3

Seating position

2

/ 7 / 7 / 7 /10

457

Up/down tilt from ankle Left/right rotation from ankle Left/right tilt from ankle Foot contact with objects/floor/body areas Left/right position on chair Front/back position on chair

Figure 1. Swnrnary from coding scheme for the Bernese System for Time-Series Notation of Movement Behavior (Frey, Bente, Fuchs, Preiswerk, Glatt, & Imhof, 1989).

cate flexions in one or the other direction. For computer analysis, these numbers are reordinalized as positive or negative deviations from zero and translated into angle degrees when fed into an animation process. The temporal resolution of the Bemese coding system is only limited by the resolution of the medium that carries the visual information. Since this is normally a video tape, the maximum resolution in the case of frame-byframe analysis is Y60 sec in the US (NTSC-standard) and Yso sec in Europe (PAL-standard), respectively; when us-

ing only full frames (every second picture in the interlacing mode), it is YJo and Y2s sec. Current coding practice, however, does not fully exhaust this resolution capacity. In most studies, position coding has been done in O.5-sec intervals, which seems to be sufficient to cover the frequency spectrum of most motor activities (Hirsbrunner et al., 1987). Assuming that only in a few cases do movements change direction or speed within a O.5-sec interval, the in-between positions can easily be recalculatedfor purposes of "key framing," for example. Since the

458

BENTE SAGITTAL

ROTATIONAL

LATERAL

Figure 2. Coding scheme for head positions in the sagittal, rotational, and lateral dimensions.

coding is done successively, dimension by dimension, the video material is prepared with a discrete time code, assigning a noninterchangeable label to each frame. This guarantees an accurate resynchronization of simultaneous movement activities in the time-series protocol. Detailed descriptions of the Bemese system, including notation principle, coding procedure, reliability and validity measures, and application examples are published in Frey et al. (1983) and in Hirsbrunner et al. (1987). Further empirical applications are described in Fisch, Frey, and Hirsbrunner (1983) and in Frey, Bente, Fuchs, Preiswerk, Glatt, and Imhof (1989). The descriptive accuracy of the Bemese coding system, has been impressively demonstrated by Frey and Pool (1976) in a static reconstruction task. The authors used a human model to place the various body parts into specific positions, which were read back from the data protocols of 40 different original subjects. The models were photographed, and their positions were coded again. Two thousand four hundred points (40 subjects x 60 coding dimensions) were compared on this basis. Over 98% of the positional codes were identical for originals and models. Because positions are the constituent elements of movement, it has been a central question for a long time whether the Bemese code could serve as a reliable data base for dynamic reconstruction tasks (animations) too. Three different 3-D-graphics programs are presented here for using Bemese time-series protocols as scripts for human body animation. 3-D FACILITIES FOR CODING AND SIMULATION OF HEAD AND BODY MOVEMENTS

This first step in software development focused on the animation of head movements. One of the programs pre-

sented here includes movements of the upper body. The lower body and the extremities will be added in the next step. All programs have been developed on the ffiM AT02 and tested for speed on the COMPAQ 386/16. True BASIC was used as the major software tool. True BASIC is an interactive BASIC compiler with comfortable graphics handling on all common graphic adapters and screen resolutions (CGA: color 320/200,640/200; HGC: mono 720/348; EGA: color 640/350, mono 640/350). It provides direct source-code portability to other computer systems (Apple MacIntosh, Atari ST, Commodore AMIGA). True BASIC also offers a special3-D-graphics library containing routines for windowing, scaling, camera setting, projection, and so forth. Together with inbuilt commands for matrix algebra and fast graphical block operations, this programming environment provides a powerful tool kit for the development of 3-D-animation programs. To increase calculation speed during on-line animation routines, the source code of the first program has been converted to Microsoft BASIC and recompiled using a standard BASIC compiler that generated a somewhat faster code. All of our programs use the same wire-frame model of a human head, defined by 80 spatial coordinates and 105 line connections, combined in 17 polygons. Program No.3 additionally uses 40 3-D points, 52 line con..'lections, and 14 polygons to display the upper body. The algorithms for animation, however, are different for the individual programs. They have been matched to the necessities of particular applications in our current research, which are: improvements of coder training and facilitation of time-series notations by means of visual feedback; dynamic face-validity check of time-series protocols; and, masking and script modification in personperception experiments. Program No.1 uses real-time animation to give direct visual feedback to the human coders during the processes

COMPUTER SIMULAnON OF MOVEMENT BEHAVIOR TIllE HEA Ill. SS.1Il SRL

81.811.811 81.811.58 81.111.811 11I.111.58 81.12.118 81.12.58 11I.13.88 81.Il3.58 81.114.118 81.114.58 81.15.88 81.15.58 81.Ilf>.811 81.Ilf>.58 81.87.811 88.117.58 81.118.88 81.118.58

111 .•• .. . .. . ••• . . . . . . . . . . .

rt LOAD

F2 SAVE F3 SIIOU

F4 TOGG

F5 QUIT

TItlE IIFJl III.SS.1Il SRL

F1 LOAD

37{, ••••• •. IlIl.58

F3 SIIOU

5 ..

•.111.• .4. ••111.58 ... •.82.1lIl •.82.511 ... •.113.• ... •.83.58 ... •.IH.lIB ... •.84.511 , •.85.118 ... 118.85.58 ... 1lIl.Ilf>.lIB ." 811.1lf> .511 ... 118.117.1lIl ... BIl.1I7.58 ... BIl.IllI.lllI ... BIl.IllI.58 ...

459

FZ SAVE

F4 TOGe

F5 QUIT

...

..

TIllE HEA III.SS.1Il SRL

rt LOAD

DA1

F1 TOGG

81.1llI.1lIl IlIl.BIl.58 1llI.81.BIl 1llI. 111. 58

D56

FOO

F2 SAllE F3 SIIOU

F5 QUIT

.69 III.BZ.1I8 ."

llIl,lIZ.58 81.83.BIl 1lIl.83.58 IlIl.M.BIl BIl.IH.511 81.I5.BB BB.I5.511 118.IJl,.BB 1II,1Jl,.511 BB.87.BB BB.87.511 BB.88.BB 88,88.511

...

... ... ...

... .. ... .. , ,

...

...

... ... ...

Figure 3. A sequence of three hard copies taken from a program for direct feedback during time-series notation of head movements. Data is entered according to the Bemese coding system in the editing window on the left. Within ':4 sec, the wire-frame model of the head is brought into the corresponding position.

of training and notation. Figure 3 shows a series of three display hard copies taken at different moments in a coding process. On the left side of the screen, the program provides an editing window with a prepared time code. Onset and resolution of the time code can be interactively

determined when starting the program and thus matched to the timer onset of the video record. The data in the editing window can be scrolled forward and backward. Using a resolution ofO.5-sec intervals, up to IO min can be stored in the RAM. The right side of

460

BENTE

the screen contains the command window with different function-key options. The first two, LOAD and SAVE, control the data exchange with disk or hard disk. The command SHOW permits a dynamic display as a series of all the positional states in the editing window. The command TOGG provides a toggle key influencing the mode of cursor movement. It switches between the "normal" mode (i.e., automatic proceeding after a digital entry) and the "explicit" mode (i.e., the cursor rests on a position until moved by the arrow keys on the cursor pad). QUIT ends the session. The display window in the middle of the screen shows the wire-frame model of the head and a 2-D overlay of a static body, which has been generated with a graphics scanner. Whenever a number is typed in by the coder according to the Bemese coding conventions, the head is directly moved to the corresponding position. Ordinal scaling is transformed into angle degrees (the ratio of scale units and angle degrees can be interactively determined when starting the program). Entries for the sagittal, rotational, and lateral displacements are used as information for X-Y-Z rotations of all points in the object's coordinate matrix. The rotated matrix is passed to a 3-Ddrawing routine. The whole process takes about 250 msec on the COMPAQ 386 and about 500 msec on the IBM AT02, both with an 80287 arithmetic coprocessor, if an EGA graphics adapter is used in the high-resolution color mode. The calculation without drawing takes only 100 msec on the COMPAQ 386. A faster monochrome graphics adapter (e.g., HGC) can speed up the process remarkably. Program No. I is mainly used for feedback during timeseries notation of head movements, enabling the coder to check the validity of an assigned label by comparing the induced movement of the wire-frame model with the original's activity on the video screen. Since the animation window can be split to show two independent heads, it also can be used for standardized coder training, with one head performing predefined movements while the second has to be matched by the coder's numerical entries. Program No.2 uses an off-line animation logic. It consists of two independent modules. The first module performs calculation, drawing, and storing of the frames that are to be used in the animation process. The second loads these pictures from hard disk following the script of a time-series protocol and displays them on the screen. Because the display time is independent from the duration of the calculation routine, the 3-D model can have any complexity. Because the pictures are not "drawn" line by line in the animation phase, but rather "shown" by fast graphical block operations, no flickering occurs, and movement appears smooth and continuous. Frames, however, require mass storage. Working, for example, with a resolution of 7 positions per head dimension, 243 (7 X 7 X 7) frames have to be stored. Using a display window size of 3 X 3 in. and the color EGA-mode, this whole picture database requires about 3 MB on a hard disk. Refining the resolution or adding other body parts would increase this number exponentially. However, since this

program is used especially for the purpose of reanimation of completed protocols, the frame base can be created for each data set from new, only taking into account those configurations that really occur. Thus the number of frames can be only as large as the number of observations, even using a high resolution and coding all body parts. Standard situations in well-designed experiments based on the Bernese system last about 3 min. Using a coding interval of 0.5 sec, the maximum number of frames would be 360. Since the visualization of the frames via block moves can be speeded up to 25 pictures/sec (monochrome), the dynamic display timing is only dependent on the access time of the mass storage. Fast harddisk drives (mean access time < 40 msec) can be used directly when working with display intervals of 0.5 sec. Smoother animations with "in-betweening" of the original codes would require the installation of RAM disks or special hard-disk controllers. Figure 4 shows a hard copy of the program's display during an animation phase. In this example, dyadic interactions between doctors and patients in psychotherapeutic sessions were analyzed, focusing on the effects of different therapeutic interventions on the nonverbal behavior of the patient. The left model represents the doctor, the right one the patient. As in the original video document, the interlocutors are both shown in a direct, frontal view, in a split-screen mode. On the bottom of each window, three digits are displayed, which represent the Bemese code for the head dimensions sagittal, rotational, and lateral. In the middle of this line, a timer is faded in, which can be synchronized with the video timer on the tape recordings. Thus, each phenomenon can easily be traced back to the original material. The animation can be stopped in any phase, and the dynamic display can be moved back and forth interactively to study specific movement patterns. At this moment, this program is used in social perception studies, in which observers have to identify the social role and status of computer-animated interlocutors. Program No. 3 is a buffered on-line animation routine including head and trunk. The two body parts are or-

Figure 4. Screen hard copy of a program for the reanimation of time-series protocols of dyadic interactions.

COMPUTER SIMULATION OF MOVEMENT BEHAVIOR IlEIlORY: 216352 FRAI1ES: 8

TillE HEA mu IlIl.SS.fIl SRL SRL 88.88.88 111 111 88.88.58 . .. ... 88 .81.88 ... ... 88 .81.58 ... ... 811.82.1IlI 376 2.. 1IlI.1l2.58 ... ... 1IlI.1I3.1IlI ... '" 1IlI.1l3.58 ... ... 1IlI.1H.1IlI 1IlI.1H.58 '" 1IlI.1I5.1IlI ... ... 88.115.58 ... ... 1IlI.1l6.88 ... ... 1IlI.1l6.58 ... ... 1IlI.87.88 ... ... 1IlI.87.58 ... 1IlI.1lll.1IlI ... 1IlI.1lll.58

...

461

~\\