Designing Synthetic Memory Systems for Supporting ... - CiteSeerX

1 downloads 0 Views 285KB Size Report
tonomous agent memory systems that support real-time social behaviours relating to .... example, sensory memory, which necessarily contains a large amount of ...
Designing Synthetic Memory Systems for Supporting Autonomous Embodied Agent Behaviour Christopher Peters Abstract— We present a method for the creation of autonomous agent memory systems that support real-time social behaviours relating to perception and attention. Our memory design methodology is modular in nature: modules are connected by transformation operators representative of different forms of information processing and which fall into two general categories: information filters and integrators. Filters are selective and only retain relevant information, while integrators process information into more compact and high-level representations. We demonstrate the flexibility and feasibility of our methodology by elaborating two examples which have been implemented and shown successful for supporting fundamental agent social behaviours in a virtual environment.

I. INTRODUCTION In order to behave in a human-like manner, the next generation of truly autonomous virtual humanoid agents will have to be endowed with more sophisticated internal algorithms, encompassing capabilities such as perception, emotion, attention and memory. In doing so, many lessons can be learned and models adapted from the field of robotics (see for example, [1]), as is the case in this paper. We feel that if sensing and other processes do not parallel, even in a very broad sense, those of humans, then no matter how sophisticated those algorithms, the resulting behaviour will not be human-like. While most agents have memory buffers of some description, these are usually defined implicitly; the subject of explicit light-weight computational memory systems is not a topic that has been widely addressed in the virtual human community. Yet, when designing an agent that must be capable of a wide range of behaviours in a complex environment, the issue of memory structuring becomes important; if processes such as attention and emotion are to be considered, processes critical for social interaction, in our experience, a well structured and compatible memory system becomes vital ([2], [3], [4]). In this paper, we do not present a single memory system, since often such a system will depend on the requirements of the intended application and the variety of its input. Rather, we make a first attempt at presenting a methodology for the creation of memory systems that are compatible in a general manner with computational attention and emotion systems. In this way, we hope that computational memory, emotion and attention processes can be made complementary and capable of conducting important interactions, as is the case in This work has been funded by the Network of Excellence Humaine (Human-Machine Interaction Network on Emotion) IST-2002-2.3.1.6 Contract no. 507422 http://emotion-research.net/ C. Peters is with the LINC (Laboratoire d’Informatique, de genie iNdustriel et de Communication), IUT of Montreuil,University of Paris 8, 93100 Montreuil, France [email protected]

the real world, that are of great behavioural consequence. The systems created using our methodology are geared towards real-time embodied agents in virtual environment settings. II. RELATED WORK Of the papers concerning memory systems for embodied agents, most consider spatial memory based on visual input in order to aid navigation in virtual environments [5], [6], [7], [8], [9]. Evers and Musse [10] have used hierarchical finite state machines to provide an agent with a memory of its past behaviours. Strassner and Langer [11] present what is probably the most complete model of virtual human memory so far. As well as representing spatial, semantic and situated knowledge, they allow for the representation of multiple levels of knowledge detail. The work in this paper focuses on event-based memory that corresponds in a rough manner to the contents of Tulvings episodic memory [12], with the difference that we do not restrict the information to long-term memory. III. DESIGN CONCEPTS AND COMPONENTS We highlight four important concepts that are reflected throughout our memory design methodology. • Modular design: it should be possible for items to be partitioned in memory in order for efficient and specialised processing to take place. Different areas of memory will have different attributes and a functioning memory system may consist of many different specialised modules that do not easily fit into the categories, such as short-term and long-term. • Progressive levels of filtering: as information moves through memory, it should be filtered and selected so that only relevant information is made available to planning processes. In the design section of this paper (see Section III-A), we do not deal with the selection processes themselves (i.e. attention), but instead present a system where such processes can be easily plugged in, as demonstrated in the examples in Section IV. • Increasing levels of representation: later stages of memory should contain sparser, but more persistent, memories representing a broader amount of information over larger time intervals. This is in contrast with, for example, sensory memory, which necessarily contains a large amount of highly volatile information for the current perceptual update. • On-demand processing: the movement of information should not necessarily have to be staged so that information movement is controlled in a bottom-up manner

(although our methodology can be used to create such systems). Instead, higher level modules should have the ability to request operations, such as filtering and integration, on-demand from their lower level input modules, so as to be able to make the most up-to-date representations available to control processes. These concepts are of high importance in ensuring a well structured and useful real-time memory and processing system that can provide relevant information to control processes when required to do so. Next, we detail the basic components that are used to construct different memory configurations.





A. Design Components There are three main constituents to a memory system, which could be thought of as building blocks that can be used to create agent memory architectures. Memory entries make reference to the memorised information itself and also store management information. Collections of these entries are stored in memory modules, which specify common rules for contained entries and may also be used to segregate entries into certain categories e.g. memory entries for facial expression may be stored in one module, entries for gaze direction in another. Finally, memory transformation operators connect memory modules and define how memory entries move between the modules. B. Entries Memories are represented in our system as discrete constructs encapsulated by memory entries. A memory entry encapsulates two types of information: firstly, information about the content of memory itself, for example, the time and place at which an event occurred. Secondly, it stores metamemory information for management purposes. This metainformation may be used for example, for deciding when the entry should be deleted from the system. A memory entry contains six important attributes for management purposes: • Activation and decay factor are used to determine when a memory entry should be removed or forgotten. These attributes are used with a decay function, presented in Section III-E, to provide an estimation of the endurance of a memory entry. In the model presented here, activation is the amount of time since a memory entry was last accessed and the decay factor is used to determine how quickly a memory entry will decay once in memory; it can be thought of as the slope of memory decay. • Trace strength is the overall indication of the current strength of a memory entry and is a function of the decay factor and the activation of the memory entry. • Uncertainty is a value between 0.0 and 1.0 representing the completeness or totality of the agent’s internal representation of the stimulus or event in question. Essentially it is a simplified measurement representing how much is known about the stimulus, where a value of zero indicates that the agent has a full internal representation of the event.

Emotional valence is a metric indicating the emotion associated with the memory entry and has an influence on the encoding of the entry as well as its retrieval e.g. entries with the maximum distance from the neutral valence (value of 0) decay less rapidly than neutral valence entries. Rehearsal represents the amount that an existing representation has been practiced by an agent. It may be used to determine when a memory should be moved from one memory module to the next. Unlike uncertainty, rehearsal represents how much an existing memory has been practised rather than the totality of representation regarding that object. This means that an agent may rehearse memories without the need to be currently sensing them.

C. Modules A memory system is composed of multiple interconnected modules that store memory entries. The purpose of a modular design is to enable (a) partitioning of entries according to category and (b) to allow for hierarchical design regarding temporal aspects of storage. (a) can be used, for example, in constructing a specialised person memory, to which only information regarding other social entities in the environment will be sent. (b) can be used for hierarchical structuring of modules: those at the base of the hierarchy related to sensing store information for only one perceptual update, while those higher up the hierarchy may store information representing a longer duration. Each module is associated with a number of parameters defining how memory entry management takes place within that module: • •





Capacity refers to the maximum number of memory entries that the module can store at any one time. Removal Threshold is used to decide when memory entries should be removed from a memory module i.e. forgotten. While the trace strength of a memory entry is greater than this threshold, the item remains in memory and its recall probability is presumed to be 1. Memory entries are removed from a module when their trace strength falls below the removal threshold. Rehearsal threshold is used, under certain circumstances, to decide when an entry should be moved from one module to another. Maximum Duration is an indication of maximum amount of time that memory entries can persist for in a module, regardless of their trace strength. The primary use for this is to ensure that a module is cleared in time for an expected update, something of particular importance for a sensory buffer module which must take a large amount of data input at regular inputs (synchronised, for example, with perceptual updating of the agent). Duration can be disabled so that the persistence of objects is based solely on their trace strength.

D. Transformation Operators Memory transformation operators connect modules together and determine how entries move between two modules, referred to as the source and destination modules. There are two general transformation operators: • The Filter operator is selective: It only allows certain categories of objects to move from the source module to the destination. A filter operator is analogous to some form of attention selection mechanism, the exact implementation of which may be unique to that particular operator (see [2] for an example attention system). • The Integrate operator amalgamates a group of memory entries from the source module in order to create a single entry in the destination. This operator is of importance in providing higher-levels of representation between interconnected modules. An example use of this operator would be to integrate all of the entries in a source module relating to perceived gaze direction from time t1 to t3 in order to add a single event (e.g. other looked at me) into the destination module. Both of these operators are vital for the evolution of memory from a large amount of low-level sensory data at the input to a sparser set of higher-level representations indicative of events over a larger timescale for use by planning processes.

Fig. 1. Diagram illustrating the use of the design system to create the stage theory of memory. Information enters the STSS from the synthetic vision module. Objects are filtered from the STSS into the STM. From there, a select group may be further filtered into the LTM, where objects persist for a longer period of time. In the diagram, boxes are modules, arrows are processes operating on the stored data and circles contain memory transformation operators (in this case, all operators are filters).

is a good example of the integration of memories between memory stages to provide higher levels of representation. A. Atkinson and Shiffrin’s Modal Model

E. Forgetting This work simplifies many issues regarding forgetting: first of all, if an item is in memory at all, then it is presumed that the probability of recalling it is always 1. That is, an item that is in memory will always be successfully recalled. In this way, trace decay is used only to determine when a memory entry should be forgotten, as oppose to how successfully information is recalled and to what extent. Memory entries are presumed to decay according to an exponential decay function, as used in the short-term store of Atkinson and Shiffrin’s buffer model [13]. An entry is removed when its trace strength falls below the modules removal threshold. The exponential function is written as: Y = (A)e−BX , −B < 0

(1)

where Y is the recall probability, X is the retention interval and A and B are constants that define the shape of the curve. IV. DESIGN AND APPLICATION EXAMPLES In this Section, a description is given on how two models involving memory and perception have been implemented using the design components presented in Section III-A. In both of these examples, we use a synthetic vision module that provides sensory input into the first module of memory, a buffer analogous to visual sensory memory. The synthetic vision module is updated in a snapshot manner and outputs a list of object identifiers for those scene objects visible to the agent. Our implementation of the Atkinson and Shiffrin model provides a good example of filtering operations between stages, while our Baron Cohen implementation

Atkinson and Shiffrin’s stage theory of memory [13] has been particularly influential and, although it has been superceded by more detailed and accurate models of human memory, it still proves to be a popular and useful high-level framework, something of great utility in creating real-time autonomous character memory. According to stage theory, memory is composed of a number of separate memory stores. Various cognitive processes act on a number of items in each stage of memory, causing them to be either moved onto the next stage or else to be forgotten. Environmental input arrives at a transient sensory register, called the short-term sensory storage, or STSS. The STSS contains memory entries relating to objects that are in the agent’s view-frustum and that are not totally occluded by other objects. Selective attention is applied to items in this area, causing a number to be transferred into the next stage, which is called short-term memory, or STM. The STM is limited in both duration and capacity and relates to our thoughts at any time. Items are forgotten here due to both trace decay and capacity limitations. If items in short-term memory are rehearsed enough times, then they may enter the final stage of memory, long-term memory (LTM). This provides storage for information over a long period of time and capacity is thought to be infinite. 1) Event Representation: In this example, we restrict the definition of an event to the act of perceiving an object in the environment. There are two types of memory entries representing two different levels of elaboration of perceived stimuli. The first form of memory entry is called a proximal stimulus and stores only information derived from the visual

Proximal Stimulus Structure Member Meaning false-colour The stimulus false colour bounding box 2D bounding box of the stimulus size Size of the stimulus in pixels average colour The average colour of the stimulus TABLE I T HE PROXIMAL STIMULUS STRUCTURE

the other memory stores, there are no strict upper bounds on trace-strength in the LTM. If an item is frequently rehearsed in the STM, then it can remain easily accessible in the LTM. Depending on whether an object is represented in the STSS, STM and/or LTM, different processes may occur. We highlight a few of these in order to demonstrate the dynamics of memory. •

Member object ID object wsTransform time proximal stimulus

Observation Structure Meaning Unique identifier of the object Pointer to the object in the scene database World-space transformation of the object Time at which the observation was made Perceived stimulus information



TABLE II T HE OBSERVATION STRUCTURE

rendering of the scene: no information is retrieved yet from the object database about object type, geometry and so on (see Table I). Memory entries may also contain more detailed information about stimuli. Such observations (Table II) elaborate on proximal stimuli information by also storing information from the scene database about the object at the time it was perceived and provide a reference into the scene database in order to allow access to object type information. These are more complete representations of stimuli. 2) Model Overview and Dynamics: The short-term sensory storage, or STSS, is the first storage stage for stimuli that have passed through initial visual filtering (based on the field of view of the agent). Operationally, the STSS consists of a list of proximal stimuli. These proximal stimuli do not persist for very long between visual system updates, so objects in the STSS correspond to those currently in view, or in view very recently. The entries in the STSS correspond only to the 2D optical images deemed to be on the retina of the agent (see Fig. 1). Memory entries are removed, or forgotten, from the STM under two conditions: they are displaced by newer memories when the STM is full or they decay over time. Rehearsal occurs when attention is paid to a specific object over a period of time and is necessary for it to remain in the STM as well as move into the next memory stage. The rate of decay of information in the STM is lower than the STSS. Because the STM relates to our awareness of an object, memory entry rehearsal may only take place when an entry is in the STM. In a similar way, object uncertainty may only be reduced when it is in both the STSS, i.e. being looked at, and in the STM, i.e. is also being attended to. For the purposes of the current model, memories that can no longer be retrieved due to lack of sufficient cues are considered to be the same as decayed entries. Items are presumed to move from the STM to the LTM when they have been rehearsed to a sufficient degree. Unlike





A memory entry that exists only in the STSS signifies that an object was recently within the agent’s view field, but is not being attended to and has not been rehearsed before: the trace strength of the memory entry in the STSS therefore decays. When an item is present both in the STSS and in the STM, then it is or was recently in the view of the agent and is also being attended to by the agent. In this case, the trace strength and rehearsal values for the entry in the STM are increased. The uncertainty value for the entry is also decreased in the STM to represent new incoming sensory information that is being used to contribute to an improved internal representation of the item. Uncertainty is not reduced since no new sensory information is available. Memory entries that exist only in the LTM are subject to decay according to the decay function. Rehearsal and uncertainty parameters do not change since the agent is neither receiving new information about the object from the STSS nor paying attention to it through the STM. When memories exist for an object in all three stages, then the entry trace strength and rehearsal is increased. Object uncertainty is also reduced. This case represents a scenario where the agent is or has recently looked at a stimulus, is paying attention to it and has also previously rehearsed the stimulus and paid attention to it.

3) Application to Visual Attention Modelling: Visual attention and associated gaze behaviour is an important factor for social agents: It helps to convey a sense of presence and adds to the sense of realism of an agent, since most intelligent living creatures tend to have eyes that may be directed. An important factor in visual attention models is inhibition of return (IOR), where previous foci of attention may be suppressed so as to allow attention to focus on new locations of potential importance. This can be troublesome for spatial attention models operating in a dynamic environment where the viewpoint can change. We have used an object-based memory designed around Atkinson and Shiffrins modal model in order to facilitate IOR for a bottom-up visual attention system. The basic attention system directs the gaze of the agent based on image processing techniques that simulate early visual processing [14] in order to calculate regions of pop-out in the scene, encoded as a 2D saliency map. Memory is an important addition to this model, as it allows the agent to keep track of what objects it has looked at and to what extent. We use the object uncertainty factor (Section III-B) to keep track of how much each object has been attended to: the uncertainty of an object falls when it is present in both the STSS and STM, which can only

occur in our model when the object is being attended to. A retinotopic memory map is created from the uncertainty factors of all objects in the visual scene and is merged with the saliency map in order to create a final attention map, which drives the gaze behaviour of the agent. Thus, where an agent looks in the scene is based on those areas that have both a high uncertainty and salience: when the agent looks at an object for too long, the uncertainty drops, thus modulating the attention map and causing a switch of attention to the next most salient / uncertain region of the scene. B. Baron Cohen’s Theory of Mind Model Theory of mind research is concerned with the mechanisms involved in creating theories of the beliefs, goals and intentions of others. Baron-Cohen [15] has proposed a theory of mind model that relies heavily on perception and emphasises the evolutionary importance of gaze detection (the model has been extended by [16] - we include their extensions in our model). The main components of the model include an intentionality detector for detecting objects in the environment are deemed to be moving under their own volition; direction of attention detectors that search for eyes in the environment and establish their direction as well as the direction of the head, body and locomotion direction with respect to the agent; a mutual attention mechanism for detecting eye contact between the observed and the agent; and a theory of mind module for storing higher-level theories about the intention of the other based on interpretation of their behaviour. 1) Event Representation: Events in this scenario consist, at the lowest level of representation, of observations of the eye, head, body and locomotion directions of other agents. The next level of representation is an amalgamation of these into a single metric that corresponds to the perceived attention level at each perceptual update. The highest-level of representation is a metric that encapsulates the perceived attention levels over a time interval. 2) Model Overview and Dynamics: Agents are partitioned into eye, head and body segments. As in the previous example, a short-term sensory storage is linked to synthetic vision updates. The sensory storage module is connected to a person memory by a filter operator which allows only those objects relating to other agents pass into subsequent modules. This filter operator could be viewed as implementing the intentionality detector (ID). The first person memory module is called the intentionality detector storage, or IDS, and is a filtered version of the STSS containing only those objects that are parts of agent objects. It is cleared and updated with each perceptual snapshot and is connected to the direction of attention detector storage, or DADS: this has a relatively short-term duration, storing entries over the previous 10 seconds. It also transforms entries that arrive from the IDS by an integrate filter: in this case, by first calculating the direction in which the other is oriented with respect to the self and then integrating these relative directions into an metric called the attention level. Thus, the memory transformation operator implements the process of the DAD in Baron-Cohens model.

Fig. 2. Diagram of the memory design for supporting agent theory of mind for interaction initiation in virtual environments. Information enters the STSS from the synthetic vision module, which itself is a filter since it considers only objects in the agents visual field. It is filtered in order to extract agent-related entities into the IDS for further processing. The IDS information is integrated in order to provide a higher-level representation in the DADS. This data is integrated over a time interval and stored as a high level representation in the ToMM.

Fig. 3. Diagram illustrating higher-levels of representation. On the left, sensory information regarding the eyes, head and body is filtered from sensory memory into the IDS which accumulates the information for times t0, t1 and t2. At each time, information is integrated into an attention level and stored in the DADS module. Integration of the attention levels over a time interval (in this case, t0 → t2) provides the interest metric, stored in the ToMM module.

Attention level is an example of a higher-level representation, since it amalgamates several previous entries into one new entry. There is now one attention level associated with the configuration of an observed agent at each perceptual update. In order to derive a higher-level representation that considers the motion of the other over a time interval, the DADS is connected to the final module, the ToMM, by an integrate operation. This operation integrates, on-demand, multiple attention levels over a time interval into a single metric called the level of interest. The ToMM has a long-term duration and contains the highest level representations: in this example, the level of interest is a compact persistent representation of the directions of multiple segments of a perceived agent over a time interval. 3) Application to Social Memory of Gaze Direction: When designing social agents, an important factor is the perception of the direction of attention of others. If one

the associative network to provide semantic and relational information. The model presented here would fit in well with such a system as a dynamic memory area, with entries having links to the more static knowledge base or ontology. Another important part of the memory system is attention, which is complimentary; for example, memory can be used to modulate the focus of attention, while attention, in turn, can be used as a control process for selection of entries to filter, rehearse or integrate. VI. ACKNOWLEDGMENTS The author gratefully acknowledges the reviewers’ helpful comments. This work has been funded by the Network of Excellence Humaine, IST-2002-2.3.1.6 Contract no. 507422 http://emotion-research.net/ Fig. 4. Social memory for gaze direction in a virtual environment. Memory for eye, head and body directions shown in bottom left corner.

intends to start interaction, the amount of attention that the other is paying can be very important when making a decision as to whether they wish to interact or not. For example, the other may look away as a signal that they have no interest in interacting. When considering the amount of interest that the other is deemed to have, one must account for their attentive behaviours over a period of time, as the consideration of their segment orientations at only a single time frame may be misleading. Because of the potentially large amount of sensory data in the environment, this model first filters the data so as to partition memory for separate social processing. Since there may be a lot of low-level sensory data, the data here does not persist for long in the system, but instead is integrated into higherlevel representations that do persist for longer in proceeding memory modules. V. CONCLUSIONS AND FUTURE WORKS A. Conclusions We have presented a design methodology for the construction of real-time autonomous agent memory systems. Key features of the methodology include successive filtering and integration between memory modules in order to provide higher-level sparser representations from large amounts of low-level data. We have demonstrated the use of this methodology in designing two fully implemented memory systems - the first as a modulator for a computational attention model to ensure that characters continue to explore new parts of the scene, and the second as a way of allowing a character to have a social memory of the direction of attention behaviours of other agents in the environment in order to interpret their attention and interest. B. Future Work An important addition to the system presented here is an associative network for storing the semantics and relationship between object, event and action types. At higher levels of representation, object instances could be bound into

R EFERENCES [1] B. Scassellati, Investigating models of social development using a humanoid robot, in Biorobotics ed. Barbara Webb and Thomas Consi, M.I.T. Press, 2000. [2] C. Peters, C. O’ Sullivan, Bottom-up Visual Attention for Virtual Human Animation, Proceedings of Computer Animation for Social Agents (CASA) 2003, Rutgers University, May 2003. [3] C. Peters, Direction of Attention Perception for Conversation Initiation in Virtual Environments. Proceedings of Intelligent Virtual Agents (IVA) 2005, Kos, Greece, pp. 215-228, September 2005. [4] C. Peters, C. Pelachaud, E. Bevacqua, M. Mancini, I. Poggi, A Model of Attention and Interest Using Gaze Behavior. Proceedings of Intelligent Virtual Agents (IVA) 2005, Kos, Greece, pp. 229-240, September 2005. [5] H. Noser, O. Renault, D. Thalmann, and N. Thalmann, Navigation for digital actors based on synthetic vision, memory, and learning, Computers and Graphics , vol. 19(1), 1995, pp. 719, IEEE Press. [6] M. Lozano, R. Lucia, F. Barber, F. Grimaldo, A. Lucas, A.F. Bisquerra, An efficient synthetic vision system for 3D multi-character systems, In Proceedings Intelligent Virtual Agents , 4th International Workshop, IVA, Kloster Irsee, Germany, September 2003, pp 356-357. [7] R. Thomas and S. Donikian, A model of hierarchical cognitive map and human-like memory designed for reactive and planned navigation, Proceedings of the 4th International Space Syntax Symposium , London 2003. [8] J. Kuffner and J-C. Latombe, Fast Synthetic Vision, Memory, and Learning for Virtual Humans, In Proceedings of CA ’99: IEEE International Conference on Computer Animation, Geneva, Switzerland, May 1999. [9] D. Isla and B. Blumberg, Object persistence for synthetic creatures, Proceedings of the 1st International Conference on Autonomous Agents and Multiagent Systems , AAMAS, pp 1356 - 1363, 2002. [10] T. Evers and S. Musse, Building artifical memory to autonomous agents using dynamic and hierarchical finite state machine, Computer Animation 2002 , 2002, pp. 164170, IEEE Press. [11] J. Strassner and M. Langer, Virtual humans with personalized perception and dynamic levels of knowledge, Computer Animation and Virtual Worlds (CASA 2005 Special Issue), 16 (3-4), pp 331-342, 2005. [12] E. Tulving, Episodic and Semantic Memory, Organization of Memory, Nova York: Academic Press, pp. 381403, 1972. [13] R. Atkinson and R. Shiffrin, Human Memory: A proposed system and its control process, Vol. 2 of The Psychology of Learning and Motivation: Advances in research and Theory. New York: Academic Press, (1968). KWSpence and JT Spence (Eds.). [14] L. Itti, C. Koch and E. Niebur, A model of saliency-based visual attention for rapid scene analysis, IEEE Transactions in Pattern Anal. Mach. Intell. (PAMI), Vol. 20, No. 11, pp. 1254-1259, 1998. [15] S. Baron-Cohen, How to build a baby that can read minds: cognitive mechanisms in mind reading, Cahiers de Psychologie Cognitive, 13 (1994) pp. 513–552. [16] D.I. Perrett and N.J. Emery, Understanding the intentions of others from visual signals: neurophysiological evidence, Current Psychology of Cognition, 13 (1994) pp 683-694.