A structured network architecture for adaptive language ... - IEEE Xplore

0 downloads 0 Views 814KB Size Report
reflect task structure in the network architecture in order to provide improved generalization capability in language acquisition. We propose a product network, ...
A STRUCTURED NETWORK ARCHITECTURE FOR ADAPTIVE LANGUAGE ACQUISITION Laura G. Miller and Allen L. Gorin

AT&T Bell Laboratories Murray Hill, New Jersey 07974

ABSTRACT In this paper we report on progress in understanding how to build devices which adaptively acquire the language for their task. The generic device is an information-theoretic connectionist network embedded in a feedback control system. We investigate the capability of the network to leam associations between messages and meaningful responses to them as a task increases in sue and complexity. Specifically, we consider how one might reflect task structure in the network architecture in order to provide improved generalization capability in language acquisition. We propose a product nehvork which provides improved generalization by factoring the associations between words and action through semantic primitives. The product network is being evaluated in several experimental systems, including a 1000-action Almanac data retrieval system. We describe these systems and provide details on two preliminary experiments. 1. INTRODUCTION

At present, automatic speech recognition technology is based upon constructing models of the various levels of linguistic structure assumed to compose spoken language. These models are either constructed manually or automatically trained by example. A major impediment is the cost, or even the feasibility, of producing models of sufficient fidelity to enable the desired level of performance. The proposed alternative is to build a device capable of acquiring the necessary linguistic skills during the course of performing its task. The basic principles and mechanisms underlying this research program were detailed in [l]. For completeness, we briefly review them here. A f i s t principle is that the primary function of language is to communicate. A consequence of this principle is that language acquisition involves gaining the capability of decoding the message, i.e. of extracting the intended meaning. This first principle underlies an investigation of a language acquisition mechanism based on connectionist methods, in which the network builds associations between input stimuli and meaningful responses to them.

A second principle is that language is acquired by interacting with a complex environment. A consequence of this principle is that the interaction involves feedback as to the appropriateness of a device’s response to a particular

input. This second principle underlies an investigation of a mechanism for human-machine interaction based on controltheory methods, where the system’s input is a message and the error signal is a measure of the appropriateness of the device’s response. The generic mechanism inspired by these two principles is an information-theoreticconnectionist network embedded in a feedback control system. The connections are strengthened or weakened depending upon feedback as to the appropriateness of the device’s response to input stimuli. We are satisfied that the device understands if it learns to respond appropriately to our commands. The basic network architecture is described in [l], which has the novel feature that the connection weights are defined to be the mutual information between words and actions. The basic architecture can be extended to include an intermediate layer of phrase detector nodes or linguistic non-terminals [1.21. The network is embedded in a feedback control system as shown in Figure 1, where the semantic-level error signal is supplied by the user. In this work, the error signal is quantized to a single bit, i.e. the action is either acceptable or not. The utility of these principles and mechanisms was first evaluated in the text-based 3-action Inward Call Manager. Those experiments are described in detail in [l]. Several extensions of this work are currently being investigated [2,3,4,5]. In this paper, we investigate the capability of the proposed network to leam the associations between messages and meaningful responses to them as the task grows in size and complexity. Specifically, we consider how one might reflect task structure in the network architecture in order to provide improved generalization capability in language acquisition. We propose a product network, which provides improved generalization by factoring the associations between words and action through semantic primitives. The product network is evaluated in several experimental systems, including a 1000-action Almanac data retrieval application. We describe these systems and provide results from preliminary experiments on a 15-action and 1Waction system.

1-201 0-7803-0532-9192 $3.00 0 1992 IEEE

2. LARGE AND COMPLEX TASKS

As a task increases in complexity, so does the mapping from message to meaning. It is then reasonable to question the capability of a proposed network to learn such complex mappings. A body of theory addressing such issues [6] exists; however, the results tend to be asymptotic in nature, requiring large numbers of examples for learning to occur. In contrast, a striking feature of human language acquisition is our ability to make sweeping generalizations from small numbers of observations. A homogeneous network architecture might, given sufficient data, be capable of learning the associations between messages and meaningful responses to them in large and complex tasks. However, we consider an altemative, namely to reflect our knowledge of the task and language structure in the network architecture. We thus propose a third principle, that a language acquisition device should be well-matched to its environment and U 0 periphery, as measured by its ability to rapidly adapt and generalize. Given a device whose range of actions can be formally described, this third principle leads us to investigate methods to reflect the structure of that action space in the network architecture. 3. PRODUCT NEXWORK

We consider tasks where the set of actions the device can execute are specified by n parameters. We view the device's selection of each parameter as a hidden subaction, which we refer to as a semuntic primitive. The semantic primitives combine to yield the set of possible semantic actions. We propose a product netwurk for these multidimensional tasks. The product network comprises n networks, one for each semantic primitive. The semantic actions, which correspond to n-tuples of semantic primitives, are represented via an n-fold Cartesian product of semantic primitives. The activation vector for the semantic action set is acquired by taking the outer product of the n activation vectors. The product network architecture reflects task structure and provides improved generalization capability as the association between words and action is factored through the semantic primitives. 4. EXPERIMENTAL SYSTEMS

We have investigated applications where there are two independent semantic primitives, requiring a two-parameter product network. In particular, consider data retrieval applications for queries on attributes of objects. The semantic primitives comprising the semantic actions are the selection of the attribute and the selection of the object. The semantic primitives combine to form the set of semantic actions, denoted (A7TRIBUTEi, OBJECTj). 1 I i 4 I, where Z is the number of attributes, and 1 4 j S J, where J is the number of objects. The semantic actions represent the device responding to a query on (ATTRIBUTE,.,OBJECT,.).

Our first experiment involved a data retrieval application based on a subset of the DARPA Resource Management Task [5.7]. The system responds to queries on 5 attributes of 3 ships, resulting in 15 semantic actions. A typical query is What is the fuel level of the Kirk? Our second experiment involves an Almanac data retrieval application which responds to queries on 20 attributes of the 50 states, resulting in lo00 semantic actions. A typical query is Who's currently the governor of Iowa? The product network for Almanac is shown in Figure 2, where the activation matrix is the outer product of the individual activation vectors.

Three sample conversations with the Almanac system follow where S denotes the system and U the user. A conversation begins with a user query, and en& when the user and the system converge on the semantic action associated with the query. If the conversation seems to be diverging, then it can be terminated, and the user connected to a human.

In Convexsation 1 the system interprets the query correctly after one user input. The user confirms the interpretation, and

the system responds. Conversation 2 illustrates the feedback control loop when the system partially misinterprets the initial query. The state selection was interpreted correctly, but the attribute specification was not. The second user input clarifies the attribute speci6cation. and the system recovers from its initial error. Conversation 3 shows the ability of the system to generalize previously learned semantic associations via the product network architecture by factoring through semantic primitives. That is, rather than building an association to the action giving the directly from the word nick" nickname of Ohio, the network factors the association through the attribute semantic primitive. CONVERSATION 1 U: S: U: S:

How many people live in Iowa 7 YOU WANT TO KNOW THE POPULATION OF IOWA.

Yes.

IOWA'S TOTAL STATE POPULATION IS 2,840.000.

CONVERSATION 2 U: S U: S U: S

Where did the name Ohio come from? DO YOU WANT TO KNOW OHIO'S STATE NICKNAME? No, the origin. YOU WANT TO KNOW THE ORIGIN OF "OHIO. Yes. "OHIO COMES FROM AN IROQUOIAN WORD MEANING "GREAT RIVER".

1-202

I

The probability of error after 2 and 3 inputs is low, approaching 0% by the end of the experiments. This demonstrates the error-recovery capability of the feedback control loop. The system recovers from its initial error by receiving additional user inputs to clariQ the request.

CONVERSATION 3

U: Where did New Jersey come from? S: YOU WANT TO KNOW THE ORIGIN OF "NEWJERSEY. U: Yes. S: "NEW JERSEY" COMES FROM THE CHANNEL ISLE OF JERSEY. 5. EXPERIMENTAL RESULTS

We conducted an on-line evaluation of the DARPA and Almanac systems. Two individual experiments were performed using the 15-action DARPA system (referred to as DARPA 1 and DARPA 2), and one experiment was performed using a 100-action Almanac subsystem. A set of six users interacted with each system. For DARPA 1 and DARF'A 2,752and 773 typed conversations (1525 total) were collected, respectively. A total of 161 typed conversations were collected in the Almanac experiment. 5.1 VOCABULARY

DARPA 1 and DARPA 2 were initially trained with 200 and 15 words, respectively. Each system acquired over 400 IMY words and their meaning in the context of the task during the course of over 750 conversations. We observed that the words introduced during the final stages of each experiment were, on the average, semantically null. The rank-frequency distribution for all the combined vocabulary from DARPA 1 and 2 are shown in Figure 3, on a log-log scale. The nfh most frequently occurring word has rmk n. The linear behavior of the log-log plot is known as Zipf's law. We observed that most high frequency words are semantically null for the task, and most moderate and low frequency words are semantically relevant. This indicates the importance of continually being in learning mode. The Almanac system was initially trained with 117 words and acquired over 175 lloy words and their meaning in the context of the task during the course of 161 conversations. We observed that the vocabulary showed no sign of leveling off, increasing at a rate of approximately 1 word per conversation. 53 CONVERSATION LENGTH

We analyze the D M A ' s system behavior during learning by plotting the the probability of error after 1, 2, and 3 user inputs. Figure 4 shows the average probability of error (using bins of 100 conversations) for DARPA 1 and 2. We observe a significant probability of error after 1 user input, although there is a decrease as the system is exposed to more conversations (in leaming mode). The network is learning associations between words and actions; therefore, the probability that the initial input queries are interpreted incorrectly is decreasing. We also observe "learning peaks" which indicate that the system is being exposed to unknown, but semantically significant words.

The probabilities of a converging conversation after L inputs for both DARPA 1 and DARPA 2 were 70%. 92%, 95%, 97%, and 98% for lengths L = 1, 2, 3,4,5 respectively. The probabilities of a converging conversation for the Almanac experiment after L inputs were 57%, 86%, ~ W Oand , 95% for lengths L = 1, 2, 3,4 respectively. The increase in probability indicates the error-recovery capability of the feedback control loop. One can consider the probability that such a dialogue converges in R steps. A simple convergence model for such dialogues was derived in [l], which predicts that the probability of a dialogue lasting k steps decreases exponentially with k. Figure 5 demonstrates this exponential decay (linear on a log scale) for the.DARPA sentences. The small conversation set collected thus far in the Almanac experiment limits the analysis that we can perform. As more data is collected for the Almanac experiment, we will perform similar analysis to Figures 4 and 5. 53 STABILmY

We also measure the stability of learning, i.e how well did each system remember what it learned. The final networks of each experiment were tested, without additional adaptation, on the first input sentence from each converging Conversation from the corresponding experiment. For the combined DARPA and Almanac experiments, we observed 91% and 89% stability in learning, respectively. In learning mode, the DARPA and Almanac systems classified 70% and 57% of the first sentences correctly, respectively. The difference is indicative of the learning of unknown words and their meaning with respect to the task, and the ability of the system to retain the information. 6. FUTUREWORK

We are currently performing an on-line evaluation of the 1000-action Almanac system. For this system, we are investigating methods to maneuver the conversation in a more intelligent manner, e.g. to specifically request from the user what information it needs, or to immediately respond with the answer when the network's interpretation is highly confident. Other investigations include a teleconferencing application which uses a product network comprising 3 dependent parameters [8], where only a subset of the tuples are allowed. An altemative method for estimating word/action associations is being investigated which circumvents small-sample issues [3]. Finally, the integration of the 1000-action Almanac and a speech based system accepting spoken rather than typed input is being investigated [4].

1-203

REFERENCES AL. Gorin, SE. Levinson, A. Gertner, and E. Goldman, "On Adaptive Acquisition of Language." Computer, Speech and Language, pp. 101-132, April 1991.

lo00 Frequency 100

A. Gertner and A.L. Gorin, "Adaptive Language Acquisition for an Airline Information Subsystem," AT&T Bell Laboratories Technical Memorandum, unpublished, November 1991.

10

1

1

N. Tishby and A.L. Gorin, "Algebraic Leaming of Statistical Associations for Language Acquisition." submitted for publication, September 1991.

after 1 input after 2 inputs

L.G. Miller, A. L. Gorin, and S.E. Levinson, "Adaptive Language Acquisition for a Database Query AT&T Bell Laboratories Technical Task," Memorandum, unpublished, December 1989.

Probability &Error

G. Cybenko, "Approximation by Superpositions of Sigmoidal Function," Mathematics of Control, Signals, and Systems 2, pp. 303-314.

Figure 4: Probability of Error (DARPA 1 and 2)

4 Tp 1 4 I

-

0

User and Environment

I

t

semantic-level

1-

0 200 400 600 SO0 Number Conversations (Bin 100)

M. McGee, private communication, August 1991.

(Network)

2o

0

Price, P,Fisher, W., Bernstein, J., and Pallett, D., "The DARF'A 1OOO-Word Resource Management Database for Continuous Speech Recognition," Roc. of ICASSP. New York, New York, April 1988.

action

100 1000 Rank

Figure 3: Rank-Frequency Distribution (DARPA 1 and 2)

A.L. Gorin, S.E. Levinson, and A. Gertner, "Adaptive Acquisition of Spoken Language," Proc. of ICASSP, pp. 805-808, May 1991.

input

10

I

error signal

-1 -2 -3

Figure 1: Feedback Control

--n-nl1

2

3

4

5

k (number of user inputs) Figure 5: Dialogue Convergence (DARPA 1 and 2)

+-I-++

IJ-LL

activation matnx

Figure 2: Product-Network for Almanac

1-204