Download as a PDF

17 downloads 1094 Views 656KB Size Report
Apr 26, 1997 - Case-Based Learning: Beyond Classification of Feature Vectors. This workshop took ..... which retrieval and adaptation mechanisms a system may have. These .... and the Applicus project A CBR help desk trial application . 7 ...
ECML-97 MLNet Workshop Notes Case-Based Learning: Beyond Classi cation of Feature Vectors? Dietrich Wettschereck1 and David W. Aha2 1

GMD (German National Research Center for Information Technology) Arti cial Intelligence Research Division Schloss Birlinghoven 53754 Sankt Augustin, Germany [email protected] 2 Navy Center for Applied Research in Arti cial Intelligence Naval Research Laboratory Washington, D.C. USA [email protected]

Abstract. This collection contains the ten papers presented at the 1997

European Conference on Machine Learning MLNet Workshop entitled Case-Based Learning: Beyond Classi cation of Feature Vectors. This workshop took place on April 26, 1997 in Prague, Czech Republic. Information on this workshops' objectives and other details can be found at either of the following two World Wide Web pages: 1. http://www.aic.nrl.navy.mil/aha/ecml97-wkshp/ 2. http://nathan.gmd.de/persons/dietrich.wettschereck/ecml97ws.html

?

NCARAI Technical Note AIC-97-005

Table of Contents Workshop Schedule : : : : : : : : : Introduction : : : : : : : : : : : : : Invited Talk #1 (E. Plaza) : : : : Invited Talk #2 (L. K. Branting) Invited Talk #3 (A. Aamodt) : :

:::: :::: :::: ::: ::::

: : : : :

: : : : :

: : : : :

: : : : :

: : : : :

: : : : :

: : : : :

: : : : :

: : : : :

: : : : :

: : : : :

: : : : :

: : : : :

: : : : :

: : : : :

: : : : :

1 2 3 5 7

Learning to Re ne Case Libraries: Initial Results D. W. Aha and Leonard A. Breslow : : : : : : : : : 1 Introduction : : : : : : : : : : : : : : : : : : : : : : 2 Revising Conversational Case Libraries : : : : : 3 Initial Empirical Evaluation : : : : : : : : : : : : 4 Discussion, Related and Future Research : : : : 5 Conclusion : : : : : : : : : : : : : : : : : : : : : : :

: : : : : :

: : : : : :

: : : : : :

: : : : : :

: : : : : :

: : : : : :

: : : : : :

: : : : : :

: : : : : :

9 9 11 12 14 15

Case Composition needs Adaptation Knowledge: a view on EBMT M. Carl : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 17 1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 17 2 Terminology : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 18 3 The case of Machine Translation : : : : : : : : : : : : : : : : : : 20 3.1 Decomposition/Adaptation . . . . . . . . . . . . . . . . . . . . . 21 3.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 4 Conclusion : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 23

Distributed Representations for Analogical Mapping B. K. Ellingsen : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 25 1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 25 2 Distributed Representations and Analogical Mapping : : : : : 26 2.1 Mapping Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 28 3 Experiments : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 29 4 Conclusion : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 31

Learning from Sequential Examples: Initial Results with InstanceBased Learning

S. L. Epstein and J. Shih : : : : : : : : 1 Introduction : : : : : : : : : : : : : : 2 Sequential Dependency : : : : : : : 3 Bridge : : : : : : : : : : : : : : : : : : 4 Representation : : : : : : : : : : : : : 5 SIBL : : : : : : : : : : : : : : : : : : : 6 Experimental Design and Results : 7 Related and Future Work : : : : : :

: : : : : : : :

: : : : : : : :

: : : : : : : :

: : : : : : : :

: : : : : : : :

: : : : : : : :

: : : : : : : :

: : : : : : : :

: : : : : : : :

: : : : : : : :

: : : : : : : :

: : : : : : : :

: : : : : : : :

: : : : : : : :

: : : : : : : :

: : : : : : : :

: : : : : : : :

33 33 34 34 35 36 37 39

Looking at Features within a Context from a Planning Perspective H. Mu~noz-Avila and F. Weberskirch : : : : : : : : : : : : : : : : : : 41 1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 41 2 Motivation : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 42 3 Feature Weighting in Case-based Planning : : : : : : : : : : : : 43 4 Planning Theory : : : : : : : : : : : : : : : : : : : : : : : : : : : : 44 5 Feature Context and Trivial Serializability : : : : : : : : : : : : 44 6 Related Work : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 46 7 Conclusion : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 46 A Theory of the Acquisition of Episodic Memory C. Ramirez and R. Cooley : : : : : : : : : : : : : : : : : : : : 1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : 2 Acquisition of Events : : : : : : : : : : : : : : : : : : : : : 3 Dynamic Memory Weaknesses : : : : : : : : : : : : : : : : 4 An Enhanced Learning Model after Dynamic Memory

: : : : :

: : : : :

: : : : :

: : : : :

A Similarity Measure for Aggregation Taxonomies J. Surma : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1 Introduction : : : : : : : : : : : : : : : : : : : : : : : 2 Aggregation : : : : : : : : : : : : : : : : : : : : : : : 3 Similarity Measure : : : : : : : : : : : : : : : : : : :

: : : :

: : : :

: : : :

: : : :

: : : :

: : : :

: : : :

: : : :

48 48 48 50 50

56 56 57 57 3.1 Set Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . 57 3.2 Final Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 3

4 Experiment : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 59 5 Final Remarks : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 59 Genetic algorithms for analogical mapping B. Tessem : : : : : : : : : : : : : : : : : : : : : : : : : 1 Introduction : : : : : : : : : : : : : : : : : : : : : 2 Genetic Algorithms and Analogical Mapping 3 Experiments : : : : : : : : : : : : : : : : : : : : : 4 Discussion : : : : : : : : : : : : : : : : : : : : : : :

: : : : :

: : : : :

: : : : :

: : : : :

Using Knowledge Containers to Model a Framework Adaptation Knowledge W. Wilke, I. Vollrath, and R. Bergmann : : : : : : : : : 1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : 2 Di erent Sources of Knowledge in a CBR System : 3 A Framework for Learning Adaptation Knowledge :

: : : : :

: : : : :

: : : : :

: : : : :

: : : : :

: : : : :

61 61 62 65 66

for Learning : : : :

: : : :

: : : :

: : : :

: : : :

: : : :

Instance-Based Classi cation of Cancer Cells C. Wisotzki and Peter Hufnagl : : : : : : : : : : : : : : : : : 1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : 2 Comparative Genomic Hybridization Method (CGH) : 3 Pre-processing : : : : : : : : : : : : : : : : : : : : : : : : :

: : : :

: : : :

: : : :

: : : :

68 68 69 70 3.1 Learning Adaptation Knowledge from Knowledge Containers . . 70 3.2 A Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 4 Further Directions and Discussion : : : : : : : : : : : : : : : : : 72

3.1 Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Processing of Symbol Strings . . . . . . . . . . . . . . . . . . . .

4 Classi cation Methods for Curves : : : : : : : : : : : : : : : : : 4.1 The Nearest Neighbor Method and Modi cations . . . . . . . . . 4.2 Prototype Methods (PM) . . . . . . . . . . . . . . . . . . . . . .

5 Further work : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

76 76 77 78 78 79 81 81 82 82

Workshop Schedule 26 April 1997

9:00-09:15 Welcome: D. Wettschereck 9:15-10:00 Invited talk #1: Cases as episodic models in problem solving E. Plaza 10:00-10:15 Learning from sequential examples: Initial results with instancebased learning J. Shih & S. L. Epstein 10:15-10:30 Looking at features within a context from a planning perspective H. Mu~noz-Avila & F. Weberskirch

10:30-10:45 Co ee Break 10:45-11:30 Invited talk #2: The role of case explanations in adaptation for

problem solving L. K. Branting 11:30-11:45 Instance-based classi cation of cancer cells C. Wisotzki & P. Hufnagl 11:45-12:00 Learning to re ne case libraries: Initial results D. W. Aha & L. Breslow

12:00-1:30 Poster Session/Lunch 1:30-2:15 Invited talk #3: Explanation-driven learning of case-speci c knowledge A. Aamodt 2:15-3:00 Discussion period #1: Transfer of learned knowledge between containers W. Wilke (Leader)

3:00-3:15 Co ee Break 3:15-4:00 Discussion period #2: First-order representations D. Wettschereck (Leader) 4:00-4:45 Discussion period #3: Learning & Planning D. Borrajo (Leader)

4:45-5:00 Co ee Break 5:00-5:45 Discussion period #4: Integrating Case-Based Learning with Other Machine Learning Techniques D.W. Aha (Leader)

5:45-6:00 Workshop Summary: S. Matwin 1

Introduction The focus of this workshop is on case-based learning. Machine learning research on case-based learning is typically restricted to classi cation using featurevector representations. However, several other opportunities for case-based learning exist, and our workshop is dedicated to forwarding these topics. We organized our workshop so that it was not a mini conference. Instead, it is more faithful to the traditional workshop objectives of encouraging discussion. Towards this goal, we reserved time for four 45-minute discussion periods, where the discussion topics were selected according to attendee votes prior to the workshop. Each discussion period was lead by an attendee with particular interests in the discussion topic. We also invited three researchers to discuss topics relating to this workshop's topic. This volume includes the abstracts for the invited talks and the ten papers presented at this workshop. Additional information about this workshop can be obtained from one of the following WWW locations: http://www- t-ki.gmd.de/persons/dietrich.wettschereck/ecml97ws.html http://www.aic.nrl.navy.mil/aha/ecml97-wkshp/ We would like to thank the other members of the organizing committee of this workshop: Daniel Borrajo (Universidad Carlos III de Madrid, Spain), Karl Branting (University of Wyoming, USA), Hector Mu~noz-Avila (University of Kaiserslautern, Germany), Francesco Ricci (IRST, Italy), Jerzy Surma (University of Economics at Wroclaw, Poland), and Henry Tirri (University of Helsinki, Finland). Their thorough reviews were much appreciated by the authors of the contributions. We also thank the invited speakers, Agnar Aamodt (Norwegian University of Science and Technology), L. Karl Branting, and Enric Plaza i Cervera (Spanish Scienti c Research Council), and the authors of the papers printed in this volume. Finally, thanks to the co-chairs of ECML-97, Maarten van Someren and Gerhard Widmer, for providing the opportunity to hold this workshop, and to the ESPRIT network of excellence (MLNet) for its nancial support. Prague, Czech Republic 26 April 1997 Dietrich Wettschereck David W. Aha

2

Invited Talk #1

Cases as Episodic Models in Problem Solving Enric Plaza i Cervera

Institut d'Investigacio en Intelligencia Arti cial Spanish Council for Scienti c Research Campus UAB, Bellaterra, Spain enric@iiia,csic.es

Abstract. Case-based reasoning systems are very diverse, but we can

think of them in two classes P and R. Class P are systems close to nearest neighbour algorithms and propositional ML techniques. They are closed in the basic core notion of using feature vectors and in the kind of domains in which they are applied. Class R systems are closely related to classical AI reasoning and problem solving systems like those for planning, scheduling and design. Class R systems are also related to relational ML techniques. Complex (or structured) representations of cases belong to this second class R. I will talk about complex representation of cases and their relation to relational ML techniques and AI problem solving systems. The issues of "snippets" (subcases), levels of abstraction, and domain knowledge representation are the most relevant here|essentialy because they determine which retrieval and adaptation mechanisms a system may have. These issues will be explained in the framework of our research at IIIA with structured representation of cases, reasoning about symbolic similitudes, and integration of eager (inductive) and lazy (case-based) learning. The talk will mainly use as example the SAXEX system that learns musical expressivity in saxophone performance.

Brief Biography Enric Plaza i Cervera is Researcher at the IIIA (Arti cial Intelligence Research Institute) of the CSIC (Spanish Scienti c Research Council). He received his Ph.D. in Computer Science in 1987 at the Universitat Politecnica de Catalunya (UPC) and was awarded the Ciutat de Barcelona Research Prize in Cognitive Science in 1988. The topic of the Ph.D. was on the area of knowledge acquisition for expert systems where he used fuzzy logic in the framework of personal construct psychology theory. Later, he developed ARC, one of the rst case-based systems in Europe. ARC performed diagnosis on pneumonias and fuzzy logic was used to model plausible information. He was project leader for IIIA on the ESPRIT II project VALID: Validation methods and tools for knowledge-based systems. The VALID project was a pioneer in the area of KBS validation and Enric Plaza was co-chairman of the rst European Workshop on Veri cation and Validation of KBS (EuroVAV). He has also organized a workshop on Integrated Learning Architectures inside the European Conference on Machine Learning. 3

Enric Plaza has been project leader on two projects on Case-based Reasoning funded by the CICYT (Spanish Commission for Science and Technology): The Massive Memory Architecture project and (currently) the ANALOG project. He directed a Ph.D. on Case-based Reasoning applied to learning control knowledge in pneumonia diagnosis performed by Beatriz Lopez in the Universitat Politecnica de Catalunya (UPC) and is currenly directing two Ph.D. on integrating case-based reasoning and machine learning methods. He has been reviewer on two ESPRIT projects, has been a member of the program committee of European Conference on Arti cial Intelligence and several European and international workshops, and has served as reviewer for a dozen of international conferences. He authored or co-authored over twenty articles published in journals and books. He is currently president of the Catalan Association for Arti cial Intelligence (ACIA) and co-chair of the 2nd International Conference on case-based Reasoning.

Related Publications Plaza, E., Lopez de Mantaras, R., & Armengol, E. (1996). On the importance of similitude: An entropy-based assessment. Proceedings of the Third European Workshop on Case-Based Reasoning (pp. 324{338). Lausanne, Switzerland: Springer-Verlag. Plaza, E. (1995). Cases as terms: A feature term approach to the structured representation of cases. Proceedings of the First International Conference on Case-Based Reasoning (pp. 265{276). Sesimbra, Portugal: Springer-Verlag. Plaza E., & Arcos J.L. (1994). Flexible integration of multiple learning methods into a problem solving architecture. Proceedings of the European Conference on Machine Learning (pp. 403{406). Catania, Italy: Springer.

4

Invited Talk #2

The Role of Case Explanations in Adaptation for Problem Solving L. Karl Branting

Department of Compute Science University of Wyoming Laramie, WY USA [email protected]

Abstract. Case adaptation is typically unnecessary when CBR is used

for classi cation. However, when CBR is applied to problem-solving tasks, adaptation can be very complex. This talk argues that the explanation underlying a case solution is often required for e ective case adaptation in problem solving contexts, regardless of whether the adaptation method is derivational or transformational. The nature and importance of solution explanations for case adaptation will be illustrated in a variety of domains, including document drafting, argument creation, and route nding.

Brief Biography Karl Branting is an Assistant Professor of Computer Science at the University of Wyoming. His research interests include: integration of case-based reasoning with other computational paradigms, such as model-based reasoning, hierarchical problem solving, and rule-based reasoning; machine learning; cognitive modeling; and applications of arti cial intelligence to natural resources management, ecology, and law. Dr. Branting was on the program committees of IJCAI-95, ICCBR-95, ICCBR-97, two AAAI workshops on CBR, ICAIL-93 and ICAIL-95, is program chair of the Sixth International Conference on AI and Law (ICAIL-97), and is on the editorial board of the Journal of Arti cial Intelligence and Law. Dr. Branting's CBR projects include CARMA, a elded advisory system for ranchers that integrates CBR with model-based reasoning and GREBE, a legal analysis system that integrates precedents with legal and common-sense rules. Dr. Branting is the recipient of an NSF CAREER grant for research on abstraction and hierarchical problem solving in case-based reasoning.

Related Publications Branting, L. K. and Lester, J. C. (1996). Justi cation structures for document reuse. Proceedings of the Third European Workshop on Case-Based Reasoning (pp 76{90). Lausanne, Switzerland: Springer-Verlag. 5

Branting, L. K., & Aha, D. W. (1995). Strati ed case-based reasoning: Reusing hierarchical problem solving episodes. Proceedings of the Fourteenth International Joint Conference on Arti cial Intelligence (pp. 384{390). Montreal, Canada: Morgan Kaufmann. Branting, L. K. (1994). A computational model of Ratio Decidendi. Arti cial Intelligence and Law, 2, 1{31.

6

Invited Talk #3

Explanation-Driven Learning of Case-Speci c Knowledge Agnar Aamodt

Department of Informatics Norwegian University of Science and Technology Trondheim, Norway Agnar.Aamodt@i .unit.no

Abstract. The main message of the talk is that learning by retaining

new cases should be a *knowledge integration* process rather than a mere *case addition* process. This relates to all aspects of case learning, such as determining what to retain from a problem just solved, how to index, etc. In this approach case knowledge is treated as one type of knowledge, the other main type being general domain knowledge from which explanations to guide case utilization and learning is derived. The talk will give a state-of-the-art overview of this type of learning, followed by a discussion of how the main issues are dealt with in our CREEK system.

Brief Biography Agnar Aamodt is Professor of Computer Science and Arti cial Intelligence at the Norwegian University of Science and Technology (former University of Trondheim), Department of Computer and Information Science, and head of the department's AI group. His general focus is in improving the construction and continuos maintenance of knowledge-based decision support systems, emphasizing knowledge acquisition, knowledge modeling, and machine learning methods. His particular focus is on case-based methods for problem solving and learning, and above all the integration of case-speci c and general domain knowledge. For many years he pursued his research as a reserach scientist at Sintef, and has also worked as a visiting scholar in the University of Texas at Austin, AI Laboratory (1987-88), and as a researcher in the AI Laboratory of the Free University of Brussels (1991{92). In his research Dr. Aamodt cooperates with national and international universities and institutions. He had spent time as invited guest professor in the University of Kaiserslautern, the University of Freiburg, and the University of Sao Paulo (Sao Carlos Campus), Brazil. He has published about 20 scienti c papers and a similar amount of research reports. He co-chaired ICCBR95, the First International Conference on Case-Based Reasoning, and SCAI'95, the Fifth Scandinavian AI Conference. He is program committee member of the series of European CBR workshops. He was a reviewer of the EU-funded EspritIII project Inreca (A method and tool for integration of induction and CBR), and the Applicus project (A CBR help desk trial application). 7

Related Publications Aamodt, A. (1994). Explanation-driven case-based reasoning. In S. Wess, K. Altho , & M. Richter (Eds.) Topics in Case-Based Reasoning. Berlin: Springer Verlag. Aamodt, A. (1995). Knowledge acquisition and learning from experience: The role of case-speci c knowledge. In G. Tecuci & Y. Kodrato (Eds.) Machine Learning and Knowledge Acquisition: Integrated Approaches. London: Academic Press. Aamodt, A., & Nygerd, M. (1995). Di erent roles and mutual dependencies of data, information, and knowledge: An AI perspective on their integration. Data and Knowledge Engineering, 16, 191{222.

8

Learning to Re ne Case Libraries: Initial Results David W. Aha and Leonard A. Breslow Navy Center for Applied Research in Arti cial Intelligence, Naval Research Laboratory, Washington, DC USA, faha,[email protected]

Abstract. Conversational case-based reasoning (CBR) systems, which incrementally extract a query description through a user-directed conversation, are advertised for their ease of use. However, designing large case libraries that have good performance (i.e., precision and querying eciency) is dicult. CBR vendors provide guidelines for designing these libraries manually, but the guidelines are dicult to apply. We describe an automated inductive approach that revises conversational case libraries to increase their conformance with design guidelines. Revision increased performance on three conversational case libraries.

1 Introduction In the context of the ECML-97 Workshop entitled Case-Based Learning: Beyond Classi cation of Feature Vectors, this paper's contribution focuses on using machine learning methods to assist in the design of case libraries. These libraries are designed for solution retrieval rather than classi cation tasks, and each case might contain a unique solution. Cases are de ned using a feature vector representation, but there is typically little intersection between the set of features de ned for two arbitrary cases. More speci cally, this paper focuses on knowledge re nement (Ourston & Mooney, 1990; Wogulis & Pazzani, 1993). However, unlike most previous e orts, we focus on revising a knowledge base of cases rather than rules. The speci c context of our research is conversational CBR (CCBR) (e.g., Inference's CBR Express, Primus's SolutionBuilder), which is a commercially successful approach for supporting help-desk tasks. These systems conduct conversations with a user, during which the user answers user-selected questions (queries and case descriptions are represented as hquestion,answeri pairs).1 A conversation ends when the user selects a ranked case, with the hope that its solution (i.e., a sequence of actions) solves the problem denoted by the query. Figure 1 displays two cases from Inferences's Printing case library. During a conversation, the CCBR engine maintains and displays a ranked list of cases whose descriptions are most similar to the query, and a ranked list of their unanswered questions. User selections are limited to these short lists, and the ranked lists are updated each time the user selects and answers a question. Figure 2 displays these two lists after one question has been answered: that the 1

In this context, question and answer are synonyms for feature and value, respectively.

9

Case 1: Incorrect Interface Cable Questions & Answers:

1. Can your printer print a self test? Yes 2. What is the display message? 03 I/O Problem 3. Is the ON LINE indicator lit? Yes 4. Is the printed con guration correct? Yes 5. Are you using the correct interface cable? No Actions: 1. Replace incorrect cable with correct one

Case 2: Printing on Wrong Side of Paper Questions & Answers:

1. Are you having print quality problems? Yes 2. What does the print quality look like? White Spots 3. Are you printing on the correct side of the paper? No Actions: 1. Turn paper over and print on other side

Fig. 1. Two Cases from the Printing Case Library Ranked List of Questions

1. Is the ON LINE indicator lit? 2. What is the display message? 3. Can you print data from the computer? 4. Is the printed con guration correct? 5. Does the computer's I/O port work with other devices? 6. Are you using the correct interface cable? Ranked List of Cases: 1. Printer is not on line 2. Incorrect con guration 3. Computer I/O port is set up incorrectly 4. Incorrect interface cable

Fig. 2. Ranked Lists of Questions and Cases for the Query fhCan the printer print a self test?, Yesig, taken from the Target Case Incorrect Interface Cable printer can print a self test. In this example, if we suppose that the problem is an incorrect interface cable (Case 1 in Figure 1), then we can see that the list of ranked questions contains several that are relevant to our problem, and that the targeted case is also ranked. However, if neither the relevant questions nor the target case is displayed, then it would be dicult to retrieve the correct action for this problem. Thus, conversational case libraries must be carefully designed to yield high precision (i.e., the user retrieves a case whose solution solves their problem) with high querying eciency (i.e., few questions need be answered before the correct case is retrieved). 10

Case Library Design

Hierarchy Inducer

Hierarchies

Edited

Hierarchy Hierarchies Editor

Case Extractor

Revised Case Library

Guidelines

Fig. 3. The Case Library Revision Process Commercial vendors supply guidelines for designing cases (e.g., reuse questions, use few questions per case, use appropriate question types) to ensure good CCBR performance, but these guidelines are dicult to implement for complex libraries. This has caused several companies to invest in costly consulting services or to try alternative technologies. We describe and evaluate a previously untested approach (Aha, 1997), implemented in a system named Clire (Case LIbrary REvisor), for assisting case authors. Clire revises a given library to improve its conformance with a given set of design guidelines, with the intention of improving its CCBR performance. Section 2 describes our approach and summarizes Clire, while Section 3 describes its initial application. We discuss the results, related research, and future research needs in Section 4.

2 Revising Conversational Case Libraries

Figure 3 summarizes Clire's approach for revising case libraries. It induces hierarchies to reveal library structure and enable case editing operations, which can assist in conforming to guidelines. We designed this rst version of Clire to induce a single tree using a top-down approach (see Figure 4). It does not use a splitting criterion (e.g., gain ratio (Quinlan, 1993)) that assumes the cases are clustered (e.g., by class). Instead, it selects a most frequently occurring question q among a node n's cases C to split them, provided that q isn't used to split n's ancestors. C is partitioned into one subset Ci per answer qi among C , plus an \unknown" subset Cu for cases in C that do not contain an answer for q. Cases passed to an \unknown" node Cu must be processed carefully because they are not distinguishable from the cases given to its siblings Ci . Thus, queries containing a subset of the hquestion,answeri pairs shared among these cases cannot distinguish cases in Cu from the others. This can prevent good question rankings, and, subsequently, good case rankings. Clire prevents this problem by promoting question reuse between the cases in Cu and their siblings in each Ci . It does this by passing the Ci cases as inactive cases to Cu , whose active cases must be recursively partitioned until they are both distinguished from each other and from the inactive cases. (This process can cause cases to appear at multiple leaves (i.e., in both Cu and Ci subtrees).) Clire's current implementation edits each case cj using a feature selection process reminiscent of Cardie's (1993), but operates on a case-speci c basis (e.g., Domingos, 1997). Speci cally, it deletes any question answered in cj that does not appear on any path P between the root node and a leaf containing cj . Case 11

Key:

L:

Set of cases in the case library Actives: Set of cases to be distinguished from each other Inactives: Set of cases to be distinguished from Actives Q: Questions used in ancestors of this node (initially ) N : A new internal node (Nq is its selected question) Top level call: induce tree(L, , ) induce tree(Actives,Inactives,Q) = 1. IF stop splitting(Q,Actives,Inactives) 2. THEN RETURN make leaf(Actives Inactives) 3. ELSE Nq = select question(Q,Actives) // Nq Q 4. FOREACH a answers(Nq ) 5. Activesa = c c Actives; cq a 6. IF Activesa 7. THEN Inactivesa c c Inactives ; cq a Nq ) 8. Nq =a = induce tree(Activesa ,Inactivesa , Q 9. Actives? = c c Actives ; cq ? // ``?'' means ``unknown'' 10. IF Actives? 11. THEN Inactives? = c c Inactives ; cq ? 12. ELSE Inactives? = Actives Inactives Actives? 13. IF Actives? Inactives ? 14. THEN Nq =? = induce tree(Actives? ; Inactives ? ; Q Nq ) 15. RETURN N

;

;;

[

2

62

fj 2 6 ; =

= g

=f j 2

fj 2 =;

6= ; _

= g

= g

fj 2

[

[f g

= g

6= ;

,

[f g

Fig. 4. Clire's Top-Level Pseudocode

extraction records the questions and answers appearing in paths P and orders questions by their average node depth among all P . This three-step process addresses the following library design guidelines: 1. distinguish and rank questions according to their abstraction level, 2. eliminate questions that do not distinguish a case, 3. minimize the number of questions per case, and 4. promote common/shared questions among cases. Guideline 1 distinguishes context questions, which partition cases into logical topics, from con rmation questions, which are typically answered in few cases. We expect users to rst answer context questions (e.g., Can your printer print a self test?), and only later answer con rmation questions (e.g., Are you printing on the correct side of the paper?). Clire supports Guideline 1 by assuming context questions have higher usage frequencies, and thus selects them at higher nodes. Thus, context questions tend to be ranked before con rmation questions within cases. This satis es Guideline 4 and increases querying eciency, since context questions are the most discriminating. Clire's case-speci c feature selection mechanism addresses both Guidelines 2 and 3.

3 Initial Empirical Evaluation

We evaluated this simple implementation of Clire to determine whether its revision process can improve a library's CCBR performance, namely precision 12

Table 1. Case Libraries Used in the Experiments (Q=Questions, A=Answers) Name

Original Revised #Cases #Actions #Q #A #Q #A Printing 25 28 27 70 16 55 VMLS 114 227 597 710 83 395 ACDEV 3334 1670 2011 28200 1266 26827

and querying eciency. In our evaluation, querying was guided by selecting and answering questions, one at a time, based on a pre-selected target case description in the library. The performance task involved selecting a most similar case's solution. Precision is the percentage of conversations that yield a case whose solution (i.e., action sequence) solves the target problem. Retrievals occurred either when none of the ranked questions were included in the target case, when all of the target case's questions were answered, or when a most similar case's similarity (i.e., number of matches minus mismatches divided by case size) exceeded a pre-determined similarity threshold. Querying eciency is the number of questions that were asked before the retrieval occurred, where lower numbers denote higher eciency. We tested Clire on the three case libraries summarized in Table 1. Printing is used to diagnose printer failures. It is a simple library provided with Inference's products. VMLS, obtained from NSWC Port Hueneme personnel, provides technical assistance for maintaining a vertical missile launch system. ACDEV, from Circuit City's Answer City product, was designed to support branch store personnel. The rst library is fairly well designed, while the latter two are known to be problematic and are proprietary. Although Clire reduced the number of questions by between 36% and 86%, it is not clear whether it sacri ced precision or eciency. Therefore, we tested the original and revised libraries using our CCBR engine, which resembles a primitive CBR Express, and Rover, which simulates a human user. Rover cycled through the Printing and VMLS libraries for its queries, selecting each case in turn. For the much larger ACDEV, we instead randomly selected a single set of 100 cases as queries, which Rover used in each experiment while the CCBR engine retrieved from the entire case library. In our evaluation, we varied both K , the number of highest-ranking cases displayed, and Q, the number of highest-ranking questions displayed. When these values are set to the total number of cases and questions in a library, precision will be 100% (i.e., assuming no two cases are identical). However, this can decrease eciency. Alternatively, small values for these variables can greatly reduce precision because they reduce the probability that relevant cases and questions will be highly ranked, displayed, and selected. This will then prematurely terminate retrieval, and a case might be selected whose actions do not solve the query problem. Therefore, we set K and Q to intermediate values in this investigation. Rover can simulate conversation seeding, as is done in CBR Express with textual description elds, by a priori supplying the top n answers from a query. Thus, when n > 0, the initial question ranking in a conversation re ects having already answered n questions from the targeted case. We tested n = f0; 1g, 13

Table 2. Performance Results and Standard Deviations (Seeding: = 1) n

Library K Q Printing 4 6 6 12 VMLS 4 6 6 12 ACDEV 4 6 6 12

Original Revised Precision Eciency Precision Eciency 90.4% (3.2%) 1.5 (0.08) 98.0% (2.0%) 1.2 (0.02) 100.0% (0.0%) 1.8 (0.00) 100.0% (0.0%) 1.2 (0.00) 83.7% (2.2%) 4.3 (0.06) 89.3% (0.5%) 2.5 (0.01) 87.1% (0.3%) 4.5 (0.04) 96.8% (0.9%) 2.8 (0.01) 82.1% (0.7%) 6.6 (0.05) 85.5% (0.5%) 6.4 (0.00) 80.3% (0.5%) 6.6 (0.00) 85.7% (0.5%) 6.4 (0.00)

but report only n = 1 because, while seeding always increased precision and eciency, it did not a ect relative performance. The results, summarized in Table 2, are averaged over ten runs because ties in the ranked list orderings are randomly broken. We report standard deviations (in parentheses) rather than signi cance test results because we believe they are more informative under these circumstances.

4 Discussion, Related and Future Research This evaluation tested whether revising case libraries by inducing, editing, and extracting revised cases from a hierarchy can improve case library performance. Our results show that, under some circumstances, the revised libraries will improve precision and querying eciency. VMLS and ACDEV represent serious e orts; VMLS is undergoing extensive and prolonged manual revision in a two year project, while ACDEV has been abandonded due to the complexities of library design and maintenance. We shared the revised case libraries with their owners and have received strong encouragement to continue this research. Our evaluation did not investigate why Clire delivered performance improvements for these three case libraries. We recently investigated this issue in (Aha & Breslow, 1997). Clire modi es a case library in only two ways. First, it re-orders the hquestion,answeri pairs in each case. Second, it performs the casespeci c feature selection process described in Section 2. In an ablation study that isolated these two processes, we learned that case-speci c feature selection was the primary cause of performance di erence, and that question re-ordering had little e ect. This occurred most probably because Rover's approach to evaluating case library performance does not (yet) suciently mimic human behavior (e.g., it does not incorporate answer \noise" nor select questions not answered in the target case), and that enhanced versions of Rover might better demonstrate the utility of good hquestion,answeri orderings within cases. We have not found publications that discuss how CCBR libraries can be revised so as to improve retrieval performance. However, some researchers have described contexts in which queries are derived incrementally. For example, Tan and Schlimmer's (1990) CS-IBL incrementally evaluated features in a robotics task when every feature has non-zero cost, where the goals were to minimize feature cost and maximize classi cation accuracy. Their approach selected features to evaluate that maximized the ratio of expected match success to cost. 14

Smyth and Cunningham (1994) instead described a two-stage incremental CBR approach in the context where feature evaluation had either zero cost or a xed nonzero cost. The rst step used the zero cost features to retrieve a subset of cases that match a given query. In the second step, a dynamically generated decision tree was induced that selects, at each step, the feature that maximizes information gain for distinguishing the remaining cases. Unlike our approach, the question-answering processes used in these two approaches is not user-driven, and they do not use trees to revise case indices. However, Clire could bene t from these approaches for tasks where feature evaluation has nonzero cost. Other approaches for reindexing cases exist. For example, Fox and Leake (1995) described a failure-driven approach that selects alternative indices for cases in a planning task. Their research focused on introspective analysis to revise case indices so that more easily adaptable cases are retrieved; eciency was measured in the context of a traditional rather than a conversational CBR context. In contrast, our approach works on the entire library simultaneously, without knowledge of speci c retrieval failures. Clire would probably bene t from using a failure-driven process to modify case indices. The problem of processing unknown values when using decision trees for CBR was studied by Manago et al. (1993) in the context of the INRECA project. They describe the integration of CBR and tree induction methodologies whereby questions are automatically selected by reading the tree, induced from the cases, in a top-down manner. This process defaults to a CBR approach when a selected question's answer is unknown. While they focused on using decision trees to guide conversations, we instead use decision trees to revise case indices, and conversations with our CCBR engine are user-driven. Our CCBR engine needs further development before Clire's revisions can be expected to yield similar performance improvements for commercial CCBR systems. For example, methods for computing feature weights and splitting functions that evaluate a question's discrimination power should be examined. Also, Rover has many parameters for simulating human users that have not yet been systematically varied (e.g., it currently always selects the top-ranking case, although users will frequently select others). Finally, our current approach is knowledge poor; if domain speci c knowledge is available, it can be used to tune Clire, Rover, and the CCBR engine. Thus, we plan to test these systems under conditions that yield more realistic CCBR conversations.

5 Conclusion This is the second of three publications that describe the evolution of a methodology for revising conversational case libraries. Aha (1997) introduced the general approach. This paper describes its initial evaluation in a system named Clire (Case LIbrary REvisor). Finally, Aha and Breslow (1997) describe its extended evaluation, including an ablation study that isolates the reasons why it improves retrieval performance on a set of three case libraries. 15

Research on case authoring is motivated by a commercial need: it is dicult to build complex case libraries, even when given case authoring guidelines. Approaches that simplify this task, especially automated or semi-automated software assistants, are valuable. Clire is a rst example of this approach. We anticipate that future research will reveal more elaborate and e ective means for revising case libraries so as to support the case authoring process for conversational CBR systems.

Acknowledgements

Thanks to Ralph Barletta, John Grahovac, Kirt Pulaski, Scott Sackin, and Gene Scampone for providing the libraries used in our experiments and feedback on our results, and to Sally Jo Cunningham for her feedback. Thanks also to the organizing committee for their attentive comments in their reviews, which greatly improved this paper. This research was supported by the Oce of Naval Research.

References Aha, D. W. (1997). A proposal for re ning case libraries. In R. Bergmann & W. Wilke (Eds.) Proceedings of the Fifth German Workshop on CBR (TR LSA-97-01E). U. Kaiserslautern, Department of Computer Science. Aha, D. W., & Breslow, L. A. (1997). Re ning conversational case libraries (Technical Report AIC-97-004). Washington, DC: Naval Research Laboratory, Navy Center for Applied Research in Arti cial Intelligence. Cardie, C. (1993). Using decision trees to improve case-based learning. In Proceedings of the Tenth ICML (pp. 25{32). Amherst, MA: Morgan Kaufmann. Domingos, P. (1997). Context-sensitive feature selection for lazy learners. To appear in Arti cial Intelligence Review. Fox, S., & Leake, D. L. (1995). Using introspective reasoning to re ne indexing. Proceedings of the Fourteenth IJCAI (pp. 391{397). Montreal: Morgan Kaufmann. Manago, M., Altho , K.-D., Auriol, E., Traphoner, R., Wess, S., Conruyt, N., & Maurer, F. (1993). Induction and reasoning from cases. Proceedings of the First European Workshop on CBR (pp. 313{318). Kaiserslautern, Germany: Springer-Verlag. Ourston, D., & Mooney, R. (1990). Changing the rules: A comprehensive approach to theory re nement. Proceedings of the Eighth NCAI (pp. 815{820). Boston, MA: AAAI Press. Quinlan, J. R. (1993). C4.5: Programs for machine learning. San Mateo, CA: Morgan Kaufmann. Smyth, B., & Cunningham, P. (1994). A comparison of incremental CBR and inductive learning. In M. Keane, J. P. Haton, & M. Manago (Eds.) Working Papers of the Second European Workshop on CBR. Chantilly, France: Unpublished. Tan, M., & Schlimmer, J. C. (1990). Two case studies in cost-sensitive concept acquisition. Proceedings of the Eighth NCAI (pp. 854{860). Boston, MA: AAAI Press. Wogulis, J., & Pazzani, M. (1993). A methodology for evaluating theory revision systems: Results with Audrey II. Proceedings of the Thirteenth IJCAI (pp. 1128{1134). Chambery, France: Morgan Kaufmann.

16

Case Composition needs Adaptation Knowledge: a view on EBMT Michael Carl Institut fur angewandte Informationsforschung, Martin-Luther-Strae 14, 66111 Saarbrucken, Germany, [email protected] In this paper the problem of case composition in case based reasoning (CBR) systems will be addressed. Examples from the eld of Example Based Machine Translation (EBMT) are given. In all sorts of textual domains, it is dicult to isolate independent text units because they are related to varying degrees to the context from which they were taken. Case based reasoning systems (just like all other computing systems) may only treat units of nite length. The question then is how to decompose a text of potentially in nite length such that it can be recomposed is crucial to many areas of application that have sequential input. We will show that decomposition is closely related to adaptation and illustrate this with examples from Example Based Machine Translation.

1 Introduction In this paper the problem of case composition in case based reasoning (CBR) systems will be addressed. CBR systems solve new problems by retrieving from a case base solutions for similar problems which have previously occured, and performing case adaptation to t the retrieved cases to the new situation. The future performance of the system improves as new solutions are stored together with the problem in the case base. To minimize the memory for the case base, to reduce retrieval time and to increase the coverage of the system, a compositional storing of the cases would be strongly preferred. However, it is far from obvious whether or not a case is compositional and if so, how to decompose it. Therefore, in this paper we examine the interdependency of case decomposition and adaptation knowledge. We say a case is decomposable if it can be entirely divided into a sequence of non-overlapping substrings i.e. chunks, where each chunk can be reused to compose other cases. The case is non-decomposable if it cannot be divided into a sequence of chunks. While in many CBR systems retrieval is described as consisting of a case base and a distance measure, we introduce a decomposition component as a third element to retrieval. The case base contains chunks of cases and composed cases; the decomposition component divides a new problem into a sequence of chunks and the distance measure yields for each chunk a (set of) best matching case(es). The decomposition component heavily depends on the case base since 17

it can only divide a problem into a sequence of chunks that may be retrieved from the case base. We claim that a successful decomposition of a case depends on the knowledge of the adaptation mechanism. Both knowledge about the right choice of the case and the way it is decomposed in the retrieval phase is knowledge about the searched target concept. From Adaptation Guided Retrieval (AGR) (c.f. Smyth and Keane, 1995) it is known that the most similar cases are not always the most suitable for adaptation. By the same token, the easiest decomposition of a case in the retrieval phase is not necessarily the most suitable for adaptation. AGR asks how adaptation can cooperate with/in uence retrieval in order to produce optimal results. This raises two closely related questions: i) how must retrieval be designed such that the retrieved cases are suitable for adaptation, and ii) how must adaptation be designed such that it can cope with the retrieved cases. We focus here on the relation of case decomposition and adaptation knowledge and examine the interdependency of the contents of the case base and the adaptation knowledge. We thus reformulate question i) above into: how must the case base be designed such that the retrieved cases are suitable for adaptation?

2 Terminology We now give a more formal de nition of the terminology.

Concept Retrieval According to Globig and Wess (1995), CBR systems implicitly describe a set of concepts C . A pair of a case base CB and a distance measure dist classi es a set of concepts C : (CB; dist). Distance Measure

The distance measure is designed such that the distance between the same cases equals zero. If the distance between a case s and a case t equals zero, then both are instantiations of the same concept C .

dist(s; s) = 0 dist(s; t) = 0 ) C (s)  C (t)

(1)

While Globig and Wess introduce two properties of the distance measure (it can be informed or universal) we give a slightly di erent de nition: A distance measure dist1 is better informed than a distance measure dist2 if both classify the same set of concepts C and for all i > 1; CB1 is a subset of CBi :

8i; i > 1; C : (CB ; dist ); C : (CBi ; dist ) _ CB  CBi 1

1

2

18

1

(2)

The better a distance measure is informed, the more knowledge about the searched concept C it codes. The informed measure allows a minimal case base because each concept requires only one instantiation in the case base: if C (s)  C (t) then dist(s; t) = 0 else dist(s; t) = 1 The least informed distance measure implies the maximal case base because each distinct case corresponds to a di erent concept: if dist(s; t) = 0 then C (s)  C (t) else C (s) 6 C (t)

Case-Base A case base CB covers a set of examples S i there exists for each example si 2 S at least one case tj in the case base where both si and tj are instantiations of the same concept C : cover(S; CB ) , 8i; si 2 S; 9j; tj 2 CB : C (si )  C (tj ) (3) A case is atomic if it cannot further be decomposed; otherwise it is decom-

posable. A case base is atomic if it contains only atomic cases. Notice that atomicity of the case base and informativity of the distance measure are not related. An atomic case base can go along with the maximal informed distance measure and vice versa. We will see some examples of these settings in the next section. But the number of concepts that are classi ed by a maximally composed case base is likely to be much bigger (it tends to be in nite) than the number of concepts that are classi ed by an atomic case base.

Case Decomposition A case s is decomposable, i it can entirely be divided into a sequence of

non overlapping chunks s1 : : : sn where the intersection of the chunks' concepts equals the concept of the case s:

dcomp(s) = s1 : : : sn , C (s) 

n \

i=1

C (si )

(4)

The granularity of a decomposition dcomp1 is ner than the granularity of a decomposition dcomp2 with respect to a set of cases S , if for all cases si 2 S the number of chunks produced by dcomp1 is greater than the number of chunks produced by dcomp2 . 8i; si 2 S : jdcomp1 (si )j > jdcomp2 (si )j (5) The granularity of a decomposition is driven by the case base. The more the case base tends to be atomic, the ner the granularity of the decomposition will be. An atomic case base implies the nest decomposition. If the case base only contains maximal composed cases, the decomposition can be coarse in the sense that it yields for each case only one chunk, the case itself (i.e. there is no decomposition). 19

Case Adaptation

A sequence of chunks t1 : : : tn is adaptable, i all chunks can be composed into one case t where the intersection of the chunks' concepts equals the concept of the case t:

adapt(t1 : : : tn ) = t ,

n \

i=1

C (ti )  C (t)

(6)

An adaptation mechanism adapt1 is more complex than an adaptation mechanism adapt2 with respect to a set of cases S if adapt1 composes a ner decomposition for all cases si 2 S than adapt2 :

8i; si 2 S : C (adapt (dcomp (si )))  C (adapt (dcomp (si ))) _ dcomp is ner grained than dcomp 1

1

2

1

2

(7)

2

From (5) and (7) it follows that the more atomic the case base is, the more complex is the adaptation mechanism.

Compositional Cases

A case s is compositional, if 1) it is decomposable into a sequence of chunks, 2) the case base covers all decomposed chunks and 3) the retrieved cases are adaptable. We thus obtain a chain of conceptual equivalence during all processing steps. \ \ C (s)  C (si )  C (ti )  C (t) (8) i

i

The rst equivalence in (8) is achieved by the decomposition function (4); the second equivalence is due to retrieval (1) if the case base covers all chunks as in (3). Adaptation (6) yields the third equivalence. Given that the case base covers chunks, (i.e. for all decomT all decomposed T posed chunks si the equivalence i C (si )  i C (ti ) in (8) holds) the question we will focus on in this paper is the relation between the granularity of decomposition and the complexity of adaptation. The relation will become more clear in the next section where we examine MT systems.

3 The case of Machine Translation In Machine Translation (MT) compositionality is crucial to attaining a reasonable coverage. One of the main problems in MT is due to transfer strategies: how and when to translate a unit of the source language into a unit of the target language. Interdependency of two or more units does not prevent these units from being independently translated. For instance, agreement between the subject and the predicate of a sentence does not necessarily imply non-compositionality of the sentence. But if they are compositionally translated, the adaptation mechanism must be powerful enough to reconstruct the agreement requirements of the target language. 20

3.1 Decomposition/Adaptation The underlying hypothesis of all MT systems is that units derived from the source language string can be mapped onto units from which the target language string can be computed. Word-for-word translation rarely yields understandable target language strings and thus is not an appropriate basis for translation. On the other hand language is claimed to be compositional; the problem then is how to decompose a source language string into units that can be mapped onto a correct target language string. We claim that whether or not the respective source and target language strings are considered to be compositional translations of each other depends on the adaptation knowledge of the system. Several degrees of decomposition are considered together with the required adaptation knowledge. If the sentences consist of only one chunk i.e. no decomposition takes place as in (9)1 , no adaptation is required. This is typical for Translation Memories (TM) (e.g. TRADOS, TRANSIT) which have only a case base and a retrieval component2 . The proposal will not now be implemented

{z j }|

| z

}

(9)

{

Les propositions ne seront pas mises en application maintenant

Two mapping patterns can be achieved when decomposing the sentence into the two chunks /The proposal/ and /will not now be implemented/ as in (10). Notice that the English subject (the rst chunk) is in singular, while its French translation is in plural. Because subject and predicate agree in number and person, the (auxiliary) verbs in the second chunk take the respective same features. The adaptation mechanism has therefore to be able to reconstruct these agreement requirements. will not now be implemented

The proposal

| z

{z j }|

}

|

{z j }|

{ z

}

(10)

{

Les propositions ne seront pas mises en application maintenant

If the sentence is divided into the three chunks /The proposal/, /will not now be/ and /implemented/ as in (11) adaptation turns out to be much more complicated. It has to reconstruct agreement between the rst and the second chunk (in the French translation for the third chunk too). Further, adaptation must take into account the discontinuity of the second chunk. Thus, although now and maintenant are translations of each other, they are separated by the interposed chunk 3 on the French side. The adaptation mechanism has to be able 1 2

Example taken from (Brown et al., 1990) Some TM, however, propose as an extra an interactive or a batch MT system

21

to integrate a discontinuous chunk into a continuous one when translating from French to English and the inverse when translating from English to French.

{z now be} | not } will j { z }| { z

The proposal

| z

{z j }|

implemented

|

{z j }|

}

(11) maintenant

{z

Les propositions ne seront pas mises en application

}|

{

In (12) almost each word corresponds to a proper chunk. The case base may be atomic; in reverse, the adaptation mechanism needs to be very powerful and would require a parser or some similar mechanism. Many MT systems can be found that t into this approach, but they are not actually CBR systems. The

proposal

will

not

now

be

implemented

, HHHH ,, , HH  , ,  ,  ,  HHH HHH ,, ,, 

Les propositions ne seront pas

mises en application

(12)

maintenant

3.2 Discussion In practice, there are many ways to decompose a sentence depending on the settings of the system as discussed in section 2. A few of the possible settings will be considered in the remainder of this section. Many traditional MT systems have an atomic case base and a least informed distance measure. According to schema (12), sentences are most ne grained and the whole translation is carried out by the `adaptation' mechanism. All three generations of MT systems (c.f. Whitelock and Kilby, 1995) take part of these settings: the direct approach seeks to map lexical items of the source language onto lexical items of the target language and then tries to rearrange the target test. The interlingual approach (c.f. Dorr, 1993) tries to calculate a language independent meaning representation from which the target text is generated. The transfer approach (c.f. Streiter, 1996) is situated in between the two: abstractions of the source language string are computed and then transferred (mapped) into target units from which the target language string is computed. Traditional MT systems do not make use of large chunks that could facilitate the adaptation mechanism. They thus fail to account for what computers can most easily do: memorization and retrieval. In contrast to traditional MT systems there are Translation Memories (e.g. TRADOS (Heyn, 1996), TRANSIT) that do not decompose cases at all according to schema (9). They are likely to have rather long decomposable cases stored in the case base, but have a well informed distance measure to return similar cases from the case base. However, due to lack of adaptation, the less the retrieved 22

case matches the initial example, the more incomplete and incorrect they are. A major shortcoming is the linear growth of the case base while coverage still remains very restricted. Translation Memories are designed for repetitive texts and fail to exploit the inherent generativity of CBR systems. In between these two settings EBMT is located in the proper sense. Some EBMT systems (c.f. Sato and Nagao, 1990; Collins and Cunningham, 1995, 1996a,b) do not decompose cases to minimize adaptation requirements of the target chunks. Instead, by means of linguistic analysis, major constituents are localized in the cases which then are related to each other in the source and target part of the case. A mappability score of cases is recursively calculated through the mappability of the constituents. Sato and Nagao (1990) propose two criteria to choose among the possible intersentential chunking: { prefer large chunks { prefer chunks where the environment of each chunk ts best into the environment of the retrieved cases While these measures yield a real-valued chunking score, ReVerb (Collins and Cunningham, 1996b,a) uses a mappability scale of 0 to 3 to classify correspondence between source and target language chunks. A mappability of 3 is attributed to chunks that map in a 1:1 fashion. A mappability of 2 indicates a di erence of some words in the chunk but the syntactic functions are the same. Mappability of 1 indicates di erent syntactic functions but lexical correspondence in the chunk. A mappability of 0 is attributed to all other chunks. Adaptation in these systems is essentially a matter of replacing target language words inside the appropriate chunk; linguistic analysis serves to better determine the location and the appropriateness of potential words to be replaced in the target language case. Other EBMT systems decompose and adapt cases by means of di erent knowledge resources. In the Pangloss EBMT (Brown, 1996; Nirenburg et al., 1994) cases are selected from the case base that contain the problem case (or parts of it) as a substring. By means of a thesaurus and a bi-lingual lexicon the translation of the problem case (or its respective part) is extracted from the retrieved cases. Adaptation of the target language chunks is left for a statistical language model outside the Pangloss EBMT system. Pangloss EBMT only supports word to word and chunk to chunk mapping. Adaptation of the chunks, which is supposed to require the actual knowledge resources, is not described inside the system.

4 Conclusion In this paper, we discussed the problem of case composition in case based reasoning (CBR) systems. We claimed that case decomposition is linked on the one hand to the granularity of the case base and on the other to the adaptation knowledge that triggers the recomposition of the decomposed cases. We gave a 23

formal de nition of the terminology and showed that during all processing steps conceptual equivalence must be preserved. A set of prototypical MT systems is classi ed according to the terminology introduced. These include traditional MT systems (transfer based, interlingua and direct approach), Translation Memories (TM) and Example Based Machine Translation. We believe that EBMT closes the gap between purely memory based systems such as TM and purely rule based systems such as many traditional MT systems.

Peter F. Brown, J. Cocke, Stephen A. Della Pietra, Vincent J. Della Pietra, F. Jelinek, Robert L. Mercer, and P.S. Roossin (1990). A statistical approach to machine translation. Computational Linguistics, 16:79{85. D. Ralf Brown (1996). Example-Based Machine Translation in the Pangloss System. In COLING-96. Brona Collins and Padraig Cunningham (1995). A methodology for example based machine translation. In Proceedings of CSNLP-95, Ireland. Brona Collins and Padraig Cunningham (1996). Adaptation guided retrieval in EBMT: A case-based approach to machine translation. In Advances in CBR, LNAI 1168, pages 91{104. Springer. Brona Collins and Padraig Cunningham (1996). Translating software documents by example. An EBMT approach to Machine Translation. In International ECAI Workshop: Multilinguality in the Software Industry, Budapest. Bonnie Jean Dorr (1993). Machine Translation: A View from the Lexicon. MIT Press, Cambridge, Massachusetts. London, England. Christoph Globig and Stefan Wess (1995). Learning in case-based classi cation algorithms. In LNAI 1168, Berlin, New York. Springer. Matthias Heyn (1996). Integrating machine translation into translation memory systems. In European Asscociation for Machine Translation - Workshop Proceedings, pages 111{123, Geneva. ISSCO. Sergei Nirenburg, S. Beale, and C. Domashnev (September 1994). A full-text experiment in example-based Machine Translation. In International Conference on New Methods in Language Processing (NeMLaP) 94, pages 78{87, Manchester. Barry Smyth and Mark T. Keane (1995). Experiments on adaptation-guided retrieval in case-based design. In CBR-Reasearch and Development. Springer Verlag. S Sato and M Nagao (1990). Towards memory based translation. In COLING-90. Oliver Streiter (January 1996). Linguistic Reference Manual of the CAT2 Machine Translation System, Version 0.2. Martin-Luther-Strae 14, 66111 Saarbrucken, BRD. Peter Whitelock and Kieran Kilby (1995). Linguistic and computational techniques in Machine Translation system design. Computational Linguistics. UCL Press, London.

24

Distributed Representations for Analogical Mapping Barry Kristian Ellingsen Department of Information Science, University of Bergen, 5020 Bergen, Norway barry.ellingsen@i .uib.no

Abstract. Mapping is an important process of analogical reasoning,

where correspondences between elements of a previous solution and a new problem are established. After recalling potential analogs from a long-term memory based on surface similarities, the following mapping process captures deeper similarities by establishing correspondences between elements based on common relational structures. An analogical mapping process is also constrained by a semantic component, which increases the probability for establishing correspondences between elements in the two analogs with similar meaning. A technique for applying neural networks in the analogical mapping process that includes both structural and semantic constraints is described in this paper. An experiment from object-oriented software engineering is presented, and reports promising results.

1 Introduction There are several computational models of analogical reasoning (AR), where SME (Structure Mapping Engine) (Falkenhainer et al., 1989) and ACME (Analogical Constraint Mapping Engine) (Holyoak and Thagard, 1989) are the dominating. AR encompass four major phases: retrieval, mapping, transfer, and learning. Although all phases are important for obtaining a successful analogy, both ACME and SME emphasize mapping as the challenging process of AR. Computational models of AR rely on a model of knowledge representation. ACME and SME rely on symbolic models, where the structural relationships and the semantic description of the elements in the analogs have a discrete and atomic representation. Typically, symbolic models use some kind of concatenative compositionality to process the structures (van Gelder, 1990). An alternative model is the distributed representation, where knowledge is spread as activities over the many units in a neural network. Here, the representation is continuous and a functional compositionality is used to process the structures. This means that a distributed representation does not maintain an internal syntactic structure of the analogs, which is the case for symbolic models. Distributed connectionist models have been criticized for lacking the ability to represent compositional structures (Fodor and Pylyshyn, 1988). Others, e.g., (van Gelder, 1990; Chalmers, 1990), claim that neural networks are in fact ca25

pable of processing compositional structures, such as lists, trees, and directed graphs. In the ROSA project (Reuse of Object-Oriented Speci cations trough Analogy) (Tessem et al., 1994) the aim is to apply AR in the reuse of object-oriented analysis components. The analysis components are considered to be more abstract than design and implementation components, and it is expected that these have a high potential for reuse. For analysis speci cation the OOram (ObjectOriented Role Analysis and Modeling) role model concept (Reenskaug et al., 1996) is selected. A reuse activity based on AR therefore includes a mapping of OOram role model components from a previous solution to a new problem domain. The aim of this paper is to propose a model for a connectionist-based analogical mapping of OOram role models and report experimental results. By combining the distributed representations of the role models with a measure of semantic similarities between the role model components, an analogical mapping model that includes both structural and semantic constraints is obtained.

2 Distributed Representations and Analogical Mapping The kinds of problems contemplated here are OOram role models of objectoriented analysis speci cations that have compositional structures. In the symbolic paradigm the role models obtain a discrete representation, such as lists in a LISP program, and its operations are, in essence, concatenative. An adequate mapping between two analogs is heavily constrained by both structural and semantic similarities. A symbolic model of analogical reasoning, such as SME, approaches the mapping by computing match hypotheses guided by a set of rules. An alternative model of analogical mapping is to compute distributed representations from a neural network, from which a mapping can be found. In contrast to symbolic representations, distributed representations of a structure are spread among several units a neural network. Importantly, the distributed representations preserve the compositionality of the structures over a xed set of processing units in a continuous vector space. A generalization is obtained in that the network compute similar distributed representations for similar input patterns. Given that two distributed representations are close in the Euclidean space, we may assume that they have similar features and, thus, establish a mapping between the two. The Labeling Recursive Auto-Associative Memory (LRAAM) (Sperduti, 1993) is a neural network that can encode variable-size arbitrary structures, such as directed cyclic graphs, into xed-size distributed representations. Figure 1 shows the basic LRAAM architecture. The objective is to learn the identity function x = F (x; w), where x is an input vector of a graph vertex and w is the set of connection weights in the network. The LRAAM network is an extension of Pollack's (1990) RAAM network which is limited to encode tree-like structures. Each vertex in the graph is in the data set coded as a vector with a unique label r of c binary elements and n pointers of m real numbers to its connected ver26

\tex{${\rm label}$} \tex{${\rm \tex{${\rm branch}_1$}branch}_2$} \tex{${\rm branch}_n$} \tex{$c\;{\rm \tex{$m\;{\rm units}$} \tex{$m\;{\rm units}$} \tex{$\dots$} units}$} \tex{$m\;{\rm\tex{${\rm units}$} output\; layer}$}

\tex[r][r]{${\rm decoder}$}

\tex{$m\;{\rm units}$}

\tex{${\rm hidden\; layer}$}

\tex[r][r]{${\rm encoder}$}

\tex{$m\;{\rm \tex{${\rm units}$} input\; layer}$} \tex{$c\;{\rm \tex{$m\;{\rm units}$} \tex{$m\;{\rm units}$} \tex{$\dots$} units}$} \tex{${\rm label}$} \tex{${\rm \tex{${\rm branch}_1$}branch}_2$} \tex{${\rm branch}_n$}

\tex[r][r]{${\rm data\; set}$}

dataset

Fig. 1. Basic LRAAM architecture. tices. For a graph with g vertices, the data set is de ned as X = [x1 ; x2 ; : : : ; xg ]T . Special \nil" symbols are used as pointers if the number of outgoing edges for a vertex is less than n = maxfoutdeg(v)g, where v is a vertex in the graph. Trained by backpropagation the LRAAM learns the identity function F by encoding each vertex into a distributed representation in the m-dimensional vector space and decode it back to its original constituents. After every presentation of the entire data set, the activation values of the hidden layer units are used to update the pointers in X. Assuming that the LRAAM network is able to learn the data set perfectly, that is, the sum of squared errors approaches 0, the encoder network, denoted by the function zi = Fe (xi ; we ), is applied to compute the distributed representation zi for the graph vertices i = 1; 2; : : : ; g. Vector we denote the weights from the input to the hidden layer units. An OOram role model, as the one in Figure 3, can be represented as a directed graph where the roles are vertices and the messages sent through the ports are edges in the graph. Given two role models, one base and one target, if vertices have similar relational structures in the two graphs, the LRAAM network computes similar distributed representation of the two. Analyzed in the m-dimensional Euclidean space, if the distance between two vertices, one in the base and the other in the target, is low, they have similar features and strengthen the probability for establishing a mapping between the two. The distributed representations computed by the LRAAM network tend to be of relatively low dimensions (typically m < 10). Non-connectionist-based models for computing distributed representations of compositional structures, such as tensor products (Smolensky, 1990) and holographic reduced representations (Plate, 27

1994), tend to be of very high dimensions. Primarily, low-dimensional representations are preferred for computing simple distance measures to impose structural constraints, and that these must preserve the arbitrary compositionality, such as cycles, of the structures.

2.1 Mapping Algorithm

The OOram role model provides a collaborational view of the objects in a system, where an object can play di erent roles in various contexts. The roles in a model collaborate with each other by message passing through ports to complete an activity. Each component in a role model, that is, role, port, and message, is described with terms, which gives them a \meaning." The ports de ned the cardinality of the collaboration, where a single-circle port assert that the role knows about one occurrence of the collaborating role, a double-circle assert that the role knows about any number of occurrences of the collaborating role, and no port indicate that no knowledge about the remote role. Thus, a role model is characterized by the collaborating structure and their semantic descriptions. The degree of similar meaning of two terms is obtained from lexical analysis by measuring the \closeness" as a function of the weighted relations between the terms in semantic network. Given directed graphs of two role models, where Gb is the base and Gt is the target, the matrix S represents the semantic similarities between the vertices in the two graphs, where 0  Sij  1. A value close to 1 indicates a high degree of semantic similarity, whereas a value close 0 indicates a low degree of semantic similarity. Three components provide the input to the mapping algorithm: Two graphs, one representing the base role model and the other representing the target role model, and a matrix representing the semantic similarities between vertices in the graphs. The mapping algorithm is conceptually illustrated in Figure 2. Distributed representations of the two graphs are computed by the LRAAM networks, from which a matrix D of Euclidean distances is derived. From combinatorial optimization the hungarian assignment algorithm (Papadimitrou and Steiglitz, 1982) is applied to obtain a one-to-one mapping on matrix S. The underlying assumption for the mapping algorithm is that we have a tentative mapping derived from matrix S, and want to impose structural constraints from homomorphic structures in the two graphs. In each iteration of the mapping algorithm the Euclidean distances between mapped pair of vertices are used to update matrix S. If m(i) = j denotes a mapping derived from matrix S, where i is a vertex in the target graph and j is a vertex in the base graph, and the Euclidean distance between the mapped pair in Dij reports a high value, the entry Sij is \punished" by subtracting a certain value to decrease the probability for this mapping to occur. Then, if a sucient amount of \punishment" has taken place in S, a new assignment derived from the hungarian algorithm results in a relabeling of the base graph vertices. The labeling technique used in the LRAAM network implies that the binary label code is embedded in the distributed representations. A consequence of this is that the formation of distributed representations may become biased towards 28

target graph

target LRAAM learning

base LRAAM learning

distance matrix

base graph

base graph label recoding

update

new label coding scheme

semantic matrix

assignment

final mapping

Fig. 2. Conceptual view of the connectionist-based analogical mapping model. similar label codes rather than similar structures. Previous study of the LRAAM network1 indicates that for the LRAAM network to compute similar distributed representations for vertices in similar structures are strengthened if the label coding is identical. Therefore, a relabeling technique is applied in the mapping algorithm where the label coding for the base graph vertices are modi ed based on the assignments on S. Given the mapping m(i) = j derived from S, where i denote a vertex in the target graph and j denote a vertex in the base graph, the label code for the base graph vertex j is updated by rj = ri . If a relabeling of the base graph vertices takes place, the LRAAM network computes new distributed representations for the graph, and matrix D of Euclidean distances are recalculated. Throughout the iterations of the mapping algorithm the magnitude of the update rate (\punishment"), de ned as 1=m(i=m)+1, where m is the dimension of the distributed representations and i is the iteration step, decreases asymptotically until no more update occurs. The nal mapping is given by the assignment on the updated matrix S.

3 Experiments Figure 3 shows two similar, but not isomorphic, OOram role models from two di erent domain, one library and one wholesaler. Given that the library model is the existing base and the wholesaler model is the new target, we want to nd 1

Reported in an unpublished PhD thesis: Ellingsen, B. K. Distributed Representations for Analogical Mapping, Department of Information Science, University of Bergen.

29

a mapping from the target to the base. The dashed roles denote the stimuli roles, which initiate the activity in the two models. The role models are represented as directed graphs, where Gb = (V; E ) is the library base graph and Gt = (V; E ) is the wholesaler target graph. The set of vertices for the base graph is V (Gb ) = fborrower, borrower data, borrower db, dep, ext library, library, library ass, ref data, ref db g and for the target graph V (Gt ) = fcustomer, customer data, customer reg, factory, part data, part archive, sold part, store ass, wholesaler g.

library

borrower database

wholesaler

factory

customer reg.

borrower data library assist.

borrower

customer

store assist.

customer data

univ. depart. ref. data

reference database

external library

part data

part archive

sold parts

Fig. 3. Two analogue OOram role models from two di erent domains. The left model shows a loan activity for a university library, whereas the right model shows a sales activity for a wholesaler of marine outboard parts.

Table 1 lists the results of applying the proposed mapping algorithm on the role models, with an initial mapping derived from matrix S only and a nal mapping derived from the iterative process constrained by the distributed representations. The target roles are listed in the leftmost column, and the rightmost column lists a one-to-one mapping scheme de ned manually as a performance indicator. The column Sij in the initial and nal mapping lists the values in S for the mapping m(i) = j , and column Dij lists the corresponding p Euclidean distance in D for the same mapping. Note that Dij is scaled by m to obtain values in the range [0; 1]. Both LRAAM networks, one for the target and one for the base graph, have identical con gurations, where the maximum branching for the graphs is n = 6, m = 4 real number per pointer, and c = 6 binary number per label. Thus, the topology for the networks is 34{4{34, which yield 4-dimensional distributed representations. A learning rate of 0:3 was set for both networks. The results in Table 1 lists ve initial mappings which are inconsistent with the desired scheme, i.e., m(customer reg ) = dep, m(part data ) = ref db, m(part archive ) = library, m(sold part ) = ref data, and m(wholesaler ) = borrower db. All the incorrect mappings reports relatively high Euclidean distances between the distributed representations of the mapped graph vertices. After ten itera30

initial

target base customer borrower customer data borrower data customer reg dep factory ext library part data ref db part archive library sold part ref data store ass library ass wholesaler borrower db

Sij

0.92 0.90 0.53 0.63 0.61 0.47 0.73 0.64 0.85

Dij

nal

0.301 0.230 0.488 0.111 0.531 0.474 0.249 0.015 0.566

base borrower borrower data borrower db ext library dep ref db ref data library ass library

Sij

0.84 0.84 0.73 0.63 0.47 0.07 0.70 0.64 0.41

Dij

0.087 0.231 0.070 0.086 0.225 0.479 0.275 0.043 0.013

scheme

base borrower borrower data borrower db ext library ref data ref db dep library ass library

Table 1. Results of the mapping of the library/wholesaler role models. tions no more update was reported in matrix S, and the mapping algorithm terminated. In nine of the ten iterations the base LRAAM network was invoked to compute new distributed representations after relabeling of the base graph vertices. The total number of learning epochs for the LRAAM networks (target invoked one and base invoked nine times) was 11363. In the nal mapping we see that values in matrix S has been updated to impose constraints derived from the distributed representations. Three of the ve incorrect mappings are now consistent with the scheme. For mapped vertices that are consistent with respect to identical label coding and structure, low Euclidean distances are reported. For example, the correct mapping m(wholesaler ) = library reports the low Euclidean distance 0:013. The remaining incorrect mappings, m(part data ) = ref data and m(sold part ) = dep, are interchanged as both ref data and dep are terminal vertices and similar distributed representations are computed for the two.

4 Conclusion The mapping phase of AR is strongly dominated by the symbolic paradigm. This paper presented an algorithm, whose underlying model of representation is distributed rather than symbolic. Directed graphs representing structures from the domain of object-oriented software engineering were used to test the mapping algorithm. Both structural and semantic constraints were taken into consideration in the mapping. The results shows that Euclidean distances between the vertices in a graph can be used to constrain the mapping structurally. The mapping algorithm have, however, several limitations. First, the balancing of the two types of constraints is non-deterministic, and must be set by the analogist. Second, the hungarian algorithm implies a one-to-one mapping, which, in some cases, can be too restrictive; many-to-one mappings are frequently desired in analogical mapping. Third, the computational intensity of the proposed mapping algorithm depends on the LRAAM network; frequent changes in the semantic matrix im31

ply many invocations of the LRAAM network due to recoding of the base graph labels. Experiments have shown that if certain vertices, especially the root vertices, are mapped initially, the mapping algorithm tend to be less computational intensive. Otherwise, if a mapping derived from the semantic matrix does not re ect the structural similarity between the two graphs, the mapping algorithm tend to be more computational intensive. This issue also relates to the scalability of the mapping algorithm: If the number of vertices in the graphs is high, the more important it becomes that the matrix of semantic similarities persevere some of the structural similarities of the graphs. Sensitivity analysis of the mapping algorithm indicate that if this cannot be obtained, mapping of large graphs will become dicult.

Chalmers, D. J. (1990). Why fodor and pylyshyn were wrong: The simplest refutation, Proceedings of the Twelfth Annual Conference on the Cognitive Science Society, Lawrence Erlbaum Associates, Hillsdale, NJ, pp. 340{347. Falkenhainer, B., Forbus, K. D. and Gentner, D. (1989). The structure-mapping engine: Algorithm and examples, Arti cial Intelligence 41: 1{63. Fodor, J. A. and Pylyshyn, Z. W. (1988). Connectionism and cognitive architecture: A critical analysis, Cognition 28: 3{71. Holyoak, K. J. and Thagard, P. (1989). Analogical mapping by constraint satisfaction, Cognitive Science 13: 295{355. Papadimitrou, C. H. and Steiglitz, K. (1982). Combinatorial Optimization: Algorithms and Complexity, Prentice Hall, Englewood Cli s, NJ. Plate, T. A. (1994). Distributed Representations and Nested Compositional Structure, PhD thesis, University of Toronto. Pollack, J. B. (1990). Recursive distributed representations, Arti cial Intelligence 46(12): 77{105. Reenskaug, T., Wold, P. and Lehne, O. A. (1996). Working With Objects. The OOram Software Engineering Method, Manning Publications Co. Smolensky, P. (1990). Tensor product variable binding and the representation of symbolic structures in connectionist systems, Arti cial Intelligence 46(1-2): 159{216. Sperduti, A. (1993). Labeling RAAM, Technical Report TR-93-029, International Computer Science Institute, Berkeley, CA. Tessem, B., Bjrnestad, S., Tornes, K. and Steine-Eriksen, G. (1994). ROSA = Reuse of Object-oriented Speci cations through Analogy: A project framework, Technical Report No. 16, ISSN 0803{6489, Dept. of Information Science, University of Bergen. van Gelder, T. (1990). Compositionality: A connectionist variation of a classical theme, Cognitive Science 14: 355{384.

32

Learning from Sequential Examples: Initial Results with Instance-Based Learning Susan L. Epstein1 and Jenngang Shih2 1 Department of Computer Science, Hunter College and The Graduate School of The City University of New York, New York, NY [email protected] 2 Department of Computer Science, The Graduate School of The City University of New York, New York, NY [email protected]

Abstract. This paper postulates an approach to planning from a se-

quence of instances. Sequential instance-based learning (SIBL) generates a sequential hierarchy of planning knowledge from which to formulate plans and make decisions. We report here on the application of SIBL to the game of bridge. Initial results indicate that examples applied in a sequentially dependent manner more often select correct actions than if the examples were used independently. SIBL suggests how empirical learners for classi cation problems may be extended to learn to plan. The contributions of this paper are the formulation of planning as a sequence of related instances, and a demonstration of the ecacy of majority vote with SIBL in the domain of bridge.

1 Introduction The thesis of this work is that an empirical learning algorithm for classi cation can be systematically extended to learn knowledge for planning, where a planning problem is viewed as a sequence of related classi cation problems. We call the application of this idea to IBL sequential instance-based learning (SIBL), and show here how it can learn to plan from instances, also known as cases or exemplars. In a typical implementation, case-based reasoning (CBR) retrieves a set of relevant cases of the current problem, reuses the most relevant case, revises the retrieved case according the result required by the application, and retains the problem as a new case (Aamodt and Plaza, 1994). Although CBR has been applied to planning before, most CBR planners rely heavily on domain knowledge to manage the inherent complexity (Alterman, 1988; Hammond, 1989; Marks et al., 1988; Turner, 1988). Manual construction and maintenance of such a knowledge-intensive system is both costly and dicult. An alternative is empirical learning, where the system accepts examples as input and produces concept descriptions (Michalski, 1983; Quinlan, 1986). Most empirical learners, however, are intended for classi cation problems. In the context of CBR, for example, instance-based learning 33

(IBL) represents both the input examples and the output concept descriptions as feature-value pairs, and most IBL algorithms retain the prototypical examples for reuse (Aha, 1992; Aha et al., 1991). The focus of our work is to extend an empirical learning paradigm for classi cation, such as IBL, to learn concept descriptions for planning. The next sections introduce sequential dependency, the domain of bridge, and SIBL. Subsequent sections describe the experiments, their results, and related and future work.

2 Sequential Dependency In classical AI planning, a planning problem is speci ed by an initial state, one or more goal states, and a set of operators that transform one state to another. A plan is a sequence of operator instantiations (actions) that transforms the world from the initial state to a goal state (Hendler et al., 1990). Most planning algorithms rely mainly on the current state description to select an action; the dependency among the sequential events is implicit. SIBL views sequential events as dependent, rather than independent. In a sequence of n instances (s1 , : : :, si , : : :, sj , : : :, sn ), we say that sj sequentially depends on si for all 1  i  j  n. Here si is called the predecessor and sj the successor. Because it has in part formulated the current state and the action available in it, a predecessor in uences its successor. Similarly, a successor is in uenced by its predecessors to take an action. Thus, while the dimension of a classical AI planner's state description is xed, the dimension of SIBL's state description is variable, depending on how far back one looks at the predecessors. Given this information, one way to plan is to select a correct action based on a sequence of past events of a certain length. If the selected action is based on too short a sequence, there may not be enough information. If the selected action is based on too long a sequence, the decision may over t a unique sequence. One resolution of this issue is to use the majority vote, which selects the action that is the majority among a set of similar sequences of various lengths. Let si : : : sj ! ak denote that the sequence of events from si to sj recommends the action ak . Then, for example, under majority vote the set of recommendations fsa sb sc ! a1 , sx sy ! a1 , sz ! a2 g would select a1 . Majority vote has been one of the primary parameters for the IBL family of algorithms (Aha, 1992; Aha et al., 1991; Cost and Salzberg, 1993; Dasarathy, 1991). It is an intuitive approach; if the majority of sequences under consideration point to the same action, the likelihood that the action is correct should be good. This paper applies majority vote in SIBL to the game of bridge.

3 Bridge Bridge is a four-player planning domain. To begin, the 52 distinct cards are dealt out, that is, distributed equally among the players. The game then has two phases: bidding and play. During bidding, a speci c number of tricks for 34

winning (the contract) is determined. During play, one of the contestants (identi ed during bidding) is the declarer and another is the dummy. The declarer tries to achieve the contract, controlling both his or her cards and the dummy's, while the other two contestants try to defeat it. (After the rst card is played, the dummy's cards are exposed on the table for all to see.) Play consists of 13 sequential tricks; a trick is constructed when each contestant in turn plays a single card. The problem addressed here is to design a sequence of actions (card plays) that guides a bridge player to reach a speci c goal. Play is viewed here as a planning task. The search space for bridge is large. There are 13 cards in each of four suits, and during a trick each player must play a card matching the suit of the rst card (the lead) in the trick whenever possible. Thus the branching factor is approximately n(n=4)3 for each trick, where n is the number of cards each player holds at the beginning of the trick, and n/4 is the average number of choices the other players have in the suit. For example, on the rst trick, when all players still have 13 cards, the branching factor is roughlyQ13(13=4)3 = 446. 3 For a complete deal, the number of possible plays will be 13 n=1 n=(n=4) = 15 5  10 . Since there are

     52 13

39 13

26 13

13 = 5  1028 13

possible ways to deal the cards, there are roughly 3  1044 possible decision states in bridge. As humans see it, however, the correct decisions in all those states are not independent of each other. How one decides which card to play in a given state depends in part on what cards have been played previously. In so large a space, domain knowledge is required to play well. The knowledge that we propose to learn is how to link decision sequentially.

4 Representation We represent each input example as a set of feature-value pairs. The features represent the state of the world (the situation in which one is expected to play a card); the associated action is the card selected. State features describe which players hold which cards, as well as bookkeeping information, such as the trick number and how many tricks have been won by each side. Action features represent the associated action. In the work on three no trump deals described here, the highest card in the suit led takes the trick. (Card precedence is 2, 3, 4, ., Queen, King, Ace.) Since the most likely cards to win tricks are the high cards, such as Ace, King, Queen, Jack and 10, those below 10 are considered indistinguishable and represented by a common symbol. The output concept descriptions can be structured in an AND/OR tree-like sequential hierarchy as in Figure 1. Figure 1 shows a contract of \three no trump" planned in three components: four tricks in which spades are led, three club tricks, and two heart tricks. (The other tricks are disregarded in this plan.) In 35

turn, each trick has components that describe behavior. A sequential hierarchy is a set of sequentially ordered partial sequences organized in a hierarchical structure. A partial sequence, such as \four spades" in Figure 1, is a partial solution that may contain other partial sequences or single actions at the lowest level. Partial sequences, such as \four spades," that are higher in the hierarchy are said to be strategic because they concern the general direction of a problem solver; those lower in the hierarchy are said to be tactical because they involve maneuvering on a smaller scale. Three No Trump Four Spades Win Finesse

Three Clubs Cash

Cash

Cash

Two Hearts Cross

Cash

Cross

Cash

Cross

Playing a single card

5 SIBL

Fig. 1. A Sequential Hierarchy in Bridge

Formally, an instance, denoted , is a pair that includes the description of a state s and an action a, such that action a is taken in state s. A sequence (1 . . . n ) is a set of ordered instances of length n. The empty sequence e is a sequence with no instances. A sequence v is a partial sequence of z if and only if there are sequences x and y such that z = xvy where x, y or both can be e. An instance

spawns a set of all the partial sequences involving its predecessors in reverse sequential order. For example, the instance 3 in the sequence (1 2 3 ) spawns f(3 ), (2 3 ), (1 2 3 )g. These partial sequences are used as additional input examples for SIBL. Like most IBL algorithms, SIBL learns a set of prototypical instances from the examples. SIBL is currently based on IB4, which is used to compute similarity between instances. IB4's variable attribute weight produces better instance selection than IB3 does. Similarly, IB4's instance reference count retains more relevant instances than either IB2 or IB1 does (Aha, 1992). Although SIBL is based on IB4, we see no reason why other empirical learning algorithms cannot be adapted to learn from sequences. If an instance  has f features and p predecessors, the number of features for the instance after spawning is f (p + 1). This linear increase in the number of features increases exponentially in the instance space. In other words, if S is the instance space for  and  has p predecessors, the instance space after spawning will be S p+1 . To control this, SIBL has a window, a constant number of predecessors it learns from, instead of the fully spawned set. For example, for the sequence (1 2 3 4 ), the instance 4 with window w = 3 produces the set f(4 ), (3 4 ), (2 3 4 )g. In bridge, a short sequence of steps has been routinely used by experts to explain playing techniques (Goren, 1963).

36

w

Set window constant Store the first example While there are more examples in the training set, do For each instance, , in the example, do Let , , be the action of Compute from the set of all partial sequences If = FindMajorityAction( ), then Update the reference count of the partial sequences referenced Otherwise, Store

a

i current action a i

P

i

P

P

Fig. 2. Main Routine that Processes Input Examples, each of which Contains a Sequence of Instances

FindMajorityAction(P) : Initialize the list of majority actions, , to the empty list For each partial sequence, , in , do For each stored partial sequence, , of the same length as , do Calculate similarity of to Let , , be the most frequently recommended action by the most similar q's Append to Return the majority action in

p

majority action m m

M

M

P

q

p

q

p

M

Fig. 3. The Function that Returns the Majority Action Pseudocode for SIBL appears in Figures 2 and 3. Figure 2 outlines the processing of input examples. For each partial sequence, SIBL maintains a reference count, the number of times the sequence was retrieved and recommended the correct action. Figure 3 shows the computation of the majority action. If no majority exists, the tie is broken by the reference count or, failing that, by random selection. This is similar to the strategy used by IB4.

6 Experimental Design and Results The goal of the three experiments described here was to test the hypothesis that sequential instances can produce better action selection knowledge than non-sequential ones for planning bridge play. To demonstrate this, the number of correct action selections in a sequence of events is measured, both before and after learning. After learning, more action selections in a sequence of events should be correct. For q correct action selections in a sequential problem with n sequences, the fraction correct is calculated as r = q=n. Before learning, r will be small when q is small; after learning, r should increase. 37

Learning Curves

0 10 20 30 40 50 60 70 80 90 100 Percent training set presented

Probability of correct selection, r 0.00 0.10 0.20 0.30 0.40

Probability of correct selection, r 0.00 0.10 0.20 0.30 0.40

Twenty-four three no trump bridge deals were used in these experiments. All twenty-four deals were fully played by experts who successfully made the contract. Each deal consisted of 26 instances, times when the declarer had to play a card from the dummy or from his or her own hand. Two thirds (16) of the deals were used for training and one third (8) for testing. Every experiment used three-fold cross validation, each with 10 trials (Kibler and Langley, 1988). The rst experiment established a baseline for performance. A random selector chose an action to perform in a sequential problem based on the action's probability distribution. That is, for each possible action a, an empirical mass function f (a) is de ned toPbe the proportion of all actions in the training set that are equal to a, where 1 i=1 f (ai ) = 1. The second experiment tested the e ectiveness of selection based on xed sequences. A xed-sequence selector was trained with instances of a xed sequence length using IB4 (Aha, 1992). The minimum xed-sequence selector used instances of length one, i.e., non-sequential instances. The maximum xed-sequence selector used instances of length de ned by the window constant w = 5. The third experiment tested SIBL as described above. The variable-sequence selector chose among instances of varying lengths from one to the window constant w. It selected an instance using majority vote and the reference count. All three selectors broke ties with random selection. Figures 4 and 5 depict the portion of the time that the correct action was selected during learning and testing, respectively. In Figure 4, during training the maximum xed-sequence selector chose the correct action with a higher frequency than the minimum xed-sequence selector did. In Figure 5, however, the performance of the maximum xed-sequence selector is actually worse during testing. This anomaly occurred because, although both had the same number of input examples, the maximum xed-sequence selector has a much larger instance space (S p+1 ). Performance Curves

Variable Minimum Maximum Random

0 10 20 30 40 50 60 70 80 90 100 Percent training set presented

Fig. 4. Learning and Performance Curves

As Figures 4 and 5 indicate, the variable-sequence selector was consistently better than both the minimum xed-sequence selector and the maximum xedsequence selector during learning (r = 0:36) and during testing (r = 0:25). It also substantially outperformed the baseline random selector (r = 0:09). The 38

performance gain, however, comes at the price of additional computation space and time. On average, the resulting instance base is about w (size of window) times larger than that of the xed-sequence selectors; each run took about w times longer on a Cray 6400.

7 Related and Future Work PRODIGY/ANALOGY retains plans from prior planning problems, and uses derivational analogy (Carbonell, 1986) to guide the search of similar planning problems (Veloso, 1994). When PRODIGY's general planning method (Carbonell et al., 1992) was enhanced by derivational analogy, it could solve more problems and solve them with less e ort. PRODIGY/ANALOGY, however, relied heavily on domain-speci c planning heuristics and annotated planning examples. As indicated earlier, our work is based on a weak method (e.g., IBL) that facilitates initial adaptation to a problem domain. OBSERVER (Wang, 1996) learned the description of an operator (action) as a set of literals with a variation of the Version Space algorithm (Mitchell, 1977). OBSERVER could both create and repair plans with a STRIPS-like representation. It relied on explicit speci cation of the states in the search space, including the goal states. It is not clear, even to professional bridge players, however, just what features would exhaustively catalogue the 1044 states in bridge play. Moore's system (Moore, 1990) uses a form of reinforcement learning on a sequence of situation-action pairs to learn the e ects of an action for robot arm control. It divides the search space into hierarchical segments to limit the search. It matches only the current state description to select an action, whereas SIBL matches a sequence of state descriptions to select an action. In a domain, such as bridge, which is sensitive to sequential relationships, we demonstrate here that a sequence of states is more advantageous. GINA (Dejong and Schultz, 1988) was a program that learned to play Othello from experience, with a sequential list of situations encountered, actions taken, and nal outcome. The representation of its experience base, however, is a subtree of the min-max game tree. Because the representation for the average branching factor in bridge is so much higher than Othello's, this approach seems impractical here. This is work in progress. These initial experiments demonstrate that sequential instances produce action selection knowledge, and that variable-length sequence selection outperforms minimum or maximum xed- sequence selection. Our current research includes examining the impact of more than 24 examples on performance, and replacing majority voting with a relevance selection scheme that is based on decision theory (Russell and Norvig, 1995).

Acknowledgements

Thanks to David Aha and Cullen Scha er for their comments on earlier drafts of this paper. This research was partly supported by the Bank of America Corporation. 39

References Aamodt, A., and Plaza, E. (1994). Case-based reasoning: Foundation issues, methodological variations, and system approaches. AI Communications, 7(1). Aha, D. W. (1992). Tolerating noisy, irrelevant and novel attributes in instance-based learning algorithm. International Journal of Man-Machine Studies, 36, 267-287. Aha, D. W., Kibler, D., and Albert, M. K. (1991). Instance-based learning algorithms. Machine Learning, 6, 37-66. Alterman, R. (1988). Adaptive planning. Cognitive Science, 12, 393-421. Carbonell, J. G., Blythe, J., Etzioni, O., Gil, Y., Joseph, R., Kahn, D., Knoblock, C., Minton, S., P'erez, A., Reilly, S., Veloso, M., and Wang, X. (1992). PRODIGY4.0: The manual and tutorial (TR CMU- CS-92-150), CarnegieMellon University. Carbonell, J. G. (1986). Derivational analogy: A theory of reconstructive problem solving and expertise acquisition. in Michalski, R. S., Carbonell, J. G., and Mitchell, T. M., eds., Machine Learning, An AI Approach, V. II, 371-392. Morgan Kaufmann. Cost, S., and Salzberg, S. (1993). A weighted nearest neighbor algorithm for learning with symbolic features. Machine Learning, 10, 57-78. Dasarathy, B. V. (1991). Nearest Neighbor (NN) Norms: NN Pattern Classi cation Technique. IEEE Computer Society Press. DeJong, K. A., and Schultz, A. C. (1988). Using experience-based learning in game playing. Fifth International Conference on ML, Ann Arbor, Michigan, 284-290. Goren, C. H. (1963). Goren's Bridge complete: a major revision of the standard work for all bridge players. London: Barrie and Rockli . Hammond, K. J. (1989). Case-based planning: viewing planning as a memory task, Academic Press, Inc., San Diego, CA. Hendler, J., Tate, A., and Drummond, M. (1990). AI planning: Systems and techniques. AI Magazine, 11(2), 61-77. Kibler, D., and Langley, P. (1988). Machine learning as an experimental science. Machine Learning, 3(1), 5-8. Marks, M., Hammond, K. J., and Converse, T. (1988) Planning in an open world: A pluralistic approach. Workshop on CBR, Pensacola Beach, FL, 271-285. Michalski, R. S. (1983). A theory and methodology of inductive learning. Arti cial Intelligence, 20(2), 111-162. Mitchell, T. M. (1977). Version spaces: A candidate elimination approach to rule learning. Fifth International Joint Conference on AI, Cambridge, MA, 305-310. Moore, A. W. (1990). Acquisition of dynamic control knowledge for a robotic manipulator. Seventh International Conference on ML, Austin, TX, 244-252. Quinlan, R. J. (1986). Induction of decision trees. Readings in Machine Learning, J. W. Shavlik and T. G. Dietterich, eds., Morgan Kaufmann, San Mateo, CA. Russell, S. and Norvig, P. (1995). Arti cial Intelligence - A Morden Approach. Englewood Cli s, NJ: Prentice-Hall. Turner, R. M. (1988). Opportunistic use of schemata for medical diagnosis. Tenth Annual Conference of the Cognitive Science Society. Veloso, M. M. (1994). Flexible strategy learning: Analogical replay of problem solving episodes. Proceedings of the Twelfth National Conference on Arti cial Intelligence, AAAI Press/MIT Press, Cambridge, MA, 595-600. Wang, X. (1996). Planning while learning operators. Third International Conference on Arti cial Intelligence Planning Systems, Edinburgh, Scotland.

40

Looking at Features within a Context from a Planning Perspective Hector Mu~noz-Avila and Frank Weberskirch Centre for Learning Systems and Applications (LSA) University of Kaiserslautern, Dept. of Computer Science P.O. Box 3049, D-67653 Kaiserslautern, Germany E-mail: [email protected]

Abstract. Determining the context of a feature (i.e., the factors a ect-

ing the ranking of a feature within a case) has been the subject of several studies for classi cation tasks. However, this problem has not been studied for synthetic tasks such as planning until now. In this paper we will address this problem and explain how the domain theory plays a key role in determining the context of a feature. We provide a characterization of the domain theory and show that in domains meeting this characterization, the context can be simpli ed.

1 Introduction Determining the context of a feature (i.e., the factors a ecting the ranking of a feature within a case) is important because cases usually match new problems only partially. Thus, whereas the absence of a certain feature in the case in one context may not be very important for reusing the case, the absence of the same feature in another context may fundamentally make it dicult to reuse the case. Related to the question of the context of a feature is the question of its relevance. Aha and Goldstone (1990) pointed out that the relevance of the feature is a context-speci c property. Traditional approaches for classi cation tasks, e.g. (Turney, 1996), have de ned the relevance and context of a feature in terms of statistical information such as the distribution of their values relative to the values of other features. In contrast to classi cation tasks, in synthetic tasks such as planning, other elements a ect the context and relevance of a feature: rst, the same problem may have several solutions, and second, there is a domain theory available that partially models the world. The rst factor was already observed in (Veloso, 1994), where the particular solution is used to classify the features as relevant or non relevant. In this work we are interested in studying how the domain theory a ects the context of a feature. In Section 2 we will motivate how the domain theory a ects the context of a feature. Section 3 overviews an approach towards feature weighting in casebased planning. A brief discussion on a planning theory that will be used to characterize the context of a feature is given in Section 4. In the next section a characterization of the context of a feature is presented. Section 6 discusses related work and Section 7 makes concluding remarks. 41

2 Motivation Consider the initial situation in the logistics transportation domain (Veloso, 1994) illustrated in Figure 1 (a). In this situation there are three post oces A, B and C . In post oce A there is a package p1 and a truck. In post oce B there is a package p2. In the nal situation both packages, p1 and p2, must be located at C . There are basic restrictions that any solution must meet: (1) only trucks can move between post oces, (2) to load a package in a truck, both have to be located at the same post oce, and (3) to unload a package from a truck in a certain oce, the truck must be located at that post oce. A possible solution is to load package p1 at A, move the truck from A to B , load package p2 in the truck, move the truck from B to C and unload both packages (the arcs show the path followed by the truck, the numbers indicate the order). Suppose that this problem and solution are stored as a case. B A p1 B A p1

1

p2

C

p2 2

3 1

4 D

C 2

5

p3

Fig. 1. Initial situation of (a) the case and (b) the new problem Consider a new problem with the initial situation illustrated in Figure 1 (b). In the nal situation the three packages must be located at C . If the case is used to solve the new problem, the truck follows the path illustrated by the arcs, collecting at each post oce the corresponding package, leaving the packages p1 and p2 in C as indicated in the case. Finally, package p3 is loaded and moved to C . In this situation, the retrieval of the case is considered to be successful because steps taken from the case (2 and 3 in the new problem) could be extended to solve the new problem (Mu~noz-Avila and Hullen, 1996; Ihrig and Kambhampati, 1996). The problem solved in the case was not totally contained in the new problem: in the case, the truck is located in the same post oce as a package whereas in the new problem, the truck is located in a post oce with no packages. Technically, this means that some initial features (i.e., the features describing the initial situation), were unmatched by the initial features of the new problem. If we take the unmatched and matched features of the case as input for a weighting model, the weight of the unmatched features is decreased relative to the weight of the other features in the case because their absence did not a ect the reusability of the case. Only initial features are in consideration for updating the weights. Now consider the same case and problem as before, but suppose that additional restrictions has been added: (4) trucks should not be moved into the same 42

post oce more than once and (5) problem-speci c restrictions such as not allowing the truck to move from D to A in Figure 1 (b). These restrictions are made to improve the quality of the plan. Clearly, the path illustrated in Figure 1 (b) violates restriction (4) because the truck is moved to post oce C twice. This means that the solution of the case must be revised. In particular, moving the truck from B to C is revised and instead the truck must be moved from B to D, where package p3 is loaded. Finally, the truck is moved from D to C , where the three packages are unloaded. In this situation, the retrieval of the case is considered to be a failure and the weight of the unmatched features is increased relative to the weight of the other features in the case. However, this does not re ect the real reasons for the failure: even if the truck is located at A, the plan must still be revised. The real reason is that in solving the additional goal, to locate p3 in C , a con ict with the solution of the case occurs. This means that there are factors that a ect the e ectiveness of reusing cases di erent than the initial features. As a result, the strategy of updating the weights of the features based solely on the matched and unmatched features of the case becomes questionable. We will now see that depending on the characteristices of the domain theory, we can decide whether this strategy is adequate or not.

3 Feature Weighting in Case-based Planning In (Mu~noz-Avila and Hullen, 1996) an algorithm is presented that analyzes the contribution of the initial features of the case during a reuse episode and updates the weights of these features accordingly. The similarity metric used is an extension of the foot-printed similarity metric (Veloso, 1994), called the weighted foot-printed similarity metric. Similarity is meassured according to feature weights, which are case-speci c. Thus, a local similarity (Ricci and Avesani, 1995) for each case is de ned. Feature weights are updated following a reinforcement/punishment algorithm. The adaptation strategy followed (Ihrig and Kambhampati, 1994) is known as eager replay. Eager replay is done in two phases: in the rst phase, each plan step contained in the retrieved cases is replayed in the new situation if replaying the step does not introduce any inconsistency in the new solution. Once this phase is nished, a partial solution is obtained. In the second phase, the partial solution is completed by rst-principles planning. Decisions replayed from the cases are only revised if no completion is possible. If decisions replayed from the cases are revised, the retrieval is considered to be a failure, otherwise it is considered to be a success. As stated before, the same problem may have many di erent solutions. This can make a feature weighting approach particularly dicult to apply because a solution obtained in a reuse episode may never occur again. However, because of the de nition of success and failure, the weights of certain features are increased if any completion of the partial solution is found or decresed if no completion is found. Thus, reinforcement or punishment does not depend on the particular solution found. 43

4 Planning Theory A plan P achieving a set of goals G is serially extensible with respect to an additional goal g if P can be extended to a plan achieving G [fgg. For example, in the case and problem illustrated in Figure 1, the plan locating the two packages p1 and p2 in C is serially extensible with respect to the goal locating p3 in C if restrictions (4) and (5) are not considered. In contrast, this plan is not serially extensible with respect to this goal if they are considered because moving the truck from B to C needs to be revised (i.e., the arc 3 in the new problem). If any plan achieving a goal g1 is serially extensible with respect to a second goal g2, then the order g1, g2 is called a serialization order. This de nition is extended to sets of goals of any size in a natural way. Serial extensibility is not a commutative property: g1, g2 might be a serialization order but not g2, g1. If any permutation of a set of goals is a serialization order, the goals are said to be trivially serializable. For example, the three goals of the problem depicted in Figure 1 (b) are trivially serializable if condition (4) is not taken into account and not trivially serializable if condition (4) is considered. Trivial serializability of goals in a domain depends on the particular planner being used; Barrett and Weld (1994) give examples of domains where goals are trivially serializable for SNLP but not for TOCL. Trivial serializability does not imply that planning in the domain is \easy". Finding the adequate extension might require a signi cant search e ort. However, having this property says that the work performed to achieve a set of goals will not be undone when planning for an additional goal. This theory, which classi es goal interactions, was conceived to explain the possible advantages of partial-order planners such as SNLP over state-space planners such as TOCL (Kambhampati et al., 1996; Barrett and Weld, 1994)

5 Feature Context and Trivial Serializability As we saw before, di erent factors may a ect the e ectiveness of its reuse. In the feature weighting approach presented in (Mu~noz-Avila and Hullen, 1996) (see section 3) emphasis is given to the initial features. In another approach presented in (Ihrig and Kambhampati, 1996), explanation-based learning techniques are instead used to generate rules explaining retrieval failures. These rules are used as a censor to the case. They are conceived to detect goals occuring in the problem but not in the case that interact negatively with the goals of the case. Thus, emphasis is given to the additional goals in the case. For example, in Figure 1 (b), the goal to locate p3 in C interacts negatively with the goals of the case when restriction (4) and (5) are considered. Even as these are two di erent approaches towards improving retrieval in case-based planning, both report positive results when tested in experiments with di erent domains. In particular, in (Mu~noz-Avila and Hullen, 1996) the original version of the logistics transportation domain is used (i.e., as de ned in (Veloso, 1994)) whereas in (Ihrig and Kambhampati, 1996) the restriction (4) is added. Goals in the logistics 44

transport domain are trivially serializable for SNLP1 . However, as we saw, when condition (4) is added, goals might not be trivially serializable. This motivates the following claim which is the main result of this paper:

Claim. In domains where goals are trivially serializable, the factors in-

uencing the e ectiveness of the reusability are the initial features of the problem and of the case, the goals common to the problem and the case, and the solution of the case. This claim essentially says that the additional goals do not a ect the e ectiveness of the reuse. As a result, weighting models on initial features can be used. To show this, let GCa , GPb denote the goals of the case and of the problem respectively, then the subplan achieving GCa \ GPb in the case is taken and extended relative to the initial situation of the new problem (in the example, this extension corresponds to moving the truck from C to A, i.e., arc 1). Once the plan achieving GCa \ GPb has been generated, it can be extended to a plan achieving GPb because the domain is trivially serializable (i.e., arcs 4 and 5). Of course, retrieval failures will still occur if the subplan achieving GCa \ GPb in the case cannot be extended to solve these goals relative to the initial situation of the new problem. But the point is that such a failure is due to the initial features and not to the additional goals.

Goals in Domain are

trivially serializable not trivially serialiable

Ca Sol Ca Sol

Context

+ +

I

Ca

Ca I

+ +

I

Pb

Pb I

+ ( Ca \ + Ca + G

G

Pb

G

Pb G

)

Table 1. Context according to the characteristics of the goal interactions in the domain. Table 1 summarizes these results (SolCa represents the solution of Ca). If goals are trivially serializable, only goals that are common to the problem and the case need to be consider. However, if goals are not trivially serializable, additional goals in the problem and the case need to be consider because they might a ect the e ectiveness of the reusability of the case. This result establishes a direct relation between the concept of trivial serializability and feature context and shows the feasibility previous research on feature weighting in case-based planning (Mu~noz-Avila and Hullen, 1996). 1

This armation has never been reported in the literature. Intuitively, plans in the transportation domain can always be extended. For example, if a plan locates an object at a certain post oce, this plan can always be extended to relocate the object in another oce provided that the transportation means are available.

45

6 Related Work As we saw, the question of determining the relevance of a feature is closely related to the question of determining its context. These questions have been studied for classi cation tasks (Aha and Goldstone, 1990; Turney, 1996), but this work cannot be applied directly for planning tasks because these tasks involve two new elements: the same problem may have several solutions and the domain theory describes the world partially. Veloso (1994) answers the rst question by taking into account the particular solution. In this work we provide a framework to answer the second question by using the domain theory.

7 Conclusion We have seen that the domain theory is an important factor determining how goals may a ect the success of a retrieval episode. We have seen that there is a large collection of domains for which the context can be simpli ed in that additional goals not occuring in the cases do not a ect the success of a retrieval episode.

Acknowledgements The authors want to thank David W. Aha for the helpful discussions and comments on earlier versions of this paper as well as the reviewers.

Aha, D. W. and Goldstone, R. L. (1990). Learning attribute relevance in context in instance-based learning algorithms. In Proceedings of the Twelfth Annual Conference of the Cognitive Science Society, pages pp 141{148, Cambridge, IN: Lawrence Erlbaum. Barrett, A. and Weld, D. (1994). Partial-order planning: Evaluating possible eciency gains. Arti cial Intelligence, 67(1):71{112. Ihrig, L. and Kambhampati, S. (1994). Derivational replay for partial-order planning. In Proceedings of AAAI-94, pages 116{125. Ihrig, L. and Kambhampati, S. (1996). Design and implementation of a replay framework based on a partial order planner. In Weld, D., editor, Proceedings of AAAI-96. IOS Press. Kambhampati, S., Ihrig, L., and Srivastava, B. (1996). A candidate set based analysis of subgoal interactions in conjunctive goal planning. In Proceedings of the 3rd International Conference on AI Planning Systems (AIPS-96), pages 125{133. McAllester, D. and Rosenblitt, D. (1991). Systematic nonlinear planning. In Proceedings of AAAI-91, pages 634{639. Mu~noz-Avila, H. and Hullen, J. (1996). Feature weighting by explaining case-based planning episodes. In Third European Workshop (EWCBR-96), number 1168 in Lecture Notes in Arti cial Intelligence. Springer.

46

Ricci, F. and Avesani, P. (1995). Learning a local similarity metric for case-based reasoning. In Case-Based Reasoning Research and Development, Proceedings of the 1st International Conference (ICCBR-95), Sesimbra, Portugal. Springer Verlag. Turney, P. D. (1996). The identi cation of context-sensitive features: A formal de nition of context for concept learning. In Proceedings of the ECML-96 Workshop on Learning in Context-Sensitive Domains. Veloso, M. (1994). Planning and learning by analogical reasoning. Number 886 in Lecture Notes in Arti cial Intelligence. Springer Verlag.

47

A Theory of the Acquisition of Episodic Memory Carlos Ramirez and Roger Cooley University of Kent at Canterbury, Computing Laboratory, Canterbury, Kent CT2 7NF, UK fcr10, [email protected]

Abstract. Case-based reasoning (CBR) has been viewed by many as

just a methodology for building systems, but the foundations of CBR are psychological theories. Dynamic Memory (Schank, 1982) was the rst attempt to describe a theory for learning in computers and people, based on particular forms of data structures and processes, that nowadays are widely used in a variety of forms in CBR. In addition to being useful for system building, CBR provides a way of discussing a range of issues concerned with cognition. This focus on the practical uses of CBR has de ected attention from the need to develop further the underlying theory. In particular, the issue of knowledge acquisition, in not adequately handled by the existing theory. This paper discusses this theoretical weakness and then proposes an enhanced model of learning which is compatible with the CBR paradigm.

1 Introduction In recent years, CBR has been gaining ground in the machine learning arena. Unfortunately, the interest has been mostly concentrated in categorisation tasks; several very successful CBR programs have been developed to date under this line of research; one example is PROTOS (Bareiss, 1989), an exemplar-based learner (see Gentner, 1989; Redmond, 1989) based on psychological theories of concept learning and classi cation (see Medin and Smith, 1984; Van Mechelen, et al., 1993; Rosch, 1978). However, there are additional possibilities for learning within CBR (Schank, et al., 1986; Burstain, 1986), and there are many avenues for research.

2 Acquisition of Events One issue of CBR that should receive more attention is concerned with the acquisition of knowledge because little is actually known. There are only a few known theories on the acquisition of knowledge, and none of them are completely satisfactory (see below). Variations of schemata-like structures are widely used to represent acquired knowledge (e.g., plans by Abelson, 1973; scripts by Schank, 1975; frames by Minsky, 1975; schemata by Bobrow, 1975, and Rumelhart, 1980). Some theorists, working in cognitive and computer sciences, assume that some form of induction is used, but do not provide satisfactory accounts of it (either because the accounts are incomplete, too loose, or because more details 48

need to be speci ed and proven). For example, Holyoak (1985) and Keane's (1988) accounts of schemata acquisition are limited to the creation of schemata by analogy; many other theorists remain suspiciously silent. Rumelhart (1980) elaborated one of the most complete accounts (see also Rumelhart and Norman, 1981) on schema acquisition. Rumelhart and Norman's account is based in three basic forms of schemata acquisition, which can be described as follows: 1. Accretion is the accumulation of `memory traces' or `traces of the comprehension process' upon having perceived some event or understood some text, into the repertoire of knowledge. 2. Tuning involves the elaboration and re nement of concepts in a schema through continued experience. There are three kinds of tuning: (a) systematic adjustment of variable constrains and default values, (b) concept generalisation, and (c) concept specialisation. 3. Restructuring involves the creation of new schemata either by induction (through the repetition of a spatio temporal con guration of schemata) or by analogy (mapping some aspects of an existing schema onto a novel situation, noticing di erences and changing some of its attributes). This form involves the actual development of new concepts. These accounts of schemata learning work well once a schema is discretionarily determined (already existent in memory), or elements or aspects of a schema are identi ed; but, what is involved in learning new schemata from scratch, i.e., when no similar schema is already in memory? The only form of learning that can deal with such a condition is schema induction. However, current schemata theories (as those mentioned above) have problems dealing properly with induction because they make no provision for recognising recurrent con gurations for which a schema does not already exist in the system. Is this the reason why most theorists prefer to remain silent on this issue? Practitioners tend to implement some form of ad hoc or ill-de ned method of induction when trying to tackle this situation, which sometimes works, but obviously, those are speci c solutions to very limited domains. For example, a typical approach is to support the `inductive' mechanism of a given system with some form of background knowledge or with training examples. Dynamic memory theory (Schank, 1982) is the starting point of case-based reasoning, and is the foundation of this paradigm of cognition. In this theory, scripts are one of the main structures used to explain the organisation of episodic memory. However, other knowledge structures are also proposed, including scenes, MOPs, Meta-MOPs, and TOPs. The acquisition of knowledge is a more intricate process than is allowed for in plain schema theories. Dynamic memory is an elaborate theory, intertwining several cognitive processes. Such a theory inevitably leaves room for interpretation; more work is needed to articulate this paradigm. Furthermore, some theorists consider that dynamic memory theory is incomplete and underspeci ed (Eysenck and Keane, 1995). This paper tackles some of the problems encountered with dynamic memory as a learning theory. 49

3 Dynamic Memory Weaknesses There are some problems concerning conceptual aspects of dynamic memory theory. Firstly, Schank argues that during the act of trying to understand an experience (event), we are inevitably reminded of similar events, because in order to recognise the closest previous experience, we have to retrieve related memory structures, sometimes closely related to the event at hand, sometimes only related by context. However, there is evidence (Seifert, et al., 1986; Seifert and Hammond, 1989) that people do not always remember and utilise prior experiences that are only abstractly related to the current situation in such a \simple memory model of episode retrieval". People frequently fail to recall speci c memories at relevant times, and even further, people commonly fail to get reminded of the closest or the most useful event when it is needed to solve a problem { specially novices. It seems that the determining factor in e ective retrieval of events is the quality of the original encoding; and that additionally, agreat deal of inference is required to fully understand an experience containing abstract relations, required to improve the encoding. These ndings are in line with previous predictions by Craik and Lockhart (1972), and Hyde and Jenkins (1973) (both studies in Eysenck and Keane, 1995), whom proposed that: 1. The level or depth of processing of a stimulus has substantial e ects on its memorability. 2. Deeper levels of analysis produce more elaborate, longer lasting, and stronger memory traces than do shallow levels of analysis. Craik and Tulving's (1975) experiments suggest that elaboration of processing of some kind and the amount of elaboration are also important factors in determining long-term memory. Another criticism of dynamic memory theory stems from Schank's proposition of \automatic reminding". Schank argues that during the act of processing an event, we are inevitably reminded of similar events. However, experimentation (Seifert et al., 1985) reported that when subjects experience an event, they are not usually reminded of close events automatically. According with Seifert et al.'s experiments, it seems that intentionality in recalling is a required ingredient in the process of bringing up analogs from memory. Intentionality depends on subject's strategies and task diculty.

4 An Enhanced Learning Model after Dynamic Memory Due to limitations of space, it will be assumed that the reader has an understanding of the memory structures mentioned above (scripts, scenes, MOPs, Meta-MOPs, and TOPs; for details see [Ramirez, 1997a]). The theory that is presented below is based on dynamic memory theory (DMT); several modi cations and enhancements have been carried out, the resultant theory is presented as follows: 50

(a) Recognition and recall. When a event is experienced, like , we try to recognise similar situations we have experienced in the past, by noticing some similarities with the current event. Recognition is a stage not considered in DMT at all, although many cognitive scientists (see Watkins and Gardiner, 1979, for a review; see also Tulving 1982, 1983) make a distinction here. The point is that a memories are encoded together with a context, and this context is relevant to the recognition process. Next to recognition comes the recalling of the best match, as proposed in DMT; however, it is important to notice that the e ectiveness of the recalling process depends on individual strategies (e.g., which features of the event are observed{salient or distinctive features usually make the best indices or `memory traces'; kinds of associations among elements of the events, etc.), and on the form of retention that was used for the encoding and storing (depth, elaboration, and distinctiveness). These two, \individual strategies" and \form of retention" are factors that Schank overlooked. In more practical terms, if it is assumed that similar events are stored in a speci c `neighbourhood', then recognition means the localisation of that neighbourhood, taking advantage of common context. Recall is the process of retrieving the closest event. (b) Recollection and reminding. During the process of recalling, we might be reminded of particular experiences, as pointed out in DMT, because the structures we use to process the new experience are the same structures we use to organise memories. However, Schank assumed that we are inevitably reminded of similar events, but evidence (see above) shows that reminding depends on intentionality{which is concerned with analogical strategies. What is remarkable here, is that memory structures for storage, and processing structures for analysis of inputs are the same ones. Therefore, it is not surprising that we may be \reminded" of similar events when processing a new one. (c) Reconstructing and Understanding. Several cognitive processes are deeply intertwined in this theory: recalling and understanding are actually part of the same process. Understanding an event{been able to process an event accordingly to a expected outcome{begins when we start trying to recall previous similar memory events to the one at hand. Finding the `right' one (i.e., the closest) means getting closer to the understanding of the experience. Schank makes an attractive remark that ts well at this point (1982, p.110): \A great deal of our ability to be creative and novel in our understanding is due to our ability to see connections between events and to draw parallels between events". This process of `mapping events' is particularly interesting when it is done at the highest level (i.e., drawing analogies among TOPs or MOPs), because analogies can be done between domains or simply can be more signi cant in the same domain. 51

(d)

(e)

(f)

(g)

Therefore, we try to understand the current experience by reconstructing the recalled similar experience, or at least a signi cant part of it (the part that allows the understanding of the current part of the current event), by accessing the corresponding memory structures that organised it (i.e., scripts, scenes, or MOPs). An event is composed by sequences of situations{separable elements of the event{organised by those structures. If a close enough event does not exist in memory, then we may have to resort to foreign domain analogue events, by accessing TOPs; process that involves additional mechanisms. The above explanation of the understanding process may di er considerably from Schank's, since his presentation is of a more higher level, and is not well explained how recalled events are processed. Expectations. Once an event has been reconstructed, expectations about subsequent situations of the recalled event are automatically brought up. Therefore, it is possible to bring up situations, from the old event, that are likely to occur in the current one. This action can be used to predict situations as the current event progresses, that is, situations that still have not taken place, but that are part of the old event. Expectation evaluation. As the event progresses, some of the predicted situations may not take place (since the current event may di er from the recalled one); however, if most of the recalled expectations match the situations of the event at hand, then no modi cations (or minor ones) to memory structures are carried out; clearly, very little or nothing is then learnt from the experience. Explanation of expectation failures. If some of the expectations (of situations) are not satis ed because the current experience di ers too much from the old one, then the possibility of explaining those failures brings new opportunities for learning by modifying memory structures or creating new ones. Then, memory is organised in terms of explanations that are created to help to understand the di erences between what is experienced and what is expected to experience. In DMT, Schank does not explain what it means for an experience to di er by \too much" from one stored in memory. What is proposed here is that those di erences are concerned with aspects of beliefs, and hence, with con dence factors. The con dence factors are attached to the attributes of the schemata related to the events, and they are evaluated through the application of a similarity function. The con dence factors are modi ed on the basis of the frequency of use, and the e ectiveness of the associated attributes, the higher the usage (provided that the outcome was positive), the higher con dence of the attribute. Learning. Having had an expectation failure, and having explained it, the individual is then in the position of modifying his memory structures in some way. But in what way? The question is not an easy one: it is not clear why every individual encodes his or her experiences differently. Schank again does not discuss this point. Here, it is suggested 52

again that the beliefs of an individual are the grounds for his or her memory organisation, because memory modi cations are determined by the explanations that an individual may provide as a response to encountering expectation failures during the processing of any event. As beliefs are non-concensual and have associated con dences, the most obvious implication is that every individual, although exposed to the same experience, will encode it di erently. Thus, learning occurs when memory structures are modi ed, either by adding new structures when new experiences are encountered, or by changing old ones when similar experiences are met. It is now possible to see how several cognitive processes are deeply intertwined: recalling, understanding, learning, and explaining can not be separated from each other because they are part of the same process. See Figure 1 for an illustration of the process. Schank makes a nal interesting remark on understanding, he says that \we understand in terms of the structures that we have available, and those structures re ect how we have understood things in the past. Then, we see things in terms of what we have already experienced". This remark is an evident conclusion after points (c) and (g). To conclude this paper, it should be mentioned that work is been carried out on the implementation of a CBR system applied to information retrieval. This is, in part, an evaluation of the theory proposed here (see Ramirez, 1997b, for details of the system). In particular, the elements of the system most strongly in uenced by the theory are: recognition, recollection and reminding, and learning.

Acknowledgements

This work was supported by the University of Technology of Monterrey, Campus Queretaro (ITESM), and the National Council of Science and Technology of Mexico (CONACYT).

Abelson, R.P. (1973). Concepts for representing mundane reality in plans. In D. Bobrow and A. Collins (Eds.), Representation and Understanding: Studies in Cognitive Science. Academic Press. Bareiss, R. (1989). Exemplar-Based Knowledge Acquisition: A Uni ed Approach to Concept Representation, Classi cation, and Learning. Academic Press. Bobrow, D.G. (1975). Some principles of memory schemata. In D. Bobrow and A. Collins (Eds.), Representation and Understanding: Studies in Cognitive Science. Academic Press. Burstain, M.H. (1986). Concept Formation by Incremental Analogical Reasoning and Debugging. Machine Learning: An Arti cial Intelligence Approach, Vol. II, San Mateo, CA: Morgan Kaufmann. Craik, F.I.M. and Tulving, E. (1975). Depth of processing and the retention of words in episodic memory. Journal of Experimental Psychology: General, 104, 268-294. Eysenck, M.W. and Keane, M.T. (1995). Cognitive Psychology. Hove, UK: Lawrence Erlbaum.

53

INPUTS (EVENTS) Intentionality Strategies

RECOGNITION OF EVENTS RECOLLECTION & REMINDING SELECTION OF THE CLOSEST EVENT

Understanding Process

RECONSTRUCTION OF CLOSEST EVENT GENERATION OF EXPECTATIONS PREDICTIVE PROCESS OF CURRENT EVENT

Success

EXPECTATION EVALUATION

Failure

EXPLANATIONS

NON or MINOR MEMORY STRUCTURES MODIFICATION

MODIFICATION OF MEMORY STRUCTURES

Learning

Fig. 1. A Conceptual Model of Episodic Memory Learning Gentner, D. (1989). The mechanisms of analogical learning. In S. Vosniadou and A. Ortony (Eds.), Similarity and analogical reasoning. Cambridge: Cambridge University Press. Kolodner, J.L. (1993). Case-Based Reasoning. San Mateo, CA: Morgan Kaufmann. Medin, D.L. and Smith, E.E. (1984). Concepts and concept formation. Annual Review of Psychology, 35, 113-138. Minsky, M. (1975). A framework for representing knowledge. In P. Winston (Ed.), The psychology of computer vision. New York: McGraw-Hill.

54

Ramirez, C. (1997a). Schemata and Dynamic Memory Structures. Technical Report No 7-97, Computing Laboratory: University of Kent at Canterbury, UK. Ramirez, C. (1997b). Enhancing Information Retrieval with Case-Based Reasoning. Submitted to the Third UK CaseBased Reasoning Workshop. Manchester, UK. Redmond, M.A. (1989). Learning from other's experience: Creating cases from examples. In K. Hammond (Ed.), Proceedings: Case-Based Reasoning Workshop / DARPA, Pensacola Beach, Florida. San Mateo, CA: Morgan Kaufmann. Rosch, E. (1978). Principles of categorisation. In E. Rosch and B.B. Lloyd (Eds.), Cognition and categorisation. Hillsdale, NJ: Lawrence Erlbaum. Rumelhart, D.E. (1980). Schemata: The basic building blocks of cognition. In R. Spiro, B. Bruce, and B. Brewer (Eds.), Theoretical Issues in Reading Comprehension. Hillsdale, NJ: Lawrence Earlbaum. Rumelhart, D.E. and Norman, D.A. (1981). Analogical Process in Learning. In J.R. Anderson (Ed.), Cognitive Skills and their Acquisition. Hillsdale, NJ: Lawrence Earlbaum. Seifert, C.M., McKoon, G., Abelson, R.P., and Ratcli , R. (1986). Memory connections between thematically-similar episodes. Journal of Experimental Psychology: Learning, Memory and Cognition, 12, 220-231. Seifert, C.M. and Hammond, K. (1989). In Proceedings of CaseBased Reasoning Workshop, San Mateo, CA: Morgan Kaufmann. Schank, R.C. (1975). The Structure of Episodes in Memory. In D. Bobrow and A. Collins (Eds.), Representation and Understanding: Studies in Cognitive Science. New York: Academic Press. Schank, R.C. (1982). Dynamic Memory: A theory of reminding and learning in computers and people. Cambridge: Cambridge University Press. Schank, R.C., and Abelson, R. (1977). Scripts, plans, goals and understanding: An enquiry into human knowledge structures. Hillsdale, NJ: Lawrence Erlbaum. Schank, R.C., Collins, G., and Hunter, L. (1986). Transcending induction category formation in learning. The Behavioral and Brain Sciences, 9, 639-686. Tulving, E. (1982). Synergistic ecphory in recall and recognition. Canadian Journal of Psychology, 36, 130-147. Tulving, E. (1983). Elements of episodic memory. Oxford: Oxford University Press. Van Mechelen, I., Michalski, R.S., Hampton, J.A., and Theuns, P. (1993). Concepts and categories. London: Academic Press. Watkins, M.J. and Gardiner, J.M. (1979). An appreciation of generate-recognise theory of recall. Journal of Verbal Learning and Verbal Behavior, 18, 687-704.

55

A Similarity Measure for Aggregation Taxonomies Jerzy Surma Department of Computer Science, University of Economics, ul.Komandorska 118/120, 53-345 Wroclaw, Poland, [email protected]

Abstract. Data mining and/or case-based retrieval in object-oriented

databases seems to be one of the crucial techniques in real-life engineering applications. The goal of this paper is to present the similarity measure for aggregation taxonomies. The computational complexity of the proposed measure is linear due to a domain-oriented matching heuristic. A detailed description of the similarity measure and evaluation on a real design problem is given. The experimental results and the expert evaluation show the usefulness of this approach. Finally the constraints and the importance of knowledge acquisition in this approach are discussed.

1 Introduction Object-oriented modeling is at present one of the standard ways of representing real world concepts. The in uence of this approach in programming languages, database technology or knowledge systems is crucial. From this point of view, the shortage of techniques for intelligent object-oriented data analysis is signi cant. The aim of this paper is to contribute in this research eld by introducing a similarity measure for aggregations, which are one of the standard relationships in the object-oriented approach. Aggregation, as a way of modeling complex assemblies, is popular in engineering applications, e.g., CAD. The proposed measure can be used in data mining and/or case-based retrieval in object-oriented databases. At present there is considerable growing interest in the use of structural representations, including research in: analogical reasoning (Holyoak, 1989; Falkenhainer, 1989), conceptual graphs (Myaeng, 1992; Maher, 1993), and object-oriented representation (Bisson, 1995). In case-based reasoning the FABEL project investigates several approaches for de ning the similarity of structures (Voss et al., 1994). In contrast to the most of these approaches, the proposed similarity measure is computationally acceptable due to a domain-oriented matching heuristic. In Section 2 we de ne the aggregation taxonomy. Then in Section 3 the similarity measure for aggregation taxonomies is introduced. In Section 4 the empirical evaluation is presented. The paper concludes with discussions concerning the importance of knowledge acquisition in this approach. 56

2 Aggregation Aggregation is the kind of relationship (aPartOf) in which objects representing the components of something are associated with an object representing the entire assembly (Rumbaugh, 1991). The most important properties of aggregation are: transitivity (if A is part of B and B is part of C, then A is part of C), and antisymmetry (if A is part of B then B is not part of A). By de nition an aggregation relationship relates an assembly to one component instance: aPartOf(component instance, assembly instance). An assembly with many kinds of components (corresponding to many aggregation relationships) will be called an aggregation taxonomy, and will be denoted by . For example in Figure 1 the aggregation taxonomy Assembly-1 consists of the components i1, i2, and i3. Component

Component-A

i1

Assembly

Component-B

i2

i3

r1

Assembly-1

Assembly-2

r2

- class

- isA link

- instance

- aPartOf link

Fig. 1. The aggregation taxonomies example

3 Similarity Measure The task is to establish a similarity measure between two aggregation taxonomies I and R . In order to ful ll this task some additional information concerning the component class objects are de ned: a root slot which holds the class name, that refers to the given object's type e.g., in Figure 1 the root(i1) = root(i2) = Component-A, and an index slot which holds a reference to a matching object.

3.1 Set Interpretation

The aggregation taxonomy might be interpreted as a set of components. Let X be a set of all instances in the component taxonomy. Then for I and R the following sets are de ned: 57

,I = fxi 2 X j aPartOf (xi ; I )g ,R = fxr 2 X j aPartOf (xr ; R )g

(1) (2) Based on this interpretation, we can apply the classical similarity measure between sets ,I and ,R : (,I \ ,R ) SIM (,I ; ,R ) = Card (3) Card(,I [ ,R ) The main computational challenge with this formula is in calculating an intersection ,I \ ,R . Formally this intersection can be established only if the match between components is one to one and exact. In reality it is possible to have several inexact matches. In the next subsection this problem will be taken into account by introducing a modi ed intersection.

3.2 Final Formula In this section the extension for the proposed measure will be introduced. The number of all possible matches can be narrowed to the same type of components (this information is stored in a root slot). For instance in comparing computer systems, a monitor will not be compared with a keyboard. Based on this constraint we still cannot avoid the problem of many matches (e.g., comparing computers with several hard disks). This problem might be solved by using an internal components description and/or a domain knowledge. In this paper, for obtaining the one to one match, we adopt the following heuristic: the one to one matching pair of components (xi ; xr ) is a pair of the most similar components. When this match is achieved the selected pair is excluded from further consideration, and this heuristic is performed on the remaining components. For each selected pair the same distinctive value in a slot index is put. The similarity between components is computed based on their internal (attributional) description. This heuristic creates the set of matching pairs. This set will be called a modi ed intersection :

= f(xi 2 ,I ; xr 2 ,R ) j root(xi ) = root(xr ) ^ index(xi ) = index(xr )g

(4) It should be emphasized that the match between elements of a given pair might not be exact. The proposed approach for generating the set should be used very carefully. In the experiment presented in the next section this heuristic was not sucient for a comparison of the chemical owsheets. In this particular domain the additional background knowledge was necessary to avoid semantically incorrect matches (Surma, 1996a). Based on the set and an internal similarity between components, the aggregation similarity between the I and R aggregation taxonomies is: P )2 sim(xi ; xr ) A (5) SIM ( I ; R ) = Card(, ()x+;xCard (,R ) , Card( ) ; I i

58

r

where sim(xi ; xr ) 2 h0; 1i is the similarity between xi and xr computed on the slot (attributes) level. This formula is equivalent to measure (3), if for all (xi ; xr ) 2 sim(xi ; xr ) = 1 (exact match), and has the following properties: SIM A ( I ; R ) = SIM A( R ; I ), SIM A( I ; R ) = 1 if I is identical to R , SIM A ( I ; R ) = 0 if is an empty set. Thanks to the normalization this measure can be used recursively for the nested aggregation taxonomies. The weighted and antisymmetric version of this measure is de ned in the REPRO system (Surma, 1996a,b), that was especially design fot the chemical owsheets retrieval. The complexity of this approach is linear O(NM), where N, M are the number of the components in the rst and the second aggregation respectively. The following example shows practically how to calculate similarity between the aggregation taxonomies from Figure 1. I (Assembly-1): ,I = fi1; i2; i3g, R (Assembly-2): ,R = fr1; r2g, root(i1) = root(i2) = root(r1) = root(r2), root(i3) is di erent. Let assume that the following similarities were computed: sim(i1, r1) = 0.7, sim(i1, r2) = 0.8, sim(i2, r1) = 0.6, sim(i2, r2) = 0.7, consequently

= f(i1; r2); (i2; r1)g. Finally SIM A ( I ; R ) = (0.8+0.6)/(2+3-2) = 0.47.

4 Experiment The introduced structural similarity measure has been implemented in the REPRO system (Surma, 1996c). The main task of REPRO is the case-based retrieval of owsheets representing chemical processes. The system has been implemented on a SUN 10 Sparc workstation with the G2 expert-system development environment. The measure was tested on owsheets for the "hydrogenation C3" (13 cases) and "hydrogenation C6-C8" processes (14 cases). Each owsheet can be interpreted as an assembly of chemical equipment's e.g., reactors, pumps, etc. In the real owsheets the components are connected by pipes, but this topological dimension was not taken into account in this experiment. The average number of components on the "hydrogenation C3" and "hydrogenation C6-C8" are 8 and 34 respectively. All the available owsheets were especially modi ed by experts in order to obtain semantically correct matches. Based on this data, REPRO was tested by means of a "leave-one-out" method. For the "hydrogenation C3" the system retrieved the proper cases without error. The accuracy for the "hydrogenation C6-C8" was 0.62 (for the most similar case). This result is quite good if we realize that for the complex owsheets like "hydrogenation C6-C8" the topology of connections between components is important too.

5 Final Remarks As it was mentioned this approach is computationally tractable. The importance of the domain knowledge in this approach should be clearly underlined. The lack of exponential complexity and the correct matches have roots in the knowledge acquisition process. Firstly, by establishing an internal similarity measure between components (each kind of components may required a speci c similarity 59

function). Secondly, sometimes it might be necessary to establish a set of rules for controlling the semantic of matches, as was done in REPRO.

Acknowledgements We would like to thank Bertrand Braunschweig and Alan Charon from Institute Francais du Petrole for many discussion on the work reported in this paper. Special thanks to the anonymous reviewers for their excellent comments.

Bisson, G (1995). Why and How to to De ne a similarity Measure for Object Based Representation Systems. In: Towards to Very Large Knowledge Bases. ISO Press, Amsterdam, 236-246 Falkenhainer B., Forbus K. and Gentner D. (1989). The Structure-Mapping Engine: Algorithms and Examples. Arti cial Intelligence, vol.41, no.1, 1-64. Holyoak K. and Thagard P. (1989). Analogical Mapping by Constraint Satisfaction. Cognitive Science, vol.13, 293-355. Maher P. (1993). A Similarity Measure for Conceptual Graphs. International Journal of Intelligent Systems, vol.8, 819-837. Myaeng S. and Lopez-Lopez A. (1992). Conceptual graph matching: a exible algorithm and experiments. J.Expt, Theor. Artif. Intell., vol.4, 107-126. Rumbaugh J., Blaha M., Premerlani W., Eddy F. and Lorensen W. (1991). ObjectOriented Modeling and Design. Prentice-Hall Int. Surma J. and Braunschweig B. (1996a). Case-Based Retrieval in Process Engineering: Supporting Design by Reusing Flowsheets. Enginnering Applications of Arti cial Intelligence, Special Issue: AI in Design Applications 9(4). Surma J. and Braunschweig B. (1996). REPRO: Supporting Flowsheet Design by CaseBased Retrieval. In Advances in Case-Based Reasoning. I.Smith, B.Faltings (Eds.) Proceedings of the Third European Workshop, EWCBR-96, Lausanne, Switzerland. Springer Verlag, 400-412. Surma J. (Juillet 1996). REPRO ver.1.3. User Manual and Implementation. IFP Rapport. Voss A. et al. (1994). Similarity concepts and retrieval methods. FABEL Report No.13, Gesellschaft fuer Mathematik und Datenverarbeitung mbH, Sankt Augustin.

60

Genetic algorithms for analogical mapping Bjrnar Tessem Department of Information Science, University of Bergen, 5020 Bergen, Norway email: Bjornar.Tessem@i .uib.no

Abstract. The mapping phase of analogical reasoning is constrained

not only by semantic knowledge, but also by deeper relations between the entities of a case. This paper describes some experiments performed on the application of genetic algorithms to the mapping problem. We view the mapping problem as an optimization problem where the objective function must encompass both the structural and semantic constraints of analogical mapping. The experiments are done in the domain of software modeling, and the preliminary results are very promising.

1 Introduction The process of analogical problem solving can be split into the four phases of retrieval, mapping, transfer, and learning. Such steps are described both in the analogy litterature (Gentner, 1983; Kedar-Cabelli, 1988) and in case-based reasoning (Aamodt and Plaza, 1994). Much of the research on analogical reasoning has concentrated on the mapping phase. However, most approaches are symbolic (Falkenhainer et al., 1989; Owen, 1990), and do not consider techniques like genetic algorithms and neural networks in the mapping phase. An exception is Holyoak and Thagard's ACME program, which is a localist connectionist approach (Holyoak and Thagard, 1989). In the ROSA project (Tessem et al., 1994) we investigate the use of analogical reasoning to support the process of building object-oriented analysis models for software systems. The goal is to facilitate software component reuse by identifying reuse potential already detectable in the preliminaries of a software project. As a part of the project we approach the problem of mapping models in several ways, including the use of neural networks (Ellingsen and Tessem, 1997), and also as indicated in this paper, genetic algorithms (Goldberg, 1989). It would be interesting to see whether subsymbolic approaches could give results similar to symbolic methods on the analogical mapping problem, for which subsymbolic methods previously has not been considered suitable. In this paper we will show how mapping of software analysis models transform to a problem that can be attacked by genetic algorithms, and show some initial experimental results. 61

2 Genetic Algorithms and Analogical Mapping Genetic algorithms are an optimization technique based on the application of evolutionary principles (Goldberg, 1989). The idea is to maintain a population of solutions to an optimization problem, and to make this population evolve through mating and mutation of solutions. An individual is built from a set of genes that each indicates some property related to the individual. The genes can be any type of data, but are often set to binary values. Furthermore, the individuals are often vectors of binary values, even though other data structures are also possible. The basic procedure is to run the evolution process in a number of generations, let the individuals mate according to their tness to some objective function that we are to optimize, and create o spring that then become part of a new population. The individual that ts the objective function best after all generations have been run are returned as the solution. The software models that we try to map are given in a modeling language called OOram (Reenskaug et al., 1995). Every model describes some activity in the software system, and indicates which objects or roles (as they are called in OOram) are communicating with each other in this particular activity. Figure 1 contains two role models. a

Traveler

Authorizer Bookkeeper

Paymaster

Plan service

b

Authorizer

Authorizer Tool

Account service Budget service

Fig. 1. Role models for activities related to authorizing travels in a business enterprise (picked from Reenskaug et al. (1995)).

The rst model (a) shows the activity of authorizing a travel within a business enterprise. The traveler requests an authorizer who makes a decision and returns a con rmation or a rejection to the traveler. If the travel is accepted, the authorizer sends a message to the bookkeeper who in his turn informes a paymaster to transfer money to the traveler's bank account. The second model 62

(b) shows the activity of the authorizer. He checks with the enterprise's plans, accounts, and budgeting using an authorizer tool. (To the syntax of the models: The small circles attached to the roles indicates that there is message passing from a role to the connected role.) The graphs of OOram models are easily transformed to labeled directed graphs, as every role can be mapped to nodes in a directed graph, and every communication path between two roles can be mapped to edges. If there is message passing from a role to another there is a directed edge between the roles. The problem of nding analogies between graphs involves nding a one-toapproximately-one mapping between the nodes of two graphs which also preserves, to a large extent, the structure of the graphs. We do not require an exact one-to-one map, as several roles may match into the same role of an analogical model, which is consistent with Owen's (1990) discussion of the partial homomorphism constraint. In addition to structure the mapping must take into consideration semantic similarity between the node labels. It is obvious that such a mapping can be represented by a n  m binary matrix, where n is the number of nodes in the target graph and m is the number of nodes in the base graph. A 1(0) in an entry of the matrix indicates a map(non-map) between the nodes. Figure 2 shows two analogous graphs with their mapping represented as a binary matrix. a3

a1

a2

a4

a5 b5

b4

a1 a2 a3 a4 a5

b1 b2 b3 b4 b5 0 0 0 1 0 0 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 0 0

b2 b1

b3

Fig. 2. Two analogous graphs and their mapping represented as a binary matrix. As we now have a representation of the problem suitable for genetic algo63

rithms, the next step is to identify an objective ( tness) function to be maximized. This is very much a trial-and-error process, but some aspects must be taken into consideration: 1. The semantic similarity of two nodes that map. We denote the similarity between node i of the target and node j of the base by sim(i; j ) 2 [0; 1]. In a real application of this approach this similarity is found by computing a semantic distance between the names of the roles, names of messages they send, and other relevant information connected to the roles and aggregating this into an overall measure of similarity between every pair of roles. So roles with higher similarity values will tend to map. 2. We do not want many ones in the matrix. We denote the number of ones in the matrix by o. The ideal would be around max(n; m), and we will have to subtract some amount if o is far from this number. 3. We want approximately one 1 in each row and column to ensure the partial homomorphism contraint. We denote the number of ones in row i by or and the number of ones in column j by oc . We reduce tness if these are larger that 1. 4. We want structural similarity. That is, for each pair of mapped nodes, their neighbours should also map. We may call this property a local isomorphism. We denote the number of local isomorphisms by s. Figure 3 shows local isomorphism for the mapping a1!b4 and a2!b2. i

j

a1

a2

b4

b2

Fig. 3. A local isomorphism between parts of two graphs. How these aspects should be combined is a matter of experimentation. One possibility is to use a linear combination. An objective function for the mapping represented by the matrix A could then be

X obj (A) = nm , Ai;j sim(i; j ) ,  jo , max(n; m)j , X i;j X

( max(0; or , 1) + max(0; oc , 1)) +   s i

i

j

64

j

If we chose this form for our objective function, our problem is now to nd 1. Suitable values for the parameters ; ; and . 2. A suitable genetic algorithm for the population of possible mappings including survival rate, mutation rate, and mating strategy

3 Experiments In order to perform experiments with large amounts of data, we have in the ROSA project devised an algorithm that generates random graphs that are similar to software models. In addition we make copies of the graphs which are changed slightly by adding and deleting edges. The graphs in Figure 2 are typical examples of such graphs. The modi ed copy represents an analogue to the original graph, but with renumbered nodes. The mapping that we have between the original and the copy is preserved even though we add and delete edges. This mapping is considered to be the ideal mapping for the graphs. Finally, we generate a matrix of random numbers in the interval [0; 1] which are used to represent semantic similarity between roles. Nodes that map in the ideal mapping are given similarity 1 with a random triangular number subtracted, whereas nodes that do not map are given similarity 0 plus some noise. This type of generated data represents realistic examples for experiments. The goal for the genetic algorithm would be to establish the ideal mapping on the basis of the two graphs and the node similarities. A random generated similarity matrix for our example graphs are given in Figure 4. b1 b2 b3 b4 b5 a1 .170 .472 .107 .984 .241 a2 .636 .599 .309 .177 .893 a3 .297 .453 .362 .070 .746 a4 .775 .589 .314 .063 .128 a5 .567 .322 .714 .724 .868

Fig. 4. Semantic similarity matrix for the graphs of gure 2. In the experiments so far we have been using a simple genetic algorithm where the whole population has been replaced in each generation, except for the best individual. We have used a population with 60 individuals and run the algorithm for 800 generations. The mutation probability is 0.001 for each binary value, while the crossover probability for two selected individuals is 0.9. We use a roulette wheel strategy for selecting individuals for mating, that is, they are chosen with a probability relative to the tness score of the individual. Mating is done with a single crossover. For the example graphs the genetic algorithm usually nds the ideal mapping. However, it occasionally mixes up terminal 65

nodes of the graphs, that is, nodes like a3 and a5. In the more elaborate runs done we have used random graphs with about ten nodes, and there are three changes (edge deletion or addition) to each copy. The results so far indicate that with a proper choice of values for the parameters of the objective function the algorithm will nd about 75% of the correct mappings for random analogues with about ten nodes. The mappings are compared to the ideal mapping obtained in the process of generating the random analogues. These values have been used for the parameters = 1:0; = 0:5; = 0:4;  = 0:2 The value of  works ne when the two graphs have approximately ten nodes, but the number of ones gets too large when the number of nodes increases, and it gets too low when the number of nodes is below ten. This indicates that this parameter, and perhaps also the other parameters should vary with the size of the graphs. More experiments should be performed to investigate the parameters of the objective function. And even though the parameters and strategies chosen for the genetic algorithm seem to be quite satisfactory, other representations and strategies should be tested.

4 Discussion We have described how one can obtain analogical mappings for software analysis models by using genetic algorithms. The models are transformed to labeled directed graphs and mapped by using a genetic algorithm to optimize a score on potential mappings. The experiments are promising and indicate a potential for the approach. The use of genetic algorithms on this problem has a computational complexity of at most O(n4 ) (for very dense graphs) if we keep the number of individuals and generations constant. The complexity lies in the computation of the objective function for each individual. This contrasts to the structure mapping engine (SME) of Falkenhainer et al. (1989) and Holyoak and Thagard's (1989) ACME which are both exponential. However, experiments indicate that the proportion of correct mappings decrease as the number of nodes grow. This suggests that the number of generations and individuals should be increased when the size of the graphs to be mapped increases. Still it is likely that we could get an algorithm that has sub-exponential complexity. As for both SME and ACME this method focuses on the mapping phase, and we assume that there has been run a retrieval process that has chosen a good analogue. We also assume that all semantic information is already aggregated and available in the matrix form as in Figure 4. It is reasonable to believe that the approach is transferable to other domains for analogical reasoning or case-based reasoning. In many problems it is possible to model both the entities and their relations by graph structures, and to also estimate the semantic similarity between the entities of cases. To be able to use 66

the approach in other settings, one has to establish more knowledge on how to build tness functions for di erent situations.

Acknowledgements The software for this work used the GAlib genetic algorithm package, written by Matthew Wall at the Massachusetts Institute of Technology

Aamodt, A. and Plaza, E. (1994). Case-based reasoning: Foundational issues, methodological variations, and system approaches, AI Communications 7(1): 39{59. Ellingsen, B. and Tessem, B. (1997). A hybrid model for combining structural and semantic constraints on analogical mapping, Working paper. Falkenhainer, B., Forbus, K. D. and Gentner, D. (1989). The structure-mapping engine: Algorithm and examples, Arti cial Intelligence 41: 1{63. Gentner, D. (1983). Structure mapping: A theoretical framework for analogy, Cognitive Science 7(2): 155{170. Goldberg, D. E. (1989). Genetic Algorithms in Search, Optimization, and Machine Learning, Addison-Wesley. Holyoak, K. J. and Thagard, P. (1989). Analogical mapping by constraint satisfaction, Cognitive Science 13: 295{355. Kedar-Cabelli, S. (1988). Analogy|from a uni ed perspective, in D. Helman (ed.), Analogical Reasoning, Kluwer Academic Publishers, pp. 65{103. Owen, S. (1990). Analogy for Automated Reasoning, Academic Press, London. Reenskaug, T., Wold, P. and Lehne, O. A. (1996). Working With Objects. The OOram Software Engineering Method, Manning Publications Co. Tessem, B., Bjrnestad, S., Tornes, K. and Steine-Eriksen, G. (1994). ROSA = Reuse of Object-oriented Speci cations through Analogy: A project framework, Technical Report No. 16, ISSN 0803{6489, Dept. of Information Science, University of Bergen.

67

Using Knowledge Containers to Model a Framework for Learning Adaptation Knowledge Wolfgang Wilke, Ivo Vollrath, Ralph Bergmann University of Kaiserslautern Centre for Learning Systems and Applications (LSA) Department of Computer Science P.O. Box 3049, D-67653 Kaiserslautern, Germany fwilke, vollrath, [email protected]

Abstract. In this paper we present a framework for learning adapta-

tion knowledge which knowledge light approaches for case-based reasoning (CBR) systems. Knowledge light means that these approaches use already acquired knowledge inside the CBR system. Therefor we describe the sources of knowledge inside a CBR system along the di erent knowledge containers. After that we present our framework in terms of them. Further we apply our framework in a case study to one knowledge light approach for learning adaptation knowledge. After that we point on some issues which should be addressed during the design or the use of such algorithms for learning adaptation knowledge in the discussion. From our point of view many of these issues should be the topic of further research.

1 Introduction Until now there are only few investigations in learning adaptation knowledge. Some approaches for learning adaptation knowledge can be found in DIAL (Leake, 1993, 1995b,a) and also in CHEF (Hammond, 1986, 1989). These systems use knowledge intensive derivational analogy approaches (Carbonell, 1986; Veloso and Carbonell, 1993) to learn adaptation knowledge. Knowledge intensive means that these approaches require a lot of background and problem solving knowledge. For example in DIAL and CHEF adaptation strategies for special problem elds are acquired based on general domain knowledge. So a reduction of the knowledge acquisition cost is not necessarily the case because the adaptation knowledge engineering e ort is costly. In this paper we want to focus on what we call knowledge light approaches for learning adaptation knowledge1. Knowledge light means that these algorithms don't presume a lot of knowledge acquisition work before learning; they use previously acquired knowledge inside the system for learning adaptation knowledge. The bene t of these approaches is to overcome the knowledge acquisition bottleneck (Feigenbaum and McCorduck, 1983) which arises when the acquisition and explicit representation of general domain and problem solving knowledge is necessary to solve a problem. Some examples for these approaches are: 1

In the following we always use the term learning adaptation knowledge for knowledge light approaches.

{ The learning of parameters used during adaptation from previously acquired

knowledge. An example for that is the learning of the best k for k-NN retrieval. The feature weights learning algorithms VSM (Lowe, 1995) and also k , NNvsm (Wettschereck and Aha, 1995) calculate the optimal k in a simple search over all possible values for k. { CARMA (Hastings et al., 1995) learn featural adaptation weights with a hill climbing algorithm from the case base. There is the possibility to learn global adaptation weights or local weights for each prototypical case. The weights are used during adaptation for the determination of the in uence of each feature on the target value. { First more complex work, were adaptation rules are derived from the already acquired knowledge was done by (Hanney, 1996; Hanney and Keane, 1996). They use an inductive learning algorithm to extract adaptation knowledge from (the cases in) the case base. We will have a closer look to this approach in our case study in Section 3.2. In this paper we begin by categorizing the knowledge inside a CBR system with the knowledge containers rst described by (Richter, 1995). Based upon this, we sketch a framework for learning adaptation knowledge from these containers. This could be seen as a starting point for the design of adaptation learning algorithms as well as an early starting point for a knowledge modeling methodology for adaptation learning approaches. Since we focus here on knowledge light approaches, the knowledge elicitation process is almost done and the e ort mainly come in the knowledge modeling task.

2 Di erent Sources of Knowledge in a CBR System Richter (Richter, 1995; Altho et al., 1997) described four containers in which a CBR system can store knowledge. Knowledge means here domain knowledge as well as problem solving knowledge that describes the \method of application" of the domain knowledge inside the container. These four containers are: 1. 2. 3. 4.

the vocabulary used to describe the domain, the case base, the similarity measure used for retrieval and the solution transformation used during the adaptation.

In general, each container can hold all the available knowledge, but this is not advisable. The rst three containers include compiled knowledge. By \compile time" we mean the development time before actual problem solving, and \compilation" is taken in a general sense including human knowledge engineering activities. The case base consists of case speci c knowledge that is interpreted at run time, i.e. during the process of problem solving. For compiled knowledge, especially if manually compiled by human beings, the acquisition and maintenance task is as dicult as for knowledge-based systems in general. However, for 69

interpreted knowledge the acquisition and maintenance task is potentially easier because it requires updating only the case base2 . Part of the attractiveness of CBR comes from the exibility to pragmatically decide which container includes which knowledge and therefore to choose the degree of compilation versus case interpretation. When developing a CBR system a general aim should be to manually compile as little knowledge as possible and as much as absolutely necessary. This results from the fact that the compilation of knowledge is costly during the development of a CBR system. More precisely, a CBR system developer has to decide how the knowledge is distributed to the different containers depending on the availability of knowledge and the engineering e ort. If there is not enough knowledge available to ll one container as requested, there is a need for knowledge transformation from some containers to others. An example for such a transformation can be found in (Globig and We, 1993) where an improvement of the similarity measure is learned by knowledge transfer from the case base into the similarity container. If knowledge is transferred into the adaptation container this is a knowledge light approach for learning adaptation knowledge because the knowledge used is already available and coded in some of the other containers. We will focus here on learning adaptation knowledge by knowledge transfer from other containers into the adaptation container or improving the adaptation container itself with already acquired adaptation knowledge. We will provide a general framework that permits the comparison and evaluation of di erent approaches for solving this problem.

3 A Framework for Learning Adaptation Knowledge In this section we describe our framework for learning adaptation knowledge in the light of the aforementioned knowledge containers. After that we present a case study which classify an approach in our framework.

3.1 Learning Adaptation Knowledge from Knowledge Containers

Figure 1 is an abstract view of the process for learning adaptation knowledge with an inductive algorithm. The sources of knowledge are the previously described knowledge containers: the vocabulary, the similarity measure, the case base and the adaptation containers. This knowledge is transformed into adaptation knowledge using a learning algorithm. Here we focus on inductive algorithms because they generate general knowledge from examples. Often a CBR system consists of examples because the available model of the real world problem is incomplete. At rst, there must be a selection of knowledge from the containers to learn from. Depending on the kind of selected knowledge and the inductive algorithm 2

Adding new cases to the case base could also cause the need for re-engineering some of the compiled knowledge, but this should not happen very often while maintaining a CBR system.

70

vocabulary similatity measure

inductive learning algorithm

preprocessor case base

improved adaptation knowledge

examples adaptation

Fig. 1. How adaptation knowledge are learned from the knowledge containers used, this data has to be preprocessed into a suitable representation. The result is a set of examples. Every example is characterized by a set of attributes3 . These attributes are derived from the knowledge in the containers. It is also necessary to integrate the output of the learning process with the old adaptation knowledge to form an adaptation container with the improved knowledge. Improved means here that the learned adaptation knowledge lead to a better overall accuracy of the CBR system. This could be easily evaluated by the comparison of the CBR systems with the adaptation knowledge before and after learning.

3.2 A Case Study Let us now apply our framework to the rule learning approach already mentioned in the introduction. This approach towards learning adaptation knowledge has been described by (Hanney and Keane, 1996). Their algorithm builds pairs of cases and uses the feature di erences of these case pairs to build adaptation rules. We will brie y describe this algorithm in terms of our framework: { The preprocessor builds pairs from all possible cases and extends them by noting the feature di erences and the target di erence of these cases. Information from the containers `case base' and `vocabulary' is needed. Hanney and Keane also suggest to constrain the case pairs by limiting their number or by taking advantage of the similarity measure to select pairs suitable for learning. Here the preprocessor needs also knowledge from the similarity container. { The example input for the learning algorithm are the case pairs computed by the preprocessor. { In a rst step the learning algorithm builds adaptation rules by taking a case pair's feature di erences as preconditions and the target di erence as the conclusion. These rules are subsequently re ned and generalized to extend the coverage of the rule base. For complexity reasons, these generalizations are only performed at adaptation time. 3

Attributes are the descriptions of the learning examples for the inductive algorithm.

71

Hanney and Keane also describe how the learning of rules may be constrained and guided by explicitly known domain knowledge which has been manually compiled into the adaptation container before the automatic rule learning takes place. If this kind of information is to be exploited, the preprocessor must also use the adaptation container as a source of input. { The dynamic part of the adaptation container consists of a set of adaptation rules along with some additional information such as con dence ratings for each rule. These con dence ratings indicate the reliability of a rule's information. They are calculated by the learning algorithm based on the degree of generalization that has been applied to generate the associated rule. The algorithm that controls the rule application and the strategy for resolving con icts is the non dynamic part of the adaptation container (see (Hanney, 1996; Hanney and Keane, 1996) for details). They also describe an approach for integrating the learned rules into already known rules. Here learning is the determination and the generalization of the rules and the improvement of the con dence ratings.

4 Further Directions and Discussion The main point of this paper is to provide a starting point for a framework for learning adaptation knowledge with knowledge light approaches. Now we point out some issues which should be addressed during the design or the use of an algorithm for learning adaptation knowledge. One should ask oneself some questions and be aware of some aspects that have signi cant in uence on the applicability of the nal algorithm and the quality of its results. We now categorize some of these issues according to the terms of our framework. From our point of view many of these open questions should be the topic of further research. In terms of our framework, this could mean:

{ General Issues for Designing a Learning Algorithm

In general rst there should be the question: Is a knowledge light approach promising for my learning task? This depends mainly on the learning goal, and on the available knowledge. Especially the knowledge in the containers must be sucient to solve the learning task. The learning goal and the available knowledge constrains the possible inductive learning algorithms which might be useful for the learning task.

{ Issues about the Preprocessing of the Knowledge

After the availability of the knowledge is approved, a learning algorithm is selected and the learning goal is well de ned, the designer has to focus on the selection and preprocessing of the information from the knowledge containers. So here arise many "How to" questions like: How to select the knowledge to learn from, measure the quality of knowledge for learning, construct the learning examples, handle noisy or unknown data, derive additional knowledge from containers for the learning algorithm, etc. If the di erent sources of 72

knowledge are contradictory inside a knowledge container or between di erent containers there might be the necessity for a con ict resolution strategy during the selection of the knowledge.

{ Issues for Choosing a Learning Algorithm

Concerning the selected learning algorithm there are also some interesting points to inquire. It might be useful to know how many learning examples the algorithm needs and how to solve con icts between contradictory examples during learning. Also the designer has to prove that the learning algorithm is appropriate for the learning task and the given examples. There are many other issues concerning learning algorithms which are addressed in the research area of algorithmic learning theory. Currently this is a research topic. For example, the "International Conference on Algorithmic Learning Theory" this year will have a special track on learning in the area of CBR (ALT97, 1997). Also the designer has to take care about a transformation or integration of the output from the learning algorithm into the already known adaptation knowledge.

{ Issues concerning the Integration of the Learned Knowledge

After the learning step, the learned knowledge has to be integrated into the already known adaptation knowledge. So the designer has to prove that this is possible and how to manage that task. The improved adaptation container also should be evaluated against the original adaptation knowledge. Possible criteria for the evaluation are: quality, correctness or coverage of the old and the new knowledge. Here also the problem might occur that the learned knowledge is contradictory to the old knowledge. So con ict solution might also be a task during this process.

However we believe that the discussed issues are mostly unanswered today and we see here many exercises for further research. Finding answers to some of the addressed issues could lead to a methodology for the design, classi cation and comparison of knowledge light adaptation learning algorithms. The classi cation of adaptation learning approaches could lead to a more detailed view to the problem and could improve the framework. Also an extension of the framework for knowledge intensive approaches could enrich it. However, complex adaptation knowledge is hard to acquire with such techniques. Nevertheless, there are a number of problems where knowledge light approaches might be useful like classi cation tasks with symbolic targets or regression tasks. Also the learning of parameters for more sophisticated adaptation solutions might be possible with such learning algorithms. Further the design of an abstract learning algorithm with a high exibility on crucial points may help to identify and hopefully to solve some current problems. Later the work on that topic could lead to a methodology for designing adaptation knowledge learning approaches. To nd a broader methodology for the design and the maintenance of a CBR system is also one of the major points of the INRECA-II project. 73

Acknowledgments

The authors would like to thank Prof. Michael M. Richter for helpful discussions and suggestions. This work was funded by the Commission of the European Communities (ESPRIT contract P22196, the INRECA II project: Information and Knowledge Reengineering for Reasoning from Cases) with the partners: AcknoSoft (prime contractor, France), Daimler Benz (Germany), tecInno (Germany), Irish Medical Systems (Ireland) and the University of Kaiserslautern (Germany).

ALT97 (1997). International Conference on Algorithmic Learning Theory - Call for Papers. http://www.maruoka.ecei.tohoku.ac.jp/~alt97/cfp.html. Altho K.-D., Richter M. M. and Wilke W. Case-Based Reasoning: A New Technology for Experienced Based Construction of Knowledge Systems. forthcoming. Carbonell J. G. (1986). Derivational analogy: a theory of reconstructive problem solving and expertise acquisition. In Michalski, R., Carbonnel, J., and Mitchell, T., editors, Machine Learning: an Arti cial Intelligence Approach, volume 2, pages 371{ 392. Morgan Kaufmann, Los Altos, CA. Feigenbaum, E. and McCorduck, P. (1983). The fth Generation. Addison Wesley. Globig, C., We, S. (1993). Case-based and symbolic classi cation algorithms - a case study using version space. In Richter, M. M., We, S., Altho , K.-D., and Maurer, F., editors, Proceedings First European Workshop on Case-Based Reasoning, Lecture Notes in Arti cial Intelligence, 837, pages 133{138. Springer Verlag. Hammond, K. J. (1986). Learning to Anticipate and Avoid Planning Problems through the Explanation of Failures. In Proceedings of the 5th Annual National Conference on Arti cial Intelligence AAAI-86, pages 556{560, Philadelphia, Pennsylvania, USA. AAAI86, Morgan Kaufmann Publishers. Hammond, K. J. (1989). Case-Based Planning: Viewing Planning as a Memory Task. Academic Press, Boston, Massachusetts. Hanney, K. (1996). Learning Adaptation Rules From Cases. In Diploma Thesis. Trinity College, Dublin. Hanney, K., Keane, M. (1996). Learning Adaptation Rules From a Case Base. In Smith, I. and Faltings, B., editors, Advances in Case-Based Reasoning (EWCBR-96), pages 179{192. Springer Verlag. Hastings, J. D., Branting, L. K., and Lockwood, J. A. (1995). Case adaption using an incomplete causal model. In Veloso, M. and Aamodt, A., editors, Case-Based Reasoning - Research and Development, pages 181{192. Springer Verlag. Leake D. B. (1993). Learning Adaptation Strategies by Introspective Reasoning about Memory Search. In David Leake, editor, Proceedings AAAI-93 CaseBased Reasoning Workshop, volume WS-93-01, pages 57{63, Menlo Park, CA. ftp://ftp.cs.indiana.edu/pub/leake/INDEX.html, AAAI Press. Leake, D. B. (1995a). Becoming An Expert Case - Based Reasoner: Learning To Adapt Prior Cases. Proceedings of the Eight Annual Florida Arti cial Intelligence Research Symposium, pages 112{116. ftp://ftp.cs.indiana.edu/pub/ leake/INDEX.html. Leake, D. B. (1995b). Combining Rules and Cases to Learn Case Adaptation. In in press, editor, Proceedings of the Seventeenth Annual Conference of the Cognitive Science Society. ftp://ftp.cs.indiana.edu/pub/leake/INDEX.html.

74

Lowe, D. (1995). Similarity metric learning for a variable-kernel classi er. Neural Computation, 7:72{85. Richter, M. M. (1995). The knowledge contained in similarity measures. Invited Talk on the ICCBR-95. http://wwwagr.informatik.uni-kl.de/~lsa/ CBR/Richtericcbr95remarks.html. Veloso, M. M. and Carbonell, J. G. (1993). Toward scaling up machine learning: A case study with derivational analogy in prodigy. In Minton, S., editor, Machine Learning Methods for Planning, chapter 8, pages 233{272. Morgan Kaufmann, San Mateo. Wettschereck, D. and Aha, D. W. (1995). Weighting features. In Veloso, M. and Aamodt, A., editors, Case-Based Reasoning Research and Development, pages 347{ 358. Springer.

75

Instance-Based Classi cation of Cancer Cells Christel Wisotzki1 and Peter Hufnagl2 1 Fraunhofer-IITB, EPO-Berlin, Kurstrae 33, D - 10117 Berlin, Germany [email protected] 2 Humboldt-Universitat, Schumannstrae 20/21, D - 10117 Berlin

Abstract. Classi cation methods for measurement curve processing are

presented in the paper. These methods are holistic in the following sense: Distance or similarity measures applied to whole curves are used in place of descriptions of these curves by feature vectors of xed length. The presented methods work in two steps. The rst one is a pre-processing step, where the curves are approximated by piecewise linear functions. The approximation procedure eliminates noise from the measurement curves. For the second step di erent instance-based classi cation methods are applied to the pre-processed curves. The potential application of curve classi cation methods to medicine { the classi cation of cancer cells { is shown. Comparative genomic hybridization (CGH) is a molecular genetic method which makes the genetic alteration of cancer cells visible as so-called ratio-pro le lines. The presented curve classi cation methods can help to answer to di erent questions about the connection of genetic alteration and the tumor behavior.

1 Introduction Most research on classi cation learning has focussed on methods for objects represented as feature vectors. Yet in di erent elds, whole curves as well as feature vectors are given for solving forecasting and classi cation problems. There are a lot of useful, ecient generic methods for classi cation learning (see Michie et al. 1994) like decision tree methods, decision surface methods, inductive logic programming methods and others. Most of them are based on feature vectors of a xed dimension. To use these methods to measurement curves corresponding feature vectors must be generated. In some cases there are de nite curve values (e. g. values, number and location of local extremes, absolute extremes) containing the information of the curve relevant for the decision. The present paper does not consider this case where additional information is supposed. In general, a xed number of points and a xed location are selected for a feature vector. But the number and location of the points building the feature vector must be determined in advance, without knowing which points or segments contain relevant information about the curve. Therefore, \holistic" classi cation methods (in the sense that distance or similarity measures applied to whole curves are used in place of descriptions of the curves by feature vectors of xed length) are needed. 76

Most instance-based classi cation methods were also designed for feature vectors with xed dimensions, but they really need only a similarity measure and in some cases linear operations. Thus, many instance-based classi cation methods can be used for object sets with a metric, like graph sets, sets of symbol strings, or functional sets. In Section 4 we present algorithms for classi cation learning of measurement curves based on similarity measures. The prototype method, that needs linear operation, is similar to Kohonen's (1989) LVQ and Bradshaw's (1987) instance averaging technique. These methods are developed for feature vectors of a xed dimension. According to the method of Bradshaw the feature vectors can have weighted in uence on the reference object. In each learning step of Kohonen's LVQ the nearest prototype is updated. For a new learning object X the nearest prototype P (X ) will be moved nearer to X if X and P (X ) belong to the same class. In the opposite case P (X ) goes away from X . However, the number of reference objects is not adapted to the data but must be de ned in advance. Furthermore, the potential application of curve classi cation to medicine the classi cation of cancer cells with the help of CGH ratio-pro le lines - will be described in Section 2. Measured curves are noisy in all probability. Furthermore, the number of measurements can be large. Therefore, a pre-processing step is necessary for noise elimination and information compression. In Section 3, di erent pre-processing methods for data reduction are brie y described. A detailed description of the algorithm is given in Wisotzki and Wysotzki (1994a and 1994b).

2 Comparative Genomic Hybridization Method (CGH) CGH is a molecular genetic method for the detection of chromosomal imbalances between a tumor and a normal genome. In order to get quantitative results, an image analysis program was developed that can detect genetic alterations and map them onto a CGH sum karyogram. A CGH sum karyogram documents the genetic changes as color-coded chromosomes. In a further processing step the sum karyograms can be transformed to ratio-pro le lines which make the quantity of the genetic alteration visible as a curve. Figure 1 shows the ratio-pro le lines of all chromosomes of a genome. Di erent questions about the connection between the tumor behavior and the corresponding CGH ratio-pro le lines exist, including { Is it possible to perform a genetic \tumor grading"? Medics grade tumors with respect to their malignant potential. The number of grades is not uniform. In general, there are three classes:  not malignant,  malignant,  extremely malignant. { Is there any connection between morphologically de ned tumor stages and progression on the one hand and the appearance of genetic alterations on the other hand? 77

Possible morphologically de ned stages can be:  large cell carcinom and  small cell carcinom. { Is it possible to conclude from genetic alterations of special tumor cells to probable metastasis formation? There are two classes in the corresponding classi cation problem:  metastasis and  no metastasis. The problem is to be solved for di erent kinds of tumors. Along with statistical methods (e. g. cluster analysis) the di erent curve classi cation methods applied to the ratio-pro le lines are supposed to help to answer these questions. A corresponding project is planned for the near future. In the moment of completing this paper only a small number of test data (CGH-pro le lines) were available, i. e, test results can only be presented in the future.

!()+,-./01 23456789:;
3 Pre-processing 3.1 Approximation

A piecewise linear spline approximation procedure can be used as the rst preprocessing step for noise elimination and information compression. Let X = 78

fxi j i = 0; 1;    ; m , 1g be the set of m equidistant measurement points of a

curve. The task is to approximate this curve by n linear regression functions in corresponding subintervals. This task corresponds to n simple linear regression problems for a given number n of subintervals and their locations. But in order to get an approximation similar to the measured curve the joints should be adapted to the structure of the curve. Wisotzki and Wysotzki (1994a) de ned a suitable approximation algorithm. It di ers from other spline approximation methods by the fact that the subintervals (number and location) where the curve is smoothly replaced are de ned by the algorithm optimally with respect to the approximation property of the corresponding spline function. The algorithm is based on a clustering technique for the measurements, i. e., the resulting clusters are subsets of the set of measurement points that are approximated by one straight line. The sum of all quadratic di erences between the measured points and the corresponding values on the regression line is used as the criterion function. The clustering algorithm consists of two stages. In the rst stage the number of joints is de ned with the help of an agglomerative method. The result depends on the target approximation accuracy and serves as an initial solution for the exchange method in the second stage, which optimizes the clustering resulting from the rst stage. This approximation procedure eliminates noise from the measurement curves. If the curves are given over the same measurement interval then a similarity measure in their set can be de ned as in Wisotzki and Wysotzki (1994a, 1994b) by the help of functional distances (e. g. the integral over the squared di erence of the functions).

3.2 Processing of Symbol Strings If this assumption is not ful lled, then another pre-processing step seems to be necessary. If the measurement curves come from di erent time intervals, it is impossible to know, which curve pieces from one curve correspond to which curve pieces from another. The approximation functions can be mapped onto symbol strings. A mapping onto symbol strings is obtained in di erent ways and depends on the concrete application. If it is known that the curves consist of segments of de nite shape (e. g. certain peaks, pieces with constant slope etc.) then a symbol can be assigned to every segment. By this way a series of symbols (a string) is generated. An example in Wisotzki and Wysotzki (1994b) shows how symbol strings can be obtained by the help of expert rules. Another method to map spline functions onto symbol strings is yielded by discretizing the spline parameters. The problem of de ning a similarity measure in the set , of symbol strings of nite length, where their components belong to the same alphabet can be solved by considering the so-called compatibility graph. The compatibility graph is de ned in the following way:

Nodes: All pairs of identical symbols (one from each string) form its nodes. 79

!()+,-./01 23456789:;
Fig. 2. Compatibility graphs (The natural numbers stand for the positions of the symbol in the strings, e. g. 1j3 means the node consisting of the rst symbol from the rst string and the third symbol from the second string.)

Two nodes are compatible if the corresponding symbols have the same relation (distance) within their strings. Arcs: Two compatible nodes are joined by an arc. The following simple example demonstrates the formation of compatibility graphs from symbol strings.

Example

We consider the four strings

S = (a b c a b), S1 = (a b a b c), S2 = (a b c a g) and S3 = (a b c d a b). Figure 2 shows the compatibility graphs of the string pairs (S; S1 ), (S; S2 ) and (S; S3 ). From the heuristic point of view, two string S and T have a large similarity if their compatibility graph has large connected subareas. In deed, a similarity measure can be de ned using the cardinal number of the maximal induced connected subgraph (maximal clique) n(S; T ). The graph metric d(S; T ) = max[n(S ); n(T )] , n(S; T ); (S; T 2 , ) is based on this value and was introduced by Zelinka (1975) and generalized by Kaden (1982) and Sobik (1982) to the set of all nite directed labeled graphs (n(S ) - cardinal number of S ). Wisotzki and Wysotzki (1994b) showed that the Zelinka metric can be used for symbol strings without the problems arising in 80

the general case. For general graphs the calculation of n(S; T ) is an NP-complete problem. Compatibility graphs generated from one-dimensional symbol strings have a useful property: They consist of disjunctive cliques. Figure 2 shows the disjunctive cliques of the compatibility graphs of the example, and the cardinal numbers of the maximal cliques are well visible. Thus, the search of the maximal clique is considerably simpler. Now, we return to the example from the beginning of this section. Let us calculate the Zelenka metric of the string pairs (S; S1 ), (S; S2 ) and (S; S3 ):

d(S; S1 ) = max[5; 5] , n(S; S1 ) = 5 , 3 = 2; d(S; S2 ) = max[5; 5] , n(S; S2 ) = 5 , 4 = 1; d(S; S3 ) = max[5; 6] , n(S; S3 ) = 6 , 3 = 3. With respect to the Zelenka metric S2 is most similar to S .

4 Classi cation Methods for Curves Instance-based classi cation methods are founded on similarity or distance measures between the objects to be classi ed and certain reference objects, which generalize the classes. The reference objects are generated in the learning phase from a teaching set of pre-classi ed curves. In the test phase an unseen object is assigned to the class of the most similar or the nearest reference object. These methods di er from each other in how they generate the set of reference objects. Section 3 describes the construction of a similarity (or distance) measure. Thus, instance-based methods can classify curves using this measures.

4.1 The Nearest Neighbor Method and Modi cations For the simple nearest neighbor method (NNM) of Cover and Hart (1967) all learning objects are stored as reference objects. Its performance is excellent in many cases, but the computational amount in the test phase is proportional to the number of training examples. The di erent modi cations of the NNM (e. g. IB2, IB3 of Aha et al. (1991), CNN of Hart (1968) and RNN of Gates (1972)) reduce the number of reference objects while they often achieve equal or improved classi cation accuracy. All of them select the reference objects from the training set where no operations (like addition, multiplication by a scalar, ...) are needed. Thus, these methods can be used to the string representation of curves with the Zelenka metric as distance measure. IB2 is identical to NNM except that it stores only misclassi ed objects in the set of the reference objects. It attempts to store only those objects that approximate the boundaries of the class regions (where NNM approximates the whole class regions). The computational expenditure in the test phase is reduced to an amount proportional to the number of stored training examples. IB3, CNN and RNN work similarly. 81

4.2 Prototype Methods (PM) If piecewise linear functions on the same interval are obtained as the result of pre-processing, then linear operations in the set of approximating functions are possible. In contrast to the methods described in Section 4.1 prototype methods use linear operations for generating the reference objects. The simplest method to generate a prototype for a class is to take the arithmetic mean of all objects of this class. According to the simple prototype method in Wisotzki and Wysotzki (1994a) for each class one prototype is generated and stored as reference object. Obviously, this method is very time-e ective in the test phase. Its further merits include compact representation of the training data and the ability to interpret the prototypes. However, its classi cation accuracy is insucient if the class regions consist of non-connected subregions. For this reason a generalization (GPM) was developed by Wisotzki and Wysotzki (1995). According to GPM the prototype generation is adapted to the training data. The algorithm starts with an initialization step. For every class the approximating function of an object belonging to this class is chosen for a class prototype. Let  = fP10 ; P20 ;    ; PL0 g denote the initial set of reference objects. Now the prototypes are taught incrementally by the remaining training objects. For any new training object X the approximation function zX is calculated. Let P 2  be the nearest to zX prototype. If P represents the same class as

X

{ then P is expanded by the formula P := (nP =(nP + 1))(P + zX ) where nP denotes the number of objects having an in uence on P ;

{ else a new prototype is opened by zX , i. e.,  :=  [ zX .

Thus, the number of prototypes per class is adapted to the structure of the class regions.

5 Further work We have shown that a large number of instance-based methods can be used for curve classi cation. Thus, we have the theoretical base for classi cation of CGH ratio-pro le lines. Unfortunately, in the moment of completion of this paper a suciently large number of CGH pro le lines was not available. I. e., an extensive comparison of the di erent methods can only be done in the future. The interesting question how to transform the ratio-pro le lines onto strings must be discussed with the partners working in the eld of CGH. 82

Acknowledgments We thank Karl Roth from Charite of the Humboldt-Universitat Berlin for the graphical presentation of the ratio-pro le lines and the anonymous referees for their helpful comments on our presentation.

Aha, D. W., Kibler, D., Albert, M. K.: Instance-Based Learning Algorithms. Machine Learning 6 (1991) 37{66. Bradshow, G., L.: Learning about speech sounds: The NEXUS project. In: Proceedings of the Fourth International Workshop on Machine Learning, Irvine, CA: Morgan Kaufmann (1987) 1{11. Cover, T. M., Hart, P. E.: Nearest Neighbor Pattern Classi cation. IEEE Transactions on Information Theory 13 (1967) 21{27. Gates, G. W.: The Reduced Nearest Neighbor Rule. IEEE Transactions on Information Theory 18 (1972) 431{433. Hart, E. P.: The Condensed Nearest Neighbor Rule. IEEE Transactions on Information Theory 14 (1968) 515{516. Houldsworth, J., Chaganati, R.: Comparative Genomic Hybridization: an Overview. American Journal of Pathology 145 (1994) 1253{1260. Kaden, F.: Graphmetriken und Distanzgraphen. in: Beitrge zur angewandten Graphentheorie, Teil 1, ZKI-Informationen, Akademie der Wissenschaften der DDR 2/82 (1982) 1{62. Kibler, D., Aha, D. W.: Comparing instance-averaging with instance- ltering algorithms. In: Proceedings of the Third European Wording Session on Learning, Glasgow, Scotland: Pitman. (1988) 63{68. Kohonen, T.: Self-Organization and Associative Memory. Springer-Verlag, Berlin Heidelberg New York (1989). Michie, D., Spiegelhalter, D., Taylor, C. (Eds.): Machine Learning, Neural and Statistical Classi cation. Ellis Hourwood, Hertfordshire, UK (1994). Salzberg, S. L.: A nearest hyper-rectangle learning method. Machine Learning 6 (1991) 251{276. Sobik, F.: Graphmetriken und Klassi kation strukturierter Objekte. in: Beitrage zur angewandten Graphentheorie, ZKI-Informationen, Akademie der Wissenschaften der DDR 2/82 (1982) 63 - 122. Wisotzki, C., Wysotzki, F.: Feature Generation and Classi cation of Time Series. in: Bock, H. H., Lenski, W., Richter M. M. (eds): Information systems and data analysis, Studies in Classi cation, Data Analysis, and Knowledge Organization, SpringerVerlag Heidelberg (1994a) 294 - 297. Wisotzki, C./Wysotzki, F.: Lernfahige Klassi kation von Zeitreihen. in: 39. IWK, Band 3, Technische Universitat Ilmenau (1994b) 137 - 147. Wisotzki, C./Wysotzki, F.: Prototype, Nearest Neighbor and Hybrid Algorithms for Time Series Classi cation. in: Lavrac, N./Wrobel, S. (eds) Machine Learning: ECML-95 (Proc. European Conf. on Machine Learning) Lecture Notes in Arti cial Intelligence 914, Springer-Verlag Berlin Heidelberg New York (1995) 364 - 367. Zelinka, F.: On a Certain Distance between Isomorphism Classes of Graphs. Casopis pest. mat. 100 (1975) 371 - 373.

83

This article was processed using the LATEX macro package with LLNCS style

84