Lexical semantic approaches to terminology

91 downloads 15501 Views 116KB Size Report
The importance of lexical semantics is increasing in terminology work. .... they call terminological dependency whereby a language (in this case Spanish) im-.
John Benjamins Publishing Company

This is a contribution from Terminology 20:2 © 2014. John Benjamins Publishing Company This electronic file may not be altered in any way. The author(s) of this article is/are permitted to use this PDF file to generate printed copies to be used by way of offprints, for their personal use only. Permission is granted by the publishers to post this file on a closed server which is accessible only to members (students and faculty) of the author’s/s’ institute. It is not permitted to post this PDF on the internet, or to share it on sites such as Mendeley, ResearchGate, Academia.edu. Please see our rights policy on https://benjamins.com/#authors/rightspolicy For any other use of this material prior written permission should be obtained from the publishers or through the Copyright Clearance Center (for USA: www.copyright.com). Please contact [email protected] or consult our website: www.benjamins.com

Lexical semantic approaches to terminology An introduction Pamela Faber and Marie-Claude L’Homme

The importance of lexical semantics is increasing in terminology work. This is in consonance with the fact that meaning is now in the spotlight. Various applications (such as Natural Language Processing and information retrieval along with more traditional applications of terminology, i.e. specialized dictionary compilation) require that the meaning of terms be represented in a way that accounts for their behavior in text. In the initial years of Terminology, meaning (viewed as an intrinsic property of terms) was not given its due importance. Although the concept was the starting point of terminological analysis (and still is in specific applications, such as ontology development), it was defined with reference to extra-linguistic reality. In fact, terms were not even regarded as linguistic units but merely as labels for concepts. In other words, knowledge structures were built (often as a result of a consensus on how knowledge should be represented), and linguistic labels were subsequently superimposed on them. Hence, little attention was paid to the semantics of specialized knowledge units. Definition formulation and analysis, if considered at all, was relegated to the background and regarded as a task for experts in the specialized field. However, methods in terminology have changed considerably since the beginning of the 1990s and this has led to the proposal of new approaches. Descriptive terminology approaches as well as the advent of corpus linguistics and corpus pattern analysis have all expanded horizons and opened the door to semantic analysis in Terminology (as reflected in the contributions in this volume). These methods require that the linguistic facets of terms be taken into account. They also raise many questions that previous to the 1990s, Terminology was not in a position to answer, such as those related to (i) corpus pattern analysis; (ii) polysemy; and (iii) multidimensionality (among others). As is well known, corpus pattern analysis can be applied to investigate syntagmatic criteria for distinguishing different meanings of a polysemous term. This is particularly helpful for specifying the meaning of terms from a multidimensional Terminology 20:2 (2014), 143–150.  doi 10.1075/term.20.2.01int issn 0929–9971 / e-issn 1569–9994 © John Benjamins Publishing Company

144 Pamela Faber and Marie-Claude L’Homme

perspective. In fact, semantic analysis has become a necessity since part of the knowledge conveyed in specialized knowledge units is reflected in the way that they are distributed in the text as well as in the knowledge patterns that they participate in. In this regard, one of the premises underlying the cognitive dimension of terminology is that language structure and lexical meaning are a manifestation of conceptual structure (Talmy 2000). Accordingly, both general and specialized lexical items can be regarded as conceptual categories of distinct yet related meanings that exhibit typicality effects. In this regard, ontology building and conceptual modeling can benefit from the semantic analysis of linguistic concepts, based on sound theoretical principles. When terms are activated in texts, they set in motion a wide variety of underlying conceptual relations and knowledge structures. Indeed, contexts are triggering mechanisms that foreground certain relations over others (Faber and San Martín 2011; Faber 2012). For example, lexical semantic relations such as hyponymy and meronymy reflect the conceptual relations of is-a and part-of, respectively. In Terminology, semantic relations are generally studied by analyzing knowledge patterns (KPs). As conceived by Meyer (2001), a KP refers to the lexico-syntactic patterns between the terms encoded in a proposition in texts. As reflected in recent research as well as the contributions in this volume, methods for their extraction and analysis are now a fertile area of inquiry in Terminology (Condamines 2002; Marshman et al. 2002; Barrière 2004; Barrière and Abago 2006; Cimiano and Staab 2005). Nevertheless, despite the current popularity of knowledge patterns, Bowker (2004) states, there are still major problems with regards to noise and silence, pattern variation, anaphora, domain and language dependency, etc. It is also true that not all relations have been analyzed to the same degree. In this sense, non-hierarchical semantic relations of agency, cause, result, location, etc. need to be analyzed in greater depth since they have not as yet been systematically implemented in research (Aussenac-Gilles 2000, 181). Up until now knowledge patterns conveying hyponymic relations have been the most commonly studied since they play an important role in categorization and property inheritance (Barrière 2004, 244). Nevertheless, KPs reflecting non-hierarchical semantic relations are also crucial to and enrich the conceptual representation of specialized knowledge concepts. Another neglected area of study in Terminology involves the specialized knowledge units that are designated by parts of speech other than nouns, such as adjectives or verbs. Both in the comprehension and structure of specialized discourse across languages, verbs play an important role (Condamines 1993; L’Homme 2002, 2003; Lerat 2002). This is due to the fact that a considerable part of our knowledge is composed of events and states, many of which are linguistically represented by verbs (this is especially true in fields such as the environment

© 2014. John Benjamins Publishing Company All rights reserved



Lexical semantic approaches to terminology 145

where events are ubiquitous or technical fields where activities represent an important part of the knowledge expressed). In this sense, it can be said that verbs set the scene for specialized concepts, which appear in the form of terms that fill the argument slots of semantic predicates. In fact, the semantic features of these specialized knowledge units interact with and constrain the meaning of the general language verb to reduce polysemy and restrict it to one sense. This signifies that terminologists must now consider how to represent predicative structures. This also entails a major shift in representation models for terms since predicative terms do not interact with other lexical units the way nouns denoting entities do. In order to account for these phenomena, terminologists now refer to alternative frameworks. Since the beginning of the 1990s, terminologists have started to apply lexical semantics to deal with specific facets of terminological units. In 1993, Condamines proposed a methodology to represent the syntactic and semantic properties of verbs in a corpus of banking. Lerat (2002) referred to classes of objects (“classes d’objets”) as defined by Gross (1994) to represent the argument structure of verbs. Computational terminologists saw the potential of distributional semantics (Harris 1968) to capture semantic regularities in specialized corpora automatically. Finally, Meyer (2001) referred to Cruse (1986) to define and later identify knowledge patterns in specialized corpora. Currently, there are a number of meaning-based linguistic frameworks that can be and have been usefully applied to Terminology. These include (but are not limited to) the following: – – – – –

Cognitive Semantics (Talmy 2000; Taylor 1995) Frame Semantics (Fillmore 1977) Generative Lexicon (Pustejovsky 1991) Lexical Grammar Model (Faber and Mairal 1999) Meaning-text Theory, or more specifically Explanatory Combinatorial Lexicology (Mel’čuk et al. 1995).

The contributions in this volume show that these frameworks when adapted to terminology can provide useful insights into the linguistic behavior of terms: to better characterize terms in the field of law (K. Perruzo), to understand the emergence of neologisms (M. Sánchez Ibáñez and J. García Palacios), to discover semantic relations in specialized corpora (A.-K. Schumann; E. Lefever, M. Van de Kauter and V. Hoste), to analyze monosemy, polysemy and vagueness (A. Bertels and D. Speelman), and finally, to study the implementation of semantic relations in terminological resources (E. Marshman). The authors in this volume have applied these frameworks to process corpora in fields such as finance, law, mechanical engineering, and medicine. They have analyzed terminological data in languages such as Dutch, French, German, Russian, Spanish.

© 2014. John Benjamins Publishing Company All rights reserved

146 Pamela Faber and Marie-Claude L’Homme

As such, lexical semantics intersects with lexicography, phraseology, corpus linguistics, pragmatics, and knowledge representation, all of which are of vital importance for Terminology.

Contributions in this volume The articles in this special volume present innovative research work on lexical semantic approaches to Terminology and Specialized Languages. Katia Peruzzo uses a lexical semantics framework in order to describe terminology in the field of law. More specifically, she devised an event template based on Frame-based terminology, FBT (Faber et al. 2006) in order to assist terminologists when extracting and analyzing terms related to victims of crime and their rights (e.g., victim, pain and suffering). The template — designed with the assistance of an expert — represents a prototypical event where a crime victim (the Patient) is subjected to an action performed by an offender (an Agent), which is classified as criminal conduct according to the relevant legal system and has consequences (e.g. harm, suffering, damage to property) for the victim. Rights and remedies are also present in the event template. The template then guides terminologists when exploring the corpus, since they will look for terms that are relevant with regard to the template and collect information on these terms. Miguel Sánchez Ibáñez and Joaquín García Palacios explore a phenomenon they call terminological dependency whereby a language (in this case Spanish) imports denominations from another one (English) in specialized fields. The authors propose a semantic characterization in order to identify traces of this dependency in formal neologisms extracted from a corpus of texts on Alzheimer’s disease. The authors first divide neologisms according to the concepts they designate (based on Sager’s (1990) and Kageura’s (2002) classifications). The semantic features of neologisms are then analyzed (this part of the study is based on Pustejovsky’s (1991) Generative Lexicon model (the authors analyzed the argument, event, qualia structures associated to neologisms as well as inheritance). This analysis allows the authors to establish links between some semantic properties of terms associated with specific conceptual classes and cases of terminological dependency. In her article, Anne-Kathrin Schumann explores the relationship between knowledge and meaning, and more specifically how knowledge is expressed in running text. Her analysis contributes to methods for identifying contexts that are useful for terminology work. “Knowledge-rich contexts” (KRCs), an expression coined by Meyer (2001), correspond to those statements in which a semantic relation between terms is expressed or a definitional information is provided on a specific term. The objective sought by the author is to characterize the linguistic

© 2014. John Benjamins Publishing Company All rights reserved



Lexical semantic approaches to terminology 147

properties of KRCs in German and Russian corpora, assuming that a cross-linguistic analysis will provide evidence to generalize at least part of these properties. Properties such as parts of speech typically found in KRCs, typical and non-typical lexical units, syntactic features of verbs found in patterns, are investigated both qualitatively and quantitatively. This work can help refine existing typologies of patterns used to locate knowledge-rich contexts and perhaps facilitate their manual or automatic extraction. Elizabeth Marshman is also interested in relations between terms, but from the point of view of their integration in terminological resources and users’ reactions to the addition of information on terminological relations. The author developed a prototype resource called CREATerminal that presents information on relations in the form of knowledge-rich contexts, KRCs (Meyer 2001) in English and in French. The contexts were extracted from corpora dealing with breast cancer; contexts expressing relations of cause-effect, generic-specific, part-whole and entity-function were collected. The contexts, the terms that share one of these relations along with the pattern linking them are stored in a database that users can explore. The terminal was then submitted to an evaluation by students in translation. Results indicate that students react positively to this kind of addition to terminological resources. Els Lefever, Marjan Van de Kauter, and Véronique Hoste explore semantic relations (more specifically hypernymy) and tested different methods to identify instances of the relations in Dutch and English specialized corpora automatically. The method — called HypoTerm — consists in extracting terms and named entities, then identifying hypernymic relations for these linguistic entities. The detection of relations can be performed using three techniques: a pattern-based approach (similar to the well-known technique developed by Hearst in 1992), a morpho-syntactic analyzer and a distributional model. These relation identification techniques were evaluated against a corpus annotated manually and results show that they yield satisfactory results (with some variation between Dutch in English and according the approach tested). When fully automated, the HypoTerm method could help enrich term terminological resources, such as term banks and increase their coverage. Ann Bertels and Dirk Speelman apply distributional semantics (based on the hypothesis that the words with similar meanings will have the same distribution (Harris (1968)) to investigate semantic similarity (monosemy, polysemy and vagueness) in specialized corpora. They developed a technique that exploits cooccurrence patterns with statistical measures and represents distance from and proximities to a node using Multidimensional Scaling (MDS). In their article, the authors apply the technique to a French corpus of machining. They present and discuss results that were obtained when applying these techniques to corpora of

© 2014. John Benjamins Publishing Company All rights reserved

148 Pamela Faber and Marie-Claude L’Homme

different sizes and to word forms vs. lemmas. Results indicate that, even within a specific field of knowledge, subcorpora can display differences. They also indicate that lemmas lead to more coherent semantic interpretation as far as the MDS representation is concerned. The research carried out by the authors in this volume opens up promising avenues of inquiry. Their innovative proposals address complex problems in terminology, and provide the groundwork for further studies linking lexical semantics and terminology. To conclude, we would like to thank the members of the scientific committee who spent valuable time reviewing the papers that were submitted to this special issue. – Guadalupe Aguado, Universidad Politécnica de Madrid, Spain – Pierrette Bouillon, École de traduction et d’interprétation, Université de Genève, Switzerland – Beatrice Daille, LINA, Université de Nantes, France – Kyoko Kanzaki, Toyohashi University of Technology, Japan – François Lareau, Observatoire de linguistique Sens-Texte (OSLT), Montréal, Canada – Pilar León-Arauz, Universidad de Granada, Spain – Patrick Leroyer, Centlex, Aarhus University, Denmark – Ricardo Mairal, UNED, Madrid, Spain – François Maniez, CRTT, Université de Lyon, France – Elizabeth Marshman, University of Ottawa, Canada – Silvia Montero, University of Granada, Spain – Janine Pimentel, Departamento de Letras, Universidade Pontifícia Católica do Rio de Janeiro (PUC-Rio), Brazil – Alain Polguère, Université de Lorraine & ATILF CNRS – Margaret Rogers, University of Surrey, UK – Juan Sager, Manchester, UK – Zuoyan Song, Beijing Normal University, China – Carlos Subirats, Universidad de Autónoma de Barcelona, Spain – Rita Temmerman, Applied Linguistics Department of Vrije Universiteit Brussel, Belgium – Maribel Tercedor, University of Granada, Spain

© 2014. John Benjamins Publishing Company All rights reserved



Lexical semantic approaches to terminology 149

References Aussenac-Gilles, N., and P. Séguela. 2000. “Les relations sémantiques: du linguistique au formel.” Cahiers de Grammaire 25, Sémantique et Corpus, 175–198. Barrière, C., and A. Agbago. 2006. “TerminoWeb: A Software Environment for Term Study in Rich Contexts.” In Proceedings of the International Conference on Terminology, Standardisation and Technology Transfer (TSTT 2006, Beijing), National Research Council of Canada. Barrière, C. 2004. “Building a Concept Hierarchy from Corpus Analysis.” Terminology 10 (2): 241–263. DOI: 10.1075/term.10.2.05bar Bowker, L. 2004. “Lexical Knowledge Patterns, Semantic Relations, and Language Varieties: Exploring the Possibilities for Refining Information Retrieval in an International Context.” Cataloging and Classification Quarterly 37 (1): 153–171 DOI: 10.1300/J104v37n01_11 Cimiano, P., and S. Staab. 2005. “Learning Concept Hierarchies from Text with a Guided Agglomerative Clustering Algorithm.” In Workshop on Learning and Extending Lexical Ontologies with Machine Learning Methods, Bonn. Condamines, A. 1993. “Un exemple d’utilisation de connaissances de sémantique lexicale: acquisition semi-automatique d’un vocabulaire de spécialité.” Cahiers de lexicologie 62: 25–65. Condamines, A. 2002. “Corpus Analysis and Conceptual Relation Patterns.” Terminology 8 (1): 141–162. DOI: 10.1075/term.8.1.07con Cruse, D.A. 1986. Lexical Semantics. Cambridge: Cambridge University Press. Faber, P. 2012. A Cognitive Linguistics View of Terminology and Specialized Language. Berlin/ New York: Mouton de Gruyter. DOI: 10.1515/9783110277203 Faber, P., and R. Mairal. 1999. Constructing a Lexicon for English Verbs. Berlin: Mouton de Gruyter. DOI: 10.1515/9783110800623 Faber, P., S. Montero Matínez, M.R. Castro Prieto, J. Senso Ruiz, J.A. Prieto Velasco, P. León Araúz, C. Márquez Linares, and M. Vega Expósito. 2006. “Process-Oriented Terminology Management in the Domain of Coastal Engineering.” Terminology 12 (2): 189–213. DOI: 10.1075/term.12.2.03fab Faber, P., and A. San Martín. 2011. “Linking Specialized Knowledge and General Knowledge in EcoLexicon.” In TOTh 2011. Actes de la Cinquième Conférence TOTh, 47–61. Annecy: Institut Porphyre. Fillmore, C.J. 1977. “Scenes-and-Frames Semantics.” In Linguistics Structures Processing, ed. by A. Zampolli, 55–81. Amsterdam/New York: North Holland. Gross, G. 1994. “Classes d’objets et descriptions des verbes.” Langages 115: 15–30. DOI: 10.3406/lgge.1994.1684 Harris, Z. 1968. Mathematical Structures of Language. New York: Wiley. Hearst, M. 1992. “Automatic Acquisition of Hyponyms from Large Text Corpora.” In Proceedings of the International Conference on Computational Linguistics, 539–545. Nantes, France. Kageura, K. 2002. The Dynamics of Terminology: A Descriptive Theory of Term Formation and Terminological Growth. Amsterdam: John Benjamins. DOI: 10.1075/tlrp.5 Lerat, P. 2002. “Qu’est-ce que le verbe spécialisé? Le cas du droit.” Cahiers de Lexicologie 80: 201–211. L’Homme, M.C. 2002. “What Can Verbs and Adjectives Tell Us about Terms?” In TKE 2002 Terminology and Knowledge Engineering. Proceedings. 6th International Conference, Nancy, France, ed. by INRIA, 65–70. Le Chesnay Cedex: INRIA.

© 2014. John Benjamins Publishing Company All rights reserved

150 Pamela Faber and Marie-Claude L’Homme L’Homme, M.C. 2003. “Capturing the Lexical Structure in Special Subject Fields with Verbs and Verbal Derivatives. A Model for Specialized Lexicography.” International Journal of Lexicography 16 (4): 403–422. DOI: 10.1093/ijl/16.4.403 Marshman, E., T. Morgan, and I. Meyer. 2002. “French Patterns for Expressing Concept Relations.” Terminology 8 (1): 1–29. DOI: 10.1075/term.8.1.02mar Mel’čuk, I., A. Clas, and A. Polguère. 1995. Introduction à la lexicologie explicative et combinatoire. Louvain-la-Neuve: Duculot/Aupelf-UREF. Meyer, I. 2001. “Extracting Knowledge-rich Contexts for Terminography: A Conceptual and Methodological Framework”. In Recent Advances in Computational Terminology (Natural Language Processing 2), ed. by D. Bourigault, C. Jacquemin, and M.C. L’Homme, 279–302. Amsterdam: John Benjamins. DOI: 10.1075/nlp.2.15mey Pustejovsky, J. 1991. “The Generative Lexicon.” Computational Linguistics 17 (4): 409–441. Sager, J.C. 1990. A Practical Course in Terminology Processing. Amsterdam: John Benjamins. DOI: 10.1075/z.44 Talmy, L. 2000. Toward a Cognitive Semantics, Vols. I and II. Cambridge, MA: MIT Press. Taylor, J.R. 1995. Linguistic Categorization: Prototypes in Linguistic Theory. Oxford: Oxford University Press.

© 2014. John Benjamins Publishing Company All rights reserved