Named Entity Recognition from Greek Texts - CiteSeerX

2 downloads 6176 Views 153KB Size Report
be easily adaptable (customisable) to new domains and users interests, as well as to multiple ... phrase, which serves as a name for something or someone. .... The corpus contains texts on personnel leaving or joining companies for the period.
Named Entity Recognition from Greek Texts: the GIE Project Vangelis Karkaletsis, Constantine D.Spyropoulos and George Petasis Software and Knowledge Engineering Laboratory Institute of Informatics and Telecommunications, N.C.S.R. «Demokritos», Tel: +301-6503196-7, Fax: +301-6532175 e-mail: {vangelis, costass, petasis}@iit.demokritos.gr 1. Introduction Today’s overload of information, particularly through the World Wide Web, makes difficult the user’s access to the right information. The situation becomes even more difficult due to the fact that a lot of this information is in different languages. Therefore, it is important to apply an information process that will extract from all that volume of information only the facts that match user’s interests, and allow the user to access facts written in a different language. Information Extraction (IE) technology can meet these requirements, since unlike what happens with information retrieval and filtering technology, in IE the user interests are on specific facts extracted from the documents and not on the documents themselves. Some documents may contain the requested keywords but be irrelevant to the user’s interests. Working with specific facts instead of documents provides users information more relevant to their domain of interest. The IE systems developed so far, extract, in most cases, fixed information from documents in a fixed language. However, in order for the IE technology to be truly applicable in real life applications, meeting the above requirements, IE systems need to be easily adaptable (customisable) to new domains and users interests, as well as to multiple languages. During the last decade, substantial progress has been made in developing reliable Information Extraction (IE) technology. IE technology is currently exploited in real applications, such as the extraction of information for companies acquisitions [1],[2],[3], stock exchanges [4], companies profits and losses [5], joint ventures and management succession events [6],[7],[8], as well as for the understanding of military messages [9] and police reports [10],[11],[12]. However, the existing IE technology concerns widely spoken languages and mainly English. So far, according to our knowledge, there is not any IE system for the Greek language, although there is activity in the area of language engineering in Greece. Our laboratory is currently participating in two IE projects, the EU-funded project ECRAN1, 1

ECRAN (Extraction of Content: Research at Near Market) is a Language Engineering project (LE-2110) funded partially by the European Commission (EC), which involves Thomson (FR), SIS (GE), Univ. of Sheffield (UK), NCSR “Demokritos” (GR), Univ. of Ancona (IT), Univ. of Tor Vergata (IT), and Univ. of Fribourg (SU).

and the bilateral (English-Greek) project GIE2 (Greek IE). In GIE we cooperate with the University of Sheffield aiming to adapt the Sheffield IE system into the Greek language. An IE task involves mainly two sub-tasks: the recognition of the named entities (e.g. persons, organisations, locations, dates) involved in an event and the recognition of the relationships holding between named entities in that event (e.g. personnel joining and leaving companies in management succession events). A named entity (NE) is a phrase, which serves as a name for something or someone. According to this definition, the phrase in question must by a noun phrase (NP). Clearly, not all NPs are named entities. An important feature of NPs is that they may contain or be contained in other NPs. In general NEs are short NPs, i.e., they contain a small number of words. Namedentity recognition (NERC) involves two tasks: recognition of NPs that are NEs, classification of NEs into different types, such as organizations and person names. In this chapter, we present the prototype named entity recogniser (NERC) we are currently developing for the Greek language, in the context of GIE. The GIE prototype is being developed over the language engineering platform GATE of the University of Sheffield. More specifically, in section 2 of the chapter, we discuss the significance of the named entity recognition task in IE, providing results from MUC Conferences, as well as providing information on existing NERC systems in English and in other languages. Section 3 presents the prototype NERC system we are currently developing in the context of GIE. Information is provided on the platform, the corpus, as well as on the modules developed or being developed so far. Some first evaluation results are presented and the major problems are also discussed. We conclude this chapter discussing the significance of a NERC system for the Greek language, and the need for customisation tools to facilitate the adaptation to new domains.

2. Named Entity Recognition Task in IE The progress in Information Extraction (IE) technology is due to the increase in available resources such as machine readable dictionaries and text corpora, in computational power and processing volume as well as due to the development of Language Technology techniques that can be applied in practice. This progress is proved from the results of Message Understanding Conferences – MUCs where several IE systems are evaluated (see MUC Website in http://www.muc.saic.com/). Named entity recognition is one of the evaluation tasks, which provides also the better results (see section 2.1), proving that this technology can be applied in practice. The identification of named entities in a corpus along with their classification as persons, organisations, etc., can be useful not only as the first stage of a complete IE system, but also for other tasks, such as indexing of documents, maintenance of data bases containing information for the identified persons, organisations, etc. That’s why our laboratory emphasises on the need for the development of a NERC system for the Greek language.

2

GIE (Greek Information Extraction) is a bilateral project between NCSR “Demokritos” (GR) and Univ. of Sheffield (GB), funded by the Greek General Secretariat of R&T and the British Council.

2.1 NERC in Message Understanding Conferences (MUC) The systems participating in MUCs should process texts, identify the texts that are relevant to the domain, and fill templates which contain slots for the events to be extracted and the entities involved. Information analysts design the template structure, that is the information that needs to be extracted by domain specific texts. The domain areas examined so far in MUCs are the following: MUCK MUCK-II MUC-3 MUC-4 MUC-5 MUC-6 MUC-7

Navy messages (1987) Navy messages (1989) News for terrorist attacks (1991) News for terrorist attacks (1992) Company news (joint ventures, micro-electronics products) (1993) Company news (management succession) (1995) Orders of airline companies (1998).

NERC was one of the tasks evaluated in MUC-6 and MUC-7 (the other tasks in MUC-6 were coreference resolution, template element filling and scenario template filling, whereas in MUC-7 the task of template relationship filling was also added). The main measures used for the evaluation of MUC tasks are recall and precision. The recall measure counts the number of words/phrases that are assigned the correct tag, out of the total number of words/phrases in the corpus. On the other hand, precision counts the number of words/phrases assigned the correct tag, out of the total number of words/phrases that are assigned a tag (either correct or wrong). NERC modules represent the most mature IE technology. The best score obtained in the NERC task in MUC-6 was 96% recall and 97% precision. The MUC-6 best overall raw scores are shown in Table 1 [13]:

Task Recall (%) Precision (%) Named Entity 96 97 Coreference (High recall) 63 63 Coreference (High Precision) 59 72 Template Element 74 87 Scenario Template 47 70 Table 1. Best Overall Scores in MUC-6 Tasks

2.2 NERC in the ECRAN Project ECRAN project is developing a new generation of techniques aiming at bringing IE near to the market, facilitating the customisation to new application domains, as well to the users requirements. The language engineering platform GATE [14] of the University of Sheffield is used in ECRAN for developing and integrating the modules produced (see Fig. 1).

Figure 1. The GATE language engineering platform During the two first years of ECRAN (the project ends at December 1998), a stable version of the English language system for IE has been established for the company news domain. A stable version of the French system has also been established to handle information about films/videos in French. An Italian IE application has been developed, which performs Named Entity recognition in the domain of financial news. Within the GATE framework, modules have been either ported to new languages (e.g. named entity recognition in French and Italian), imported from already existing resources or redesigned. During the last year of ECRAN, we examine techniques for the rapid customisation of the English IE system into new domains. One of the customisation tasks currently examined, is NERC. NERC usually exploits a grammar of named entity rules. These rules specify when a sequence of words is named entity and also the type of this entity. Usually the NERC grammar is built manually, by experts in a particular domain. Manual grammar construction is clearly problematic when we want to port the NERC system to a new domain. For this reason, learning from examples can be useful. The aim of our research for this customisation task in ECRAN, is to have a NERC system that is adaptive to new domains exploiting machine learning techniques.

3. NERC in GIE The named entity recogniser in GIE is based on the relevant English recogniser developed at the University of Sheffield over the GATE language engineering platform. It involves the following modules: tokeniser, sentence splitter, part of speech tagger, gazetteer look up, named-entity parse, name matcher. The following sub-sections present the work done so far in GIE for the development of the Greek NERC system, starting from the platform, the corpus, and the modules

developed or customised so far. Fig. 2 shows the modules of GIE over the platform of GATE.

Figure 2. Modules of GIE Named Entity Recogniser

3.1 The GATE Platform The language engineering platform GATE (v1.5.0) was installed according to the guidelines of Sheffield team. The platform was provided by Sheffield together with the English IE system VIE. The installation was done on two operating systems, first in Linux and then in SunOS and Solaris. The NERC prototype system is implemented in Solaris (v2.5.1). 3.2 The Corpus We used corpora from three different sources in order to train and then evaluate our system. − A text corpus on “management succession events” was provided by the company “Advertising Week” (DIAFHMISTIKH EBDOMADA) (http://www.addweek.gr). The corpus contains texts on personnel leaving or joining companies for the period from 1/96 until 6/98. The corpus size is about 50,000 words. A part of this corpus (about 15.000 words) was hand-tagged in order to train our part of speech tagger. The rest part is used for testing our system. − The second text corpus on “stock market news” was provided by the company “Kapa-TEL” (http://www.kapatel.gr). The corpus contains news for the period from 1/97 until 4/98. The corpus size is about 85.000 words and it has not been used yet in our experiments. − The third text corpus, is a general theme hand-tagged corpus, which was provided by the WCL3 of Patras University for training our part-of-speech tagger. The size of that corpus is about 125.000 words.

3

Wired Communications Laboratory, Dept of Electrical and Computer Engineering, University of Patras, Greece

3.3 Tokeniser and Sentence Splitter This module was implemented in Sicstus Prolog. The tokeniser accepts raw text as input and produces a list of tokens and their boundaries (byte offsets). An identifier is assigned to each token. The sentence splitter uses the tokens produced to generate a list of sentences. Each sentence is described by its span and the list of its constituents (identifiers of tokens). An identifier is also assigned to each sentence. The tokeniser uses a set of rules in order to identify the tokens. Examples of these rules are presented below: - a character string represents a token when one of the following characters occurs: , ‘, “, (, ), [, ], «, », , . - a character string represents a token if after «.», either «’» or «”» occurs. The sentence splitter uses also a set of rules. Examples of such rules are the following: - the characters «!», «;», «?» mark always the end of a sentence. - the occurrence of «.» or «:» marks the end of a sentence in certain cases.

3.4 Part-of-Speech Tagger: the Brill tagger The Brill tagger is a rule-based part-of-speech tagger. It works by first assigning each word its most likely tag, and then changing word taggings based on contextual cues. There are two stages in training (see Brill tagger README file in http://www.cs.jhu.edu/~brill/): − Rules are learned to predict the most likely tag for unknown words. (Example: if a word ends in "ed", it is probably a past tense verb). These rules operate on word types. If the outcome of applying these rules is that a word should be tagged with a particular tag, this holds for all occurrences of the word in the corpus. − Rules are learned to use contextual cues to improve tagging accuracy. (Example: change the tag of a word from verb to noun if the previous word is tagged as a determiner). These rules operate on individual word tokens. A new set of part-of-speech tags for the Greek language was specified, which contains 61 tags (the initial tagset for the English language contains 48 tags). We had to define new tags in order to take into account issues such as the gender for nouns and adjectives, number for adjectives and verbs, etc. We had to decide whether we would use more tags in order to represent more features for the Greek words (e.g. cases for nouns, adjectives and verbs, mood for verbs, etc.). We finally decided to use a rather limited tagset for the Greek language (although larger than the English ones) for efficiency reasons. Our intention is to combine the results of the Brill tagger with a Greek morphological analyser, in a similar way to the one used for the Italian NERC system in ECRAN project, where the Brill tagger was used after the morphological analyser in order to solve any ambiguities produced. We have also to note that our decision to train the Brill tagger for the Greek language was due to the fact that we couldn’t find and use a Greek lexicon. A rich lexicon would give better results without the need for hand tagging.

The Brill tagger was trained into a part of the corpus on management succession events (about 15,000 words) that was hand-tagged with the specified Greek tags, and on the larger hand-tagged corpus provided by WCL (125.000 words). For that second training we had to create a “translator” in order to convert the tags used in the WCL corpus to our tagset. More specifically, the whole tagged corpus (15.000 + 125.000 = 140.000 words) was split in two corpora of equal size. In the first stage of training the lexicon and the lexical rules were learned from the first corpus. The contextual cues were learned during the second stage. Some first evaluation was performed on a part of the corpus on “management succession events”. More specifically, the corpora for May 1998 and June 1998 (1/6 – 12/6/98) were processed by the trained Brill tagger. The number of words tagged correctly by the tagger over the whole number of words in the corpus was computed (i.e. the recall of the tagger). The results for each of the two corpora and in total are presented in Table 2. Corpus May 98 June 98 Total

Size (words) 1211 544 1755

Words Wrongly tagged 54 29 83

Recall (%) 95,5 94,7 95,3

Table 2. Evaluation of Greek Brill tagger

The result is not very satisfactory, since the tagger was trained in a rather large corpus. We still have to test whether we would have similar results if we used as training corpus only the 15.000 corpus on “management succession” (i.e. same domain with the test corpus). In other words, we have to examine the behavior of the Greek Brill tagger according to the domain (training and testing). In any case, these are just our first results. We still have to evaluate the Brill tagger in a larger corpus in order to have more reliable results. Another test that should also be performed is related to the training with the large general-theme WCL corpus. According to the training strategy followed, the corpus was split into two, without taking into account the structure of the specific corpus. The corpus covers many domains and the documents are grouped according to the domain. A better splitting strategy would be to split into two each domain-specific group of documents, in order for the Brill tagger to take into account all the domains covered. 3.5 Gazetteer Lookup This module attempts to identify phrases and keywords related to named entities, as defined for the management succession task (persons, organisations, locations, dates). This is done by searching a series of pre-stored lists (gazetteers) of organisations, locations, date forms, currency names, etc. In order to use that module for the Greek language, we have to create Greek gazetteers. The gazetteers used so far contain some of the proper nouns identified by the Brill tagger which were then classified by hand in the different types of named entities (mainly persons and organisations). More specifically, the current status of gazetteer list is the following (in number of entries): Persons: 842, Organisations: 475, Locations:

154, Titles: 107, Dates: 34, Company Designators: 19. Table 3 contains indicative samples of these lists. Persons Φραγκίσκος Τράγκας Τονιά Σόφη Σόνια Σωτήρης Σίσσυ Σίμος Σίλια Σέργιος Λία Λένα Ιφιγένεια Ισίδωρος Ι. Ελισάβετ Ελεονόρα Ελεάνας Ελεάνα

Organizations Σκάι 100,4 FM ΣΚΑΪ 100,4 FM ΣΚΑΙ Ποπ-Κορν Ποπ&Ροκ Ποπ Κορν Μελωδία FM 100 Μελωδία ΜΕΛΩΔΙΑ FM 100 Ι.Γ. Δραγούνης & Υιοί AE Ι.Γ. Δραγούνης & Υιοί Ι.Γ. Δραγούνης Ι. Γ. Δραγούνης & Υιοί ΑΕ Ι. Γ. Δραγούνης Flash 96.1 Flash 96,1 FM Flash 96,1 Flash 9,61 FM Flash 9,61

Locations Ουκρανίας Ουγγαρίας Ουγγαρία Ολλανδίας Ολλανδία Νοτίου Ελλάδος

Titles Σύμβουλος Media Σύμβουλος Marketing Πρόεδρος Διοικητικού Συμβουλίου Διεύθυνση Στρατηγικού Σχεδιασμού Διεύθυνση Επικοινωνίας Διεύθυνση Διαφήμισης Διεύθυνση Marketing Senior Product Manager Senior Media Planner

Table 3. Samples of Gazetteers

The development of such lists for the domain of management succession presented a lot of problems. The names of persons and organisation in the relevant corpus are actually bi-lingual, that is there are several English names especially in the case of organisations (see Table 3). This means that we actually have to maintain two sets of gazetteers: one for Greek and one for English names. Unfortunately for our tests, this is not the only problem. We found out that several of the names are bilingual in their own, that is they are composed from English and Greek characters. Greek characters are used in English names and vice versa. Thus, although a name exists in a gazetteer, it is possible (and this occurs frequently) that it won’t be found and subsequently tagged. Another problem concerns the different writings of the same name, such as the name of the radio station “Flash” in Table 3, which in our training corpus occurs with 5 different forms. There is also the problem of Greek proper nouns declension (“Ελεάνα” in nominative and “Ελεάνας” in accusative) and with accented characters, which are not used all the times. We plan to develop a module that will be responsible for identifying these cases and resolve them. We also hope that we will be able to get soon gazetteers for Greek locations, names and organisations by the WCL laboratory. These gazetteers are the result from the EUfunded project ONOMASTICA, where WCL was participating. The integration of those gazetteers will improve significantly the results of the NERC system. 3.6 Named-Entity Chart Parser The parser is a modification of the Gazdar and Mellish bottom-up chart parser [15]. It applies a Named Entity grammar to construct proper noun phrases. In the named entity grammar of English IE system, there are 189 rules for organisations, persons, locations, temporal and number expressions. The following information is taken into account by

the grammar: part-of-speech tags of the words in the NE and close to it, gazetteer tags for the words in the NE and close to it, punctuation. We had to create new rules for the Greek language, excluding most of the English rules (see in Fig. 3 the resulting parse tree for the named entities appearing in a sentence from the Greek corpus). This was due to the nature of the Greek language and of the specific corpus. For instance, several of the English rules for organisations are based on the existence of a company designator (i.e. Ltd., Co.), which is not used in the Greek corpus. There was only one case in the test corpus where such a designator was found (A.E. which means S.A., but unfortunately, although this was in the gazetteer list, it was written with English characters in the corpus and thus it couldn’t be matched against the list). Another example is the case of person names, where there are English rules based on person title (i.e. Mr., Mrs). However no such titles occur in the Greek corpus. It seems that in the case of Greek corpus, the named entity rules should be mainly based on the existing gazetteer tags. A rich gazetteer is thus required in order to improve the named entity parser results. However, it is difficult to create and maintain such rich gazetteers. New empirical rules should be included in order to identify unclassified proper nouns as named entities and classify them as persons, organisations, etc. To achieve this we have to take into account the context of the proper nouns. For this purpose we performed some first tests, the results of which are presented below, introducing some new contextual rules. In most of the cases, as it is also shown from these first evaluation results, such rules can prove effective. However, we have to note that these rules represent actually the writing style of the specific technical writer of the corpus. A new technical writer may introduce a new style, canceling in practice those rules or even introducing erroneous results. We need to make a lot of tests implementing different scenaria (actually this is the 2nd year’s task of GIE) with different sets of rules.

Figure 3. The parse tree for the named entities

We present below the results of two evaluations for two types of named entities (persons, organisations). These evaluations were performed in parts of the corpus on management succession events, which are different from the parts used for training (i.e. evaluation on unseen data). In the first evaluation (Table 4(a,b,c)), all the named-entity rules are dependent on the gazetteer tags, whereas in the second evaluation (Table 5) a few contextual rules for persons were added in two stages (Rule 1, Rule 2). In both evaluations, we compute the recall and precision figures. Corpus Jan. 98 May 98 June 98 Total Corpus Total Corpus Total

Corpus Rule 1 Rule 2

Named Entities (words) 187 144 79 410

Named Entities Tagged (words) 117 126 68 311

Precision (%)

Persons Tagged (words) 118

Precision (%) 92

Organisations Organ. Correctly Recall Organ. Tagged (words) Tagged (words) (%) (words) 254 192 76 193 Table 4(a),(b),(c) First Evaluation results of NERC

Precision (%) 99

Persons (words) 156 156

Precision (%) 93 94

Persons (words) 156

Named Entities Correctly Tagged (words) 112 121 68 301

Recall (%)

Persons Correctly Tagged (words) 109

Recall (%) 70

60 84 86 73

Persons Correctly Recall Persons Tagged Tagged (words) (%) (words) 120 77 129 136 87 145 Table 5. Second evaluation results of NERC

96 96 100 97

As we can see in Table 4(a), there are differences in recall figures for named entities in the three corpora examined. Especially, concerning January 98 corpus the low recall is due to the large number of names for persons and organisations that are not covered by the gazetteer list. On the other hand, the precision figures are high, since the parser is based mainly on the gazetteer tags and the tags for proper nouns, reducing the number of wrongly tagged noun phrases as named entities. Concerning the second evaluation for persons (Table 5), there is a significant increase with the addition of two sets of rules in two stages. The first set of rules is not corpus dependent, but rather language dependent. However, the larger increase (10%) occurs in the second stage where the added set of rules is corpus dependent. The second set of rules represents a writing style that seems common for this type of company news (management succession events). However, it is possible that a new technical writer would use a different style reducing the positive effect of these rules in practice. In any case, these figures are just our first results. We believe that by the end of GIE we will be able to present more reliable

results from evaluations in larger corpora and using more extensive resources (i.e. lexicon, gazetteer lists).

4. Concluding Remarks The GIE project represents the first effort for the development of an IE system (actually a named-entity recogniser) for the Greek language. We strongly believe that IE technology is important for the Greek market too. This also results from our discussions with people working in the Greek industry. There is a need for reliable Greek IE technology in several domains, such as the stock market and the company news. This small project is actually the first step towards the development of such a technology for the Greek language. Several actions should still be taken concerning the Greek linguistic resources. A rich lexicon is essential part of an IE task, in order to support not only the morphological analysis task but also the parsing and discourse processing tasks in a complete IE system. Customisation to new domains is another issue. Actually, it is the major issue in IE technology. Research effort is mainly devoted to making IE more adapted to new domains without the huge human labour overhead of constructing new templates by hand for each change of domain. The experience of our laboratory in the development of customisation tools for the needs of the ECRAN project can be useful also for the Greek language [16]. For this purpose we submitted together with the French company CDC-Informatique a proposal in the context of the French-Greek bilateral scientific cooperation programme for a project aiming at the development of customisation tools for IE for the French and the Greek language. The funding of such a project will allow us to better exploit the results of the GIE project towards the development of a more complete IE system and a set of customisation tools for the Greek language.

References 1.

2. 3. 4.

5.

6.

Cowie J., Wakao T., Jin W., Pustejovsky J. and Waterman S.. The diderot information extraction system. In Proceedings of the First Conference of the Pacific Association for Computational Linguistics (PACLING 93), Vancouver, Canada, 1993. Jacobs P.S. and Rau L.F.. Scisor: Extracting information from on-line news. Communications of the ACM, 33(11):88-97, 1990. Wilks Y. Diderot: a text extraction system. In DARPA Speech and Natural Language Workshop. Morgan Kaufmann, San Mateo, CA, 1991. Vichot F., Wolinski F., Tomeh J., Guennou S., Dillet B., Aydjian S., High Precision Hypertext Navigation Based on NLP Automatic Extractions, Hypertext, Information Retrieval, Multimedia (HIM'97), Dortmund, Germany, (30):161-174, October, 1997. Andersen P.M., Hayes P.J., Huettner A.K., Nirenburg I.B., Schmandt L.M.and Weinstein S.P. Automatic extraction of facts from press releases to generate news stories. In Proceedings of the Third Conference on Applied Natural Language Processing, pages 170-177. ACL, 1992. ECRAN: Extraction of Content: Research at Near Market, http://www2.echo.lu/langeneg/en/le1/ecran/ecran.html

7. 8. 9. 10. 11. 12.

13. 14. 15. 16.

MUC5, 1993. Proceedings of the Fifth Message Understanding Conference, San Francisco, Calif.: Morgan Kaufmann. MUC6, 1995. Proceedings of the Sixth Message Understanding Conference, San Francisco, Calif.: Morgan Kaufmann. DARPA Speech and Natural Language Workshop, Harriman, NY, 1992. AVENTINUS: Advanced Information System for Multinational Drug Enforcement. http://www2.echo.lu/langeneg/en/le1/aventinus/aventinus.html Evans R.and Hartley A.F.. The traffic information collator. Expert Systems: The International Journal of Knowledge Engineering, 7(4):209-214, 1990. Gaizauskas R., Evans R., Cahill L.J., Richardson J., and Walker J.. Poetic: A system for gathering and disseminating traffic information. In S.G.Ritchie and G.T.Hendrickson, editors, Conference Preprints of the International Conference on Artificial Intelligence Applications in Transportation Engineering, pages 79-98, San Buenaventura, California, 1992. Gaizauskas, R., Wilks,Y. «Information Extraction beyond Document Retrieval», University of Sheffield, Dept. of Computer Science, CS-97-10, 1997. Cunningham, H., Wilks, Y., Gaizauskas, R., GATE - a General Architecture for Text Engineering, 16th Conference on Computational Linguistics (COLING'96), 274-279, 1996. Gazdar G. and Mellish C, 1989. Natural Language Processing in Prolog. AddisonWesley, 1989. Paliouras G., Karkaletsis V. and Spyropoulos C.D., "Machine Learning for Domain-Adaptive Word Sense Disambiguation". Proceedings of the LREC Workshop on "Adapting Lexical and Corpus Resources to Sublanguages and Applications", Granada, Spain, May 26, 1998.