Paper Title (use style: paper title)

0 downloads 0 Views 504KB Size Report
of automatic lexico-syntactic classification using three classification methods: ... for English and Romance languages (French, Spanish,. Italian, Portuguese, and ...
ECAI 2015 - International Conference – 7th Edition Electronics, Computers and Artificial Intelligence 25 June -27 June, 2015, Bucharest, ROMÂNIA

Automatic lexico-syntactic classification of noun-adjective relations for Romanian language Marilena Lazăr

Diana Militaru

Military Equipment and Technologies Research Agency Bucharest, Romania [email protected]

University Politehnica of Bucharest Bucharest, Romania [email protected]

Abstract – The natural language processing became one of the most important fields of artificial intelligence because is related to the area of human-computer interaction using human languages (natural language generation, question answering, machine translation, etc.) or speech understanding (language modeling). To model the relations between words it is necessary to find the syntactic and semantic relations between them. Starting from the property/attribute-holder relation between nouns and adjectives, extracted from Romanian translation of Orwell’s novel “Nineteen eighty-four” (part of Multext-East [1]), this paper presents the results of automatic lexico-syntactic classification using three classification methods: decision trees, k nearest neighbors and naïve Bayes. Keywords: lexico-syntactic pattern, decision trees, Naïve Bayes, k-NN, Romanian language

I.

INTRODUCTION

As the amount of electronic documents (corpora, dictionaries, newspapers, newswires, etc.) becomes more and more important and diversified, there is a need to extract information automatically from these texts. Because of the complexity of the natural language processing the number of the perfect intelligible sentences and phrases are infinite. Interpreting these sentences and phrases correctly requires various types of linguistic information to understand the grammatical constructions using syntactic, semantic and pragmatic knowledge. In the context of the semantic interpretation of sentences and phrases within large text corpus, the syntactic and semantic properties of their components play an important role. The meaning of a collocation is determined both by the meaning of the component words and the way the words are ordered and grouped together. Thus the discovery of semantic roles and relations between word meanings are needed for a deep textual understanding. Recently a wide variety of research has focused on finding methods for automatically detection of semantic roles in open text for English [2], [3], [4], [5], [6], [7], [8], [9], etc. Also, research has been made regarding semantic relations identification and detection. For example, some methods used for the discovery of part-whole relation

are pure probabilistic models [10], decision trees [11], hand-coded rules or constraints [12]. The methods for detecting semantic relations based on pattern extraction and matching was used to extract hypernymy relations [13], meronymic relations [14], noun compounds [15] and other semantic relations. In [16] are investigated the syntactic and semantic properties of prepositions in the context of the semantic interpretation of nominal phrases (of the type Noun Preposition Noun) and compounds (Noun Noun) for English and Romance languages (French, Spanish, Italian, Portuguese, and Romanian). For English has been developed many methods for automatic labeling of semantic relations in noun phrases and in [23] the authors found that the most frequently occurring noun relations were part-whole, attribute-holder, possession, location, source, topic, and theme. In this paper we focused on the detection of the relations between noun and adjective encoded by a lexico-syntactic pattern in Romanian language. For their classification we used decision trees, k nearest neighbors (kNN) and naïve Bayes and different morphosyntactic feature sets. II.

CLASSIFICATION METHODS

A. Decision Trees Decision trees are useful tools frequently used in the decision making process or categorization problems. They provide support in learning process based on the method known as divide and conquer to construct a tree from a training dataset. The goal of the decision tree learning is to create a hierarchical tree structure that predicts the value of a variable based on several input variables. Decision tree are frequently used in natural language processing because they require little training dataset preparation, are able to handle both categorial and numerical data, and perform well with large datasets in a short time. Decision trees divide the given dataset into smaller successive datasets. They are based on conditional probabilities and generate rules that can be easily be understood and interpreted by humans.

S-86

Marilena Lazar, Diana Militaru

The most known decision tree algorithm is C4.5 [17], used to build trees by deciding the best variable that will be chosen, at each step, to be used in splitting the dataset. The C4.5 algorithm builds decision trees from a set of training dataset using information gain, a method based on entropy concept. B. K Nearest Neighbors K nearest neighbors (k-NN) is one of the most used learning and classification method used in machine learning. Introduced by Fix and Hodges [18], k-NN is known as a powerful and simple classification algorithm. k-NN classifier is an instance-based learning algorithm that has proved to be effective for problems like pattern recognition. To classify a new instance, kNN finds its k nearest neighbors from the training dataset using different similarity metrics (Euclidean distance, Minkowski distance, etc.). The Euclidian distance between two points, commonly used by k-NN as similarity measure, is given by the following equation: 𝑑(𝑝, 𝑞) =

√∑𝑛𝑖=1(𝑞𝑖

− 𝑝𝑖

)2

(1)

The accuracy of the k-NN algorithm can be severely degraded by the presence of noisy or irrelevant features [20]. C. Naïve Bayes The Naïve Bayes method is based on Bayes’ inference [19] and assumes that, given the class, the attributes are independent. Although the independence assumption is not realistic in real life, this classifier works very effectively when is tested on actual datasets. Their performance are improved when are used certain attribute selection procedures that eliminate redundant attributes. The results are worst if a particular attribute value does not occur in the training set in conjunction with every class value [21]. In the Bayes classifiers decision making process is based on Bayes theorem, thus they assign to a given instance d the most probable class cNB from a finite set of classes C= (c1, c2, ..., cn) according to: 𝑐𝑁𝐵 = argmax 𝑃(𝑐𝑗 ) ∏𝑛𝑖=1 𝑃(𝑑𝑖 |𝑐𝑗 )

(2)

𝑐𝑗 𝜖𝐶

Naïve Bayes classifiers have some advantages: they are fast to train and classify, are not sensitive to irrelevant features, and can handle well the real and discrete data. They have a major disadvantage: Naïve Bayes classifier assumes independence between attributes (features). III.

ROMANIAN NOUN - ADJECTIVE CONSTRUCTIONS

The understanding of natural language is based on considerable knowledge about the structure of the language including words identification, the way they are combined together into sentences and phrases, the words meaning, the manner they contribute to the

meaning of the phrases, what kind of actions the speaker intend, etc. Sentence constituents do not always show its meaning. It can be correctly build syntactically and semantically but from the point of view of its meaning it could be ambiguous. The meaning of a sentence depends on the context in which it is used. In natural language there are a very large number of words and collocations that change their meaning in a specific context. Many such relations are encoded in semantic relations between a verb and its associated noun, nouns and the determining adjectives and so on. The noun-adjective constructions encode one of the most frequently used semantic relations in Romanian language: property and attribute-holder relation. For this paper we choose the semantic relation property and attribute-holder defined by Levi [22]: X is a characteristic/quality of an entity/event/state Y. We tried to identify for Romanian language the lexical-syntactic patterns that encode the property and attribute-holder semantic relations. Romanian adjectives have a different behavior from English adjectives. In Romanian language adjectives agree with nouns in gender, number and case. The typical place for them in a sentence is after the determined noun, not in front of the noun - as in English. Whenever the adjective is placed in front of the noun, the intention is purely emphatic. In that case they can be declined in nominative/accusative and genitive/dative cases, and can take the enclitic definite article. So, because most of the adjectives in Romanian are variable having different forms according to determined noun and the context, there are a large number of lexico-syntactic patterns that encoded a property/attribute-holder relation. Generally the semantic relations are encoded by ambiguous lexico-syntactic patterns. When one pattern can encode a more than one semantic relation its disambiguation is made using the context. For example the expression “podișul dobrogean“ (dobrogean plateau) contains both a property as well as a location relation. IV.

EXPERIMENTS AND RESULTS

Our first goal was to detect lexico-syntactic patterns that encoded a property/attribute-holder relation for Romanian language. To discover these patterns we used C4.5 decision tree, naïve Bayes, and k-NN classifiers implemented in WEKA [21]. The patterns were extracted from the Romanian translation of Orwell’s novel “Nineteen eighty-four” [1]. The noun-adjective constructions extracted from the training corpus have been grouped according to the similarity of the lexical material. Some of these patterns extracted from the training corpus are presented in table I. There have been identified 75 types of lexicosyntactic patterns that encode a property/attributeholder relation.

Automatic lexico-syntactic classification of noun-adjective relations for Romanian language TABLE I. SOME EXAMPLE OF LEXICO-SYNTACTIC PATTERNS BETWEEN NOUNS AND ADJECTIVES THAT ENCODED A PROPERTY RELATION (FROM MULTEXT-EAST [1]) Noun-adjective example zi senină apa rece trăsături frumoase lumii întregi întregii lumi imense câmpii fum negru paharul gol ochii albaştri vânt mai rece cel mai mare duşman

English translations a bright day the cold water handsome features the entire world the entire world endless plain black smoke an empty glass blue eyes colder wind

MSDs Ncfsrn Afpfsrn Ncfsry Afpfsrn Ncfp-n Afpfp-n Ncfsoy Afpfson Afpfsoy Ncfson Afpfson Ncfson Ncms-n Afpms-n Ncmsry Afpms-n Ncmpry Afpmp-n Ncms-n Rp Afpfsrn Tdmsr Rp Afpms-n Ncms-n

the worst enemy

TABLE II. EXPERIMENTAL RESULTS FOR CLASSIFICATION FOR NOUN-ADJECTIVE RELATIONS

Classifier

C4.5

Correctly classified instances (%)

Evaluation method Training set (min. no. of instances /leaf) Cross validation (min. no. of instances /leaf)

99.882

2

99.881

1

99.881

2

99.822

1 nearest neighbor 3 nearest neighbors 1 nearest neighbor 3 nearest neighbors

Training set k-NN Cross validation

Naïve Bayes

1

The second goal of our work was to determine which linguistic feature combination obtains a better result for classification. Because the adjective in Romanian language has different forms for gender, number and case we wanted to see classification results using different feature combinations. So, our first approach was to use a set of linguistic feature (table II) formed by: 

number, gender, morphosyntactic descriptions, and instance of nouns and adjectives, and



morphosyntactic descriptions and instances for the link words.

In this case the best result for accuracy 99.882% was obtained with C4.5 algorithm for training set evaluation with one minimum instance per leaf. The decision tree obtained had 167 leaves and a size of 194. The second approach was to observe how the addition of different frequencies improves the result of classifications. For this case we selected a set of base features. Then we formed other two feature sets using normalized frequency of words (nouns, adjectives and link words) and, respectively, frequency of morphosyntactic descriptions. These feature sets were: 

Based features set: morphosyntactic descriptions and instances for nouns, adjectives and link words,



Additional feature set 1: normalized frequency of words, and



Additional feature set 2: normalized frequency of words and normalized frequency of morphosyntactic descriptions for nouns and adjectives.

100.000 99.674 99.644 99.496

Training set

99.496

Cross validation

98.786

We imposed restrictions on the textual distance between the noun and the adjective and extracted only the relations that contain no more than two link words. Then we grouped these patterns in 48 classes based on semantic criteria, manually annotated. The total number of instances was 3,378.

For the based feature set, the size of the decision tree obtained is 152 and it has 130 leaves. Adding the feature set1 to the based feature set we obtained a tree with size 158 and 136 leaves. For the last case, adding the feature set 2 to based feature set and feature set 1, the tree had largest, with a size of 166 with 140 leaves. The results obtained for these features are listed in table III:

TABLE III. EXPERIMENTAL RESULTS FOR CLASSIFICATION FOR THE SELECTED FEATURES Classifier

Evaluation method

Training set C4.5 Cross validation Training set K-NN Cross validation Naïve Bayes

Training set Cross validation

1 instance/leaf 2 instances/leaf 1 instance/leaf 2 instances/leaf 1 nearest neighbor 3 nearest neighbors 1 nearest neighbor 3 nearest neighbors

87

Correctly Classified Instances (%) Based features Based features Based features + Feature set 1 + Feature set 2 99.881 99.911 99.940 99.881 99.911 99.940 99.881 99.851 99.911 99.822 99.851 99.792 100.000 100.000 100.000 99.585 98.54 99.289 99.526 96.653 98.193 99.348 97.246 98.637 99.407 98.104 97.216 98.371 96.150 95.943

S-88

Marilena Lazar, Diana Militaru

A. The decision trees classifier We started with an accuracy of 99.822% for basic selected features in the case of cross validation and finally we obtained an accuracy of 99.851% for the features set 1 and 99.792% for the features set 2. So, in this case the best obtained result is the one for the features set1. B. Naïve Bayes classifier In the case of Naïve Bayes classifier, for the cross validation evaluation the best results, 98.371%, was obtained for based selected features. The results obtained for feature set 1 (96.150%) and feature set 2 (95.943%) are lower than that for based features set. C. K nearest neighbor classifier For the k nearest neighbor classifier the result are different for k-NN = 1and k-NN = 3. For k-NN = 1 the best result was the one for the based features set. In this case, adding frequencies to the based selected features decreases the accuracy from 99.526% for based features set to 96.653% for the feature set 1 and 98.193% for the feature set 2. For k-NN = 3 the situation is the same. Adding frequencies to the based selected features decreases the accuracy from 99.348% for based features set to 97.246% for feature set 1 and 98.637% for the feature set 2. V.

CONCLUSIONS

In this paper we presented the classification results for noun-adjective constructions, extracted from Romanian translation of Orwell’s novel “Nineteen eighty-four”, that encoded a property/attribute-holder semantic relation. Initially, we have identified 75 types of lexico-syntactic patterns that encode a property/attribute-holder relation. Then these patterns were grouped in 48 semantic classes, manually annotated. The total number of training instances was 3,378. For their classification we used decision trees, k nearest neighbors and naïve Bayes. Finally, the best results for test with crossvalidation set were obtained with decision trees, especially with based features set + feature set 1 (normalized frequency of words) - 99.822%. For the other cases (k-NN and naïve Bayes) the best results were the ones for based features set: 99.526% and, respectively, 98.371%. The normalized frequency of the parts of speech used could not improve the results in any classification method. To evaluate our results, we compare with [23], where is presented a method for automatic labeling of semantic relations in noun phrases. They obtained 33.33% for the semantic classification of noun phrases patterns formed by adjective noun constructions in English language extracted from Wall Street Journal corpus using decision trees and 40% using naïve Bayes. In the future we will study the other syntactic patterns, like noun compounds, to observe what kind of semantic relation are encoded by them and try another feature sets to determine those are most important to classify the patterns studied.

REFERENCES [1]

[2] [3]

[4]

[5]

[6]

[7]

[8]

[9] [10] [11]

[12] [13]

[14]

[15]

[16]

[17] [18]

[19]

[20] [21]

[22] [23]

T. Erjavec, “MULTEXT-East Version 4: Multilingual Morphosyntactic Specifications, Lexicons and Corpora“, Proceedings of the Seventh International Conference on Language Resources and Evaluation, LREC'10, Malta, 2010 D. Gildea, D. Jurafsky, “Automatic labeling of semantic roles“. Computational Linguistics, 28(3):245-288, 2002 D., Gildea, M., Palmer, “The necessity of syntactic parsing for predicate argument recognition“, in Proceedings of ACL02, pp. 239-246, Philadelphia, PA, 2002 D., Gildea, J., Hockenmaier, “Identifying semantic roles using combinatory categorial grammar“, in 2003 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 57-64, Sapporo, Japan, 2003 M., Fleischman, N., Kwon, E.,Hovy, “Maximum Entropy Models for FrameNet Classification“, in Proceedings of EMNLP-2003, Sapporo, Japan, 2003 J.H. Lim, Y.S. Hwang, S.Y. Park, H.C. Rim, “Semantic Role Labeling using Maximum Entropy Model“, CoNLL-2004, 2004 D. Moldovan, R. Girju, M. Olteanu, O. Fortu, “SVM Classification for FrameNet Semantic Roles“, Senseval 3, 2004 M., Surdeanu, S., Harabagiu, J., Williams, P., Aarseth, “Using predicate-argument structures for information extraction“, in Proceedings of the 41th Annual Conference of the Association for Computational Linguistics, pp. 8-15, 2003 eNet Semantic Roles, Senseval 3, 2003 U. Baldewein, K. Erk, S. Pado, D. Prescher, “Semantic Role Labeling With Chunk Sequences“, CoNLL-2004 M. Berland, E. Charniak, “Finding Parts in Very Large“, The Proceeding of ACL, 1999 R. Girju, A. Badulescu, D. Moldovan, “Learning semantic constraints for the automatic discovery of part-whole relations“, Proceedings of the Human Language Technology Conference (HLT-NAACL 2003), Edmonton, Canada R. Girju, “Text Mining for Semantic Relations“, PhD Dissertation, University of Texas at Dallas, May 2002 M. A. Hearst, “Automatic acquisition of hyponyms from large text corpora“, in Proceedings of the 14th International Conference on Computational Linguistics, pp. 539-545, Nantes, France, 1992 R. Girju, A. Badulescu, D. Moldovan, “Automatic Discovery of Part-Whole Relations“, in Computational Linguistics, Volume 32, No. 1, pp. 83-135, 2006 B. Rosario, M. Hearst, “Classifying the Semantic Relations in Noun Compounds via a Domain-Specific Lexical Hierarchy“, Proceedings of EMNLP, 2001 R. Girju, “The Syntax and Semantics of Prepositions in the Task of Automatic Interpretation of Nominal Phrases and Compounds: A Cross-Linguistic Study“, in Computational Linguistics, Volume 35, Number 2, pp. 185-228, 2008 J.R. Quinlan, C4.5: Programs for machine learning. Morgan Kaufmann Publishers, 1993 E. Fix, J. Hodges, “Discriminatory analysis. Nonparametric discrimination: Consistency properties“, Technical Report 4, USAF School of Aviation Medicine, Randolph Field, Texas, 1951 T. Bayes, “An essay toward solving a problem in the doctrine of chances“, Vol. 53. Reprinted in Facsimiles of two papers by Bayes, Hafner Publishing Company, New York, 1963 R. O. Duda, P. E. Hart, Pattern classification and scene analysis, New York: Wiley & Sons, 1973 I. H. Witten, E. Frank, A. M. Hall, Data mining: Practical machine learning tools and techniques, third edition. Morgan Kaufmann Publishers, 2011 J. Levi, The syntax and semantics of complex nominals. New York: Academic Press, 1979 D. Moldovan, A. Badulescu, M. Tatu, D. Antohe, R. Girju, “Models for the Semantic Classification of Noun Phrases“, Computational Lexical Semantics Workshop, Human language Technology Conference, Boston, USA, May 2004