Recognition of Personality Traits using Meta Classifiers

0 downloads 0 Views 578KB Size Report
recognizing personality traits, there is also a need to synthesize personable ... text categorization [15,16], information extraction, named entity recognition ..... LLD. Moving average filter (sma), first delta coefficient. (de). Functionals applied to.
Recognition of Personality Traits using Meta Classifiers Firoj Alam, Giuseppe Riccardi, Shammur Absar Chowdhury Department of Information Engineering and Computer Science, University of Trento {alam,riccardi,sachowdhury}@disi.unitn.it

Abstract In this paper, we have tried to understand human personality traits by using meta classifiers. We have used SMO (Sequential Minimal Optimization for Support Vector Machine), RF (Random Forest) and Adaboost as the three main algorithms to design our meta classifiers. As a method of evaluation, we have used weighted and un-weighted average evaluation measure according to the Interspeech 2012 speaker traits challenge guidelines. In the Interspeech 2012 speaker traits challenge, the organizer provided Speaker Personality Corpus (SPC), which we used to design our meta classifiers and measured the accuracy of the system. Index Terms: Personality trait prediction, meta classifier.

1. Introduction Above all the human attributes, personality is the most complex of all and also it characterizes a uniqueness of a person. It has been a long-term goal for psychologists to understand human personality and its impact on human behavior. Behavior involves an interaction between a person's underlying personality and situational variables. The situation, that a person finds himself or herself in, plays a major role in how the person reacts. However, in most of the cases, people often respond based on their underlying personality traits. With time, this area has attracted researchers from different fields, especially those in the humanmachine interaction and behavioral analytics field. Such an achievement in understanding human personality will add an extra advantage in the human-machine spoken interaction design. It is suggested in Nass [1] and Bickmore [2], that naturalness and efficiency of interaction of a machine as perceived by a user increases by matching the user’s personality. Psychologists have been studying this topic for a very long time while scientists from speech and language have recently got some interesting results [3, 4, 13]. Studies have been done on how the style of communications like emails, blog entries [11] etc. depends on the author’s personality; even the choice of particular parts of speech is also dependent on the personality [12]. A comparison has been done on the role of linguistic cues in spoken and textual communications [3]. Apart from recognizing personality traits, there is also a need to synthesize personable voice in a rich human computer interactive system, which can be achieved by synthesizing voice with personality traits. Very little work has been done to recognize personality traits from speech, compared to the psychological studies, which is another motivation of this work. Most of the machine learning algorithms is not able to perform better to recognize multiple personality traits. Some

perform well to recognize one trait but perform poorly in other traits. So our approach is to combine the classifiers. Combining classifiers is a well-known topic in data mining, text categorization [15,16], information extraction, named entity recognition [14] as well as spoken language understanding [10] task. Since combining classifiers is very successful technique in different fields, so we have experimented two different approaches: 1. majority voting 2. meta classifiers. This paper describes an attempt to determine human personality traits based on the acoustic data provided by the organizer [5]. This attempt has been experimented by using meta classifiers and majority voting for classification and openSMILE tool [6] for feature extraction. For the classification, we have used SMO (Sequential Minimal Optimization for Support Vector Machine), RF (Random Forest) and Adaboost as the base and meta classifiers. Details of the architectural design of the system can be found in the later part of the paper. The evaluation of such system is done on SPC corpus. The paper is organized as follows: Section 2 contains the task description, which has been given for this experiment. Section 3 describes the design of the system. Details of the experiment and results are given in Section 4, followed by the conclusion and future study in Section 5.

2. The Task Description The main motivation of Interspeech 2012 Speaker Trait Challenge [5] is to determine the personality traits of the speaker into five personality dimensions, OCEAN [2], which is generally the most widely-used personality trait model. OCEAN describes the human personality as a vector of five values corresponding to bipolar traits. This is a popular model among the language and computer science researchers, as it is being used as a framework for both personality traits identification and simulations. OCEAN, which is summarized form of the Big-Five personality traits that can be explained in brief as follows: Openness to experience: An appreciation for art, emotion, adventure and varying experience. It estimates the degree to which a person considers new ideas and integrates new experiences in everyday life. High scored people are presumed to be visionary and curious while low scored people are generally conservative. Conscientiousness: A tendency to show the self-discipline, aim for achievement, having a planned behavior rather than having a spontaneous behavior. People with high score are considered to be accurate, careful, reliable and effectively planned while people of low scores are presumed to be careless and not thoughtful. Extraversion: Extraverted people are energetic, seeking companies of others and have outgoing attitude, while

introverted personalities are presumed to be rather conservative, reserved and contemplating. Agreeableness: Compassionate and cooperative, opposed to suspicion. They trust other people and are being helpful. Nonagreeable personalities are presumed to be egocentric, competitive and distrustful. Neuroticism: A tendency to experience of mood swings, easily influenced by negative emotions like anger, depression etc. For this task, organizers of The Interspeech 2012 Speaker Trait Challenge [5] provided us SPC, which consists of three sets of data where in the training and the development sets we have known class labels, and in the test set class labels are unknown. The training and the development sets consist of OCEAN tag, where each trait is mapped into two classes, positive and negative. For example, for openness: we have two tags, O, for openness and NO, for not-openness. This corpus has around 640 audio files, that were randomly collected from the French news bulletins, broadcasted in February, 2005. Only one speaker was used for each audio clip and there were altogether 322 individual speakers, among them in 16 clips - 18.8% of the speakers appear frequently and other 61.0% of the speakers is present in one clip and 20.2% speakers in two. Maximum of the clip is of 10 seconds duration. The extracted corpus was judged by 11 judges, who have listened to all the clips and individually evaluated the clips using BFI-10 [17], which is a commonly used questionnaire for the personality trait evaluation. The organizer provided us speech files, extracted acoustic features from those speech files and a tool – openSMILE [6], to extract more acoustic features, if needed. The acoustic features contain different low level descriptors (LLDs) applied with different functionals (e.g. mean, standard deviation and so on). The details of the features are given in Appendix A and the partitioning and distribution of the corpus are given in Appendix B. Here, we have tried to do different experiments using different machine learning algorithms with varying feature sets to improve the accuracy of the system.

the prediction of each tree. Each tree on its own makes a prediction. These predictions vote to make the RF prediction. Adaptive Boosting (Adaboost) [13] is an algorithm for constructing a strong classifier as a linear combination of week classifiers by overweighting the examples that are misclassified by each classifier. The predictions from all of the classifiers are then combined through a weighted majority vote to produce the final prediction. We use weka1 to use SMO and RF; for boosting we use icsiboost2 [9], which is an open source implementation of Adaboost based classifier. The diagram of our system is given in Figure 1. Training/Dev/ Test Corpus

Feature Extraction

Base classifiers SMO, RF, Adaboost Baseline features

Output of the base classifiers (label and score) as features Classifier SMO/RF

Classifier output

3. System Design In this paper, we have used SPC to train and evaluate our system. For which, we have extracted acoustic features using TUM’s open-source openSMILE tool [6] based on the configuration file provided by the organizer [5]. The feature set contains 6125 features, which we feed to our base classifiers. For the classification, we have designed meta classifiers, where in the first step we use SMO, RF and Adaboost as the base classifiers. Then, the output of these base classifiers, labels and scores, are used as features for the meta classifiers. We used SMO and RF as our final meta classifiers to see the performance of the system. SMO (Sequential Minimal Optimization for Support Vector Machine) [7] is an optimization technique for solving quadratic optimization problem, which arises during the training of SVM. SVM contracts a hyper-plane or a set of hyper-planes in a high or infinite dimensional space by maximizing the distance (functional margin) between the nearest data-points, called support vectors. It separates a set of positive examples from negative examples with maximum margin. RFs (Random Forests) [8] are a combination of tree predictors. It builds a series of classification trees. Then, it uses

Meta Classifier

Figure 1: Personality traits recognition system

4. Experiments and Results We have experimented with different combination-of-classifiers methods such as majority vote and meta classifiers and the performance of the system has been measured in terms of weighted average (WA) and un-weighted average (UA). We tried to optimize the parameters of different classifiers on the development set. For the evaluation on the test set, we re-train the model by combining the train and development set and we used the parameters that are optimal on the development set. Our initial experiment was tuning the system based on the development set. Table 1 shows the baseline results using SMO and RF, whereas Table 2 shows the tuned results using SMO, RF and Adaboost.

1 2

http://www.cs.waikato.ac.nz/ml/weka/ http://code.google.com/p/icsiboost/

RF Class

SMO

UA

WA

UA

WA

Openness

64.35

69.40

60.40

62.84

Conscientiousness

77.00

77.05

74.53

74.86

Extraversion

84.16

84.15

80.88

80.87

Agreeableness

70.27

67.76

67.58

65.57

Neuroticism

69.88

69.95

68.01

68.31

Table 1: Baseline results using RF and SMO on the development set using organizer’s RF and SMO’s parameter settings. Used baseline features. CL O C E A N

RF UA 65.87 78.72 85.78 68.51 71.50

WA 69.94 78.68 85.79 66.12 71.58

SMO UA WA 62.54 64.48 74.52 74.86 82.52 82.51 67.58 65.57 70.23 70.49

Adaboost UA 63.67 72.57 80.73 60.44 68.47

WA 67.21 72.68 80.75 59.02 68.31

Table 2: Tuned results using RF, SMO and Adaboost on the development set. CL – Class label, O-openness, Cconscientiousness, E-extraversion, A-agreeableness, Nneuroticism. Used baseline features.

4.1. Majority vote of classifiers Majority voting is one of the most commonly used strategies when we have multiple labels from multiple classifiers. In this experiment we have used majority voting to deal with multiple labels. Here, we deployed three classifiers and each classifier produces a unique decision to classify the example. In combining the decision of the three classifiers, the example is assigned to the class where majority agreed. The output of the majority vote label is calculated as follows: 1 !! = 0

1 !"  ( ) ! 1 !"   !

! !!! ! !!!

!

!! > 0.5 !

!! < 0.5

where !! is the majority vote label, C is the number of classifiers. Here we have used three classifiers. The experimental results on the development set of the majority vote are given in Table 3 where the classifiers are RF, SMO and Adaboost. Class Openness Conscientiousness Extraversion Agreeableness Neuroticism Mean

Majority vote UA WA 77.05 77.00 74.32 74.16 84.70 84.70 66.12 68.52 71.04 70.93 74.64 75.06

Table 3: Results of the majority vote based on the development set. The majority of the vote has been computed using the classifiers RF, SMO and Adaboost. This result can be compared with the tuned results of Table 2.

4.2. Meta classifiers A study [11] on spoken language understanding suggests that combination-of-classifiers method provides better results to understand spoken conversations, which is our first motivation to apply the meta classifiers to recognize the personality traits. We used RF, SMO and Adaboost to make our base classifiers and then used these base classifiers to annotate training, development and test set. The annotations are class label and confidence score, which we then combined with baseline features. Baseline features are the features, which we extract from speech files using openSMILE tool. These features are then feed into the final classifier. Moreover, we have made our final classifier using SMO, RF and Adaboost to see the performance of the algorithms. The results of the three different algorithms as meta classifiers are given in Table 4. It shows that different algorithms behave differently on each OCEAN trait. Among the three different algorithms Adaboost and RF perform better in most of the cases. According to this experimental result, we have chosen Adaboost and RF as our final meta classifiers alternatively, that are used to classify the development and the test set. CL RF SMO Adaboost UA WA UA WA UA WA O 70.30 71.04 68.87 69.95 69.95 78.72 C 75.05 74.32 74.16 74.32 78.69 78.72 E 85.23 85.25 84.70 84.70 85.79 85.79 A 68.54 66.67 67.89 65.57 66.12 68.52 N 67.00 67.21 71.42 71.58 71.58 71.50 Mean 73.22 72.89 73.40 73.22 74.42 76.65 Table 4: Performance of the meta classifiers based on the development set using only baseline features. CL – Class label, O-openness, C-conscientiousness, E-extraversion, A-agreeableness, N-neuroticism. Bold form of the results represents the best results. Based on our experimental results, we found that performance of the system improves using RF when we combine the baseline features. However, performance has been decreased in some traits using SMO (e.g. openness). The results are given in Table 5. CL O C E A N Mean

RF UA 71.32 79.70 86.89 70.92 73.69 76.50

WA 74.31 79.78 86.88 68.85 73.77 76.71

SMO UA WA 66.28 67.75 74.92 74.31 85.24 85.24 68.89 67.75 73.52 73.77 73.77 73.76

Adaboost UA WA 65.88 69.95 78.72 78.69 85.79 85.79 66.12 68.52 71.50 71.58 73.60 74.90

Table 5: Performance of the meta classifiers on the development set using the baseline features and the base classifiers output as features. CL – Class label, O-

openness, C-conscientiousness, E-extraversion, Aagreeableness, N-neuroticism. Bold form of the results represents the best results.

[6]

[7]

4.3. Official results After submitting the first run of the classified test data we achieved a score that is shown in Table 6.

[8] [9]

Class Openness Conscientiousness Extraversion Agreeableness Neuroticism Mean

Official results (First run) UA WA 58.23 61.19 78.56 78.60 73.36 73.63 61.86 61.69 62.22 62.68 66.84 67.56

Table 6: Official results of the first run of our submission. Training data is the SPC training and development set and test data is the SPC test set. We got close results on class conscientiousness in the first run compare to our experiment on development data set of the same class. In the rest of the personality trait categories we got around 10-13% less than our experiment on the development set. The reason is that the classifiers we generate do not generalize well. We could do some feature-engineering task to reduce the feature dimension and to perform better on the test set.

5. Conclusions and Future Study In this paper, we have investigated an automatic system to recognize BIG-5 personality traits from speech. We have experimented several techniques and we came across a system by applying meta classifiers, which provides better results to recognize personality traits on the development set, however it does not outperform the baseline results on the test set. The main reason is the high dimensional feature space, which does over-fit. Future directions of this study would be to do experiment with some feature selection task and integrating textual and linguistic information along with spoken data, which may lead to provide more information to recognize personality traits.

6. References [1] [2] [3] [4] [5]

C. Nass and S. Brave, "Wired for Speech: How Voices Activates and Advances the Human-Computer Relationship", MIT Press 2005 T.W. Bickmore and R.W. Picard., “Establishing and maintaining long-term human-computer relationships”, ACM Trans. Comput.Hum. Interact., 12:293–327, June 2005. F. Mairesse, M. A. Walker, M. R. Mehl, and R. K. Moore, “Using linguistic cues for the automatic recognition of personality in conversation and text”, J. Art. Intelligence Res., 30:457–500, 2007. T. Polzehl, S. Moller, and F. Metze, “Automatically assessing acoustic manifestations of personality in speech”, In Spoken Language Technology Workshop, 2010 IEEE, pages 7 –12, 2010. B. Schuller, S. Steidl, A. Batliner, E. Nöth, A. Vinciarelli, F. Burkhardt, R. van Son, F. Weninger, F. Eyben, T. Bocklet, G. Mohammadi, B. Weiss, “The Interspeech 2012 Speaker Trait Challenge”, Proc. Interspeech 2012, ISCA, Portland, OR, USA, 2012.

[10]

[11] [12]

[13] [14]

[15] [16]

[17]

F. Eyben, M. Wo ̈llmer, and B. Schuller, “openSMILE - The Munich Versa- tile and Fast Open-Source Audio Feature Extractor,” in Proc. ACM Multi- media. Florence, Italy: ACM, 2010, pp. 1459–1462. John C. Platt, “Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector Machines”, Microsoft Research, Technical Report, April 21, 1998. Leo Breiman, “Random Forests”, Statistics Department, University of California Berkeley, CA 94720, January 2001. Benoit Favre, Dilek Hakkani and Sebastien Cuendet, “Icsiboost”, http://code.google.come/p/icsiboost, 2007. Mercan Karahan, Dilek Hakkani-Tur, Giuseppe Riccardi, Gokhan Tur, “Combining classifiers for spoken language understanding”, Automatic Speech Recognition and Understanding, 2003. ASRU '03. 2003 IEEE Workshop, 2003. A. J. Gill and R. M. French. “Level of representation and semantic distance: Rating author presonality from texts”, In Proc. of the 2nd European Cognitive Science Conference (EuroCogsci07), 2007. J. Oberlander and A. J. Gill., “Individual differences and implicit language: Personality, parts-of-speech and pervasiveness”, In Proc. of the 26th Annual Conference of the Cognitive Science Society, Chicago, IL, USA, 2004. Freund, Y. and Schapire, R. (1997). A decision-theoretic generalization of online learning and an application to boosting, Journal of Computer and System Sciences 55: 119–139. Firoj Alam, “Named Entity Recognition on Transcription Using Cascaded Classifiers”, EVALITA 2011. In Working Notes of EVALITA 2011, 23-24th January 2012, Rome, Italy, ISSN 22405186 (2012). Fabrizio Sebastiani, “Machine learning in automated text categorization”, Journal ACM Computing Surveys (CSUR), Volume 34, Issue 1, March 2002. D. Morariu, R. Cretulescu, L. Vintan, “Improving a SVM Metaclassifier for Text Documents by using Naïve Bayes”, Int. J. of Computers, Communications & Control, ISSN 1841-9836, E-ISSN 1841-9844, Vol. V (2010), No. 3, pp. 351-361. Rammstedt, B. & John, O. P., “Measuring personality in one minute or less: A 10-item short version of the Big Five Inventory in English and German”, Journal of Research in Personality, 41, pp. 203-212, 2007.

Appendix A Feature set contains only functionals applied to LLDs (Low Level Descriptors) and delta LLDs. A summary of the results is given in the following Table 12. 35 functionals applied to LLD/delta LLD, 23 functionals applied to LLD only, 5 functionls applied to smooth and delta coefficient of all LLDs and 3 others applied to LLD only. A.1 Summary of the features: LLD

4 energy related LLD 54 spectral related LLD 6 voicing related LLD

Moving average filter (sma), first delta coefficient (de) sma, de (2) sma (1) sma-de (1) sma, de (2) sma (1) sma-de (1) sma, de (2) sma (1)

Functionals applied to LLD/delta LLD and LLD only 35 23 3 35 23 3 33 23

Number of features

4x2x35=280 4x23=92 4x1x3=12 54x2x35=3780 54x23=1242 54x1x3=162 6x2x33=396 6x23=138

F0 voicing related LLD

sma-de (1) sma (1)

3 5

6x1x3=18 1x5 = 5

Total 6125 Features Table 7: Summary of the features. sma, de(2) indicates that functionals applied after applying (i) moving average filter smoothing and (ii) first delta coefficient separately. sma(1) means, only used moving average filter smoothing before applying functionals. sma-de (1) means, moving average filter smoothing and first delta coefficient applied together, then used functionals. 35 Functionals refers to Table 8, 33 Functionals refers to Table 9, 5 Functionals refers to Table 10, 23 Functionals refers to Table 11 and 3 Functionals refers to Table 12. A.2 Low Level Descriptors (LLDs) 4 energy related LLDs 1. Sum of auditory spectrum (loudness) 2. Sum of rasta-style filtered auditory spectrum 3. RMS energy 4. Zero-crossing rate 54 spectral LLDs 1. Rasta-style auditory spectrum: bands 1-26 (0-8 kHz) represents 26 LLDs 2. Spectral energy: 250-650 Hz, 1k-4k Hz represents 2 LLDs 3. Spectral roll off point: 0.25, 0.5, 0.75, 0.90 represents 4 LLDs 4. Spectral flux, Entropy, Variance, Skewness, Kurtosis, Slope, Psychoacoustic-sharpness, Harmonicity represents 8 LLDs 5. MFCC 1-14 represents 14 LLDs 6 voicing related LLDs 1. F0 by SHS + Viterbi smoothing 2. Probability of voicing 3. Jitter-local 4. Jitter-delta 5. Shimmer-local 6. Logarithmic HNR A.3 Functionals

1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17.

35 Functionals applied to LLD/delta LLD Range: range Position of max: maxPos Position of min: minPos Quartiles: quartile1 Quartiles: quartile2 Quartiles: quartile3 Inter-quartiles: iqr1-2 Inter-quartiles: iqr2-3 Inter-quartiles: iqr1-3 1% percentile: percentile1.0 99% percentile: percentile99.0 Percentile range: pctlrange0-1 Standard deviation: stddev Skew-ness: skewness Kurtosis: kurtosis Mean segment length: meanSegLen Max segment length: maxSegLen

18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35.

Min segment length: minSegLen Standard segment length: segLenStddev Relative duration LLD is above 25: upleveltime25 Relative duration LLD is below 25: downleveltime25 Relative duration LLD is above 50: upleveltime50 Relative duration LLD is below 50: downleveltime50 Relative duration LLD is above 75: upleveltime75 Relative duration LLD is below 75: downleveltime75 Relative duration LLD is above 90: upleveltime90 Relative duration LLD is below 90: downleveltime90 Relative duration LLD is rising: risetime Relative duration LLD is falling: falltime Gain of linear prediction: lpgain Linear prediction coefficients: lpc0 Linear prediction coefficients: lpc1 Linear prediction coefficients: lpc2 Linear prediction coefficients: lpc3 Linear prediction coefficients: lpc4 Table 8: This 35 functionals applied to 58 LLDs/delta LLDs (4 energy related LLDs and 54 spectral LLDs) after applying (i) moving average filter and (ii) first delta coefficient, which brings 4080 features.

1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32.

33 Functionals applied to LLD/delta LLD Range: range Position of max: maxPos Position of min: minPos Quartiles: quartile1 Quartiles: quartile2 Quartiles: quartile3 Inter-quartiles: iqr1-2 Inter-quartiles: iqr2-3 Inter-quartiles: iqr1-3 1% percentile: percentile1.0 99% percentile: percentile99.0 Percentile range: pctlrange0-1 Standard deviation: stddev Skew-ness: skewness Kurtosis: kurtosis Relative duration LLD is above 25: upleveltime25 Relative duration LLD is below 25: downleveltime25 Relative duration LLD is above 50: upleveltime50 Relative duration LLD is below 50: downleveltime50 Relative duration LLD is above 75: upleveltime75 Relative duration LLD is below 75: downleveltime75 Relative duration LLD is above 90: upleveltime90 Relative duration LLD is below 90: downleveltime90 Relative duration LLD is rising: risetime Relative duration LLD is falling: falltime Relative duration left curvature: leftctime Relative duration right curvature: rightctime Gain of linear prediction: lpgain Linear prediction coefficients: lpc0 Linear prediction coefficients: lpc1 Linear prediction coefficients: lpc2 Linear prediction coefficients: lpc3

33. Linear prediction coefficients: lpc4 Table 9: This 33 functionals applied to 6 voicing related LLDs/delta LLDs after applying (i) moving average filter and (ii) first delta coefficient, which brings 396 features. 5 Functionals applied to F0 voicing related LLD 1. Percentage of non-zero frames: nnz 2. Mean segment length: meanSegLen 3. Max segment length: maxSegLen 4. Min segment length: minSegLen 5. Standard segment length: segLenStddev Table 10: 5 Functional applied to F0 based voicing related LLD. This brings 5 features.

1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23.

23 Functionals applied to LLD Only Arithmetic mean: amean Flatness: flatness Root quadratic mean: rqmean Mean of peak distances: meanPeakDist Standard deviation of peak distances: peakDistStddev Absolute peak range: peakRangeAbs Relative peak range: peakRangeRel Absolute peak mean: peakMeanAbs Absolute peak range: peakMeanMeanDist Relative peak mean: peakMeanRel Relative min range: minRangeRel Mean of rising slope: meanRisingSlope Standard deviation of rising slope: stddevRisingSlope Mean of falling slope: meanFallingSlope Standard deviation of falling slope: stddevFallingSlope Linear regression: linregc1 Linear regression: linregc2 Linear regression quadratic: linregerrQ Quadratic regression: qregc1 Quadratic regression: qregc2 Quadratic regression: qregc3 Linear regression quadratic: qregerrQ Centroid: centroid Table 11: This 23 functionals applied 64 LLDs (4 energy, 54 spectral and 6 voicing related) after applying moving average filter smoothing. This brings 1472 features.

3 Functionals applied to smooth and delta coefficient of all LLDs 1. Flatness: flatness 2. Position of arithmetic mean: posamean 3. Root quadratic mean: rqmean Table 12: 3 functionals applied to 64 LLDs (4 energy, 54 spectral and 6 voicing related) after applying moving average filter and first delta coefficient together, which brings 192 features. A.4 Details of the Low Level Descriptors (LLDs)

LLDs

Sum of auditory spectrum (loudness) Sum of rastastyle filtered auditory spectrum RMS energy

Zero-crossing rate

4 energy related LLDs Moving Functionals average applied to filter LLD/delta (sma) LLD and first delta LLD only coefficient (de) i) sma 35, 23, 3 ii) de i) sma ii) de

35, 23, 3

i) sma ii) de

35, 23, 3

i) sma ii) de

35, 23, 3

Total

Number of features

(i) 35+23+3 = 61 (ii) 35 (61+35=96) (i) 35+23+3 = 61 (ii) 35 (61+35=96) (i) 35+23+3 = 61 (ii) 35 (61+35=96) (i) 35+23+3 = 61 (ii) 35 (61+35=96) 384

Table 13: 4 energy related LLDs. 35, 23 and 3 Functionals refers to Table 8, 9 and 11.

LLDs

Rasta-style auditory spectrum: bands 1-26 (08 kHz) à 26 Spectral energy: 250650 Hz, 1k-4k Hz à 2 Spectral roll off point: 0.25, 0.5, 0.75, 0.90 à 4 Spectral flux, Entropy, Variance, Skewness, Kurtosis, Slope, Psychoacousti c-sharpness,

54 spectral LLDs Moving Functiona average ls applied filter to (sma) LLD/delta first LLD and delta LLD only coefficie nt (de) i) sma 35, 23, 3 ii) de

i) sma ii) de

35, 23, 3

i) sma ii) de

35, 23, 3

i) sma ii) de

35, 23, 3

Number of features

(i) 35x26+23x26+38 26 = 1586 (ii) 35x26 = 910 (1586+910=2496) (i) 35x2+23x2+3x2 = 122 (ii) 35x2 = 70 (122+70=192) (i) 35x4+23x4+3x4 = 244 (ii) 35x4 = 140 (244+140=384) (i) 35x8+23x8+3x8 = 488 (ii) 35x8 = 280 (488+280=768)

Harmonicity à 8 MFCC 1-14 à 14

i) sma ii) de

35, 23, 3

Total

(i) 35x14+23x14+3x 14 = 854 (ii) 35x14 = 490 (854+490=1344) 5184

Table 14: 54 spectrtal related LLDs. 35, 23 and 3 Functionals refers to Table 8, 9 and 11. The symbol à indicates represents.

LLDs

F0 by SHS + Viterbi smoothing Probability of voicing

6 voicing related LLDs Moving Functionals average applied to filter (sma) LLD/delta - first delta LLD and coefficient LLD only (de) i) sma 35, 23, 3 ii) de i) sma ii) de

35, 23, 3

Jitter-local

i) sma ii) de

35, 23, 3

Jitter-delta

i) sma ii) de

35, 23, 3

Shimmerlocal

i) sma ii) de

35, 23, 3

Logarithmic HNR

i) sma ii) de

35, 23, 3

Total

Number of features

(i) 33+23+3 59 (ii) 33 (59+33=92) i) 33+23+3 61 (ii) 33 (59+33=92) i) 33+23+3 61 (ii) 33 (59+33=92) i) 33+23+3 61 (ii) 33 (59+33=92) i) 33+23+3 61 (ii) 33 (59+33=92) i) 33+23+3 61 (ii) 33 (59+33=92) 564

=

=

=

=

=

=

Table 15: 6 voicing related LLDs. 35, 23 and 3 Functionals refers to Table 8, 9 and 11.

Absolute peak range

A.5 Details of the functionals: Name of the functionals (66 Functionals) Range Position of max Position of min Quartiles Quartiles Quartiles Inter-quartiles Inter-quartiles Inter-quartiles

1% percentile 99% percentile Percentile range Standard deviation Skew-ness Kurtosis Mean segment length Max segment length Min segment length Standard segment length Relative duration LLD is above 25 Relative duration LLD is below 25 Relative duration LLD is above 50 Relative duration LLD is below 50 Relative duration LLD is above 75 Relative duration LLD is below 75 Relative duration LLD is above 90 Relative duration LLD is below 90 Relative duration LLD is rising Relative duration LLD is falling Gain of linear prediction Linear prediction coefficients Linear prediction coefficients Linear prediction coefficients Linear prediction coefficients Linear prediction coefficients Relative duration left curvature Relative duration right curvature Percentage of non-zero frames Mean segment length Max segment length Min segment length Standard segment length Arithmetic mean Root quadratic mean Mean of peak distances Standard deviation of peak distances Absolute peak range Relative peak range Absolute peak mean

Short form range maxPos minPos quartile1 quartile2 quartile3 iqr1-2 iqr2-3 iqr1-3

Number of features 128 128 128 128 128 128 128 128 128

Relative peak mean Relative min range Mean of rising slope Standard deviation of rising slope Mean of falling slope Standard deviation of falling slope Linear regression Linear regression Linear regression quadratic Quadratic regression

percentile1.0 percentile99.0 pctlrange0-1 stddev skewness kurtosis meanSegLen maxSegLen minSegLen segLenStddev

128 128 128 128 128 128 116 116 116 116

upleveltime25

128

downleveltime25

128

upleveltime50

128

downleveltime50

128

upleveltime75

128

downleveltime75

128

upleveltime90

128

downleveltime90

128

risetime

128

falltime

128

lpgain lpc0 lpc1 lpc2 lpc3 lpc4

128 128 128 128 128 128

leftctime

12

rightctime

12

ff0nnz ff0meanSegLen ff0maxSegLen ff0minSegLen ff0segLenStddev amean rqmean meanPeakDist

1 1 1 1 1 64 128 64

peakDistStddev

64

peakRangeAbs peakRangeRel peakMeanAbs peakMeanMeanDis t peakMeanRel minRangeRel meanRisingSlope

64 64 64

stddevRisingSlope

64

meanFallingSlope

64

stddevFallingSlope

64

linregc1 linregc2 linregerrQ qregc1

64 64 64 64

64 64 64 64

Quadratic regression Quadratic regression Linear regression quadratic Centroid Flatness Position of arithmetic mean

qregc2 qregc3 qregerrQ centroid flatness posamean Total:

64 64 64 64 128 64 6125

Table 16: Number of features based on functionals count

Appendix B Corpus partition CL

Train

Dev

Test

Sum

O

97

70

80

247

NO

159

113

121

393

C

110

81

99

290

NC

146

102

102

350

E

121

92

107

320

NE

135

91

94

320

A

139

79

105

323

NA

117

104

96

317

N

140

88

90

318

NN

116

95

111

322

Sum

256

183

201

640

Table 17: The SPC corpus is partitioned into the train, development and test set for the experiment. CL – Class label, O-openness, C-conscientiousness, E-extraversion, A-agreeableness, N-neuroticism. N before class label means negative, e.g. NO – not openness. Corpus distribution CL Train Dev Test Train-dev O 0.38 0.38 0.40 0.38 NO 0.62 0.62 0.60 0.62 C 0.43 0.44 0.49 0.44 NC 0.57 0.56 0.51 0.56 E 0.47 0.50 0.53 0.49 NE 0.53 0.50 0.47 0.51 A 0.54 0.43 0.52 0.50 NA 0.46 0.57 0.48 0.50 N 0.55 0.48 0.45 0.52 NN 0.45 0.52 0.55 0.48 Table 18: Distribution of the corpus on the train, development, test and the train and development set combined based on the partition in Table 17. CL – Class label, O-openness, Cconscientiousness, E-extraversion, A-agreeableness, Nneuroticism.