Download PDF - SAGE Journals

4 downloads 0 Views 1MB Size Report
that can be used on small datasets, this research explores the use Conceptual Recurrence Analysis (CRA) to detect changes in conceptual structure in ...
Proceedings of the Human Factors and Ergonomics Society 2017 Annual Meeting

1213

Analyzing Semantic Structure: Testing Simple Metrics from Conceptual Recurrence Analysis using Language Models Michael T. Tolston1, Victor Finomore2, Gregory J. Funke2 Oak Ridge Institute for Science and Education, Wright-Patterson AFB, OH; 2Air Force Research Laboratory, Wright-Patterson AFB, OH

1

Not subject to U.S. copyright restrictions. DOI 10.1177/1541931213601785

Team communication analyses can provide insight into critical team processes. However, such methods often rely on time-consuming subjective rater evaluations, or on techniques that need extensive preparation and yield results that may be difficult to interpret. As part of an effort to identify reliable automated techniques that can be used on small datasets, this research explores the use Conceptual Recurrence Analysis (CRA) to detect changes in conceptual structure in simulated data. We discuss several metrics that quantify conceptual alignment and test the sensitivity of these metrics to changes in relational structure among groups of words generated from language models. We show that CRA summary statistics are sensitive to changes in relational structure among terms and other manipulations in ways consistent with expectations, and give insight into the changing structure of word distributions as constraints are relaxed. We conclude that CRA presents a sensitive and customizable framework for evaluating linguistic exchanges.

Conceptual Recurrence Analysis (CRA) is a technique that evaluates the semantic similarity of written or spoken communicative exchanges (i.e., utterances; Angus, Smith, & Wiles, 2012; Angus, Watson, Smith, Galois, & Wiles, 2012). CRA, conducted using the software package Discursis (Angus, Smith, & Wiles, 2012), has shown promise in qualitative (Angus, Smith, & Wiles, 2012; Angus, Watson, et al., 2012) and quantitative (Watson, Angus, Gore, & Farmer, 2015) analyses of relatively small data sets. However, despite the apparent utility of the method, as of yet, a detailed investigation of the differences in summary statistics that quantify global semantic coordination in the CRA framework is lacking. In an earlier study, Tolston and colleagues (2016) presented a set of global CRA metrics and demonstrated their sensitivity to biasing information on team communication patterns. Specifically, the authors found that when biasing information was presented to teams late in a distribution queue, the overall proportion of similar to non-similar utterances increased, though there was a trend in the opposite direction with respect to the degree of similarity between similar utterances. The authors speculated that receiving biasing information late in the queue may have partially reduced the constraints among the concepts spanning the conversations. An open question, then, is whether the proposed CRA metrics are sensitive to changes in semantic structure in the ways Tolston and colleagues (2016) suggested. The present study tested the sensitivity of a set of CRA metrics to known changes in semantic structure. Specifically, artificial statements were generated from a language model based on underlying conditional probabilities of word cooccurrences in a dataset (Jurafsky & Martin, 2008), with varying levels of constraints among the words, and the generated data were submitted to CRA. METHODS To generate artificial utterances, this study used a language model based on the word co-occurrences in a set of logic statements given to participants in the Experimental

Laboratory for Investigating Collaboration Informationsharing and Trust (ELICIT; Ruddy & Nissen, 2008). To simulate altered semantic constraints, the language-model probabilities were varied over a range of smoothing parameter values to yield increasingly uniform conditional distributions (i.e., as the uniformity increased, the words in the model contained less information about each other). We generated 10 sets of documents, each with 250 utterances, using values of the smoothing parameter ranging between 10-5 and 1, yielding a total of 470 documents. These documents were submitted to Discursis for semantic analysis under two parameter settings that limit the number of keywords used to assess similarity between utterances (i.e., the size of the semantic space): automatic identification of the upper limit of key words (AUTO), and a fixed upper limit of 100 key words (N100), with the AUTO setting resulting in uniformly smaller semantic spaces. The potential importance of the number of key concepts lies in the granularity of the constructed semantic space – adding more concepts typically results in an increase in specific terms (Smith & Humphreys, 2006), which may result in a finer-grained separation between otherwise similar utterances. The metrics tested capture various aspects of the density of similarity in conceptual recurrence plots generated by CRA, and include: Proportion Recurrence (PREC) – the ratios of similar utterances to utterances that are not similar; Overall Similarity (OS) – a measure of the average amount of similarity over the entire set of utterances, including zero cells; and Mean Similarity (MS) – a measure of how closely aligned in the semantic space, on average, similar utterances tend to be. We expected that increasingly relaxed constraints in the relationships between words would result in more generated utterances being at least somewhat related, meaning that PREC should increase monotonically up to some saturation value. Second, we expected mean similarity to decrease along with increasingly relaxed constraints, as reduced constraints among words should increase the probability of utterances being only tangentially related. Finally, as OS captures aspects of the data that both MS and PREC are sensitive to (namely, the density of similarity in the

Proceedings of the Human Factors and Ergonomics Society 2017 Annual Meeting

complete set of utterances), we expected OS to increase along with decreasing constraints, as previously unrelated words become connected, but at a lower rate than PREC. Further, we expected OS to decrease after reaching some saturation value, since the similarity of similar utterances should decrease with highly relaxed constraints. The generated documents were submitted to Discursis for semantic analysis. The results were used to calculate mean values and 95% confidence intervals. Further tests of statistical inference were not conducted. RESULTS Results from the analyses can be seen in Figure 1.

1214

parameter. Specifically, with moderate amounts of smoothing, both PREC and OS decreased, while MS increased. This difference was more pronounced in the smaller semantic spaces for PREC and MS. With higher levels of smoothing, PREC approached a saturation value, while both OS and MS decreased. With respect to the influence of the number of key concepts, overall, MS was most sensitive to changes in the number of basis concepts, being uniformly greater in the AUTO condition, while OS and PREC only showed differences between conditions in subsets of the smoothing parameter settings. Further, for the N100 condition, MS showed little variation in the intermediate values of the smoothing parameter. In summary, we found the proposed metrics of semantic similarity to be sensitive to changes in semantic structure in comparable, but complementary ways. Of the three metrics, MS and PREC are most sensitive to changes in size of the reconstructed semantic space. Further, all three show responses to changing semantic structure that follow expectations, for at least some values of the smoothing parameter. However, though all three can detect variations in semantic structure, MS appears to be less sensitive in larger than in smaller spaces, at least when the underlying corpus is small. In conclusion, we take this effort as a demonstration that CRA and the proposed metrics are sensitive to changes in semantic structure in predictable and meaningful ways. REFERENCES Angus, D., Smith, A. E. & Wiles, J. (2012a). Human communication as coupled time series: Quantifying multi-participant recurrence. IEEE Transactions on Audio, Speech, and Language Processing, 20(6), 17951807. Angus, D., Watson, B., Smith, A., Gallois, C. & Wiles, J. (2012b). Visualising conversation structure across time: Insights into effective doctor-patient consultations. PloS One, 7(6), e38014. Jurafsky, D., & Martin, J. (2008). Speech and Language Processing (2nd Edition). Upper Saddle River, NJ: Prentice Hall Publishing. Ruddy, M. & Nissen, M. (2008, June). New software platform capabilities and experimentation campaign for ELICIT. Paper presented at the 13th International Command and Control Research and Technology Symposium, Seattle, WA. Retrieved from http://www.dtic.mil/dtic/tr/fulltext/u2/a487140.pdf Smith, A. E. & Humphreys, M. S. (2006). Evaluation of unsupervised semantic mapping of natural language with Leximancer concept mapping. Behavior Research Methods, 38(2), 262-279. Tolston, M. T., Finomore, V., Funke, G. J., Mancuso, V., Brown, R., Menke, L., & Riley, M. A. (2017). Effects of Biasing Information on the Conceptual Structure of Team Communications. In Advances in Neuroergonomics and Cognitive Engineering (pp. 433-445). Springer International Publishing. Watson, B. M., Angus, D., Gore, L. & Farmer, J. (2015). Communication in open disclosure conversations about adverse events in hospitals. Language & Communication, 41, 57-70.

Figure 1. OS (a), MS (b) and PREC (c) of simulated data as a function of the smoothing parameter (k). Results are shown for both AUTO and N100 settings governing concept extraction in Discursis. The error bars represent 95% confidence intervals.

DISCUSSION AND CONCLUSIONS Obtained results were consistent with our expectations within the intermediate and high ranges of the smoothing