Deliverable nr D 5.2 - DSpace Open Universiteit

0 downloads 0 Views 4MB Size Report
Dec 15, 2009 - 2009), learners have to perform a lot of writing-based activities that are seldom assessed ..... “voice” of the question remains), greeting-greeting, etc.; ..... synthesis covers parts of the source text: “No sentence of your synthesis is .... feedback information inside the widget without any need of a page refresh.
Language Technologies for Lifelong Learning LTfLL -2008-212578

Project Deliverable Report

Deliverable nr D 5.2 Work Package

5

Task

Learning Support and Feedback

Date of delivery

Contractual: 01-11-2009

Code name

Actual: 15-12-2009

Version: 2.0

Type of deliverable

Report

Security

Public

Draft

Final

(distribution level) Contributors Authors (Partner)

Stefan Trausan-Matu, Philippe Dessus, Traian Rebedea, Sonia Mandin, Emmanuelle Villiot-Leclercq, Mihai Dascalu, Alexandru Gartner, Costin Chiru, Dan Banica, Dan Mihaila, Benoît Lemaire, Virginie Zampa, Eugène Graziani

Contact Person

Stefan Trausan-Matu

WP/Task responsible

Stefan Trausan-Matu

EC Project Officer

Ms. M. Csap

Abstract (for dissemination)

This report presents Version 1 of the support and feedback services (delivering recommendations based on interaction analysis and on students’ textual production) that can be integrated within an e-learning environment. Further steps toward the implementation of Version 2 of these services and their future integration with all the LTfLL services are also suggested.

Keywords List

Individual and Collaborative Knowledge Building Social Network Analysis, Feedback, Written Synthesis, Latent Semantic Analysis, Bakhtin

Acknowledgements

We wish to thank Dale Gerdemann, Bernhard Hoisl and Kiril Simov for valuable comments on an earlier version of this report.

LTfLL Project Coordination at: Open University of the Netherlands Valkenburgerweg 177, 6419 AT Heerlen, The Netherlands Tel: +31 45 5762624 – Fax: +31 45 5762800

LTfLL -2008-212578

D 5.2 Learning Support and Feedback

Table of Contents Executive Summary ........................................................................................................1 1. Introduction ............................................................................................................2 1.1. Previous Work on WP 5...................................................................................2 1.2. Educational Theory..........................................................................................2 1.3. Design/Scenarios .............................................................................................5 2. Implementation of Version 1 of the Services...........................................................6 2.1. Overall Presentation.........................................................................................6 2.2. Technical Description of T 5.1 Service.............................................................6 2.3. Technical Description of T 5.2 Service........................................................... 20 3. Integration and Validation of Services ..................................................................31 3.1 WP2 Integration............................................................................................. 31 3.2 WP3 threads...................................................................................................33 3.3 Collaboration with WP 4 and WP 6................................................................ 35 3.4 WP7 Validation Plans .................................................................................... 35 4. Conclusions: Tools and Resources for Second Cycle of LTfLL............................ 37 5. Appendices ...........................................................................................................39 Appendix 1 — Description of our Services as Fostering Self-regulated Learning.......39 Appendix 2 – The extended pattern language............................................................. 41 Appendix 3 – Identification of Lexical Chains ........................................................... 43 Appendix 4 – Details on the Evaluation of Interactions..............................................46 6. References ............................................................................................................55

LTfLL -2008-212578

D 5.2 Learning Support and Feedback

Executive Summary

According to the DoW, this report presents Version 1 of the support and feedback services (delivering recommendations based on interaction analysis and on students’ textual production) that can be integrated within an e-learning environment. Therefore, it contains details about the design and implementation of the services. For each of the two services of WP5, four issues were considered: Challenges, Methods, Results and Conclusion. Also details of how the services will be integrated with the other services of LTfLL, as well as a brief introduction to how the services will be validated are provided. This report also attempts to answer the following important questions: — How it is possible to implement the ideas of polyphony and dialogism introduced by Bakhtin (1981, 1984) and recognized as paradigm for Computer Supported Collaborative Learning (Koschmann, 1999)? — Is it possible to provide a tool that measures the contribution and degree of collaboration of each participant in a chat? — How it is possible to foster individual and collective knowledge building processes with computer-based artifacts? As van Aalst (2009, p. 262) puts it: “Knowledge construction involves a range of cognitive processes, including the use of explanation-seeking questions and problems, interpreting and evaluating new information, sharing, critiquing, and testing ideas at different levels […] and efforts to rise above current levels of explanation, including summarization, synthesis, and the creation of new concepts”. — How self-regulated learning processes in Personalized Learning Environments can be promoted? — What kind of Instructional Design prescriptions can be used to predict and help the whole students’ learning workflow using the LTfLL services? The remainder of this report is as follows. The first part is introductive and sets up the theoretical background of our research. The second part describes Version 1 of WP 5 services in a fourfold argumentation (challenges, methods, results and conclusions). The third and last part sheds some light on the integration and validation of WP 5 services.

LTfLL -2008-212578

1

D 5.2 Learning Support and Feedback

1. Introduction In a lifelong learning context (van Merriënboer, Kirschner, Paas, Sloep, & Caniëls, 2009), learners have to perform a lot of writing-based activities that are seldom assessed by teachers or tutors because this assessment is time-consuming. One of the goals of the LTfLL (Language Technologies for Lifelong Learning) project is to develop a set of services that help learners manage communication through texts (that they read or write either as essays, posts in forums or utterances in chats) in order to learn, either in a formal or informal way. This management is mainly based on semantics and pragmatics levels and includes text retrieval, learner positioning, essay or chat assessment, etc. The outline of this Deliverable is as follows. First, we describe the theoretical background upon which our work is based. Second, we describe from a technical perspective the services that we have devised. Third, we elaborate some paths toward integration and we briefly discuss validation. 1.1. Previous Work on WP 5 Since writing is one of the most important ways to get information and to communicate to each other, at first, we have investigated the relations between the two forms of written communication: writing and chatting and their possible effects on learning. As unifying theories we referred to Stahl’s (2006) knowledge building cycles, as well as to Bakthin’s dialogism. These theories enable us to show that in both cases the learner is engaged in a two-cycle process which individually and collectively generates knowledge from beliefs and utterances, and leads to a more elaborated discourse that is in turn re-injected in the process as cultural and cognitive artifacts. Since lifelong learners have limited access to teachers and tutors, the services we are designing can help them get feedback from their written productions. In turn, this feedback—together with that of peers—can be compared to the learners’ own self-assessment, and this comparison is at the core of the learning process (Ross, 2006). To sum up, our goal is to provide cognitive-based feedback to learners on their written productions (either individual or collective) for them to build knowledge from texts they have read and from discussions with peers. 1.2. Educational Theory This deliverable refines some points that aim at integrating the two main tasks of WP5 in the future, as well as some of the other WPs. One of our main claims, which aims also at unifying our research efforts, is that dialogical meaning building should drive the student experience in a distance learning environment. As Wegerif (2006) mentions, such an environment is a dialogical space in which all the stakeholders’ activities are located. This space is defined by spatial, temporal and social characteristics in which learning and teaching take place (Code & Zaparyniuk, 2009). This idea is supported by a variety of evidence showing that argumentation (both as input, read texts, and as output, written texts) leads to a more profound understanding than monological or even more narrative ways of expression. The learners immersed in a learning space can direct their attention

LTfLL -2008-212578

2

D 5.2 Learning Support and Feedback

in two ways: a cognitive attention, directed toward learning material (e.g., textbooks), and a social attention, displayed to others and interpreted from others (Jones, 2005). Our research purpose is to devise tools that can help these two forms of attention. The services developed in WP5 play the role of mediators between the two ways of understanding: individual and collective (Stahl, 2006, p. 210 et sq.). The activities of teaching and learning are seen as a joint activity (Lorino, 2009) in which each author (producer) has multiple co-authors. When students have to write out an essay, they necessarily take into account the view of their teacher, who becomes a co-author implicitly. Conversely, every teacher produces texts (e.g., task sheets) that are tightly related to the students: in writing the text the teacher foresees the kind of problems, the reactions or questions that the students may have, so the document is “co-authored” too. This joint activity is not circumscribed to formal instructional tasks, but also relates to the modes of coordination, the roles the stakeholders have and manage within the educational situation. For this joint activity to work and be successful, a step further, a semiotic mediation by signs and tools is necessary. Signs or words are not the mere representation of something, but a mediation between people (Lorino, 2009). As Wegerif (2006, p. 144) put it: “Any sign taken to be a mediation between self and other, a word or a facial expression, must pre-suppose the prior opening of a space of dialogue (an opening of a difference between voices) within which such a sign can be taken to mean something.” Keeping all the previous points in mind, we can now claim that writing (essays, notes, utterances) in a distance learning platform implies participation in curricular conversations (Newell, 2006; Sfard, 1998), and in a community of practices in a curricular domain (Wenger, 1998). All these different reasons let us envisage the possibility of devising computer-based environments that foster student’s knowledge building, either individually or collaboratively. There are very few computer-based environments that aim at helping students working from these two viewpoints in the same environment (Moreno, 2009). However, we are aware that other forms of writing (e.g., answers to more formal questions for learning, see Deliverable 4.2; search queries, see Deliverable 6.2) are also to be taken into account in our LTfLL project. As already expressed in the DoW (section B.1.1) and in the previous Deliverable (see Deliverable 5.1, section 1.1 p. 5), the goal of our consortium is to design and implement services in Personal Learning Environments (PLEs) that allow lifelong learning. PLEs are mostly designed and implemented to support two lines of core activities for learning: – self-regulated learning, or SRL (Puustinen & Pulkkinen, 2001); – summary writing, and, by extension, multiple-source writing (Thiede & Anderson, 2003). We have argued that Stahl’s (2006) model is of great interest for integrating both individual and collective knowledge building in the same framework. Nonetheless, this model does not account well for the individual learner activity, mainly through writing; and does not mention tasks and the way to process them on the individual side. Moreover,

LTfLL -2008-212578

3

D 5.2 Learning Support and Feedback

the exact role artifact play in the model is rather vague. We thus additionally have to choose an SRL model compatible with Stahl’s, which is an activity not often supported in current PLEs (Vovides, Sanchez-Alonso, Mitropoulou, & Nickmans, 2007). These authors proposed an SRL model especially dedicated to PLEs, which allows students to perform the following activity: “The key to the success in their design was to have students experience these strategies in their own learning, explicitly compare their own performance with that of the model, and take action to revise ineffective learning approaches.” (Lin, 2001, p. 26)

Figure 1 — Metacognitive approach to design e-learning (Vovides et al., 2007, p. 68).

Let us provide a short description of this model. First, students work on the object level, in preparing their activity according to the ongoing task. They can also apply for some cognitive strategies for these activities to be performed. Then, students perform a first rough assessment of their production (its adequacy, its relation to their knowledge, etc.). Third, a reflection on a meta level allows them to perform a comparison between the latter comparison and the object level, often offered by artifacts (computer-based services, prompts, etc.), so to compare their perceived level of learning with that proposed by the artifacts. Eventually, the student can perform some adaptations to their work, which in turn fuels the possible update of the object level and can be re-acted in a further loop. The previous Deliverable (D 5.1) was mainly focused on chat and summary writing, i.e., the shortening of isolated pieces of texts (either utterances or even summaries), related to a course domain. Although reading and writing can be examined as separate processes, many academic tasks can be considered hybrid tasks, as students are asked to demonstrate their understanding of source texts they read by composing a new abridged text (Spivey, 1997). Such hybrid tasks can range from summary writing of single texts to discourse synthesis from multiple source texts. Discourse synthesis is a cognitively demanding activity requiring students to transform knowledge (Bereiter & Scardamalia, 1987) rather than simply reproduce information from a single source. It entails writing a new text and constructing meaning by using three key operations: selecting information according to the writer’s goals, connecting this information to achieve a cohesive text,

LTfLL -2008-212578

4

D 5.2 Learning Support and Feedback

and organizing it according to the intended goals (Segev-Miller, 2004; Spivey, 1997). While it is possible to keep the same textual pattern when summarising a single text, synthesizing from multiple texts requires students to decide on a new text structure (macrostructure) to use in their written text in order to integrate their own ideas about the text contents (Brassart, 1993; Flower et al., 1990; Segev-Miller, 2004). 1.3. Design/Scenarios The functionality of the service developed by T 5.1 may be understood from the following scenario, which describes the experience of a hypothetical learner, Ulysses. In the Natural Language Processing (NLP) course, a forum and a chat system are used to collaborate with classmates. Moreover, the evaluation of these activities constitutes an important part of the final grade for each student. The teacher, Dr. Smith starts discussion topics on the forum after each course session. The tutors have to moderate and solve possible conflicts by offering explanations. In addition, the teacher gives topics to be discussed in small groups using the chat system. As preparation for a chat, the tutors group students in small teams of 4-7 participants, each being assigned a topic to study and then to support it in debates. Ulysses reads the most interesting materials about that topic in order to understand the subject in detail. During the discussions, his peers present other points of view, debate and inter-animate, all of these improving his own and the others’ understanding of the domain. After concluding a chat session, Ulysses can launch several web widgets from the Chat & Forum Analysis and Feedback System (C&F-AFS) that provides graphical and textual feedback and preliminary scores both for him and for the group as a whole. As he knows, also the tutors use C&F-AFS for providing them a better insight for writing a detailed feedback and giving grades. When Ulysses is using C&F-AFS for a forum, it provides him threads and/or posts that are related to a concept, it recommends peer-learners that have a good coverage of particular topics and it offers preliminary feedback about selfreflection activities. In turn, Ulysses can use Pensum, the T 5.2 service. He launches it as a Web service. He selects the NLP course domain and starts to express the main questions, problems and notions he has/wants to tackle in this course in a dedicated notepad. Then he begins to write a synthesis about the most important ideas of the understood course. Whenever Ulysses is uncertain about whether he grasps the most important notions of a text, he asks support from Pensum. The system gives Ulysses a feedback on his written synthesis, e.g., relevance of the sentences or inter-sentence coherence of the synthesis. Ulysses is in control of his own learning process, he requests whenever he wants feedback and can update his notepad according to the main points he understood and go further in the writing of the same synthesis or one related to another topic.

LTfLL -2008-212578

5

D 5.2 Learning Support and Feedback

2. Implementation of Version 1 of the Services 2.1. Overall Presentation Every lifelong learner performs a wide range of learning activities based on the use of language (retrieving pieces of text, reading, taking notes, discussing with peers or other stakeholders). In order to study and support these activities, different research domains are currently investigated (psychology of writing, distance learning, instructional psychology, natural language processing). In the preceding D 5.1, a state of the art presented these activities from two viewpoints: writing (mostly summaries) and chatting together with their relations with learning. In this report we want to go further in some new domains. This deliverable is the occasion to pinpoint the relations between individual and collaborative knowledge building, and the following points: – learners are more able to build knowledge from argumentative writing task than from other kinds of discourse (e.g., narrative) (Wiley & Voss, 1999), especially when that learner’s opinion is asked; – learners are able to manage their self-understanding of the course; – learners are able to write out syntheses of the course that capture their ongoing understanding of the course. The remainder of this section is devoted to document the implementation of each of the WP 5 services in a fourfold argumentation: what are the challenges at stake, the methods carried out in the implementation of the services, the main results we obtained and the conclusions we drawn. 2.2. Technical Description of T 5.1 Service Challenges Educational institutions have largely embraced the use of Internet, web technologies and its collaborative environments to supplement standard learning practices. Learners’ interactions show their (individual and group) knowledge regarding the course materials as well as their capacity to apply this knowledge when solving (practical) problems. However, what happens in these interactions is now generally beyond the control of the teachers, who only focus on the results of the collaboration processes. More involvement to assess individual contributions, to moderate or to provide relevant feedback concerning the quality of the web interactions regarding both content and collaboration appears to be time-consuming and demands a high cognitive load. The development of the service for T 5.1 is challenging because it is based on a new vision of supporting learning: using Natural Language Processing (NLP) tools for analyzing dialogic knowledge creation in chat conversations and forum discussions. It is theoretically challenging from several points of view. As mentioned in D 5.1, the service is based on Bakhtin’s ideas that dialog is in any text, that the echoes of many voices exist

LTfLL -2008-212578

6

D 5.2 Learning Support and Feedback

in any word and that they weave together in polyphony and inter-animation (Bakhtin 1981, 1984). However, even if many (e.g. Koschmann, 1999 or Stahl, 2006) consider Bakhtin’s theories as a paradigm for Computer-Supported Collaborative Learning (CSCL), there are some important details remaining to be elaborated (e.g., what could be considered as a voice and how to develop a computational system from these theories). On another perspective NLP is known to be hard and is often considered unreliable in real settings. Furthermore, interaction analysis and pragmatics are among the most difficult issues for NLP. Moreover, it is not clear if complex metrics of collaboration can be computed. Another challenging theoretical point is the degree to which using CSCL may enhance knowledge building in lifelong learning. The validation activities for T 5.1 were carefully designed to analyze this challenge. In addition, there are technological challenges because NLP has hardly been used until now for analyzing multi-user chat conversations. Moreover, discourse analysis in chats and in any text, in general, was not based on a multithreaded, inter-animation perspective, as is the case for us. Methods One main idea behind our approach is to encourage in lifelong learning the usage of conversations, dialogue, debates, and inter-animation as premises for understanding, studying and creative thinking. The achievement of these goals may be supported by tools that analyze chats and forums and which provide insight and feedback. The implemented analysis method integrates results from NLP (content and discourse analysis), Social Networks Analysis (SNA) and, a novel idea (Trausan-Matu & Rebedea, 2009), the identification of polyphonic threading in chats. The results are textual and graphical feedback and evaluations of the contributions of the participants. Architecture of T 5.1 Service The goals of T 5.1 service are to provide feedback, recommendations and to propose grading for learners that participate for an assignment in a chat conversation or a discussion forum. Although the parameters taken into consideration are slightly different for the chat and forum cases, the main steps are identical, as described below. T 5.1 service is implemented by the Chat & Forum Analysis and Feedback System (C&FAFS). First, the data is processed by a NLP pipe (spelling correction, stemmer, tokenizer, POS tagger and parser). In the semantic sub-layer, concepts are searched in a collection of key concepts and their inter-relations for the subject, provided by the teacher. These concepts may be obtained also from ontologies, either provided by experts or automatically extracted from various sources (e.g. using Wikipedia and Wiktionary, alternative which will be investigated for Version 2). Synonyms are obtained from the lexical database WordNet (http://wordnet.princeton.edu) for English; for other languages, e.g. Romanian, in Version 2 of the T5.1 service other lexical resources will be used, like dexonline.ro or particular wordnets e.g. Balkanet or EuroWordNet, if available.

LTfLL -2008-212578

7

D 5.2 Learning Support and Feedback

Advanced NLP and discourse analysis techniques are used for the identification of speech acts, rhetorical schemas, lexical chains, co-references, in order to find interactions between the participants. Discourse analysis techniques are further used for identifying adjacency pairs, other implicit links, discussion threads, argumentation and transactivity. The polyphony sub-layer uses the interactions and advanced discourse structures to look for inter-animation, convergence and divergence. In addition, for the computation of several metrics regarding the participation in the community of learners, Social Network Analysis is used, which takes into account the social graph induced by the participants and interactions that have been discovered. The results of the previous sub-layers are combined to offer textual and graphical feedback and grade suggestions for each participant to an assignment. The above steps are performed in the four successive layers of the architecture of the system depicted in Figure 2 (to be noted that the upper layers use the information computed by the lower layers as it is presented in the next paragraph).

Figure 2 — Layers of T 5.1 service

Modules in T 5.1 Service The modules of T 5.1 service may be grouped into several functional components around its main goal, the contribution analyzer, which is a module that provides textual feedback, visualization of the interaction and proposes grades for the

LTfLL -2008-212578

8

D 5.2 Learning Support and Feedback

participants. The input of the service is the interaction (chat or forum) log. For the content analyzer the NLP pipe is needed for pre-processing the log text. The interanimation analyzer is processing the threads in the chat, which are built upon the explicit and implicit links in the conversation – the latter ones are provided by a specialized module that uses the results of the NLP pipe plus discourse analysis (see Figure 3). In Figure 5, the main blocks from Figure 3 are broken down into specific functional modules that are designed to be included in each component from Figure 3 and the most important interactions between them are present in the figure for not complicating it furthermore. The majority of the modules are already implemented in the first version and the few remaining (collocation determination, rhetorical schemas identification) will be included in the second version of the service. In the next sections more details about these components will be provided.

Figure 3 — Main blocks of C&F-AFS

The Format of the Input Data The chat environment used in the experimentation is the one used in the NSF Virtual Math Teams (VMT) project (Stahl, 2009), which offers a whiteboard and referencing facilities. This environment is available also as the open source system ConcertChat (Holmer, Kienle, & Wessner, 2006; http://sourceforge.net/projects/concertchat/). This environment allows the user to explicitly reference a previous utterance or an object on the whiteboard. This facility is extremely important in chat conversations with more than two participants because it allows the existence of several discussion threads in parallel, a feature that cannot be achieved in face-to-face, usual chats or phone conversations. An XML schema was designed for encoding chat conversations and discussion forums. In Figure 4, an example fragment of such a chat is presented. Each utterance has a unique identifier (‘genid’) and the existing explicit references (‘ref’) to previous utterances. In

LTfLL -2008-212578

9

D 5.2 Learning Support and Feedback

addition to annotating the elements of a chat, the schema also includes data generated by the system, as will be presented later. In forums, an additional ‘thread’ XML element is added. The input data may be in different formats besides the above XML schema. A preprocessing module transforms these formats into an XML document that respects this schema. The supported formats are saved chats from Yahoo Messenger in text format, VMT (ConcertChat) html format and VMT older formats. speech act implicature ………………. hello all hi Hello Alex ……………………………………..

Figure 4 — XML encoding of chats

The NLP Pipe The NLP pipe has as input the chat log or the text of the discussion forum, in the above XML format and has the following component modules: – Spelling correction, which tries to correct the spelling errors from the text; – Tokenizer, which splits the text into textual units (these are not always simple words); – Named Entity Recognizer, which identifies and classifies names of persons, places, brands, companies, etc. It uses a gazetteer which has to be loaded with specific names for the considered teaching domain. – Stemmer (Lemmatizer), which stems each word to identify words from the same word family; – POS tagger and parser, which tags each word with its POS and constructs dependencies between the words; – NP-chunker, which is used to structure the noun phrases in the text.

LTfLL -2008-212578

10

D 5.2 Learning Support and Feedback

The modules of the NLP pipe are those provided by the Stanford NLP software (http://nlp.stanford.edu/software), with the exception of the spellchecker (implemented with Jazzy, see http://jazzy.sourceforge.net/ and http://www.ibm.com/developerworks/java/library/j-jazzy/). Two alternative NLP pipes are under experimentation, integrating modules from GATE (http:// gate.ac.uk) and LingPipe (http://alias-i.com/lingpipe/). Version 1 of T 5.1 service has included only modules for the English language. However, the design and implementation were performed with the facility of considering other languages as well. For these new languages, of course, the modules should be replaced with the ones for the other languages.

Figure. 5 — The main components of the system. Components that are part of the same module have the same background colors (for example, the NLP pipe is colored in light blue).

Cue-Phrase Identification Because important parts of the processing in C&F-AFS are based on patterns identified by cue phrases, a module called “PatternSearch” was implemented for searching occurrences that match expressions specified by the user, in a log of a chat or a forum. In

LTfLL -2008-212578

11

D 5.2 Learning Support and Feedback

addition to a simple regular expression search, the module allows considering not only words, but also synonyms, words’ stems and their part of speech (POS). Another novel facility is the consideration of utterances as a search unit, for example, specifying that a word should be searched in the previous n utterances and that two expressions should be in two distinct utterances. For example, the expression #[*] cube searches pairs of utterances that have a synonym of “convergence” in the first utterance and “cube” in the second. One result from a particular chat is the pair of utterances 1103 and 1107: 1103 # 1107. overlap # cube [that would stil have to acount for the overlap that way] # [an idea: Each cube is assigned to 3 edges. Then add the edges on the diagonalish face.]

The search is made at utterance level. The program checks the utterances one by one and if there is a match between a part of the utterance and the searched expression, both the utterance and the specific text that matched are indicated. PatternSearch is used in several other modules: cue-phrases identification, implicit links identification and adjacency pairs identification. A complete description of the module is presented in Appendix 2. Content Analysis The content analysis identifies the main concepts of the chat or forum using the NLP Pipe, cue-phrases and graph algorithms. It also identifies speech acts and argumentation types of utterances (as in Toulmin’s theory: Warrant, Concession. Rebuttal and Qualifiers (Toulmin, 1958)). The first step in finding the chat subjects is to strip the text of irrelevant words (stopwords), text emoticons (like “:)” or “:P”) special abbreviations used while chatting (e.g., “brb,” “np” and “thx”) and other words considered irrelevant at this stage. The next step is the tokenization of the chat text. Recurrent tokens and their synonyms are considered as candidate concepts in the analysis. Synonyms are retrieved from the WordNet lexical ontology (http://wordnet.princeton.edu). If a concept is not found on WordNet, mistypes are searched. If successful, the synonyms of the suggested word will be retrieved. The last stage for identifying the chat topics consists of an unification of the candidate concepts discovered in the chat. This is done by using the synonym list for every concept: if a concept in the chat appears in the list of synonyms of another concept, then the two concepts’ synonym lists are joined (in the version 2 of the service, some of the processing done in WP4 will be also considered for concepts identification). At this point, the frequency of the resulting concept is the added frequencies of the two unified concepts. This process continues until there are no more concepts to be unified. At this point, the list of resulting concepts is taken as the list of topics for the chat conversation, ordered by their frequency.

LTfLL -2008-212578

12

D 5.2 Learning Support and Feedback

In addition to the above method, used for determining the chat topics, there is an alternative technique we used to infer them by using a surface analysis technique of the conversation. Observing that new topics are generally introduced into a conversation using some standard expressions such as “let’s talk about email” or “what about wikis,” we have constructed a simple and efficient method for deducing the topics in a conversation by searching patterns containing specific cue phrases. The topics of the chat may also be detected starting from the connected components in the interaction graph constructed from the explicit and implicit links described in the next section. Speech acts were introduced by Austin and then elaborated by Searle and others (Jurafsky & Martin, 2009). They are classifications of utterances according to the action they fulfill. The list of speech acts considered in T 5.1 system is derived from DAMSL (http://www.cs.rochester.edu/research/cisd/resources/damsl/RevisedManual/). Statement Info Request Declarative Question Wh-Question Action Directive

Conventional Agreement Accept Reject Partial Accept Partial Reject

Maybe Understanding Answer Thanks Sorry Opinion

Greeting Noise Continuation Exclamation

Implicit Links Identification In addition to explicit links, stated in chats by the referencing facility of the VMT environment and in forums by the reply link, implicit links are also identified. The advanced NLP and basic discourse analysis sub-layer uses the results of the previous two sub-layers to identify various types of implicit links: -

-

-

Repetitions (of ordinary words or Named Entities) Lexical chains, which identify relations among the words in the same post or utterance or in different ones, by using semantic similarities (the semantic sublayer); Adjacency pairs (Jurafsky & Martin, 2009) – pairs of specific speech acts – answers to a single question in a limited window of time (in which the echo of the “voice” of the question remains), greeting-greeting, etc.; Co-references.

Implicit links, with the exception of lexical chains (Appendix 3 contains more details about the detection of lexical chains) and co-references (detected with the BART system is used, see http://bart-coref.org/) are detected using the cue phrases identification system

LTfLL -2008-212578

13

D 5.2 Learning Support and Feedback

(PatternSearch) and LSA. For version 1 of the services, LSA has been considered as an alternative for computing semantic similarities using a domain ontology. Words, Key Concepts, Voices, Threads In addition to existing approaches in analyzing chats, which are mainly based on analyzing pairs of utterances (Dong, 2006; Joshi & Rosé, 2007), we use a thread-based analysis, starting from Mikhail Bakhtin’s multivocality (heteroglossia), polyphony and inter-animation ideas (Bakhtin 1981, 1984). The polyphony-based theoretical framework founded on Bakhtin’s theories (Bakhtin 1981, 1984) is centered around the idea of a co-presence of multiple voices which may be considered as particular positions that may be taken by one or more persons when they emit an utterance, which has explicit and implicit links or influences on the other voices. In the implementation of our analysis tool, we start from the key concepts and associated features1 that have to be discussed. Each participant is assigned to support a position which corresponds to a key concept. Implicitly, that corresponds to a voice emitting that concept and the associated features. We identify other, additional voices in the conversation by detecting recurrent themes, new concepts. Therefore, a first, simple perspective is to have a word-based approach on voices: we consider that a repeated (non-stop) word becomes a voice. The number of repetitions and some additional factors (e.g. presence in some specific patterns) are used to compute the strength of that voice (word). This perspective is also consonant with Vygotsky’s (1978) ideas that words are artifacts socially constructed, or tools for group knowledge construction. An example of an artifact in a CSCL chat for solving a geometry problem is the phrase “60/90/60”, the degrees of the angles of a triangle, which is used many times by the participants of a VMT cha. We use voices to keep track of the position that each participant has to support, in order to identify divergences and conjunctions. This position is, as mentioned above, an implicit voice. For a given small period of time, the last utterances are echo-like voices. For example, answers may be associated to questions in a given time window. Voices influence each other through explicit or implicit links. In this perspective, voices correspond to threads. A thread may be a reasoning or argumentation chain (Toulmin, 1958), a chain of rhetorical schemas, chains of co-references, lexical chains and even only chains of repeated words, in the idea of Tannen (1989). The identification of argumentation chains, rhetorical schemas or co-references in texts and conversations are very difficult tasks for Natural Language Processing. Chains of repeated words, however, are very easy to detect, the sole problem being the elimination of irrelevant repeated words like stop-words. Lexical chains can also be detected very easy, but their 1

In the second version a domain ontology will also be used.

LTfLL -2008-212578

14

D 5.2 Learning Support and Feedback

construction is more difficult and the results are greatly influenced by the choice of semantic similarity measures. Polyphony, Inter-animation and Collaboration In polyphony, the most advanced kind of music compositions, a number of melodic lines (or “voices,” in an extended, non-acoustical perspective) jointly construct a harmonious musical piece, generating variations on one or several themes. Dissonances should be resolved, even if several themes (melodies) or theme variations are played simultaneously, and even if sometimes the voices situate themselves in opposing positions. Voices in polyphonic music have two dimensions, the sequential threading of utterances or words and the transversal one implicitly generated by the coincidence of multiple voices (Trausan-Matu and Rebedea, 2009). In addition, another dichotomy, the unitydifference (or centrifugal-centripetal, (Bakhtin, 1981) opposition may also be observed. Bakhtin (1981) considers that multiple voices are present also in texts, and sometimes they inter-animate, constituting a polyphonic framework (Bakhtin, 1984). Extrapolating this idea, we observe that inter-animation of voices following polyphonic patterns can be identified in dialogs in general and in chats in particular. A polyphonic collaboration involves several participants who play several themes and their variations in a game of sequential succession and differing positions. The existence of different voices introduces “dissonances,” unsound, rickety stories or solutions. Wegerif advocates the use of a dialogic framework for teaching thinking skills by stressing inter-animation: “meaning-making requires the inter-animation of more than one perspective” (R. Wegerif, 2006) He proposes that “questions like ‘What do you think?’ and ‘Why do you think that?’ in the right place can have a profound effect on learning” (Rupert Wegerif, 2007). However, he does not develop the polyphonic feature of inter-animation. Multivocality means that there are permanently several voices entering in competition. Each utterance is filled with “overtones” of other utterances. A first problem is to detect these overtones. In this aim we can start from implicit or explicit links. From these links a graph is constructed connecting utterances and, in some cases, words. In this graph, threads may be identified as sequences of implicit or explicit links that constitute voices. Simple examples of threads are repetitions of words or lexical chains. The same utterance may, of course, be included in several threads. Always, there are several voices that interact, for example, the writer, the potential reader, the echoes of the voices present in each word. Moreover, from this multivocality perspective, texts become meaning generation mechanisms, facilitating understanding and creative thought, as Lotman stated (Wertsch, 1991; Dysthe, 1996). A consequence is that in education, "the interaction of oral and written discourse increased dialogicality and multivoicesness and therefore provided more chances for students to learn than did

LTfLL -2008-212578

15

D 5.2 Learning Support and Feedback

talking or writing alone" (Dysthe, 1996). The dialogic and multivoicesness features of any utterance, even written, may be unifying factors for the integration of the modules in the language-centered LTfLL project. Therefore, starting from the ideas of T5.1 an integrated framework may be provided for analysing all the textual learning activities such as searching documents, reading, writing summaries or forum posts and chatting. Contribution Analyser The evaluation of the contributions of each learner considers several features like the coverage of the expected concepts, readability measures (see appendix 4) , the degree to which they have influenced the conversation or contributed to the inter-animation. In terms of our polyphonic model, we evaluate to what degree they have emitted sound and strong utterances that influenced the following discussion, or, in other words, to what degree their utterances became strong voices, by generating new and long threads. The automatic analysis considers also the inter-animation patterns among threads in the chat. It uses several criteria such as the presence in the chat of questions, agreement, disagreement or explicit and implicit referencing. In addition, the strength of a voice (of an utterance) depends on the strength of the utterances that refer to it. If an utterance is referenced by other utterances that are considered important, obviously that utterance also becomes important. By using this method of computing their importance, the utterances that have started an important conversation within the chat, as well as those that began new topics or marked the passage between topics, are more easily emphasized. If the explicit relationships were always used and the implicit ones could be correctly determined in as high a number as possible, then this method of calculating the contribution of a participant would be considered successful (Trausan-Matu, Rebedea, Dragan, & Alexandru, 2007). The above ideas of polyphonic assessment are combined with the results of two additional sets of evaluations (more details are provided in Appendix 4): 1. From the utterance assessment perspective, the following 3 steps are performed after the NLP Pipe: 1.1. Evaluate each utterance individually taking into consideration the following features: - effective length of initial utterance; - the words that remain after eliminating stop words, spell-checking and stemming and their number of occurrences; - the level at which the current utterance is situated in the overall thread - the correlation/similarity with the overall chat; - the correlation/similarity with a set of predefined set of topics of discussion. 1.2. Augment the importance of Utterance Marks in the middle of threads using a Gaussian distribution; 1.3. Determine the final grade for each utterance in the current thread using only explicit links, thus obtaining a relative mark of each utterance in the corresponding thread.

LTfLL -2008-212578

16

D 5.2 Learning Support and Feedback

2. From the participants’ assessment perspective, evaluation is made at 2 different levels: 2.1. At a surface level, consisting of the following: - Readability of all utterances regarded as a document; - Proxes derived from Page’s Essay Grading techniques (see Appendix 4), consisting of the following aspects: fluency, spelling, diction and utterance structure; - Social Network specific metrics computed on the matrix of interchanged utterances, based entirely on explicit links: Degree (In-degree, Out-degree), Centrality (eigen-centrality, closeness, graph centrality) and User Ranking based on the well-known Google Page Rank Algorithm. 2.2. At a semantic level, using semantic similarity based on LSA and social network analysis on a matrix obtained by using previously assessed utterances and their corresponding marks. For instance, now the edge in the network is not represented by the number of utterances exchanged (as considered in the analysis at the surface level), but it consists of the sum of all the marks determined for each utterance. The main difference between the two is that at the first level the actual implication of each participant is assessed (gregariousness, openness to the community), but at the second level actual knowledge interpretation is made; therefore participant competency in the specified domain is being evaluated. Results C&F-AFS supports the analysis of collaboration among learners: It produces different kinds of information about discussions in chat and forum discussions, both quantitative and qualitative, such as various metrics or statistics, and content analysis data e..g. the coverage of the key concepts related to executing a task and the understanding of the course topics or the inter-threaded structure of the discussion. In addition, C&F-AFS provides feedback, both directive and facilitative (see Deliverable 5.1), telling the learners what was good or wrong in their interactions and facilitating their understanding. It provides data about the involvement of each learner, generates a preliminary assessment and visualizes the interactions and the social participation. Finally, the system identifies and visually highlights the most important chat utterances or forum posts (that express different opinions, missing topics/concepts, misleading posts, misconceptions or wrong relations between concepts). The results of the contribution analyzer are annotated in the XML file of the chat or forum. The annotations are about utterances: 8.15 Continuation Info Request Statement

LTfLL -2008-212578

17

D 5.2 Learning Support and Feedback Claim

and about participants: 20.07 13.16 25.63 20.44 22.89 27.37 24.98 25.06 23.19 13.21 13.16 55.18 8.23 10.35 25.51 23.19 22.09 100.01 16.79 17.46

These values are used for generating textual feedback, which includes, besides the above numerical values: - the list of most important (used, discussed) concepts in a chat/forum; - coverage of the important concepts specified by the tutor; - the most important utterances of each participant (the ones with the largest scores – the score for an utterance uses a complex formula that takes into account the concepts used, dialog acts, the links between utterances and SNA factors); - a score for each participant in the conversation; - areas of the conversations with important collaboration (inter-animation, argumentation, convergence and divergence); - a grade for the collaboration in the whole conversation; - other indicators and statistics that are going to be added with the development of the service / system. As graphical feedback the service provides an interactive visualization and analysis of the conversations graph with filtering enabled. The graphical representation of chats was designed to facilitate an analysis based on the polyphony theory of Bakhtin and to permit the best visualization of the conversation. For each participant in the chat, there is a separate horizontal line in the representation and each utterance is placed in the line corresponding to the issuer of that utterance, taking into account its positioning in the original chat file—using the timeline as an horizontal axis (see Figure 6). Each utterance is represented as a rectangular node having a horizontal length proportional with the textual length of the utterance. The distance between two different utterances is

LTfLL -2008-212578

18

D 5.2 Learning Support and Feedback

proportional to the time between the utterances (Trausan-Matu et al., 2007). The interface is implemented via widgets (see section 3.1).

Figure 6 — Chat visualization widget

The explicit and implicit references between utterances are depicted using connecting lines with different colors. The user may explore the visualization in several ways. First of all, he has several options for changing the colors of the threads or other features of the diagram like zooming or scaling. He may click on the rectangle of an utterance in order to obtain the threads in which it is included (Figure 7).

Figure 7 — Visualization of a conversation thread

LTfLL -2008-212578

19

D 5.2 Learning Support and Feedback

Threads of repeating words and patterns may be visualized by the option “special threads”, each thread being represented with a different color (Figure 8). This visualization allows seeing the inter-animation among the topics supposed to be discussed (the “voices” and their polyphonic weaving). For example, in a chat with a good collaboration the threads are often passing from one participant to another.

Figure 8 — Visualization of specific threads, indicated by the user

Conclusion for T5.1 The T5.1 service is aimed at supporting learners in chats and discussion forums. It provides textual and graphical feedback, both directive and facilitative, at several levels: utterance (post), participant and conversation. The service uses Natural Language Processing, Social Network Analysis and polyphonic-based analysis for generating quantitative and qualitative data. It also offers a graphical visualization and exploration of the interactions, which allow to quickly have a glimpse on the inter-animation and, therefore, on the collaboration in chats and forums. 2.3. Technical Description of T 5.2 Service Challenges for T 5.2 Service As expressed in section 1.2, the aim of T 5.2 service is to give learners assessments on their written productions, and thus to make the teachers’ assessment process smoother. The validation of T 5.2 Showcase was the first step to test and to validate some features of the service. T 5.2 Showcase used a reading and writing cycle to assist students undertaking text revisions. The reading loop uses LSA to identify texts on selected topics. During the writing loop, students put their understanding of topics in writing, and

LTfLL -2008-212578

20

D 5.2 Learning Support and Feedback

LSA is used to provide feedback on how well the student has understood the text. This service got quite good evaluations from their users, though it would benefit from refinement. Its validation (see D 7.2 Appendix 7, pp. 78 et sq. for more details) highlighted the following points: – Students not only have to summarize separate pieces of texts, but also to write syntheses from multiple sources after their reading (e.g., different parts of a course or articles). – A key benefit of the service is that it supports learners in a familiar situation: reading texts and then summarizing them. They frequently read texts from the web to understand and refine concepts and then make notes. – Teaching managers stated that the system is ready to be used for teaching, an advantage being that the system only provides texts relevant to the course, whereas the internet can provide large amounts of only slightly relevant material. – The system would benefit from a more adaptive approach to identify texts, based on how well the learner performed in the writing cycle. – Transparency in how the system came to its conclusions is needed, so that learners can identify the specific areas that need more attention. These previous achievements led us implement Version 1 of T 5.2 service with the following main improvements. First, the prototype allows text summarization, not only from single texts (e.g., article summary) but also from a bulk of documents (e.g., synthesis). We propose to ask students synthesize the courses, activity which promotes and assesses their own understanding (Palincsar & Brown, 1984). Feedback is provided on the learning and the synthesis of courses, which is a common task proposed in several teaching domains (Kirkpatrick & Klein, 2009). We propose several kinds of feedback that help students understand both the texts they read and the pieces of texts they write as a synthesis. These feedbacks allow students to be focused on the information they need to understand in the source texts. Moreover, these feedbacks directly indicate the parts of source texts lacking in the synthesis and those of the synthesis to revise. The Solution Scenario we designed (see D 3.2, section 3.3.2 pp. 25 et sq.) describes the final version (v. 2) with two kinds of feedback and support. The first kind of feedback is a “reflexive feedback” (Butler & Winne, 1995; Hattie & Timperley, 2007) that fosters self-directed learning and knowledge building processes. Its main goal is to help students formulate questions on the course documents before reading them and starting the activity of synthesis. It relies on an inquiry activity in which students formulate a focus question and lead a reflection on their prior knowledge and ideas on the topic. Formulating learning questions about what they will read helps the student to collect relevant information in the texts and to organize it in the synthesis to be written. The second kind of feedback is an “object feedback” on the pieces of texts produced by the student. It aims at supporting students on their synthesis task and focuses on the semantic content of the synthesis. Two ways of delivering this feedback are: first, an immediate and computer-based feedback, as many times as necessary. Second, a delayed

LTfLL -2008-212578

21

D 5.2 Learning Support and Feedback

feedback, given by teachers and tutors via the system. To help tutors or teachers to deliver adequate feedback on student’s syntheses, a list of recurrent understanding and writing problems based on the categorization of Thibaudeau (2000) is provided. This categorization of problems is built according to a design patterns approach (Hübscher & Frizell, 2002). The visualization of these pieces of feedback (texts, graphics, etc.) and their conditions and modalities of display relies on the model of Dufresne et al. (2003), which defines precisely the specifications of feedback to be displayed in an online educational context. The challenge we face is twofold: devise and implement cognitive models of written assessment; and use these latter models to build a comprehensive set of feedback to foster students’ knowledge building. We now describe the methods used in the implementation of Version 1 of T 5.2 service, which tests the effect of some of the feedback presented above. Methods carried out for version 1 In this first version of T 5.2 service, we are only focusing on the immediate feedback based on LSA-based computational cognitive models. LSA has been argued to be useful for modeling human semantic memory (Landauer & Dumais, 1997). This model can be used to simulate the understanding of texts and to analyze summaries of short explanatory texts. For example, Foltz, Kintsch and Landauer (1998) showed that LSA can be used to measure the coherence of text and that the understanding of a text depends on this coherence. This capability of LSA has also been used in other systems like Apex (Lemaire & Dessus, 2001), which uses the same measure of coherence to give feedback; or Summary Street, which sends feedback about the relevance and the redundancy of sentences (E. Kintsch et al., 2000; Wade-Stein & Kintsch, 2004). Several types of immediate feedback have been implemented in the Version 1 on the coherence of the synthesis, the relevance of its sentences, its completeness, and the generation of an outline of the synthesis. – The coherence assessment (Foltz et al., 1998) model was used (two consecutive sentences are coherent if their semantic proximity is above a threshold). This model can assess if two given sentences (in the same paragraph) are coherent. Moreover, a coherence gap between the last sentence of a paragraph and the first sentence of the next paragraph usually appears. In that case we nevertheless chose to indicate a coherence lack for two reasons: first, coherence lack may indicate a bad transition between paragraphs; second, students may insert wrong paragraph returns. – The relevance assessment model reused the Summary Street’s model (a relevant sentence is a sentence similar to at least one sentence of the source texts). – The completeness assessment model (does the synthesis cover all the course text topics?), we propose two alternative models: first, a measure with respect to course text topics (a topic is a keyword representing the gist of a text), i.e., the semantic proximity between the synthesis and the block of sentences linked to the topic in the course texts. A sentence of a course text is in this block if the semantic proximity between the sentence and the topic is high enough. If the semantic proximity between

LTfLL -2008-212578

22

D 5.2 Learning Support and Feedback



the block of sentences and the topic is high then the topic is covered; if not, the student is prompted that the topic is not covered. Second, a measure with regard to the semantic relation between the sentences of the course text and the sentences of synthesis (a sentence of the course text is indicated as not summarized in the synthesis if there is not enough semantic proximity between this sentence and each sentence of the synthesis). The outline generation model, which prompts the student with a picture (e.g., a map or a diagram), which represents the course topics as they appear in the synthesis.

We implemented the non-topic based feedback in version 1 (coherence, relevance, and the second alternative of completeness model), and we worked in parallel on the detection of topics. Since we have two types of feedback based on topics (synthesis completeness, outline), we have to propose and test models that generate such topics (keywords). Several methods for extracting keywords were tested. The first one considers that good keywords are words semantically related to the general meaning of the text. The method therefore consists in computing the LSA vector for the text by means of a classical sum of the vectors of its words. Then all of the words of the corpus are successively considered in order to find those whose vectors are the closest to the text vector. The closest ones are considered as keywords because they are semantically close to the meaning of the text. It is worth noting that this method can produce keywords from the corpora that are not present in the analyzed text, although this is quite unlikely. In order to assess the relevance of that method, we asked 30 participants (2nd year Master students) to provide 5 keywords for each of 6 texts. These French texts were between 570 and 2,532 words long and were about internet and networks. We compared human and model keywords and unfortunately found that correlations between cosines of automatically extracted keywords and human keywords frequencies were very low. Although the extracted keywords seem relevant, they were quite different from those provided by participants. We believe two facts may explain this finding: first of all, participants tend to select domain-specific keywords, whereas the method often outputs general words (such as content, technique, users, etc.); secondly, participants mostly provide nouns whereas the methods may produce verbs or adjectives since any words of the corpus may be chosen as a keyword, provided that it is close enough to the text vectors. We will further investigate this problem, in particular by filtering keywords based on their specificity and their POS tags. We also tried another method which is promising but much more complicated. This method is based on the integration of a cognitive model of text comprehension and a model of word meaning. This method thus combines the Construction-Integration model (W. Kintsch, 1998) and LSA. An implementation is presented in Lemaire, Denhière, Bellissens and Jhean-Larose (2006). Each sentence of the text is successively considered by the model, which first retrieves semantic neighbors for each word and then only keeps those relevant with respect to the general meaning of the portion of the text analyzed so far. Said differently, neighbors that are close to a word but not to the text vector are ruled out. For example, if the sentence is “how planes fly”, the model may retrieve the words “bird, wing, airplane” as neighbors of ‘fly’, but ‘bird’ will be removed when compared to

LTfLL -2008-212578

23

D 5.2 Learning Support and Feedback

the vector of the sentence. At the end, each word and sentence is given an activation value, which is proportional to its contribution to the general construction of the text meaning. This model has been proven to well account for text comprehension of very well controlled material. However, when applied to raw texts, this model did not provide satisfactory results: only very general words are highly activated. We need to further investigate that issue. Let us now describe the way feedback, which is based on cognitive models using LSA and described in the previous section, can be displayed to students. We chose to design the feedback (see Figure 11, part 2) in only one format: the sentence with a detected problem is underlined and a tooltip warns the student of the problem. For example, in Figure 9, there are two detected problems in the underlined sentence: a coherence problem (the two contiguous sentences are semantically far from each other) and a pertinence problem (this sentence doesn’t appear to be very important). We believe this format is not cognitively demanding.

Figure 9 — Feedback on a given sentence of the synthesis, prompted in a tooltip. “Sentence 2: — lacks coherence: this sentence and the previous one [and the following one] are semantically far from each other. — lacks pertinence: this sentence appears to be not so important.”

The service also delivers feedback about the completeness of synthesis. This feedback is not currently related to synthesis topics (because further validation tests are necessary). However, for each sentence of the course text, the feedback indicates if it is semantically linked to a sentence of synthesis (see Figure 10).

LTfLL -2008-212578

24

D 5.2 Learning Support and Feedback

Figure 10 — The tooltip delivers feedback on the extent to which a sentence of the synthesis covers parts of the source text: “No sentence of your synthesis is related to this one”

No reflexive feedback is implemented in version 1 yet. We propose the use of a notepad in which learners can write questions and notes about their learning process (Hadwin & Winne, 2001). The notepad is divided in four parts according to the type of utterance Scardamalia and Bereiter (1996) identified in their analysis of CSCL chat discussions. Thus, the four panes of the notepad are the following. Issues addressed in the read texts; ideas from the read texts to report and summarize in a synthesis; ideas from the participants’ individual knowledge; encountered problems, questions and points to detail. This notepad allows students to monitor their learning and the writing of the synthesis. It also allows collection of data to formalize and test a future computer-based reflexive feedback to be implemented in version 2. This service delivers feedback to students about the quality of their course synthesis, on a semantic basis. The students log on and select a course domain to synthesize. Then they can freely choose their learning activity. They can freely perform the following activities. Read course texts, search additional texts, write a synthesis, ask for feedback on the synthesis or lastly fill their notepad with possible research questions, ideas to put in the synthesis and their difficulties or questions. The search of additional texts according to the texts already read and understood and the effect of readings has already been tested with the Showcase (see Deliverable 3.2). During the next Version 1 validation process we won’t test this part of feedback even though it will be later integrated in Version 2. Results: Architecture of T 5.2 Service Layers The service is organized in four layers: client, service, application logic and storage. Client layer. Clients (students) use a web interface. It lets users read course texts and synthesize them. The interface code is in HTML/Javascript.

LTfLL -2008-212578

25

D 5.2 Learning Support and Feedback

Service layer. This layer links the interface with LSA and/or the database. It needs an AJAX solution. In accordance with the users’ actions, some parameters are sent to the server with the XMLHttpRequest object. The parameters are used by a PHP script. And in PHP, we invoke the C scripts (for using LSA) or MySQL queries. Application logic layer. C scripts are invoked to manage LSA through passed parameters or to recover data directly from files. The LSA application returns a result file. In this file, semantic proximities are required. The service layer transforms this file to communicate data to the client layer. We use a 24 million-word corpus gathered from newspaper articles to allow a transfer of experimental results in other domains than the tested one. We are using a version with the original Bellcore application (which is maintained by WP 5.2 staff) and a version using the R programming language (http://www.r-project.org/) is under development in collaboration with WP 2. Our goal is to successively implement each of the initial feedback as web services by using R-LSA. Storage layer. LSA needs a semantic space to function. We computed it from a text corpus, which depends on the user’s knowledge level and the studied domain. Since computing a semantic space is time-consuming, we compute in advance the semantic spaces. If we want LSA to make comparisons in processing a new document (in addition to the corpus), we don’t compute a new semantic space, but use specific LSA functions (tplus and syn, i.e., “fold in” technique). Users’ data and the texts are stored in a MySQL database. We plan to replace, for the integrated version of services, the user/password authentication process in favor of logging in using an openID account. Interface Description Currently, the pilot is not turned into a widget but is a web interface. After being authenticated, the student selects a course domain. Then, the main page is displayed (see Figure 11). The main page is split in 2 parts (part 1 and 3). At the top, the student can select/read a text (course text or understood additional text) (part 3). At the bottom, the student can write a synthesis (part 1). On the right, a button allows reaching a search engine of additional texts (part 4, deactivated button in the test of version 1), a button allows asking feedback (part 2) and another button allows to display the notepad (part 5).

LTfLL -2008-212578

26

D 5.2 Learning Support and Feedback

Figure 11 — Flow diagram of version 1 of T 5.2 service.

During the next weeks (as part of D 2.3), we will transform the current web interface into a web widget. The implementation of a 5-part widget is planned: – synthesis, – feedback, – management of course reading and additional texts understanding, – additional texts search engine, – notepad. Data Conceptual Description The first version of system needs a database to store the course and additional texts and users’ data (name, password, synthesis, read texts). We represent the relations between data in the Merise formalism below. The Data Conceptual Model represents the data in a formal representation while relational tables describe tables used in the MySQL database.

LTfLL -2008-212578

27

D 5.2 Learning Support and Feedback

Data Conceptual Model

Figure 12 – Data Conceptual Model of T 5.2 Version 1 Service.

LTfLL -2008-212578

28

D 5.2 Learning Support and Feedback

Relational Tables USERS = { username ; password} In the current version of service this table allows to log on a user. We will further adapt it to store opened identifiers (openID authentication) in widgetizing the service. READTEXT = { read_id ; addtext_id ; username ; read_understood} This table allows to know the read additional texts by the user and if the user understood it. ADDTEXT = { addtext_id ; addtext_title ; addtext_author} This table stores the title and author’s additional text LINESADDTEXT = { linesaddtext_id ; addtext_id ; linesaddtext _nb ; linesaddtext _sentence ; linesaddtext _endparag ; linesaddtext _bold ; linesaddtext _italic ; linesaddtext _underline ; linesaddtext _color ; linesaddtext _highlight}

This table stores the content of additional texts HOLD2 = { hold2_id ; addtext_id ; domain_id} This table allows to know the additional text in relation with the course domain. It’s indicated by teachers DOMAIN = { domain_id ; domain_title } This table stores the title of course domains SYNTHESIS = { synthesis_id ; notepad_section1 ; notepad_section2; notepad_section3; notepad_section4} This table allows to know the users’ synthesis and the content of notepad LINESSYNTHESIS = { linessynthesis_id ; synthesis_id ; linessynthesis _nb ; linessynthesis _sentence ; linessynthesis _endparag ; linessynthesis _bold ; linessynthesis _italic ; linessynthesis _underline ; linessynthesis _color ; linessynthesis _highlight}

This table stores the content of synthesis WORK = { work_id ; username ; domain_id ; synthesis_id ; work_begin ; work_last} This table allows to know which synthese is write by an user about a course domain COURSE = { course_id ; domain_id ; course_title} This table stores the title of a course and the linked domain LINESCOURSE = {

linescourse_id ; course_id ; linescourse_nb ; linescourse_sentence ; linescourse_endparag ; linescourse_bold ; linescourse_italic ; linescourse_underline ; linescourse_color ; linescourse_highlight }

This table stores the content of course texts

LTfLL -2008-212578

29

D 5.2 Learning Support and Feedback

Conclusion for T5.2 Version 1 of T 5.2 service presented above allows learners to be assessed during their written production through immediate computer-based feedback on coherence, relevance and completeness of the written synthesis. These feedbacks allow learners to be engaged in a reflexive process concerning the way they build knowledge, and rely on cognitive models under validation. After validation of this service we plan to propose a more integrated version, which will handle cognitively and pedagogically sound computational models for assessing students’ written production.

LTfLL -2008-212578

30

D 5.2 Learning Support and Feedback

3.

Integration and Validation of Services

3.1 WP2 Integration The entire system is based on a client-server architecture. The server side is implemented as web services that generate support and feedback information. The client side of the system consumes these web services, processes the data that it receives and presents the useful information in a user-friendly interface. In order to make the integration between the e-learning environment and the user interface of the provided services smoother, the latter is designed to be rendered as a set of widgets inside the environment. Server Side. The server side of the system is implemented using Java technology while its interface to the rest of the world is implemented using web services. These are built on top of the Apache Axis2 web services framework (version 1.5 http://ws.apache.org/axis2/), which in turn needs to run in a Java servlet container, in our case the Apache Tomcat (version 6.0). The more lightweight RESTful web-services offered by the framework are employed in the system. Although the entire communication always leverages on the HTTP protocol, the Axis2 framework allows for a lot of flexibility, especially concerning the output formats of the implemented web services. Since Axis2 comes with several implementations of message formatters we can, for example, use both the default Simple Object Access Protocol (SOAP) message format and the Javascript Object Notation (JSON) format at the same time, without any code or configuration changes being needed. The format of the response is resolved based on the content type of the HTTP request. Client Side. The user interface of the system uses the default web building blocks HTML and Javascript, in accordance with AJAX to be able to call web services which outputs are rendered using widgets. The widget system allows for an easy integration in any learning environment, while the AJAX technology is used for getting and presenting the feedback information inside the widget without any need of a page refresh. The widget system is based on the Wookie framework (http://cwiki.apache.org/WOOKIE/). Wookie is an open source implementation of the W3C Widgets Candidate Recommendation (http://www.w3.org/TR/widgets/), which allows for small web applications to be embedded into web pages and therefore standardizes the packaging formats. Wookie is an Incubator project at the Apache Software Foundation. Using the Wookie framework the client side of the system consists actually of 2 components: 1. The Widget Container: is the web application that can include widgets in its pages, in our case the learning environment. It takes care of the authentication process and it can set properties for the widgets that run inside it. In order to communicate with the Widget Engine, the web application needs some specific code, a plugin. 2. The Widget Engine: is the core of the Wookie framework and it is the widget repository. It provides functionality for adding, editing, removing and, in general, for

LTfLL -2008-212578

31

D 5.2 Learning Support and Feedback

managing the widgets that it hosts. The widget engine is implemented as Java Servlets and runs in the Apache Tomcat Container. The widget itself uses AJAX to consume the provided web services. Here, the JSON message format that the web services can generate, comes in very handy, since it is very easy to transform this type of response into Javascript objects that can be afterwards further processed. The Yahoo User Interface library (http://developer.yahoo.com/yui/) that we are using assists us both in managing the AJAX requests and in processing the web services responses and it integrates flawlessly in Wookie. Theoretically, AJAX calls towards external services (external servers) are not permitted by the user’s browser since this would constitute a security violation of the Same Origin Policy (http://en.wikipedia.org/w/index.php?title=Same_origin_policy&oldid=314530069). Therefore, all these requests need to be forwarded through a proxy server, this role being played here by the Widget Engine. Wookie provides this functionality by creating for every web service URL another URL that actually points to the Widget Engine server while the original domain name is passed as a parameter. Concerning the WP 5.2, we plan to transform the system in several widgets to be used in a learning environment with services from other WPs. The first version of the system is developed using AJAX to exchange data between the client and the server. This form makes the transformation into widgets smoother but we have also divided the system into different learning parts (see Figure 13) and maintain a common look and feel. Since WP 5 shares in some parts the same needs as other WPs (such as user authentication, or a text management system), we plan to adapt our source using the source of other widgets (WP 4.1 for course management or WP4.2 for versioning). This will make the data of services more exchangeable between WPs.

Figure 13 — Architecture of widgets.

LTfLL -2008-212578

32

D 5.2 Learning Support and Feedback

3.2 WP3 threads The aim of this section is to explore a bit further the way to integrate WP 5 services in two main instructional settings: informal and formal learning. Informal Learning Scenarios In informal learning settings, students can freely explore the working possibilities of WP 5 services, without being directed by explicit pedagogical scenarios. They design their learning environment from scratch and use some widgets taken from our services for facilitating their workflow. Table 1 represents all the possible students’ paths across the different LTfLL services. Some of their features are noteworthy: first, the third to fifth services require texts (e.g., course notes, essays, summaries, syntheses) as input, in order to diagnose student’s position or understanding. Second, the first are focused on the assessment of student’s pre-knowledge, so they are likely to be used at the beginning of learning. A sample workflow of a student across our services could be as depicted in Figure 14. Let us present Maria Smith (see the “additional integration report” p. 20–21). She is enrolled in several degree courses at her college to get a qualification in the newest trends on IT (e.g., Web 2.0). At the beginning of a week, Maria decides to start learning on one of the topics of the course: “LSA-based systems for learning”. She connects on the LTfLL web platform, reflects on her work to come and manages the set of widgets she plans to use accordingly: – a common PDF reader/annotator widget, for reading and annotating course documents; – several widgets for analyzing chat and forum sessions (WP 5.1) performed with a chat system or in a discussion forum, for getting some feedback about her performance and of the team she participated to chats or forums; – a common word processing window, for writing the summaries and syntheses related to her understanding of the course content; – learning material searcher widget (WP 6.1), because she is not very sure of being able to understand all the notions of this difficult course (her peers said); – a level “determinator” widget (WP 4.1), because she has quickly to grasp some core notions on the course topic, and this widget is very useful to indicate the adequate level (or section, chapter) to which start a course; – an overall understanding assessment (WP 5.2), because she has quickly to figure out if she adequately understood the first several pieces of courses. This first widget selection is rather time-consuming. Since her teacher and the tutors can be informed on her choice, Maria thinks this time is a good investment because each of the widget smartly interacts with the other ones in order to help her learn. Table 1 below depicts the following use path. She first uses the level determinator (4.1 Service) to assess his or her initial knowledge, and types some key-words to find some parts of courses of interest. She then uses the PDF reader & annotator widget and takes some notes that could later compose her course synthesis. These notes, in turn, are entered as input in the learning material searcher widget (6.1 Service), which gives some

LTfLL -2008-212578

33

D 5.2 Learning Support and Feedback

information on the possible course level to actually start the course. She then sets up, in her web browser, the four following widgets: – the PDF reader/annotator for reading further course content; – a word processor, for writing the synthesis of the course, gathering the main pieces of the course she understood; – a chat analysis widget window; the overall understanding assessment widget (5.2 Service) for asking intermediate feedback on the written syntheses. Table 1 – Possible Paths for a Student’s Workflow using our Services. Legend: S: Student. Read: When students are using 6.1 service (first row), then they are most likely to use 6.2, nd rd th 4.1 and 5.2 ones (resp. 2 , 3 and 5 columns). 1

2

3

4

5

6

1 Learning Material Searcher (6.1)

X

XX

XX

2 Assess S Pre-Knowledge & Connect to other people (6.2)

XX

X

X

3 S Level Determination. Text as input (4.1)

XX

X

X

XX

XX

4 S Conceptual Understanding. Text as input (4.2)

XX

X

XX

X

XX

X

5 S Overall Individual Understanding. Text as input (5.2)

XX

XX

XX

X

XX

6 Collective Knowledge Building. Discussion as input (5.1)

XX

XX

X

XX XX

XX

Legend: X: likely succession; XX: most likely succession.

Figure 14 – A likely student’s workflow across LTfLL services.

Formal Learning Scenarios In more formal learning settings, students are immersed in specific learning scenarios designed by teachers. WP 5.2 task is focused on summary and synthesis writing, which are core activities in distance learning. This activity is not only involved in very common situations, like note taking during courses, but is also at hand in more sophisticated

LTfLL -2008-212578

34

D 5.2 Learning Support and Feedback

learning activities. Bonk and Dennen (2003, p. 340) listed four main kinds of pedagogical activities in distance learning. It is noteworthy that most of these activities they describe involve summarizing and chatting at one or more steps of their process. – Motivational and ice-breaking activities, allowing participants to introduce them to each other. All these activities are mainly performed through chat or forum. – Creative thinking activities, like brainstorming, role-play, topical discussions, webbased explorations and readings, performed through chat, forum, and also web-based searches. – Critical-thinking activities, like electronic polling, Delphi technique, summary writing, case analyses, web resources evaluation, virtual debates. Like for the previous activities, these involve the use of chat, forum, and word processor (summary writing). – Collaborative learning activities like structured controversy, expert panels, problem solving activities, publishing work. These latter activities need a strong group organization and the use of combined pieces of software like chats, collaborative writing processors, etc. 3.3 Collaboration with WP 4 and WP 6 As expressed in the integration report, we plan to strengthen our collaboration efforts with WP 4 and WP 6. The way a document can be annotated to support learning will be jointly determined (together with WP 4, 5 and 6, since all of them use or plan to use formulation of questions for learning). We plan to devise a set of annotation templates that fit with the most common pedagogical or learning intents (e.g., “Introduction, related work, our ideas, evaluation, conclusions”, or “New information, New idea, I need to understand, Further Explanation, My theory”). These templates can also be dependent to the subject domain taught—see http://kbtn.cite.hku.hk and Scardamalia, Bereiter and Lamon (1994), and will be integrated in our notepad (T 5.2). Task T5.1 could beneficiate in its analysis from some of the modules developed for positioning the learner in WP4, for example, topic detection and the suffix array algorithm, for the identification of repeating phrases. Regarding WP6, in version 2 we intend to develop a module that allows to use ontologies in the conversation processing. From another perspective, WP5 could use also annotated chats and forums for peer search. 3.4 WP7 Validation Plans The validation goals for T 5.1 service are to investigate the extent to which: – learners get a useful feedback immediately after they finish a chat discussion and justin-time for forums; – C&F-AFS offers a graphical visualization that improves the understanding of the conversations; – time needed to provide final feedback and grading is reduced; – it increases the quality of the feedback resulting from analyzing collaborative chat sessions and discussion forums; – it is easier to maintain consistency of feedback between different tutors;

LTfLL -2008-212578

35

D 5.2 Learning Support and Feedback

– –

the system offers formative feedback to adapt and improve the course by harvesting the large volume of data produced by the learners; using C&F-AFS mediated collaboration improves the learning outcomes of the learners.

The feedback service for chat conversations will be validated during the HumanComputer Interaction (HCI) course, running for 14 weeks in the first semester (ends in February 2010) at the Computer Science Department, “Politehnica” University of Bucharest, and the validation will involve the following participants: – at least 8 undergraduate students, year 4 (senior year); – 4 tutors / teaching assistants for the HCI course; – 1 professor for the HCI course. The forum feedback service will be validated at the University of Manchester, using a forum from the medical domain on the topic “professional behaviour for medical students”. The participants are: 8 students, 1 student-facilitator and 2 program managers. The validation procedure concerning T 5.2 service is running. It consists in proposing Pensum to 60 first year students engaged in a distance learning course provided by CNED (Centre National d’Enseignement à Distance, a French Open University), through a WebCT platform. These students have to perform a case study and their task is to read a set of papers carefully and to collectively write a synthesis. In order to understand the content of the papers, each student is invited to freely use Pensum as an external and individual help. Qualitative and quantitative analyses, in the same line as these performed for the Showcase validation, will be carried out for. In parallel, we plan to design and test more cognitive models on topic extraction.

LTfLL -2008-212578

36

D 5.2 Learning Support and Feedback

4. Conclusions: Tools and Resources for Second Cycle of LTfLL We presented in this report two lines of services that focus on written production (chat or forum conversations and course syntheses) and ways to assess them. Since these two lines appear to be separated (the first being social-centered and the second essaycentered), they share nonetheless a lot of assumptions: – written productions can be viewed as voices that populate the classroom, among others, those of the teacher, tutors, handbook’s authors, peers, etc.. The interanimation between these voices can be uncovered with NLP techniques and reveal their relations; – self-regulated learning—an important characteristic of lifelong learning, since students have irregular contacts with their tutors and teacher—can be considered as a loop in which artifacts and peers help students make explicit what knowledge is built and how this knowledge is built. – ways to highlight the importance of some parts of the text produced by the students (i.e., utterances, paragraphs) are crucial to foster students’ understanding and knowledge building, since they direct their attention to the most important pieces of text; – providing students with computed-based artifacts that analyse and display the main features of their written production help them build knowledge. The next round of research will be dedicated to the following points. First, to design and manage experiments for validating the services in educational settings. Second, to refine some of the latter experiments to explore two main research paths: social psychologybased hypotheses (e.g., to what extent our services can help students not to feel themselves isolated) and cognitive-based ones (e.g., to what extent the reference to the self and dialogicity help students understand the content of courses). Third, exploring the ways to operationalize Bakhtin’s theory to lifelong learning, in providing a comprehensive guide to account for the core notions of this theory (e.g., “dialogicity” of the interactions in an environment to assess the quality of learning/teaching, utterance and its boundaries, inter-animation of voices with cohesion-based measures, etc.). Eventually, to provide graphical-oriented interfaces to give students a comprehensive perspective on knowledge built (like this of O'Rourke & Calvo, 2009). These points taken into account would allow us to design and implement Social and Knowledge Building software (Code & Zaparyniuk, 2009), whose purpose fully suits our own research goals and can be summarized as follows, in providing students: – shared spaces representing collective contributions; – ways to link and reference ideas and their development; – ways to represent higher-order organizations of ideas; – ways for the same idea to be worked in various contexts; – different kinds of systems of feedback to enhance self- and group-monitoring;

LTfLL -2008-212578

37

D 5.2 Learning Support and Feedback

– –

opportunistic linking of persons and groups; ways for different user groups to customize the environment.

LTfLL -2008-212578

38

D 5.2 Learning Support and Feedback

5. Appendices Appendix 1 — Description of our Services as Fostering Self-regulated Learning Fostering self-regulated learning for lifelong learning is one of our main goals in this project (see also Deliverable 4.2). Our previous state of art on feedback (see Deliverable 5.1, section 2.3) underlined the need to promote alternate ways to deliver feedback on writing (Lindblom-Ylänne, Pihlajamäki, & Kotkas, 2006). In doing so, we plan to blend two forms of feedback—verification and elaboration—for improving its effectiveness (Kulhavy & Stock, 1989) and to support students in their knowledge building. We can now elaborate more on how our services can support personal and collaborative learning. Very briefly (more information in the “Additional integration report”, pp. 8 et sq.). First of all, each of our services can foster learning in a self-regulated way. Vovides et al.’s (2007) loop represents the tiers in which students are involved during learning. For T5.1 this loop is presented in Figure 15. The student discusses in a chat or forum. She evaluates whether her utterances fit her goals. The T5.1 service provides feedback which allows the student to compare it with her image of what she uttered.

Figure 15 — Self-regulated learning loop for students using T 5.1 service.

Let us show such a loop also for T 5.2 service. Individually, each student can read and write texts concerning a given topic (see Figure 16). At any moment (the “stop and think” strategy, as described by Vovides et al. (2007), the student can write a specific piece of text on a subject he or she wants to understand (or reuse an already-made one, like course notes) and considers being important content. This is the object level and this piece of text can be close to a synthesis. The second step is a first kind of assessment, either by the learner or a peer, in which the student assesses the purpose and the quality of the text

LTfLL -2008-212578

39

D 5.2 Learning Support and Feedback

(related to the course). Third, the student can be prompted with some information on the relevance of the topics mentioned, the coherence of the synthesis, and its completeness. Then, the ‘meta’ level starts, in which the student is asked to monitor his or her written production (e.g., synthesis), that is, to compare the feedback of the service with his or her own. In turn, the initial text can be modified in light of the last two steps for a new round.

Figure 16 — Self-regulated learning loop for students using 5.2 service

LTfLL -2008-212578

40

D 5.2 Learning Support and Feedback

Appendix 2 – The extended pattern language Elementary Expressions The elementary expressions that may be used in the search are: word matches any occurrence of a ‘word’ “text” matches any occurrence of a ‘text’ * matches any string of 0 or more characters matches any word which has the stem ‘stem’ matches any synonym of “word” matches any word that also appeared in the last 10 utterances matches any word that also appeared in the last x utterances , , , etc any word that was annotated by the POStagger with that label The tags can be combined in expressions like (synonim with a word that appeared in the last 10 utterances), (has the same stem as a word that appeared in the last 10 utterances), (synonim with something that has the stem of “word” or, equivalently, is declined from it). However, not every combination of tags is computable. For example it can’t be derived about a word “X” whether it is (declined from a noun) or not. That is because this version of the program only uses the information from the POS-tagger about the words in chat, and when saying “X” is declined from “Y”, we can’t tell anymore whether “Y” is a noun or not, because “Y” is not a word from the chat. This is not the only example of incomputable expression that can be obtained by combining the tags. The program announces when such a combination occurs. Operators Three operators are provided in “PatternSearch” for constructing more complex expressions. They are concatenation, AND and OR. - Concatenation: Expression1 Expression2 By joining two expressions, a new expression is obtained that matches a text if there exists text1 and text2 such that text1 matches Expression1, text2 matches Expression2 and text = text1 + whitespaces + text2 - AND operator: Expression1 & Expression2 Expression1 & Expression2 matches text if both Expression1 matches text and also Expression2 matches text. - OR operator: Expression1 | Expression2

LTfLL -2008-212578

41

D 5.2 Learning Support and Feedback

Expression1 | Expression2 matches text if Expression1 matches text or Expression2 matches text. Normal parentheses may be used for grouping expressions. The evaluation of expressions is by default from left to right, excepting the case of parentheses, which are evaluated first. For example Expression1 Expression2 & Expression3 is equivalent to (Expression1 Expression2) & Expression3 but different from Expression1 (Expression2 & Expression3) Composite expressions Composite expressions can be used for finding links between utterances. Their syntax is: Expression1 # Expression2 Just one “#” can appear in a composite expression. Two consecutive utterances, utterance1 and utterance2, match Expression1 # Expression2 if utterance1 contains a text that matches Expression1 and utterance2 contains a text that matches Expression2. The following forms are also accepted: Expression1 #[k] Expression2 – utterance1 and utterance2 must be at distance k. Expression1 #[*] Expression2 – utterance1 and utterance2 can be at any distance (at most 10). An example of composite expression: (What do you think) | (What is your oppinion)#I* Variables The program allows the definition of variables. If one line from the input contains the assign string “:=”, then the line is interpreted as a variable definition. The name of a variable must begin with the character ‘$’. So the syntax is: $variable := Expression Filters Sometimes there are more occurrences which match an expression. We can specify to select only the longest (or shortest) one at the level of the whole chat: Expression @ max (or Expression @ min) Also, a single utterance might contain multiple matches of an expression. Therefore it might be useful to be able to select only the longest (or shortest) match from each utterance. The syntax is the following: Expression @ max_r (or Expression @ min_r)

LTfLL -2008-212578

42

D 5.2 Learning Support and Feedback

Appendix 3 – Identification of Lexical Chains Algorithm As shown in Cartgy & Stokes (2001), a lexical chain is a collection of semantically related words that are spread in a text. In order to determine the lexical chains from a text, we have to prior know the semantic distances between the words found in that text. The process evaluates each word found in the text and places it in the lexical chain where it fits best. If the word doesn’t fit in any of the existing lexical chains, then a new chain is created and the word is placed in this new chain. In order to evaluate how well a word fits in a lexical chain, a relationship between that lexical chain and that word e tried to establish that is evaluated in order to be added in an existing lexical chain. We considered that a word can be added to a chain if most of the words from that chain are semantically close to the given word. Next, we present the algorithm that we have used for this task. The algorithm receives two variables threshold1 and threshold2 along with the text in order to build the lexical chains. Threshold1 represents the maximum value of the semantic distance between two words so that they can be considered semantically connected, while threshold2 represents the minimum percentage of the words in the chain that should be semantically connected to the given word, in order to place the word in that chain: Lexical_chains(text, threshold1, threshold2) FOR EVERY word w in the text FOR EVERY existing chain c IF c contains w THAN continue with the next word from the text ELSE continue with the next chain FOR EVERY existing chain c FOR EVERY word w1 in chain c IF sem_dist(w1,w) = threshold2 THAN word w is introduced in chain c stop trying the rest of the chains and continue with the next word from the text ELSE continue with the next chain IF w has not been placed in any chain THAN create a new chain and introduce w in it End.

In the above algorithm, sem_dist(a,b) denotes the semantic distance between a and b, no_sem_related(c,w) represents the number of words from the chain c that are semantically related to the word w and no_words(c) represents the total number of words found in chain c. We have used two metrics for detecting the semantic distances: shortest path length and Jiang-Conrath. Further information about these distances will be presented in the Semantic Distances subsection. In the case of the shortest path length, the thresholds used in practice were 90% for the percentage of the words in the chain that should be semantically connected to the given

LTfLL -2008-212578

43

D 5.2 Learning Support and Feedback

word (threshold2), and 3 for the maximum semantic distance between two related words. The value of 90% for threshold2 has also been kept for the Jiang-Conrath metric, but this time, threshold1 had to be broken in two limits (inferior and superior), since this metric could also have negative values. Therefore, instead of testing if sem_dist(w1,w)