Service-Oriented Collective Intelligence for

0 downloads 0 Views 2MB Size Report
Japanese teachers do not understand Korean, so they are unsure if the translation is correct. They perform back translation, i.e., they translate the Korean back.
Service-Oriented Collective Intelligence for Multicultural Society Rieko Inaba 1, Toru Ishida1, 2 1 National Institute of Information and Communications Technology 2 Department of Social Informatics, Kyoto University [email protected], [email protected]

1. Introduction Increased globalization among all countries has brought about the coexistence of multiple cultures. Although promoting English as a common language has its own advantage, taking the effort in learning other languages significantly helps in understanding different cultures. Unfortunately, hundreds of languages are spoken all around the world. It is simply too daunting to tackle this challenge of learning other languages all by ourselves. Machine translations can be useful when they are customized to suit the communities involved. However, the question is, is it that simple to use machine translation? Several websites already offer translation services. For instance, first, enter the line “You have cleanup duty today” in Japanese and translate it into Korean. The following sentence “오늘은너가청소당번이야” appears on the screen. Japanese teachers do not understand Korean, so they are unsure if the translation is correct. They perform back translation, i.e., they translate the Korean back into Japanese and the following sentence is displayed “You should clean the classroom today!” This sentence seems a little rude when spoken; however, it may be acceptable, if the line is delivered with a smile. Now, if we translate it into Chinese in the same manner, the following Chinese equivalent “今天你 是扫除值日哟” would appear on the screen. When we back translate the Chinese sentence into Japanese, we notice a very strange sentence “Today, you remove something to do your duty.” It appears that the word “掃除当番,” which means duty to clean the classroom, was not registered in the dictionary of this machine translator. Therefore, for elementary schools, it is necessary to compile a multilingual dictionary of words that are frequently used in schools. It is also useful to create a multilingual list of frequently used expressions: 500 sentences for math, 800 sentences for national language, 300 sentences for school activities, etc. Then, we can try to replace words or expressions output by machine translators with the more correct ones in local dictionaries. To this end, morphological analyzers are necessary to divide the input and output sentences into different parts. Morphological analyzers are often developed in research institutes or universities and are used for research purposes. Their websites do not state that they can be used in schools, hospitals, etc. If an elementary school wants to use them, the school must take the permission of the providers by sending them a mail via post or via e-mail. It appears that multilingual rooms cannot be easily established without large budgets and sophisticated technologies. In reality, most schools have neither of those. These schools require service-oriented collective intelligence that connects language services across the world to support their local multicultural life. In this paper, Section 2 discusses Language Grid for Service-Oriented Collective Intelligence and Section 3 discusses multilingual communication using the Language Grid. We conclude this paper in Section 4. 2. The Language Grid: Service-Oriented Collective Intelligence In 2002, we performed an intercultural collaboration experiment that involved more than 40 professors and students from China, Japan, Korea, and Malaysia. The experiment aimed at developing open source software in the mother tongue of the participants using machine translation. Through this experience, we arrived at the conclusion that machine translation is a half-completed product. To use it more efficiently, we should utilize local dictionaries that cover specialized words

in each user’s community and culture. In 2006, the Language Grid project (http://langrid.nict.go.jp/) was commenced in NICT, one of the major Japanese government institutes, in order to connect various language resources including machine translators and dictionaries worldwide. There are four types of stakeholders in the Language Grid: Language Resource Provider, Computation Resource Provider, Language Service User, and Language Grid Operator who coordinate other stakeholders. Though there can be various operation models for the Language Grid, we first created a non-profit operation model. This model limits the usage of language services solely to non-profit ends. Unlike Wikipedia, the model tries to match the incentives of stakeholders and manages various issues associated with intellectual property rights, user privacy, and operation costs. Language Service Users can use the language and computational resources that are provided. For example, elementary schools are Language Service Users. However, those schools can set up their own server and connect it to the Language Grid. They then become Computation Resource Providers. They can also become Language Resource Providers if they register a multilingual dictionary to share their resources with other schools. Thus schools, as one example, can assume three roles at the same time. The Language Grid is a service-oriented collective intelligence platform. Its software has been continuously developed since April 2006, and Kyoto University started its non-profit operation in December 2007. S eventy groups worldwide have signed the agreement to participate in this initiative. Resources registered by participants to the Language Grid include machine translators that cover Chinese, English, French, German, Italian, Japanese, Korean, Spanish, and Portuguese. Morphological analyzers, dependency parsers, concept dictionaries, and specialized dictionaries in disaster management, tourism, life sciences, etc., have already been registered. Language service providers implement their own language service as a Web service with a standard interface, deploy them on the Language Grid, and register its WSDL description and profile on the Language Grid. Simultaneously, they can create a new composite language service by applying the workflow and then deploy the services. Information of language services are shared by the Language Grid Service Manager (http://langrid.org/operation/service_manager/). It becomes possible for users to use the language service registered in the Language Grid easily by developing the application program. The Language Grid Playground (http://langrid.org/playground/) was created by university students to show how to develop application programs. Since all source codes are open, the Playground can be seen as a building block of Language Grid applications. There are various types of ongoing application activities in the Language Grid Association (http://langrid.org/associat ion/), which is a user group of the Language Grid; NPOs and universities have started supporting intercultural collaboration in hospitals, schools, etc.

3. Multilingual Communication in Multicultural Society In this section, we discuss the usage environment of NPO Pangaea for the case study. NPO Pangaea (http://www.pangaean.org/) creates a universal playground that brings children around the world closer. Activities are held regularly in Japan, Korea, Austria, and Kenya. Organizers were facing problems due to language barriers in communication between the volunteer staff of different language groups. Pangaea now uses a multilingual chat system and language input for corresponding and communicating between staffs who do not speak English in Kyoto, Seoul, and Vienna. To facilitate communication among the staff members who use different languages, NPO Pangaea developed the Pangaea Community Site. When a member enters a phrase in his/her native language on the online bulletin board, it is translated into Japanese, Korean, English, and German through the multilingual translation system by the Language Grid. We interviewed the members of NPO that has been using a machine translation embedded chat system to manage its overseas offices for almost two years. From these interviews, we found that they were facing certain difficulties when conducting multiparty group meetings. All of the interviewees mentioned that it was virtually impossible to conduct a group meeting when the total number of languages within the group was more than two. Why is machine-translationmediated conversation so difficult when the number of languages is more than two? According to a previous research [4], by expanding the experiment on referential communication from pairs to triads, we consider ways of supporting machine-translation-mediated collaboration for group work. We used a multilingual chat system “Langrid Chat” that was also used by NPO. Langrid Chat translated each message into other languages while providing awareness information on the typing of others. Langrid Chat is equipped with a back translation function, i.e., when a user types a sentence, the system automatically translates the sentence into other languages, retranslates them back to the original language, and displays the sentence to the user in real time. Users can edit their messages before sending them out to others. Thirteen triads from different language communities (China, Korea, and Japan) participated in the experiment. Nine triads participated in a referential communication task (Fig.1) using their native languages through machine translation; four triads participated in the same referential communication task using a common language (English, which is not their native language). The experimental design was a design for comparing referential communications carried out using the above two language methods. We observed the following phenomena. 1) Places for Identifying Referents When two Matchers do not share the same utterance of a Director, the Matchers may not be able to identify the referents on the basis of the same Director’s utterances. We found many cases in which the Matchers identified the referents at different places in the conversation and clarified with the others when using machine translation. As shown in Figure2, the Matchers identified the referents at different points in the conversation more frequently in machine translation communication than in English.

From further observation, we found that referential communication using machine translation was even more inefficient because the Matchers were not aware whether they shared the same Director’s utterance. In regular conversations, when one Matcher B accepts Director A’s proposal on a referent faster than the other Matcher C, B often acquires knowledge as to why C did not accept A’s proposal concurrently with him/her by following the subsequent conversations between A and C. B makes use of such knowledge to coordinate his/her own utterances on the referent upon becoming the next Director. However, such coordination was rarely observed in referential communication when using machine translation. The Directors coordinated their utterances more with the previously slow Matcher when using English (Avg.: 48.4%) than when using machine translation (Avg.: 78.8%). A T-test showed a significant difference between the two language conditions (t (8) = 2.63, p < 0.05). Since the previous slow Matchers often required further explanation when the Directors did not coordinate their utterances with them, we infer that such a lack of coordination of utterances was one reason for faulty communication requiring a large number of utterances to match the figure. 2) Referring Expressions We compared the lengths of referring the same Director between the first and second trial and classified for each referent expression. Studies using referential communication tasks have shown that once a pair of communicators has entrained on a particular referring expression for a referent, they tend to abbreviate this expression on subsequent trials. It appears that the participants had trouble finding the referring expressions that could be shared with all three members. Even in the case where a Director’s reference was smoothly accepted by the Matchers in the first cycle, the Director sometimes lengthened his/her referent in the second cycle because the reference could not be used between the two Matchers. Table1 shows this tendency. 3) Efficiency of Mutual Acceptance Process We hypothesized that the participants are unable to improve their efficiency in formulating appropriate references when using machine translation than when using English because they are unable to distinguish between the information that they share and do not share with others. To see how many Directors improved in making appropriate references over trials, we calculated for each trial the rate of participants matching the figures through basic exchange. As shown in Figure 3, the participants were able to match the figures more efficiently in English than in machine translation (F (1, 48) = 76.9, p < 0.001). We found that the Directors using machine translation had difficulty in improving their references so that both Matchers could identify them immediately, and they were reluctant to shorten them because they were not aware which references could be shared among all members. This study was carried out to clarify the factors that make machine-translation-mediated conversations difficult when the number of group members is more than two. As a result, the (1) Directors were less likely to coordinate their utterances with the previously slow Matcher, (2) Directors were less likely to abbreviate their referring expressions over trials, (3) participants’

mutual acceptance process was inefficient and did not improve much when compared to using English. We suggest two recommendations for the design of future machine-translation-embedded communication systems to support group work: Provide speakers with an awareness of how their utterances are translated between addresses and provide addresses with an awareness of how a speaker’s utterance is translated to other addresses using a different language.

4. Conclusion It is necessary to develop the service-oriented collective intelligence for supporting the multicultural and global society. To solve the language problems, the Language Grid has been proposed as a new infrastructure on the Web; it enables users to combine language resources on the basis of the Web service technology. In addition, it is important in creating a user-centered usage environment, after developing the service oriented collective intelligence. Then, we found that it is necessary to customize the user-centered usage environment in order to establish the service oriented collective intelligence as a result of observing the community that provides the language service for a long term. As we cannot solve all problems through translation, we have to increase our knowledge on different cultures in order to arrive at a mutual understanding. For instance, we can literally translate the term “掃除当番” into Portuguese, which means cleanup duty; however, this translation still puzzles Brazilian students in Japan because there is no such concept in Brazil. As known to all, the translation of one language into another is the first step in understanding one another. Therefore, it is necessary to extend the Language Grid so that machine translation results are associated with the knowledge that allows a better understanding of different cultures. It is hoped that service-oriented collective intelligence can support both daily and research activities in a multicultural society. Acknowledgement We would like to express our sincere gratitude to Naomi Yamashita and Hideaki Kuzuoka for their careful comments on this paper. The authors would like to thank the staff members of the Language Grid project. This research was supported by the Kyoto University Global COE Program: Informatics Education and Research Center for Knowledge-Circulating Society. References [1] T. Ishida. Language Grid: An Infrastructure for Intercultural Collaboration. IEEE/IPSJ Symposium on Applications and the Internet, pp. 96–100, 2006. [2] T. Ishida. Communicating Culture. IEEE Intelligent Systems, Vol. 21, No. 3, pp. 62–63, 2006. [3] T. Ishida, S. R. Fussell and P. TJM Vossen Eds. Intercultural Collaboration, LNCS, 4568, Springer-Verlag, 2007. [4] N. Yamashita and T. Ishida. Effects of Machine Translation on Collaborative Work. International Conference on Computer Supported Cooperative Work, pp. 515–523, 2006. [5] S. Sakai, M. Gotou, R. Inaba, Y. Murakami, T. Yoshino, Y. Naya, Y. Hayashi, M. Tanaka, T. Takasaki, S. Matsubara, Y. Kitamura and T. Ishida. Supporting Multicultural Society with the Language Grid, International Conference on Informatics Education and Research for KnowledgeCirculating Society (ICKS'08), 2008.