Visualizing Trends in Knowledge Management - Semantic Scholar

13 downloads 15154 Views 324KB Size Report
is then performed to check the relevance of these results. While the .... Average. Ranking. 1 (1). Search Engine and Web Information Categorization. 3. 4.7. 2.
Visualizing Trends in Knowledge Management Maria R. Lee1 and Tsung Teng Chen2 1

Shih Chien University, Department of Information Management 104 Taipei, Taiwan [email protected] 2 National Taipei University, Graduate School of Information Management 104 Taipei, Taiwan [email protected]

Abstract. Knowledge visualization is a creative process, but difficult to formalize. This paper presents a system that is capable of analyzing voluminous citation data and visualizing the result. The system offers visualizations of trends by clustering scientific papers taken from the web (CiteSeer papers). Two methods are implemented: factor analysis and PFNET. An experiment has been carried out with the literature in knowledge management. A deep analysis of current trends in KM is then performed to check the relevance of these results. While the topical content is specific to knowledge engineering, semantic web, and related sub-areas, the approach could be applied to any general topic area in AI. Keywords: Knowledge Management, Knowledge Visualization, Factor Analysis, PFNET.

1 Introduction Knowledge visualization is a creative process, but difficult to formalize. Massive scientific papers publish every year. It would be useful to comprehend the entire body of scientific knowledge and track the latest developments in specific science and technology fields. However, effective and efficient comprehension of vast knowledge is a challenging task. Knowledge Management (KM) is a fast growing field with great potential. However, researchers have disagreeing opinions about what constitutes the content and context of the KM research area [11]. It will be instrumental if the intellectual structure of KM domain could be constructed using Knowledge Domain Visualization (KDV) techniques. Previous studies of intellectual structure of the KM domain has been constructed by researchers with predominantly information systems and management oriented factors [18, 19]. In contrast, our study drew primarily on voluminous science and engineering literature that has given us some interesting results. This paper is organized as follows. Section 2 introduces the start-of-the-art of knowledge domain visualization. Section 3 introduces a scientific analysis system, which offers visualizations of trends by clustering scientific papers taken from the web (CiteSeer papers). Two methods are implemented: factor analysis and Pearson correlation coefficients in PFNET. An experiment has been carried out with the literature in knowledge management in section 3. A deep analysis of current trends in KM is then carried out to check the relevance of these results in section 4. Z. Zhang and J. Siekmann (Eds.): KSEM 2007, LNAI 4798, pp. 362–371, 2007. © Springer-Verlag Berlin Heidelberg 2007

Visualizing Trends in Knowledge Management

363

2 Knowledge Domain Visualization Visual exploration of large data sets provides insight by the visualizing of data [16]. However, lacking the ability to adequately explore the large amounts being collected, and despite its potential usefulness, the data become useless. Fortunately, the task of knowledge comprehension could be facilitated by an emerging field of study - Knowledge Domain Visualization (KDV), which tries to depict the structure and evolution of scientific fields [3]. A knowledge domain is represented collectively by research papers and their inter-relationships in this research area. A knowledge domain’s intellectual structure can be discerned by studying the citation relationships and analyzing seminal literatures of that knowledge domain. Researchers in information science studied the intellectual structure of a discipline in the early eighties [21]. One of the pioneering studies, Author Co-citation Analysis (ACA), is used to present the intellectual structure of knowledge domain. Recent studies in knowledge visualization adopt this ACA approach as its underlying methodology and outfitted the intellectual structure with visual cues and effects [4,5,6,7]. In addition, some recent work in knowledge discovery systems and data mining systems carry out analyses and visualizations a scientific domain [29, 30, 31]. We proposed an approach, which is comparative to ACA analysis to derive knowledge visualization [8]. Figure 1 shows the process of the proposed approach. We propose an approach to construct a full citation graph from the data drawn from the online citation database CiteSeer [2]. The proposed procedure leverages the CiteSeer citation index by using key phrases to query the index and retrieve all matching documents from it. The documents retrieved by the query are then used as the initial seed set to retrieve papers that are citing or cited by literatures in the initial seed set [9, 19]. The full citation graph is built by linking all articles retrieved, which includes more documents than the other schemes reviewed earlier. Factor analysis and pathfinder network (PFNET) [21, 22, 23] are used to analyze the citation network. Factor Analysis Retrieve Data

Co-citation Matrix Pathfinder Network

Citation Network Analysis

Fig. 1. Citation Analysis Process

3 Research Trends in Knowledge Management Many knowledge visualization studies drew their citation data by using a key phrase to query citation indexes. However, it is rather limited to retrieve the citation data by a simple query of citation indexes. Factor analysis and pathfinder network (PFNET) are amalgamated to perform citation analysis. The resulted citation graph was built from the literatures and citation information retrieved by querying the term “Knowledge Management” from CiteSeer on March, 2006. The complete citation graph contains 599,692 document nodes and 1,701,081 citation arcs. In order to keep the highly cited papers and keep the literature to a manageable size, we pruned out papers that were cited less than 150 times. The resultant citation graph contains 199 papers and 640 citation arcs.

364

M.R. Lee and T.T. Chen

In addition, we search literatures published during the eight years between 1998 and 2005 in KM. We divided these eight years into five consecutive overlapping time slots of four years each. Following the same citation processing procedure described above, the citation graph in each time slot is tabulated in table 1. The threshold value is applied to prune out papers that were cited less than the threshold value. The information of the resulting citation graphs are listed on the far right-hand two columns. Table 1. Citation Graphs Data Time span 1 98-01

No. Papers 33,836

No. Links 78,176

Threshold 20

No. Papers 666

No. Links 2,182

2 99-02

27,115

55,819

15

621

1,978

3 00-03

20,178

36,303

10

548

1,680

4 01-04

13,852

21,520

10

265

680

5 02-05

8,506

11,003

5

305

658

3.1 Factor Analysis The factors and the variances of these factors are listed in Table 2. Eighteen factors were extracted which collectively explained 56.028 cumulative variances. Table 2. KM Main Factors and Variances Explained

Factor 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

Descriptive Name Semi-structured/Object Databases Inductive Logic Programming and Learning Logic Programs Machine Learning and Classifiers Knowledge in a Distributed Environment Data Structures for Spatial Searching AI Concept Symbols Data Mining Information Integration of Varying Sources Functional Languages and Development Environment Planning and Problem Solving Distributed Agents Cooperation Modal and Temporal Logic Inductive Logic Programming Views Maintenance Probabilistic Reasoning Computational Geometry Search Engines

Variance Explained 6.406 4.694 4.498 4.312 4.046 3.591 3.303 2.871 2.771 2.568 2.477 2.407 2.251 2.077 2.042 2.005 1.930 1.781

Visualizing Trends in Knowledge Management

365

We identified eight to ten top research themes of each time period and consolidated them into table 3. We took the ten themes found in the earliest time span (1998-2001) as the basis and merged new themes or their variants uncovered in the latter periods. Forty-eight themes found in the five periods are merged into eighteen main trends. Table 3. Top New KM Research Trends 1998-2001

Theme (Trend) 1ʳʳ(1) 2 3 4ʳʳ(2) 5 6ʳʳ(3) 7 8 9 10ʳʳ(4) 11ʳʳ(5) 12ʳʳ(6) 13ʳʳ(7) 14 15 16 (8) 17 18

Descriptive Name

Found Period

Average Ranking

Search Engine and Web Information Categorization Materialized Views and Queries Using Views Information Extraction Using Machine Learning Approach P2P Issues Agent-Oriented System Analysis and Design Auctions Protocols and Algorithms Web Usage and Webʳ Mining Semi-Structured Data Query and Schema Integration Collaborative Filtering Time Series Data Mining Web Annotation and Knowledge Embedding

3 1

4.7 2

1

3

4 4 5 2

3.9 3 4.6 5

3

4.3

1 2 4

9 10 5.3

Continuous Query and Query Over Streaming Data Schema ReconciliationʳandʳOntology Merging Privacy Preserving Data Mining Dialogs Modeling of Agents Automatic Programs Specˁ Generation and Checking Automated Trust Mgmt.ʳ Context and Semantic Rich HCI

4 2 2 3

4 8.5 9 9

3

7

4 5

9 7

From table 3 listed above, eight themes consistently appear in these periods. Theme number 1, 4, 6, 10, 11, 12, 13, and 16 appeared in the earlier period and lasted well into the last period. We therefore regard these eight themes as research trends that will continue to gather interest, evolving, and lasting for sometime. Theme numbers 2, 3, and 9 appear only once in period one and this may imply their transient nature. Or, they may just converge with other themes; for example, theme 3—information extraction using machine learning approach—may converge with data mining or web mining. The study of agent-oriented software engineering, web usage mining, XML data processing, and privacy preserving data mining (themes 5, 7, 8, 14) appear to be insignificant in the last period implying that these studies may be maturing and so are gathering less interest. Theme numbers 15, 17, and 18 appear once in the later period and this may indicate these are ephemeral or emerging studies [1,10, 12, 13, 17, 18, 20, 24, 25].

366

M.R. Lee and T.T. Chen

4 Relationships Between Trends and KM Studies The trends and themes found in the last section seem predominantly related to the Internet. However, we can see the links between them and the KM studies discussed in section three by further analysis. The first trend deals with search engine related issues, which also includes Web information categorization and ranking as well as focused searches of particular information. A focused search engine uses reinforcement learning to optimize its sequential decision making. The second trend is characterized by studies in P2P infrastructure and semantic-based P2P systems. These researches try to provide the conceptual modeling, mechanism, and data model to integrate heterogonous data models or semantics that is local to each peer. The P2P paradigm is very useful when it comes to sharing files over the Internet; however, the more powerful knowledge sharing paradigm and semantic-based retrieval in P2P are hindered by the lack of common semantics shared by the peers. The research in this trend tries to lay down the infrastructure that facilitates effective knowledge sharing and exchange in the P2P paradigm. The third trend includes studies in auctions protocols and algorithm, combinatorial auctions, and preference elicitation using learning algorithms. The parallels between the preference elicitation problem in combinatorial auctions and the problem of learning an unknown function from learning theory were discussed [26]. The fifth trend deals with the issue of semantic annotation [27], knowledge embedding, and data mapping for semantic web [28]. Semantic web promises to provide a common framework that allows data to be shared and reused across applications. However, most of the existing data is in the form of XML not in RDF; a mapping mechanism is required to render the data sources interoperable. The seventh trend encompasses research of ontology merging, schema reconciliation, and knowledge sources integration. The separate ontology development has led to a large number of ontologies covering overlapping domains. In order for these ontologies to be reused, tools and methods that facilitate the merging or alignment of them need to be developed [17]. A key hurdle in building a data-integration system that uniformly accesses multiple sources of data is the acquisition of semantic mappings. A system that uses current machine learning techniques to find such mappings semi-automatically was proposed by researchers. 4.1 Pathfinder Network The Pearson correlation coefficients between items (papers) were calculated when factors analysis was applied. The correlation coefficients are used as the basis for PFNET scaling [23]. The value of Pearson correlation coefficient falls between the range -1 and 1. The coefficient approaches to one when two items correlate completely. Items that closely relate, i.e., are highly correlated, should be placed closely together spatially. The distance between nodes is normalized by taking d = 1/(1 + r), whereas r is the correlation coefficient. The distance between items is inversely proportional to the correlation coefficient, which maps less correlated items apart and highly correlated items spatially adjacent. As we have mentioned earlier that the nodes located close to

Visualizing Trends in Knowledge Management

367

the center of a PFNET graph represents papers contributed to a fundamental concept, which are frequently referred by other peripheral literatures that are positioned in outer branches. Figure 2 shows PFNET scaling with papers under same factors close to each other.

Fig. 2. PFNET Scaling with Papers under Same Factors Close to Each Other. Nodes belong to the same factor are painted with corresponding color of the factor. Nodes that do not belonged to any factor are painted with color palette numbered 0.

The cluster that corresponds to logic programs and AI concept symbols papers (colored in 3 and 7 palette) seems to play a key role in the KM intellectual structure as shown in the PFNET. AI concept symbols include a paper discussing the philosophical problems of AI, a paper addresses the “knowledge level” of computer system, and a book discusses the logical foundations of Artificial Intelligence. A paper in factor seven is dated back as early as 1969.

5 Discussions Based on the research trends and changes in knowledge management communities over the past 20 years, the proposed trends in knowledge management are rivaling [14]. Of

368

M.R. Lee and T.T. Chen

particular interest has been the proposed approach that could be re-used in a number of ways and could possibly be shared across different domains. One of the potential future works of the proposed approach is to develop niche knowledge. The current approach provides a broad view of the trends, but we would like to work more in-depth. Figure 3 shows the mathematic function of Pathfinder Network Scaling [5]. Our current algorithm is to increase the value of parameter q or r, which limit network nodes construction. One of the advantages is to retrieve the most significant paths and nodes. However, the depth of nodes will be missed out.

⎡ k W (P ) = ⎢ ∑ ⎣ i =1 ⎡ = ⎢∑ ⎣ i =1

k −1

W

nink

w

⎤ i ⎥ ⎦ r

1 r

1 r

⎤ w nr i n k ⎥   ∀ k ≤ q ⎦

Fig. 3. Pathfinder Network Scaling Formula

Why do we prefer breadth to depth now? We may argue that the construction of breadth knowledge is easier than depth knowledge. Trends can be provided by the breadth knowledge, but trends are fleeting. Niche knowledge is less about trends and more about vision - they are about what is possible, not what is popular. Niche knowledge also helps to create the medici effect which leads to innovation [15].

6 Conclusions This paper has provided a survey of trends in knowledge management, which have been developed through analysis of citations and relationships among citations in CiteSeer. Factor analysis and Pearson correlation coefficients in PFNET have been implemented and the results obtained in the domain of knowledge management. The research themes that play the central role, according to the layout of PFNET, are knowledge representation, information integration and query related studies, modal and temporal logic, data mining, text categorization and constraint logic programming. The studies of inductive logic programming, machine learning, planning, and active network and mobile agents seem to fall to the side line or just play a peripheral role in KM related studies. Ten of the most important current research trends of KM were summarized. Semantic web, semantic-based P2P and agent systems, Ecommerce related topics, Ontology, and human computer/robot interaction researches are recent popular research trends in the KM domain area. Distributed knowledge representation and reason systems are also research interests due to World Wide Web proliferation. In addition, classifiers and patterns learning, especially in the area of Webs and hidden

Visualizing Trends in Knowledge Management

369

databases with Web front end, are active research areas too. The fusion of disparate research areas such as computer science, economics, and law is an interested new trend; the implication of other research areas, leveraging Internet as a standard platform, may follow this cross discipline trend is expected. Privacy and security related issues are getting more attention due to the burgeoning Ecommerce activities. Research that intertwines World Wide Web, P2P systems, and intelligent agents with classical AI topics seems to be the new direction in large. However, we have only seen limited new research that tries to leverage the rich AI tradition of the past to pursue Web and Internet related fields. One of the future works is to emerge and apply the proposed approach to other disciplines. We also would like to extend the approach to work in-depth to find out niche knowledge of a domain. Niche knowledge adds more value. Niche knowledge creates the medici effect that when you step into an intersection of fields, disciplines, or cultures, you can combine existing concepts into a large number of extraordinary new ideas. More and more, innovation is arising not from particular industries or disciplines, but rather across them [15].

References 1. Boella, G., Torre, L.: Permissions and Obligations in Hierarchical Normative Systems. In: Proceedings of the 9th international conference on Artificial intelligence and law Scotland, ACM Press, United Kingdom (2003) 2. Bollacker, K.D., Lawrence, S., Giles, C.L.: CiteSeer: an Autonomous Web Agent for Automatic Retrieval and Identification of Interesting Publications. In: Proceedings of the second international conference on Autonomous agents Minneapolis, ACM Press, Minnesota, United States (1998) 3. Börner, K., Chen, C., Boyack, K.: Visualizing Knowledge Domains. In: Cronin, B. (ed.) Annual Review of Information Science and Technology. American Society for Information Science and Technology, Medford, New Jersey, vol. 37, pp. 179–255 (2002) 4. Chen, C.: Visualization of Knowledge Structures. In: Chang, S.K. (ed.) Handbook Of Software Engineering And Knowledge Engineering, vol. 2, p. 700. World Scientific Publishing Co, River Edge, NJ (2002) 5. Chen, C.: Searching for Intellectual Turning Points: Progressive Knowledge Domain Visualization. PNAS 101, 5303–5310 (2004) 6. Chen, C., Kuljis, J., Paul, R.J: Visualizing Latent Domain knowledge. Systems, Man and Cybernetics, Part C, IEEE Transactions 31, 518–529 (2001) 7. Chen, C., Paul, R.J.: Visualizing a Knowledge Domain’s Intellectual Structure. Computer 34, 65–71 (2001) 8. Chen, C., Steven, M.: Visualizing Evolving Networks: Minimum Spanning Trees versus Pathfinder Networks. In: IEEE Symposium on Information Visualization, pp. 67–74 (2003) 9. Chen, T.T., Lee, M.: Revealing Themes and Trends in the Knowledge Domain’s Intellectual Structure. In: Hoffmann, A., Kang, B.-H., Richards, D., Tsumoto, S. (eds.) PKAW 2006. LNCS (LNAI), vol. 4303, pp. 99–107. Springer, Heidelberg (2006) 10. Chen, T.T., Xie, L.Q.: Identifying Critical Focuses in Research Domains. In: IV 2005. Proceedings of the Information Visualisation, Ninth International Conference, London, pp. 135–142 (2005)

370

M.R. Lee and T.T. Chen

11. Dahl, T.S., Mataric, M.J., Sukhatme, G.S.: Adaptive Spatio-Temporal Organization in Groups of Robots. In: Intelligent Robots and System, 2002. IEEE/RSJ International Conference, vol. 1, pp. 1044–1049 (2002) 12. Earl, M.: Knowledge Management Strategies: Toward a Taxonomy. Journal of Management Information Systems 18, 215–242 (2001) 13. Feigenbaum, J., Shenker, S.: Distributed Algorithmic Mechanism Design: Recent Results and Future Directions. In: Proceedings of the 6th international workshop on Discrete algorithms and methods for mobile computing and communications Atlanta, ACM Press, Georgia (2002) 14. Guarino, N., Welty, C.: Evaluating Ontological Decisions with OntoClean. Communications of the ACM 45, 61–65 (2002) 15. Hoffmann, A., Kang, B.-H., Richards, D., Tsumoto, S. (eds.): PKAW 2006. LNCS (LNAI), vol. 4303. Springer, Heidelberg (2006) 16. Johansson, F.: The Medici Effect: Breakthrough Insights at the Intersection of Ideas, Concepts & Cultures. Harvard Business School Publishing, Boston (2004) 17. Keim, D.: Visual Exploration of Large Data Sets. Communication of the ACM 44(8), 39–44 (2001) 18. Noy, N.F., Musen, M.A.: The PROMPT Suite: Interactive Tools for Ontology Merging and Mapping. Int. J. Hum.-Comput. Stud. 59, 983–1024 (2003) 19. Ponzi, L.J.: The Intellectual Structure and Interdisciplinary Breadth of Knowledge Management: a Bibliometric Study of Its Early Stage of Development. Scientometrics 55, 259–272 (2002) 20. Subramani, M., Nerur, S.P., Mahapatra, R.: Examining the Intellectual Structure of Knowledge Management, -2002 - An Author Co-citation Analysis, Management Information Systems Research Center, Carlson School of Management, University of Minnesota, 2003, p. 23 (1990) 21. Tews, A.D., Mataric, M.J., Sukhatme, G.S.: A Scalable Approach to Human-Robot Interaction. In: ICRA 2003. Robotics and Automation, 2003. Proceedings. IEEE International Conference, pp. 1665–1670 (2003) 22. White, H.D., Griffith, B.C.: Author Cocitation: A Literature Measure of Intellectual Structure. Journal of the American Society for Information Science 32, 163–171 (1981) 23. White, H.D.: Pathfinder Networks and Author Cocitation Analysis: A Remapping of Paradigmatic Information Scientists. Journal of the American Society for Information Science & Technology 54, 423–434 (2003) 24. White, H.D.: Author Cocitation Analysis and Pearson’s r. Journal of the American Society for Information Science & Technology 54, 1250–1259 (2003) 25. Vaidya, J., Clifton, C.: Privacy Preserving Association Rule Mining in Vertically Partitioned Data. In: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, ACM Press, Edmonton, Alberta, Canada (2002) 26. Vaughan, R.T., Stoy, K., Sukhatme, G.S., Mataric, M.J., LOST,: Localization-Space Trails for Robot Teams. Robotics and Automation, IEEE Transactions 18, 796–812 (2002) 27. Lahaie, S., Parkes, D.: Applying Learing Algorithms to Preference Elicitation. In: Proceesings pf the 5th ACM Conference on Electronic Commerce, ACM Press, New York (2004) 28. Erdmann, M., Maedche, A., et al.: From manual to semi-automation tools. In: Proceedings of the COLING 2000 Workshop on Semantic Annotation and Intelligent Content (2000)

Visualizing Trends in Knowledge Management

371

29. Doan, A.: Madhavan et al Learning to map between ontologies on the semantic web. In: Proceedings of the 11th International Conference on World Wide Web, ACM Press, New York (2002) 30. Mothe, J., Chrisment, C., Dkaki, T., Dousset, B., Karouach, S.: Combining Mining AND Visualization Tools TO Discover THE Geographic Structure OF A Domain. Computers, environment and urban systems 30, 460–484 (2006) 31. Crimmins, F., Smeaton, A., Dkaki, T., Mothe, J.: TetraFusion: information discovery on the Internet. Intelligent Systems and Their Applications, IEEE Intelligent Systems 14, 55–62 (1999) 32. Mothe, J., Dousset., B.: Mining document contents in order to analyze a scientific domain. In: Sixth International Conference on Social Science Methodology, Amsterdam, The Netherlands (2004)