Harnessing collective intelligence: techno ...

5 downloads 276 Views 655KB Size Report
Geoweb online community, including website design and continuous design ..... time, various incentives need to exist to promote participation such as top 10.
Memorial University of Newfoundland

Harnessing collective intelligence: technoontological model

Prepared by: Roman Lukyanenko

2010

Executive Summary The progress of Information Technology provides researchers with new methods of data acquisition and analysis. The last decade has seen both the rise of the Web 2.0 technologies as well as an increased availability of geospatial data. This research presents a techo-ontological model of integrating Web 2.0 Geoweb technology with knowledge from “citizen sensors” in order to facilitate public engagement in scientific issues. The project is based on an action research approach using a social networking website www.NLNature.com, a Newfoundland and Labrador online atlas of wildlife, as the primary testing environment. The research provides an overview of the process of creation of a Geoweb online community, including application design and continuous design adjustments based on user feedback. It describes the identification of target audiences and addresses specific needs of the audience that will be fulfilled though the project. The research deals with design aspects of the online community, with the focus on accessibility and interactivity. The research attempts to find balance between maximizing data acquisition and preventing user dissatisfaction and abandonment. The research explores technical and ontological challenges of citizen scientist data integration. It also addresses the issue of data quality and investigates ways to induce quality of user supplied information through interface and process manipulation.

III

Table of Contents EXECUTIVE SUMMARY................................................................................................................................ III TABLE OF CONTENTS ................................................................................................................................... IV LIST OF ILLUSTRATIONS............................................................................................................................. IV 1.0 1.1 1.2 1.3 1.4 2.0

INTRODUCTION .................................................................................................................................... 5 RESEARCH PURPOSE ................................................................................................................................ 6 BACKGROUND ......................................................................................................................................... 6 RESEARCH SCOPE .................................................................................................................................... 6 RESEARCH METHODOLOGY ..................................................................................................................... 7 BASIC ELEMENTS OF NL NATURE’S INTERFACE ...................................................................... 9

2.1 PROJECT DESIGN STRATEGIES ................................................................................................................. 9 2.2 FOSTERING USER PARTICIPATION .......................................................................................................... 11 2.2.1 User Profile....................................................................................................................................... 11 2.2.2 Addressing Participatory Challenges and Biases.............................................................................. 12 2.2.3 Exogenous Efforts of Inducing Participation..................................................................................... 13 3.0

MODEL OF FACILITATING CITIZEN SCIENCE PROJECTS.................................................... 15

3.1 RATIONALE BEHIND THE NEW INFORMATION MODEL ........................................................................... 15 3.1.1 Limits of Rigid Classification ............................................................................................................ 16 3.1.2 Uniqueness, Information and Utility Loss ......................................................................................... 16 3.1.3 Systemic Incompatibility.................................................................................................................... 18 3.2 ARCHITECTURE AND IMPLEMENTATION OF INSTANCE MODEL .............................................................. 20 3.2.1 Ontological Foundations of Instance Model ..................................................................................... 20 3.2.2 Implementation of Instance Model .................................................................................................... 21 3.2.3 Organization of Properties / Attributes.............................................................................................. 22 3.2.4 Instance Model and Information Retrieval ........................................................................................ 22 4.0

CONCLUSIONS AND RECOMMENDATIONS ............................................................................... 24

REFERENCES .................................................................................................................................................... 26 APPENDIX A. SCHEDULE OF OUTREACH ACTIVITIES ....................................................................... 30 ACCOMPLISHED TO-DATE ................................................................................................................................... 30 APPENDIX B. USER ACTIVITY IN MARCH 2010 ...................................................................................... 31 APPENDIX C. NL NATURE’S NEWSLETTER............................................................................................. 32

List of Illustrations FIGURE 1. NL NATURE PAGE SHOWING USER REGISTRATION AND LOGIN FORM ...................................................... 10 FIGURE 2. NL NATURE MEMBERSHIP ..................................................................................................................... 14 FIGURE 3. NL NATURE'S INTERFACE TO RECORD OBSERVATION ............................................................................. 20

IV

APPLICATION OF GEOWEB TECHNOLOGY

1.0 Introduction The progress of information technology provides researchers with opportunities to collect an abundance of information and also carries challenges related to its use. The last decade has seen a rapid rise of the Web 2.0 technologies, a new approach to the Internet which is based on user-supplied information, and increased user interaction. The proliferation of 2.0 projects coincides with the increased availability of geospatial data, resulting in the rise of Volunteered Geographic Information (VGI) combine Web 2.0 and satellite mapping. This research presents a technical model of a Web 2.0 technology to collect volunteered citizen science information. This paper also explores a very interesting consequence of the new technology – the challenge to reconcile ontological frameworks of science and common world, and constructs a preliminary reconciliatory model, called the instance model. The primary tool to engage public is Web 2.0, an emerging Internet technology with over a billion users (Anderson, 2007). The key components of the new technology include openness, scalability, user-generated content, special architecture of user participation, power of the crowd and managing information of epic volumes (Anderson, 2007). It is a set of technologies and human experiences that keep people excited and engaged on an unprecedented scale. Successful examples of Web 2.0, including Facebook, Wikipedia, Youtube, and Twitter, have tens of millions of users (refer to appendix A for detailed figures). The almost limitless proportions of the Web 2.0 offer vast opportunities for research in every field of human knowledge. According to Goodchild (2007), for example, humans are best sensors of biological and ecological change. Members of general public are much closer to wildlife than scientists are: sometimes wildlife occurs in their own back yard. There are more members of the public than there are scientific specialists. Complete with current availability of inexpensive photo and video equipment, harnessing the power of ordinary people to provide data and observations about the natural world can lead to major advances in natural sciences as well as assist in vital area of wildlife conservation and preservation. Developing methods for public engagement is a true scientific frontier of the modern time.

5

APPLICATION OF GEOWEB TECHNOLOGY

1.1 Research Purpose The purpose of this research is to develop a model of integrating Web 2.0 technology and knowledge from “citizen sensors” to facilitate increased public engagement in scientific issues with a specific focus on wild plants and animals in the province of Newfoundland and Labrador. This model can form a standard framework for similar research in other areas of natural sciences and in other citizen science projects. 1.2 Background This specific project originated as part of a larger cross-Canada research on the Participatory Geoweb (funded by the GEOIDE network) which is being led by Dr. Renee Sieber of McGill University. The larger project aims to “examine what defines effective participation through the Geoweb, contextualize observations and opinions on environmental change, and develop a technical and policy infrastructure to support climate change response and adaptation” (“The Participatory Geoweb,” 2008). In the context of this research Geoweb can be defined as an online merger of spatial data and user-supplied content. The Newfoundland and Labrador “node’” of the larger Geoweb project is the www.NLNature.com project. The intent of NLNature.com is to investigate how public uses Web 2.0 technologies in the context of wildlife, and to provide researchers a user-supplied ecological data that can be used in further research (Wiersma, 2010). This research project is an attempt to convert the enthusiasm of Web 2.0 community into a scientifically viable and valuable asset. For the majority of its discoveries and tests, the project will use a real-time Web 2.0 / Geoweb environment (www.NLNature.com), which promotes wildlife and nature conservation efforts in Newfoundland and Labrador. The dynamism of NL Nature allows for dynamic hypothesis testing and presents a unique form of research implementation. 1.3 Research Scope The scope of this research will include an overview of implementation of Geoweb online community, including website design and continuous design adjustments based on user feedback for the Newfoundland and Labrador node of the large Geoweb project. This proposal will describe the target audience for NLNature.com and addresses specific needs of the various audiences that will be 6

APPLICATION OF GEOWEB TECHNOLOGY

fulfilled though the project. The research will deal with design aspects of the online community, with the focus on participation and interactivity. The research will attempt to find a balance between maximizing data acquisition and preventing user dissatisfaction and abandonment. A distinctive section of the research will dedicated to the examination of the new opportunities that Geoweb projects unravel in the areas of data integration between ontological frameworks of ordinary people and scientists. A model of such integration called an instance model will be presented. It is important to note, the research will be limited to a high-level discussion of the programming and coding techniques. While it does touch on some elements of technical implementation, a step-by-step programming guide to creation of a Geoweb project is beyond the scope of this project. 1.4 Research Methodology The methodology of the current research has two components: technological and ontological. Technological components include the approaches to interface design, interactivity, accessibility; and uses action research as its primary method. The ontological foundation mostly addresses issues of user engagement, information gathering, and knowledge sharing. Both methodological approaches are interdependent: the ontological approach often determines technologic implementation and available technology often refines ontological theory. The dynamic nature of the Web 2.0 environment necessitates a flexible approach to the research methodology. The technological paradigm of the project is action research, which was developed by Kurt Lewin in the 1940s and has been used extensively in social sciences and, more recently, in information technology (O'Brien, 2001). Action research is applicable to the challenges of engaging the public because it calls for continuous theory testing based on a dynamically developing research environment. www.NLNature.com is the main testing environment where experiments will be conducted to determine optimum design, information balance, and features that maximize user involvement. The results of the user interaction with NL Nature will produce a set of data that can be used empirically. The findings from that data will lead to further adjustments to the user environment of the NL Nature online community. Thus, the broad basis for the project is an empirical one with continuous feedback adjustments. 7

APPLICATION OF GEOWEB TECHNOLOGY

In addition to the action method, this research also relies on the best practices of the Web 2.0 development. In recent years, the research community has been paying increased attention to the Geoweb and Web 2.0 technologies (see Elwood, 2008). Research in this direction even resulted in the birth of new scientific disciplines, such as neogeography (see The Participatory Geoweb, 2010 and Hudson-Smith, A. and Crooks, A., 2008). Many findings of this prior research have been incorporated in the design of the NL Nature project, which allows theories to be tested in a real environment. The ontological basis for the project is the philosophical works of Mario Bunge (1977). This paper takes a careful approach to Bunge’s ontology and sees it as a set of guiding principles rather than a prescribed solution. Throughout time, thinkers have offered many visions of the universe and ways information can be organized. None has been unanimously accepted. For this research we borrow those elements of Bunge’s ontology that we believe have the most universal appeal, and the greatest applicability to a Geoweb project. This research also tests the implementation of the ontological framework as part of its action research. Finally, the techno-ontological framework is supported by relevant studies in the fields of psychology and information systems. Psychological theories of cognition and similarities provide useful assistance for both interface design as well as information modelling. Building on Bunge’s ontology Wand and Parsons (2000) have developed an instance model of data storage and organization. Working within the framework of the instance model, we attempt to extend this model to issues of identification, data presentation, and citizen science.

8

APPLICATION OF GEOWEB TECHNOLOGY

2.0 Basic Elements of NL Nature’s Interface This section deals with a set of design challenges typical to Web 2.0 participatory projects and serves as a preamble to the following section, which develops a novel techno-ontological model of data acquisition and analysis. 2.1 Project Design Strategies The user interface plays an important role in a success of an online technology. Most research on interface design acknowledges centrality of a target user in making design choices (O'Reilly, 2002; Naaman et al, 2006; Komarkova et al, 2007; Noyle & Bouwman, 2009). Given the primary objective of NL Nature to serve general public, user centrality dictates that the application should be easy to use and simple to understand. The website needs to be "visually balanced with enough contrast, typographically correct, readable, and should be using familiar presentations" (Komarkova et al, 2007, p. 266). Similarly, NL Nature uses two primary colors with most contrast: black font on white background. Its layout is typical: the main navigation menu is on the left hand side of the page (Figure 1). The overall presentation is static with little multimedia effects. Simplicity is promoted in order to maintain focus on the user-supplied information – the key to a Web 2.0 projects success (Anderson, 2007). The design of the user registration form is an important, and often, overlooked elements of user participation. As a potential barrier to becoming a registered member, a registration form can either hinder or foster site participation. A typical Web 2.0 website needs as many users as possible in order to harness the power of interactivity (O'Reilly, 2002). Thus, converting visitors into members should be a high priority. Since non-registered members have little invested interest into the website, clarity and reduction of effort are important. From this follows that the fewer information is asked at the point of registration, the more enticing it would appear. NL Nature requires a user name, display name, a password and a captcha – a random common sense question that prevents computer scripts from registering. The entire form is visible to the user at once (Fig. 1), thus showing that the process is quick and simple. One of the key aspects of NL Nature’s registration form is its user name field which asks for an email address. In doing so, the website asks for a unique piece of information that most people know and remember. At the same time, the project gains

9

APPLICATION OF GEOWEB TECHNOLOGY

a valid email address that can be used to contact members when deemed appropriate.

Figure 1. NL Nature page showing user registration and login form

Strategies implemented in the registration form are designed to maximize the network effect. Network effect is a social and economic concept of added value from a new user (Anderson, 2007). It is based on the assumption that existing members benefit from the new member joining, because a marginal member creates new point of contact for every existing one. If the site has 100 new members, a 101st will produce 100 potential connection points. Every new user reenergizes the remaining ones. This explains exponential growth of both Internet as well as Web 2.0. Other network effect-related strategies include new member notices, live information updates, and periodic newsletters. Emphasizing the appearance of a new member on the front page of the website shows both existing members, as well as general public, that a new connection is possible. The same concept is behind live updates that show new content of the website on various pages. By refreshing the content, NL Nature strives to match new information with interests of its members. Finally, periodic newsletters are emailed to all registered members. Newsletters contain information on key developments of the project, and are designed to reengage existing site members. Interface design can be used to measure the effect of various biases that are typical to an online project. Such biases include geographic, demographic, 10

APPLICATION OF GEOWEB TECHNOLOGY

technologic and others. While the primary purpose of the registration form is to convert a visitor into a member and retain a valid email address, another form is needed to acquire basic user information. Users are known to refrain from giving personal data. To overcome that resistance, NL Nature’s profile form is designed with every field being non-mandatory. Presenting user profile as a choice appears fruitful: of 89 members of NL Nature (as of July 20, 2010), over 56% volunteered personal information, including their city/town (56%), gender (56%), province of residence (when not in NL?) (56%), occupation (43%), year of birth (38%), and approximate hours spent outdoors (38%) (Wiersma and Lukyanenko, 2010). As evidenced from the difference in the rates of volunteering information, users have different degrees of sensitivity to their private data. It is important to note that capturing private information requires design compliance with Personal Information Protection and Electronic Documents Act of 2000. NL Nature’s Act compliance is ensured through a variety of security measures, as well as abstinence from publicly displaying detailed profile records. Interactivity is a major attraction of Web 2.0 applications. Interactivity is a user perception of a project as a live and dynamic environment. Web 2.0 facilitates interactivity through user-supplied content that if added regularly will give an impression of dynamism. Yet, successful projects should not rely only on users as a source of interactivity. As O’Reilly famously warned, “software will cease to perform unless it is maintained on a daily basis" (2002). NL Nature’s content requires continuous upkeep. Slight modifications to design layout can also enhance interactivity by giving an impression that the project is alive. 2.2 Fostering User Participation 2.2.1 User Profile Development of any interactive technology should be user focused. This project’s main audience is citizen scientists. The definition that fits well with the objectives of the research defines citizen scientists as members of general public that are enlisted to collect the large amount of data needed for scientific purposes (Bonney at al, 2009). Many citizen scientists are capable of producing high-quality scientific information. In fact, according to Goodchild, the major distinction between scientists and amateurs is the general lack of ability to reason beyond observations, the distinguishing quality of the scientific domain (2008, p. 12). In most other aspects, however, Goodchild concludes, the distinctions between professional and amateur 11

APPLICATION OF GEOWEB TECHNOLOGY

are "quickly blurring," and this new reality simply needs to be formalized by better semantics in English language. This broad definition implies a broad number of people, and it is difficult to create a typical user profile, since anyone technically capable of interfacing with NL Nature can participate. As the result, the design of the website needs to accommodate different levels of both scientific and technical expertise. The project also needs to address a set of challenges typical to technology-driven participatory projects. 2.2.2 Addressing Participatory Challenges and Biases There is a typical set of challenges that tends to be common for any new technology with an explicit public component, including public apathy, managing uneven participation, and maximizing interconnectedness. Several design-level approaches have been developed to deal with each of the challenges. While NL Nature proponents maybe excited about scientific value of new technology, citizen scientists themselves may experience a general feeling of apathy to the new technology. The prevalence and significance of “public apathy and…low expectations” needs to be considered when designing participatory projects (Holmes, 2001, p. 2). When dealing with apathy, Holmes (an authority on eGovernment) states that the more public feels their actions have direct impact, the more likely they will be involved (2001, p. 285). This challenge is implicitly addressed by the nature of Web 2.0 which relies on user-supplied information for majority of its content. NL Nature is shaped by user participation. As new species are created, the engine responds by dynamically updating graphs, counts, lists. Each set of user-supplied data on NL Nature can be commented by other site members, thus increasing the feeling that a volunteered contribution has a direct impact. Users shape the way NL Nature looks and performs. A second set of challenges deals with different levels of participations typical to a citizen-science project. Few users tend to produce the most content on a website. User participation tends to follow Pareto or long-tail distributions. This is true of NLNature; of 195 observations made on NL Nature as of July 14, 2010, 74 were made by a single member (Wiersma and Lukyanenko, 2010). While such user activity is overall beneficial, it is also known to cause adverse effects including user apathy and decreased diversity of ideas. Compared to frequent contributors, ordinary 12

APPLICATION OF GEOWEB TECHNOLOGY

members may feel inferior, and their impact insignificant, which can lead to user apathy. To prevent that, a website needs to feature members using various filters (geography, random, user of the day) to showcase low-contributing members. At the same time, various incentives need to exist to promote participation such as top 10 lists, participation prizes. In order to take advantage of interconnectedness that keep people excited, it is important to understand participatory patterns of site members. NL Nature follows fundamental principle of human communication that was summarized by Walden in 1854: "We are in great haste to construct a magnetic telegraph from Maine to Texas; but Maine and Texas, it may be, have nothing important to communicate" (Briscoe et al, 2006). Since members differ in their interests and goals, the focus should be on connecting like-minded users. On NL Nature it is achieved through the algorithm that determines a set of similar observations (location, species, etc), and presents the results on maps that also contain links to member profiles. An ability of a member to contact another member is achieved by posting comments to observations that trigger an email notification to the sighting author. 2.2.3 Exogenous Efforts of Inducing Participation Any web project that has a social component requires a certain number of visitors in order to remain enticing for others to join. This is called “critical mass”, or a “tipping point”, a point after which it is truly the masses that shape the virtual landscape, and the website creators/authors become mere observers. Once the critical mass is reached, the size of the network effect alone is adequate to create selfsufficient user activity. While effective design facilitates participation, it fails to attract new members. For that a series of exogenous activities are needed. As part of the investigation into participatory aspects of a Geoweb project, we conducted five types of activities: 

project presentations to target audiences,



workshops and public displays,



media attention,



information brochures, and



search engine submissions and online promotion.

13

APPLICATION OF GEOWEB TECHNOLOGY

Spaced two to three week apart from each other, each activity has been monitored to determine its effectiveness and allow for design and tactical adjustments. Based on the data analysis and evaluation, the functional adjustments have been made. For example, during the presentation at the Newfoundland’s Natural History Society on February 18th 2010, several audience members suggested broadening species categorization. The suggestion was deemed appropriate and allowed for a variety of new species to be added since the adjustment was made. The analysis of the outreach activities and participation levels (number of new members and number of observations), indicates that participation can be exogenously induced. March 2010 saw the most outreach activities of various types: meetings, presentations, search engine submissions, etc (Appendix A). At the same time, 38 of 89 members joined during that month (fig 2). Closer look at the process reveals that media promotions were most effective at recruiting new members. Majority of new members, 20 joined during the first week of March (Appendix B). It was the time of an intensive media campaign, which included newspaper articles, TV and radio interviews. Close to 10 percent of 189 observations were made during the same period, many of which were made by new members that recently joined.

NL Nature Membership 40 30 20 10 0 O

ct0 N 9 ov -0 D 9 ec -0 9 Ja n1 Fe 0 b1 M 0 ar -1 A 0 pr -1 M 0 ay -1 0 Ju n10 Ju l-1 0

No Members

Figure 2. NL Nature Membership

14

APPLICATION OF GEOWEB TECHNOLOGY

3.0 Model of Facilitating Citizen Science Projects 3.1 Rationale behind the New Information Model The abundance of information on the Internet in general and on Web 2.0 websites in particular, creates need for a good organization. Existing in “overcommunicated society,” (term coined by marketing gurus Al Ries and Jack Trout in 1972), a user values efficiency and relevancy of desired information and has high degree of sensitivity to information “noise.” Creating an easy to understand website structure improves user experience and increases participation. Desirability of a clear and simple navigational and information presentation is rooted in basic cognitive principles of classification. Making sense of things by creating cognitive order is a fundamental physiological need. Classifying objects as “good” and “bad” were critical for human survival. “Without the ability to categorize, we could not function at all, either in the physical world or in our social and intellectual lives” (Lakoff, 1987, p. 6). Traditionally, information systems architecture has relied on the application domain for guidance in information organization. Organizational composition has been passed from the application domain and affected such website elements as navigational menus, drop down lists, site maps, text headings, meta tags, etc. According to Noah and Lloyd-Williams (1998), for example, the steps of domain modeling include representation of the application domain, resolving any inconsistencies within it, and transformation of the model into programming logic (p. 199). Similarly, Fowler (2002) recommends developing application elements “that mimic the data in the business and objects that capture the rules the business uses” (p. 116). The substance of NL Nature project as a citizen scientist initiative poses a challenge that traditional domain modeling techniques cannot resolve. The logic of traditional approach suggests that NL Nature needs to choose an application domain and try to produce an accurate representation of it. Yet, what is the application domain of the project, and what is its structure? NL Nature is a synergy of at least two domains: the scientific world of biology and terra incognito world of biology amateurs. Each has its own set of objects, concepts and rules. How can a traditional website with one menu, one navigation map, and one overall structure can accommodate the 15

APPLICATION OF GEOWEB TECHNOLOGY

infinite diversity of common perceptions of wildlife held by the public? In this research we argue that not only are attempts to find a unified structure for scientists and general public futile, but any unified structure, even serving one isolated domain, is a limiting solution. A perfect unified solutions model is impossible because no application domain has a perfect unified structure. 3.1.1 Limits of Rigid Classification Responding to a fundamental need to organize information, scientific disciplines strived to create a perfect way to present their respective domains. Trying to liberate own discipline from “chaos,” scientists proposed systemic ways to organize knowledge. Attracted to the simplicity of a perfect unified solution, prominent thinkers packed the infinity of knowledge into such influential schemas as periodic table of elements, Napoleonic code, Marxist economic models, laws of classical physics. In biology, the target domain of this research, Carl Linnaeus brought “simplicity” into an area of “total chaos” (Mayr, 2009, p. 173). Similarly to Mendeleev’s periodic table, Linnaean taxonomy attempted to provide a unified structure of the natural history. He proposed to group organisms based on their shared properties. Later, under the influence of Darwinism, common descent was also factored in, and added to the biological classification. While specific propositions of the Linnean taxonomy have been a subject of on-going debates (see: The Oxford Handbook of Philosophy of Biology, 1998, pp. 160-167), a unified classification of species has become a universal paradigm. When modeling a biology domain, the biologic taxonomy of species, genus, family, order, etc. would be an appropriate solution according to the traditional modeling framework. We contend that this approach is limiting because it is based on the flawed assumption that the application domain can be organized in a static, rigid way. 3.1.2 Uniqueness, Information and Utility Loss Unified static structure of any knowledge is impossible because elements of real world are diverse and unique. Both philosophy and cognitive perception of the reality dictate that “there are no two identical entities” (Bunge, 1977, p. 90). Even the quantum physics law that states that atoms and particles of the same element are identical can be refuted once we add a state and time dimension to it. Truly no two things however identical they can be “can occupy the same state at the same time” (Bunge, 1977, p. 256). Let us call this a uniqueness theorem. 16

APPLICATION OF GEOWEB TECHNOLOGY

From the theorem of uniqueness follows that even the most similar entity will always possess a combination of properties that would be unique to that entity. For example, two lichens that belong to the exact same species can grow on different trees and be at a different stage of its life cycle. If we replicate life using a traditional model, we should have an object for every individual organism. Since that is unrealistic, we may go one level higher, and suggest an SQL table that house information behind the scenes, for every species. That is not realistic either, since Gross Morne National Park alone claims to have “over 400 species of lichens” (Gross Morne National Park, 2006, p. 2). If we go one more level higher, and have a category for a group of species, such as lichens, birds, large mammals, we may easily create over 20 tables, which is also not an elegant solution. It is important to note that with every higher level of abstraction, valuable properties of the originally observed individual species are inevitably lost. Thus, optimum domain design can only be achieved at the cost of information and representation. Any system designed using the traditional model has that inherit constraint. The cognitive purpose of classification – ability to infer, is being encumbered by systemic property loss (Parsons & Wand, 2000, pp. 233, 238). Adopting any given taxonomy means that potentially useful inferences that can be made from infinite number of alternative taxonomies are always out of reach for the application domain. Unified classification is impossible because each time we classify a phenomenon, it is done with for a specific utility, which varies contextually. A tiger in the woods represents imminent danger, while the exact same tiger on a photograph caries aesthetic value. The process of classifying phenomena is documenting “observations about relationships among properties of phenomena” (Parsons and Wand, 2008, p. 1040). Yet, since every object is unique, there are an unlimited number of possible relationships, each reflecting specific classification objectives; and a theoretically infinite number of classifications exist. Modeling the domain with a given taxonomy or structure misses potential utility from infinite number of alternative classifications. Being the primary carrier of information, language is a powerful tool in mapping the world. While the discussion of salience between language and ontology is beyond the scope of this research, we believe that both exist in a complex, and interdependent relationship. Further, we believe that this complexity adds to the implausibility of a rigid classification. In 1979 Frank Keil’s presented a theory of one17

APPLICATION OF GEOWEB TECHNOLOGY

to-one mapping of the syntactical structure of language and general ontology of knowledge. Keil believed that language adjusted to such ontology by creating sentences with rigid predicate-term structure that exactly match human perception of the world (1979, p. 42-44). To test Keil’s theory, Gerard and Mandler (1983), created an experiment where testers were instructed to map predicates with their syntactical subjects. Based on the experiment, the authors concluded that “the flexibility of language provides a range of … interpretation that escapes the boundaries of a rigid hierarchy.” (1983, p. 119-120). Therefore taxonomic disagreements can be caused by individual interpretations of common terms. 3.1.3 Systemic Incompatibility Considering that a single domain is incapable of producing a perfect unified structure, it is more challenging to create a unified structure that represents multiple domains. One of the key objectives of NL Nature project is enriching traditional scientific domains with citizen scientists’ data. This objective creates at least two major classifications of species: a scientifically accepted Linnaean taxonomy and a structure of natural history that is easy to understand for an ordinary person. The structure of the former poses few challenges for the IS domain to implement, with one-to-many relationships between database objects and application-level classes mimicking a rigid hierarchical model envisioned by Carl Linnaeus. The user interface that follows this classification model uses traditional flow of drop down lists that narrow down user selection from biological kingdom, phylum, class, order, family, genus, to finally, species. Yet, such approach is incompatible with the needs of the website’s target audience – ordinary people. Traditional domain modeling produces a unique data storage and quality challenge for citizen-scientist projects. Volunteered information is often to be vague and imprecise, yet it needs to be stored in a rigid confines of a structured system. Consider a real example that was observed on NL Nature in May 2010. Marnhull, a member of a website, made an observation and using a rigid classification picked a species Old Man's Beard, while the only thing the user truly knew were observable visual appearance and the habitat. Following the user’s choice, the record was stored in the NL Nature database as Old Man's Beard. Hours later, Mac Pitcher, an industry expert, an industry expert, indicated that the observation was not likely to be Old Man's Beard. Thus, the validity of the stored information was questioned. Yet, no proposed alternative was given. By keeping this record as Old Man's Beard (Usnea) 18

APPLICATION OF GEOWEB TECHNOLOGY

we knowingly store potentially invalid information. The only alternative that traditional approach suggests is classifying the observation as "Unknown lichen," a category that houses dissimilar unknown species and carries little inferential value. As indicated by the above example, traditional domain modeling produces a variety of inherit challenges and incompatibilities that it struggles to resolve. Incompatibility 1: Classification misalignment. To understand this weakness, first consider yourself having observed what you think was a coyote, and now ready to share your observation on NL Nature. You know coyote is an animal, and thus, you select “animal” at the kingdom level drop down list. You are then presented with a number of choices: Chordata, Echinodermata, Hemichordata, Xenoturbellida. Few would intuitively choose Chordata without supportive information. Clearly, a scientific classification misaligns with our own perception of a coyote as a dog-like animal. Classification misalignment is an inherit consequence of infinite number of classifications reflecting unlimited relationships between phenomena. Incompatibility 2: Overreliance on structure.This weakness becomes clear when we have observed a bird, but don’t know what species it belongs to. It reflects an underlying principle of cognition. We first learn about the phenomenon and then establish links between that phenomenon and the rest of our knowledge base. The links we create can subsequently evolve into relationships and classifications. We know we observed a bird, and without knowing what species the bird is, we cannot use the traditional NL Nature interface (figure 3) without considering what features of the unknown bird are common to the features of the birds on the NL Nature’s dropdown list. Even for a professional biologist it is not Linnaean taxonomy, but the attribute similarity is what allows classifying a black and white bird with long bill and

19

APPLICATION OF GEOWEB TECHNOLOGY

pointed wings as a gannet.

Figure 3. NL Nature's interface to record observation

Incompatibility 3: Semantic ambiguity. A language gap between scientific community and common people produces information ambiguity. For example, a category of “shore birds” to some visitors may be the same as the category “sea birds,” while other NL Nature members can view them as two separate classes. If, as according to Gerard and Mandler (1983), a unified classification is incapable to satisfy one individual, it will fail to satisfy the needs of an environment as diverse as NL Nature, where people with vastly diverse backgrounds converge. 3.2 Architecture and Implementation of Instance Model Traditional static structure-driven model of storing and displaying information does not fully satisfy the requirements of an application domain, a task that an information system is expected to perform. A new approach to classification, and data representation is required. This research tests and investigates a new model of collecting, storing and displaying information that was coined “instance model” by Parsons and Wand (2000). 3.2.1 Ontological Foundations of Instance Model The ontological basis of the instance model is the philosophical works of Mario Bunge (1977). According to Bunge’s ontology every object, called “thing” possesses a unique set of properties: “what makes a thing what it is, i.e. a distinct individual, is the totality of its properties: different individuals fail to share some of their properties” (p. 111). Classes are then formed based on the required inferential utility and are defined 20

APPLICATION OF GEOWEB TECHNOLOGY

as a set of overlapping properties. Since there is a multitude of properties that are defined for each object (instance), there can be an infinite number of potential classes based on the context and utility. Thus, Bunge’s ontology is consistent with the assumptions of uniqueness and supports infinite classifications. 3.2.2 Implementation of Instance Model By shifting the focus from a predefined classification to the object itself and its properties, we can solve both the issue of describing the application domain as well as integration of multiple domains into one application. With this approach we do not need to model a domain. It is sufficient to ensure that the application has a comprehensive collection of objects (instances) and each object contains a set of well defined properties. When required, a user can assemble a dynamic classification based on the collection of properties that are of interest at a given moment. Thus, if one property is given, such as “behaviour” then the system will create at least two classes: one for nocturnal and one for diurnal animals. The same system can also use property that connects each species with a biological taxonomy to reproduce scientific biological classification. Thus, the instance model is capable of achieving the objectives of a traditional classification without its inherit limitations. Attribute-based design will facilitate efficient and effective data collection from citizen-scientists. The data collection interface will be designed based on Bunge’s postulate of primary of phenomenon and its attributes over classification. The primary object of an observation, a species, will be identified via a critical mass of usersupplied attributes, rather than from a checklist of species that is only usable by more expert amateurs. A user will be asked to identify those attributes (e.g., size, colour, appearance, behaviour, location, sound) of a species that he / she were able to observe. Once a number of attributes are chosen, the system will match them with pre-exiting set and either infer a species, or ask for additional attributes that could also be automatically inferred from the previously supplied. The final attribute set can potentially match to multiple species. This solution represents a realistic approach to a citizen-scientist project. Non-experts do not always know exactly the phenomenon that was observed. It is more realistic to expect a volunteer to remember some features of unknown species then to expect a precise classification and identification. The key activity of identification therefore shifts from development of a perfect classification to development of an effective attribute management. The more the

21

APPLICATION OF GEOWEB TECHNOLOGY

system can guide in the choice of attributes, the higher inferential value such record hold, and the easier object-attributeset matching becomes. 3.2.3 Organization of Properties / Attributes Being attributes-centered, the core element of the instance design is an efficient attribute management system. The foundation of an attribute management system is cognitive principles of using attributes. The fundamental research on systematicity, the cognitive process of “mapping of systems of mutually constraining relations, such as causal chains or chains of implication,” carried out by Gentner and Toupin (1986), compared the relative importance of attributes and relationships between domains (p. 277). The research concluded that overreliance on attributes alone produced surface level links, and where subjects had a choice, systematicity, overrode information matching based on attributes (p. 282). The design implications of the systematicity theory for NL Nature affect attribute management design. If NL Nature were to present an array of attributes without organizing them in a systematic way, then we may expect vague and misleading attribute-based observations. For example, consider a scenario where a member observed a lichen, and is faced with the following set of attributes to match against the observed ones: dark, nocturnal, grey, grows on trees, grows on rock, has red dots, solitary. The underlined attributes are generic attributes that belong to a wide range of species while the italicized attributes are the lichen-specific ones. Based on Genter and Toupin’s theory, there is no overarching system presented, and the user may tag several attributes (dark, grey, solitary) in addition to the lichen ones. This will cause the NL Nature attribute system to record misleading and irrelevant information. It is unrealistic to expect users to create a unifying attribute system and filter the non-applicable attributes. To take advantage of the cognitive power of systematicity, NL Nature’s interface should filter as many of the non-applicable attributes as possible. 3.2.4 Instance Model and Information Retrieval The abundance of information in a typical Web 2.0 application makes the task of information retrieval especially important. The objective of the system is to deliver precise and concise result set to match the information requested by the user. The key concepts of information retrieval are precision and recall. Precision is the degree is exactness and recall is the measure of completeness of an information query. The following are the classic formulae of each concept:

22

APPLICATION OF GEOWEB TECHNOLOGY

Information systems domain has a variety of techniques to improve precision of search results. Boolean logic used in the SQL, the standard language of database retrieval, allows information architects to deliver relatively accurate results. For example, if we assume an NL Nature user wants information about coyotes around Corner Brook, NL, a system can query the base data with the keyword “coyote” in the title AND (Boolean operator) a geographic area around Corner Brook. If the result returns n records, most likely majority of them would be correct, thus resulting in a near 100% precision. However, such search would ignore observations with coyotelike attributes that are not explicitly tagged as “coyote,” thus resulting in a lower recall. It is largely agreed that “the central problem of information retrieval is the improvement of recall, without reducing precision” (Dynamic Taxonomies and Faceted Search…, 2009, p. 43). By employing properties for information retrieval, we can maximize the recall. Since NLNature allows users to post an observation without identifying the species, we can assemble a number of attributes that overlap with the typical set of attributes of a coyote, and present the resulting observations for user to consider. As the result, if needed, observations where users were not sure if the species were indeed a coyote, but describe their sightings with coyote-like attributes can be added to the result set that contains explicitly identified coyotes. .

23

APPLICATION OF GEOWEB TECHNOLOGY

4.0 Conclusions and Recommendations The progress of information technology provides researchers with new tools of acquiring and organizing information. The rise of the Web 2.0 technologies as well as an increased availability of geospatial data, created a fertile environment for citizenscientist projects. In addition to opportunities new technology and methods produce a number of challenges. Through experiments we have determined that many typical participatory issues can be addressed through efficient interface design. Active research cycles demonstrate that participation can be exogenously induced, and endogenously maintained through a series of design strategies. A ubiquitous challenge of a citizen scientist project is integrating knowledge of ordinary people and scientific community. Conventional wisdom of integration requires merging scientific and folk classifications into a sufficing global structure. Equipped with philosophical and cognitive principles, we demonstrated the limitations of this approach and proposed a new techno-ontological model, we called “instance model” after a similar one used in information systems design. We contended that by shifting the focus from a predefined classification to the object and its properties, we can solve both the issue of accurate application domain modeling as well as integration of multiple domains into one application – the essence of citizen scientist initiatives. The relative novelty of the key concepts of our research has potentially profound implications for science and technology. A quasi-experimental study is necessary to measure the impact of the instance model on user participation, data quality, and usefulness to scientists. A test of the techno-ontological model can lead to conclusions that challenge traditional frameworks of both science and technology. It is highly feasible that discoveries will be made in the traditional scientific domains. Determining the degree to which unstructured data organization and presentation can solve problems is an open-ended objective with epic scope. In addition to own experiments, we recommend developing projects liberated of rigid classifications in other areas. These basic ontological and cognitive principles can be applied to medicine (diagnosing, illness treatment, patient development), criminal law (decodification), astronomy (planet, galaxy classifications), biology (taxonomy), and other disciplines. Areas of technology, such as web design can 24

APPLICATION OF GEOWEB TECHNOLOGY

implement instance model in interface design, site menus, site maps, and overall structure. This research touched on a number of challenges of a rapidly developing stream of science – citizen science. Harnessing the power of ordinary people to provide data and observations about the natural world can lead to major advances in natural sciences as well as assist in vital area of wildlife conservation and preservation. Developing methods for public engagement is a true scientific frontier of the modern time.

25

APPLICATION OF GEOWEB TECHNOLOGY

References Anderson, P. (2007). What is Web 2.0? Ideas, technologies and implications in education. Retrieved January 22, 2010 from, http://www.jisc.ac.uk/media/documents/techwatch/tsw0701b.pdf Bonney, R., Cooper, C.B., Dickinson, J., Kelling, S., Phillips, T., Rosenberg, K.V. and Shirk, J. (2009). Citizen Science: A Developing Tool for Expanding Science Knowledge and Scientific Literacy. BioScience. 59 (11). P. 977-984. Briscoe, B., Odlyzko, A., Tilly, B. (July 2006). Metcalfe’s Law is wrong. IEEE Spectrum. July 2006. Retrieved January 25, 2010 from http://spectrum.ieee.org/jul06/4109 Bunge, M. (1977): Treatise on Basic Philosophy (Volume 3), Ontology I: The Furniture of the World, Boston: Reidel. De Man, E. (2007). Beyond Spatial Data Infrastructures there are no SDIs – so what. International. Journal of Spatial Data Infrastructures Research, 2007, Vol. 2, 1-23. Retrieved January 22, 2010 from, http://ies.jrc.ec.europa.eu/uploads/SDI/ABSTRACT%20%20Beyond%20Spatial%20Data%20Infrastructures%20there%20are%20no %20SDIs%20%20%20ERIK%20DE%20MAN.pdf Elwood, S. (2008) Volunteered geographic information: key questions, concepts and methods to guide emerging research and practice. GeoJournal 72:3-4 , pp. 133-135. Gross Morne National Park Newfoundland & Labrador, Canada. (2006). United Nations Environment Programme. World Conservation Monitoring Centre. Retrieved May 22, 2010 from, http://www.unepwcmc.org/sites/wh/pdf/Gros%20Morne.pdf Dynamic Taxonomies and Faceted Search: Theory, Practice, and Experience (The Information Retrieval Series) (1 ed.). (2009). New York: Springer. Fowler, M. (2002). Patterns of Enterprise Application Architecture (1st ed.). New York: Addison-wesley Professional. 26

APPLICATION OF GEOWEB TECHNOLOGY

Gentner, D. and Toupin, C. (1986). Systematicity and Surface Similarity in the Development of Analogy. Cognitive Science. 10, 277-300. Gerard, A.B. & Mandler, G.M. (1983). Ontological knowledge and sentence anomaly. Journal of Verbal Learning and Verbal Behavior 22, pp. 105–120. Goodchild, M.F. (2007). Citizens as sensors: The world of volunteered geography. Geo Journal 69:211-221. Holmes, D. (2001). eGov: E-Business Strategies for Government (1 ed.). London: Nicholas Brealey Publishing. Hoover, N. (2007). Microsoft gives peek at next version of SQL Server. InformationWeek | Business Technology News, Reviews and Blogs. Retrieved March 13, 2010, from,http://www.informationweek.com/news/windows/microsoft_news/showAr ticle.jhtml?articleID=199500164 Hudson-Smith, A. & Crooks, A. (2008). The renaissance of geographic information: neogeography, Gaming and Second Life. UCL Working papers series. Paper 142 – Aug 08. Retrieved January 22, 2010 from, http://www.casa.ucl.ac.uk/working_papers/paper142.pdf Keil, F. C.,(1979) Semantic and conceptual development : an ontological perspective. Harvard University Press, Cambridge, Mass. Lakoff, G. (1987). Women, Fire, and Dangerous Things: What Categories Reveal about the Mind. Chicago, University of Chicago Press. Mayr, E. (1985). The Growth of Biological Thought: Diversity, Evolution, and Inheritance. Cambridge: Belknap Press. Noah, S. A. and Lloyd-Williams, M. (1998). An evaluation of two approaches to exploiting real-world knowledge by intelligent database design tools, in: Ling, T W, Ram, S and Lee, M L, eds., Proceedings of the 17th International Conference on Conceptual Modelling. Berlin: Springer-Verlag: 197-210. O'Brien, R. (2001). Um exame da abordagem metodológica da pesquisa ação [An overview of the methodological approach of action research]. In Roberto 27

APPLICATION OF GEOWEB TECHNOLOGY

Richardson (Ed.), Teoria e Prática da Pesquisa Ação [Theory and practice of action research]. João Pessoa, Brazil: Universidade Federal da Paraíba. (English version) Retrieved January 22, 2010 from, http://www.web.ca/~robrien/papers/arfinal.html Definitions and terms used for the Participatory Geoweb. (2010). The Participatory Geoweb. Retrieved January 22, 2010 from, http://rose.geog.mcgill.ca/geoide/glossary Komarkova J., Ondrej V., Martin N. (2007). Heuristic Evaluation of Usability of GeoWeb Sites. Web and Wireless Geographical Information Systems: 7th International Symposium, W2GIS 2007, Cardiff, UK, November 28-29, 2007, Proceedings (Lecture Notes ... Applications, incl. Internet/Web, and HCI) (1 ed., pp. 264-278). New York: Springer. Retrieved January 22, 2010 from http://www.springerlink.com/content/b3177r34j105v281/fulltext.pdf Naaman, M., Song, Y. J., Paepcke, A., & Garcia-Molina, H. (2006). Assigning textual names to sets of geographic coordinates. Computers, Environment and Urban Systems, 30(4), 418-435. Noyle, B., Bouwman, D. (2009). Remember the User. ArcUser, Fall, 26-28. O'Reilly, T. (2002). Design Patterns and Business Models for the Next Generation of Software. Retrieved February 3, 2010, from http://www.oreillynet.com/lpt/a/6228 The Participatory Geoweb for Engaging the Public on Global Environmental Change. (2008). The Participatory Geoweb. Retrieved April 2, 2010 from, http://rose.geog.mcgill.ca/geoide/node/73 Parsons, J., & Wand, Y. (2008). A question of class. Nature, 455 (7216), 1040-1041. Parsons, J. & Wand, Y. (2000). Emancipating instances from the tyranny of classes in information modeling. ACM Trans. Database Syst. 25, 2 (Jun. 2000), 228268. Ries, A.,Trout, J. (1972). The Positioning Era Cometh. Retrieved February 6, 2010 from http://www.ries.com/articlespositioningera.php

28

APPLICATION OF GEOWEB TECHNOLOGY

The Oxford Handbook of Philosophy of Biology (Oxford Handbooks) (1 ed.). (2008). New York: Oxford University Press, USA. Tversky, A. (1977). Features of similarity. In Psychological Review, volume 84, pages 327-352. Wiersma, Y.F. & Lukyanenko R. (Editors). (2010). NL Nature. Electronic atlas of wildlife and species-at-risk in Newfoundland and Labrador [nlnature.com]. Landscape Ecology and Spatial Analysis Lab, Memorial University of Newfoundland, St. John’s. Retrieved March 22, 2010 from ,http://www.nlnature.com

29

APPLICATION OF GEOWEB TECHNOLOGY

Appendix A. Schedule of Outreach Activities Accomplished to-date -

Feb.18, 2010:presentation at NL Natural History Society (apprx 15 people)

-

Feb 24, 2010: meeting at Botanical Gardens w/ 3 staff members

-

Feb. 25, 2010: Article in the Gazette (http://www.mun.ca/gazette/issues/vol42no10/nature.php)

-

March 3, 2010: Article in The Telegram

-

March 3, 2010: interview on CBC-Corner Brook and CBC-Gander

-

March 3, 2010: interview on Out of the Fog (TV)

-

March 5, 2010: short note in the Telegram

-

March 9, 2010: GIS Day at MUN (approx 30 people visited table)

-

April 4, 2010: member newsletter to emails on NLNature.com

-

April 23, 2010 upcoming: article in The Osprey (newsletter of the NL Natural History Society)

-

April – May 2010: distribution of pamphlet at strategic locations (wildlife and outdoors shops and businesses, tourism outlets)

-

April – July 2010: search engine optimization

30

APPLICATION OF GEOWEB TECHNOLOGY

Appendix B. User Activity in March 2010

User Activity in March 2010 Period

No Observations

Period

Members Joined

01-Mar-10

1

01-Mar-10

1

02-Mar-10

1

02-Mar-10

5

03-Mar-10

7

03-Mar-10

6

04-Mar-10

1

04-Mar-10

5

05-Mar-10

1

05-Mar-10

4

06-Mar-10

1

06-Mar-10

1

07-Mar-10

3

07-Mar-10

1

08-Mar-10

1

09-Mar-10

1

09-Mar-10

2

11-Mar-10

4

11-Mar-10

6

12-Mar-10

1

12-Mar-10

2

13-Mar-10

1

14-Mar-10

1

15-Mar-10

2

15-Mar-10

2

17-Mar-10

1

17-Mar-10

1

21-Mar-10

1

21-Mar-10

1

22-Mar-10

1

25-Mar-10

1

25-Mar-10

1

26-Mar-10

2

27-Mar-10

1

27-Mar-10

1

30-Mar-10

1

28-Mar-10

1

Total:

Total:

38

36

31

APPLICATION OF GEOWEB TECHNOLOGY

Appendix C. NL Nature’s Newsletter

32