Towards Meaningful Mathematical Expressions in E ...

2 downloads 16208 Views 217KB Size Report
mantic addition by converting presentation MathML into content MathML. ..... ements with Cascading Style Sheets (CSS) stylesheet (e.g., class, id, style, href, xref, and other ... http://www.openmath.org/cd/index.html, 2010. [5] RDFa Distiller and ...
Towards Meaningful Mathematical Expressions in E-Learning Iyad Abu Doush

Department of Computer Sciences Yarmouk University Irbid, Jordan

[email protected]

Faisal Alkhateeb

Department of Computer Sciences Yarmouk University Irbid, Jordan

[email protected]

ABSTRACT This paper presents a new framework for adding semantic into the mathematical expressions in the context of elearning. The proposed system converts presentation MathML into content MathML with RDFa annotations. The objective is to add meaning into the mathematical contents, and to create a framework to facilitate the searching of mathematical contents on the web. The proposed approach relies on two principles. The first principle is the automatic semantic addition by converting presentation MathML into content MathML. The second feature of the proposed system is the ability to search for the mathematical expression and find where it is located exactly in the e-learning web page. This is accomplished by having RDFa annotations embedded in the resulted MathML.

Categories and Subject Descriptors K.3.1 [Computers and Education]: Computer Uses in Education

General Terms Algorithms

Keywords Semantic Web, E-learning, MathML

1.

INTRODUCTION

The study of mathematics is crucial in the preparation of students to enter careers in science, technology, engineering and other disciplines, such as the social and behavioral sciences. The advent of the Internet has significantly enhanced availability of technical content, by making millions of documents available on-line. The introduction of sophisticated Learning Management Systems (LMSs), such as Blackboard and Moodle, has widely

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ISWSA 10 June 14 to 16, 2010, Amman, Jordan. Copyright 2010 ACM 978-1-4503-0475-7/09/2010 ...$10.00.

Eslam Al Maghayreh

Department of Computer Sciences Yarmouk University Irbid, Jordan

[email protected]

increased the opportunities for students to attend schools. Recent statistics [14] by the Sloan Consortium indicate that more than 3.2 million people are involved in some form of online education. Mathematical contents represent a particular challenge for searching the contents in e-learning - e.g., formulas, mathematical symbols, and abbreviated function names. The W3C recommendation for encoding mathematics on the web is called MathML [1]. There are two types of MathML: presentation MathML and contents MathML. Presentation MathML describes the visual appearance of the mathematical expression by using 2-dimensional layout and formatting of the mathematical expression. On the other hand, content MathML encodes the meaning or the mathematical semantic of the expression. Many equation editors (e.g., MathPlayer, Maple, and MathFlow) generate mathematical expressions and the resulted encoding is in presentation MathML. The presentation MathML can be visually rendered by the machines but they are not understandable by them [8]. Presentation MathML does not provide adequate semantic information [11]. Kohlhase and Sucan [11] mentioned that usually math web search uses content MathML as its basis. The encoding of mathematical expressions using content MathML can help in capturing the conceptual structure and remove any ambiguity and inconsistency related to the use of presentation MathML in encoding the mathematical expression. This can help in having a common notation which can be used for the underlying search of the mathematical contents. Adding semantics to the mathematical expressions on the web can provide several benefits to the users: • Provide a more accessible mathematical expression for the blind and visually impaired, as the mathematical encoding can be read by screen readers. • Easier searching for technical and educational mathematical materials. Youssef [15] mentioned that math knowledge are embedded in math symbols, notations, and structures. In order to achieve a better math search these symbols and structures need to be recognized. • The ability to do automatic evaluation for the mathematical expressions. • In the case of e-learning, the mathematical notations and symbols can be explained (e.g., using content dictionaries in OpenMath [4]). • Help the learning disabled people in navigating the

mathematical contents (e.g., providing information about a term in the navigated mathematical expression as a tooltip). Easy and effective search for fine grained mathematical patterns (e.g., equations, functions, and structures) is needed by users. Youssef [15] mentioned that math metadata will help in better math search and it is important for managing math knowledge. In this paper we propose a novel framework for converting presentation MathML into content MathML with RDFa annotations. This framework takes advantage of the common encoding which is used for visualizing the mathematical expression on the web (i.e., presentation MathML). The resulted encoding of the mathematical expression can be searched for accurate identification (i.e., on the level of its position on the web page) of the searched mathematical expression.

3.

METHODOLOGY

We have developed the first stage of the project - the development of a MathML converting module. The proposed module converts the e-learning contents automatically from presentation MathML into content MathML by applying prefix notation to the presentation MathML tree. The converter then map different presentation MathML tags into their equivalent content MathML tags. The RDFa annotations then are added to the content MathML. The goal of this work is to develop a system that provides the user with a mathematical content with semantic encoding. This can lead in a more accurate search of the mathematical contents in e-learning. Instead of searching for the content tags the system will search for the agreed vocabulary used in RDFa. The proposed methodology consists of the following major phases (see Figure 1 ): • To add semantic to the mathematical expression, the presentation MathML is converted into content MathML.

2.

BACKGROUND

A review of related work shows that developing a searchable mathematical expressions imposes a number of requirements. An extensive literature exists, dealing with various aspects of this problem. Some of these relevant studies are discussed next. Munavalli and Miner [12] introduce a math aware search engine which search mathematical contents. The system analyzes MathML mathematical expressions into text math fragments. In this system the user enters the math query using graphical equation editor. According to Youssef [15] the mathematical search purpose is: 1) allowing the user fine grained search for mathematical data 2) Allow users to enter the math query naturally and easily using the symbols and notations applied by mathematicians and scientists. In another work Guidi and Schena [9] introduces a math query language for Resource Description Framework (RDF) metadata repository called MathQL. Asperti et al. [8] presented HELM, a framework that uses XML technology for building structures contents in logical manner. The purpose is to use the system as a library for indexing and retrieving mathematical documents. Altamimi and Youssef [7] presented a math query language that enable users to express their information needs intuitively yet precisely. The new math query language offers an alternative way to describe mathematical expressions that is more consistent and less ambiguous than conventional mathematical notation. In addition, the language goes beyond the Boolean and proximity query syntax found in standard text search systems. It defines a powerful set of wildcards that are deemed important for math search. These wildcards provide for more precise structural search and multi-levels of abstractions. Hijikata et al. [10] presented a search engine for MathML objects using the structure of mathematical formulas. The system makes the inverted indices by using the Document Object Model (DOM) structure of the MathML object. It also propose three types of indexes: One type is constructed from some paths of the DOM structure and expressed in XPath. The other type is constructed by encoding the nodes in the same level in DOM structure. The third type is a hybrid method from the other two types.

• In order to allow the user to do a fine grain search on the page level the content MathML is annotated using RDFa. • The RDFa extractors will be used next to extract the information about the mathematical expression along with the URI of the mathematical expression. • The extracted RDFa annotations can be compared then with the user query.

3.1

Use Scenario

To picture how the system works, consider a student who wants to access the on-line course in Moodle to review lecture notes of a math course. The student wants to search to find a mathematical formula in a lesson. Because the lessons are mapped from presentation MathML into content MathML and then annotated using RDFa, the formula can be found using the extracted RDFa annotations. The student query will be used to search for an equation and get the set of lessons where it appears. The system point the user exactly where the searched mathematical formula is located in the e-learning content.

4.

SYSTEM DESIGN

The system is composed of three components: the MathML parser, MathML converter, and the RDFa annotater. The user requests a web page using Moodle; and the source code of the web page will be available to our components in the form of a DOM tree. The web page source code DOM tree is parsed, and when we encounter the math tag the presentation MathML parser will use the MathML converter to produce the equivalent content MathML. The RDFa annotater would then add the RDF vocabulary into the resulted content MathML encoding.

4.1

Presentation MathML Parser

The web page is parsed using a DOM parser. When the math element is encountered the whole DOM tree will be sent to the MathML converter. This parser will match each presentation MathML tag in the current web page with the corresponding content MathML tag in the converter.

Figure 1: The Steps of the Proposed Algorithm.

Figure 2: The System Architecture. When we reach to the specific presentation MathML element our system will start the mapping between presentation MathML and content MathML.

4.2

MathML Converter

The MathML converter component takes the encoding in presentation MathML and performs some processing and mapping between the tags and output the equivalence of the mathematical expression in content MathML. It converts the encoding from infix notation in presentation MathML into prefix notation in content MathML. In Content MathML, all identifiers, numerals, and symbols are enclosed within an environment: ... for numerals, ... for identifiers, and ... for more advanced mathematical symbols. Usually content MathML are composed using prefix notation. The operator and its arguments are delimited using the tags ... . The scope of the operator or function is specified by the opening and closing tags of apply [13, 1]. The conversion process includes the following steps: • Pass through the expression tree of presentation MathML and output the tags in prefix notation. • Map each tag or tag and text in presentation MathML into their equivalent content MathML. The following are sample of the applied transformation rules: num ⇒ num ID ⇒ ID + ⇒ * ⇒ ABC ⇒ ABC ( ABC ) ⇒ ABC Some operators in presentation MathML do not have a one to one mapping into content MathML. For example, ”plusor-minus” operator does not have content MathML equivalent. For this operator, when we convert it from presentation MathML we need to split the expression into two forms one

with minus and the other with plus. The following is example of the conversion from presentation MathML into content MathML. Assume the web page has the following formula presented in presentation MathML (X + 3)2 . ( x + 3 ) 2 Then the equivalent content MathML for the same expression is: x 3 2 A set of tags and attributes are not covered in the converter yet. Some content MathML tags support the use of attributes, for example the mathematical term [x,y] is presented in content MathML using the following encoding:

x y Some attributes in content MathML are also not covered by the converter because they are used to associate the elements with Cascading Style Sheets (CSS) stylesheet (e.g., class, id, style, href, xref, and other attributes). The tags that are used to associate content MathML with presentation MathML are not covered also (i.e., , , , , and elements).

4.3

RDFa Annotater

RDF data can be embedded inside XHTML as RDFa [2]. The RDFa annotations are used for making parts of the web page foldable into a more detailed information (i.e., according to the vocabulary and the relations of the used RDF). Standard extractors for RDFa can be used to retrieve the annotations in the web page (e.g., [6, 5]). We will have a place where the mathematical vocabulary (i.e., classes and properties) located, and use this vocabulary to annotate the resulted content MathML. The resulted content MathML from the previous example can be annotated as follows:
x 3 2
Using this annotation scheme the user can find the exact position of the needed mathematical expression in the web page.

5.

CONCLUSIONS AND FUTURE WORK

In this paper, we presented a preliminary investigation aimed at improving the process of searching for mathematical contents within e-learning. The goal of the proposed solution is to identify exactly where the mathematical expression is located in the web page by using RDFa annotations. Using the proposed system the presentation MathML in the e-learning contents are automatically converted into content MathML, and then RDFa annotations are added to embed the semantics of the mathematical expression. The user query are then matched with the extracted RDFa annotations and the user is pointed to the list of URLs that have the resulted mathematical formula. The future work will include a comparison between the search using our proposed mathematical encoding and the

regular text search. The mathematical symbols meanings are defined in an ontology called OpenMath [3]. A new version of MathML (i.e., MathML 3.0) bring Content MathML closer to OpenMath. In the future OpenMath content dictionaries will be used to categorize the mathematical information and add this information as metadata for the mathematical contents (e.g., cosine is a trigonometric function).

6.

REFERENCES

[1] Mathematical Markup Language. World Wide Web electronic publication. http://www.w3.org/Math/, 2010. [2] Mathematical Markup Language. World Wide Web electronic publication. http://www.w3.org/TR/xhtml-rdfa-primer/, 2010. [3] MathML3.0. Candidate Recommendation. World Wide Web electronic publication. http://www.w3.org/TR/MathML3/, 2010. [4] OpenMath Content Dictionaries. World Wide Web electronic publication. http://www.openmath.org/cd/index.html, 2010. [5] RDFa Distiller and Parser. World Wide Web electronic publication. http://www.w3.org/2007/08/pyRdfa/, 2010. [6] rdfquery, RDF processing in your browser. World Wide Web electronic publication. http://code.google.com/p/rdfquery/, 2010. [7] Moody Ebrahem Altamimi and Abdou Youssef. A math query language with an expanded set of wildcards. Mathematics in Computer Science, 2(2):305–331, 2008. [8] Andrea Asperti, Luca Padovani, Claudio Sacerdoti Coen, and Irene Schena. Helm and the semantic math-web. In TPHOLs ’01: Proceedings of the 14th International Conference on Theorem Proving in Higher Order Logics, pages 59–74, London, UK, 2001. Springer-Verlag. [9] F. Guidi and I. Schena. A query language for a metadata framework about mathematical resources. In The 2nd International Conf. Mathematical K nowledge Management, pages 105–118. Springer Berlin, 2003. [10] Yoshinori Hijikata, Hideki Hashimoto, and Shogo Nishida. Search mathematical formulas by mathematical formulas. In Human Interface and the Management of Information. Designing Information Environments, pages 404–411. Springer Berlin, 2009. [11] Michael Kohlhase and Ioan A. S ¸ ucan. A search engine for mathematical formulae. In Proceedings of Artificial Intelligence and Symbolic Computation, AISC’2006, pages 241–253. Springer Verlag, 2006. [12] Rajesh Munavalli and Robert Miner. Mathfind: a math-aware search engine. In SIGIR ’06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pages 735–735, New York, NY, USA, 2006. ACM. [13] Pavi Sandhu. The MathML Handbook. Charles River Media, 2003. [14] Jesse Whitehead. Challenges of online education. http://www.articlesnatch.com/Article/ Challenges-Of-Online-Education/115265, 2009.

[15] Abdou Youssef. Roles of math search in mathematics. In Mathematical Knowledge Management, pages 2–16. Springer Berlin, 2006.