Mobile Parallel Computing - IEEE Xplore

9 downloads 1785 Views 250KB Size Report
MMPI allows parallel programming of mobile devices over a Bluetooth network. This paper gives an overview of the. MMPI library, and demonstrates that mobile ...
Mobile Parallel Computing Daniel C. Doolan Member IEEE, Sabin Tabirca Department of Computer Science University College Cork, Ireland {d.doolan, tabirca}@cs.ucc.ie

Abstract This paper outlines how the Mobile Message Passing Interface (MMPI) may be used for parallel computation. MMPI allows parallel programming of mobile devices over a Bluetooth network. This paper gives an overview of the MMPI library, and demonstrates that mobile devices are capable of parallel computation. An example of Matrix Multiplication O(n3 ) is used to show this.

1. Introduction The Smartphones of today are powerful computing devices. Many Smartphones have processing speeds well in excess of 200Mhz. The Nokia 6630 and 6680 are typical examples of this, both devices run at 220Mhz having an ARM9 architecture. The next breed of phones will see devices sporting processors of up to 500Mhz using the ARM11 processor core. It is clearly evident that mobile devices are becoming serious sources of computing power. Within a few years mobile devices will have processors running at similar clock speeds to that of Desktop Computers of only a few years ago.

1.1. The Future is Already Here The announcement by ARM in October 2005 of their 1Ghz Cortex-A8 processor is a clear indication of the future of mobile computation [1] [18]. Unlike previous ARM chips, the new Cortex-A8 has a superscalar architecture [19]. It is capable of high end multimedia processing with ARM NEON technology [3]. This unit allows for single instruction, multiple data (SIMD) processing. NEON supports 8, 16, 32 and 64bit integer and single precision floating point SIMD instructions. This allows for the handling of audio and video applications as well as graphics and gaming processing. Jazelle RCT (Runtime Compilation Target)

Proceedings of The Fifth International Symposium on Parallel and Distributed Computing (ISPDC'06) 0-7695-2638-1/06 $20.00 © 2006

Laurence T. Yang Department of Computer Science St. Francis Xavier University Antigonish, NS B2G 2W5, Canada [email protected]

is also implemented providing hardware support for Java applications [2] [12].

1.2. Mobile Phone Usage Consumer demand for mobile devices seems to be insatiable. In 2005 alone a record number of mobile phones were dispatched (in excess of 800 million) [13] [11]. This is a massive increase of 19 percent increase on the previous year. Clearly the sales of mobile devices are by far outstripping that of personal computers (at just over 200 million units for 2005) [15]. It seems that everybody wants to be the owner of mobile technology. Just one example of the interest in mobile technology is Ireland which had a market penetration of just 29 percent in 1999 [7]. The last quarter of 2005 saw Ireland reaching 100 percent penetration [8]. Similar stories are true across many other European countries, such as: Spain, Finland, the Netherlands and Austria all now having 100 percent penetration. Perhaps the idea of the average household having 1.5 children is creeping into to that of mobile phone usage. Luxembourg for example currently has the highest mobile phone usage with 156 percent market penetration. It is expected that Western Europe will exceed 100 percent usage by 2007 [21].

1.3. Bluetooth and Mobiles In all parallel computing systems nodes communicate with each other via a network. This network usually uses fibre optic cabling running a gigabit speeds. If we wish to develop a similar parallel computing system using mobile phones then the immediate problem is creating the network. We cannot just connect a bunch of mobiles together with looms of cable. This is where Bluetooth technology becomes a necessity. Bluetooth is a short range, low power consumption communications system that is ideally suited for mobile phones,

as their power source is finite. The word “Bluetooth” is derived from the English translation of Harland Bl¨atand 10th century king of Denmark [22]. He was well known for getting people to talk to one another, hence the technology for short range communication being named after him. The majority of mobile devices conform to the Bluetooth 1.2 specification. It defines the maximum data rate to be 1 Mbits/s (723Kbits/s actual throughput) [6]. An updated version of the specification Bluetooth 2.0 is now becoming prevalent in devices. One typical example is it integration into the new Intel powered iMacs from Apple computers. This new standard called Enhanced Data Rate (EDR) was ratified in November 2004. This new specification is capable of delivering a maximum of 3 Mbits/s or 2.1 Mbits/s actual throughput. The number of Bluetooth enabled devices is continually growing. Some of the well known devices include GPS receivers, and Headsets for hands free conversation. For a complete list of all the Bluetooth enabled devices available see the Products page at Bluetooth.com [5]. Development of Bluetooth enabled applications for Java enabled devices requires the use of the optional Bluetooth package (JSR-82). There are several phones however that support Bluetooth, but do not support the optional Bluetooth package. One example of this is the Nokia 6165 Series 40 3rd Edition. All the Series 60 3rd Edition phones have support as well as the majority of Series 60 2nd Edition. Bluetooth enabled Mobile phones are capable of carrying out omni-directional communication with devices up to ten meters away. This is carried out at a frequency band of 2.4Ghz. This band is divided into 79 channels 1Mhz apart (from 2.402 to 2.480Ghz). The system also uses frequency hopping at a rate of 1,600 times per second.

MPICH is a well known and freely available implementation of the library [16]. MPI implementations are designed for both C and Fortran. Two Java based systems are currently available mpiJava which is simply an object orientated Java interface to the underlying MPI system. It was developed using Java Native Interface wrappers to interact with the native MPI system. One system based entirely on Java exists called Message Passing in Java (MPJ) [4] [10].

1.5. Motivation Writing any Bluetooth application in J2me intrinsically requires the development of significant amount of Bluetooth specific code. The very same code is essentially required for any type of a Bluetooth application you may write. One example of this is the case of device discovery.

2. Mobile Message Passing Interface The MMPI system consists of three classes (Figure 2). The MMPI class is the main feature of the structure. It requires the use of two other classes to operate effectively. The BTServer class and the BTClient class. Both of these classes carry out the operations of establishing the Bluetooth connections between the nodes. MMPI works with Bluetooth Piconets, this establishes the upper bound for the maximum number of nodes in the system to be eight (Figure 1).

1.4. Parallel Computing in the Small Why do we wish to have the ability to perform parallel computation on a set of mobiles? The need is the same as the need for supercomputers. Many tasks simply would take too long to process on a single system. Hence the need for multi-node systems to carry out the computation in a reasonable time period. The tasks that could be carried out on a network of mobile phones cannot compare to tasks that would be carried out with a normal cluster. However as mobile devices have limited resources there are still tasks that would require a long time to process on a single mobile phone. Parallel computing with mobile phones is one solution to aid in the processing of tasks that are very large for a single mobile. The Message Passing Interface (MPI) was introduced in 1992 and greatly simplified the programmers task of writing a parallel program. MPI is a library specification for the passing of message between nodes of a multi-nodal system.

Proceedings of The Fifth International Symposium on Parallel and Distributed Computing (ISPDC'06) 0-7695-2638-1/06 $20.00 © 2006

Figure 1. Piconet Structure Only one instance of either the BTServer or the BTClient is created for each node. The creation of which depends on a parameter that is sent to the constructor of the MMPI object indicating whether the device should be registered as a client or a server. Hence with all MMPI applications it is necessary to provide a simple user interface to allow the

user to choose what type of application should be instigated. The creation of BTServer and BTClient objects is of no real concern to the programmer using the MMPI system. All that is necessary for them to know is that one single parameter must be passed to the constructor to indicate the type of node to create.

connect between all nodes in the system (Figure 3). As each new Server connection is established the communication channel on which it sends data is incremented. Hence it is not only necessary to create connections between the correct Client and Server, but also to make sure that they are both operating on the same channel number.

Figure 2. General MMPI Structure

Figure 3. Complete Interconnect for Five Nodes

The underlying basis for how the entire system works reverts back to the Bluetooth Client / Server model. The process of device discovery allows the root node to discover all active clients. This means that the root node can communicate with all the clients and vice versa. However Clients cannot communicate between one another. Hence it is necessary to establish channels of connection to interconnect all devices together. To ensure to correct establishment of communication channels between clients it is necessary to carry out the creation of the interconnect in a synchronized manner. This system requires the creation of several Server connections and the corresponding number of client connections. The creation of a Server or Client connection is based on the rank of the current node. All nodes in the system maintain an array of both DataInputStreams and DataOutputStreams. To ensure that all nodes can communicate with the correct node the positioning of the connections within this array is critical. Take for example the process of sending a message, one parameter is an integer (id) representing the node to which the message should be sent. This id is used as an index into the array of DataOutputStreams. All client nodes at index zero of the arrays maintain connections to the root node. If there are any indices of the arrays greater than the rank of the current node then Server connections will be established. Alternatively if there are any indices greater than zero and less than the rank they will have Client connections. It is necessary to have a Server connection established before a Client connection can be made to it. So when a Client create a server connection it sends on a message to the root node to indicate same. This message is in turn forwarded on to the appropriate client which then established a Client connection with the Server. This process is carried out to establish the complete inter-

Proceedings of The Fifth International Symposium on Parallel and Distributed Computing (ISPDC'06) 0-7695-2638-1/06 $20.00 © 2006

Several methods of the MMPI class are exposed to the programmer to allow for communication between nodes (Listing 1). Two of the most frequently used non communication methods are getRank() and getSize(), which return the id of the current node the the size of the parallel world respectively. The remaining methods of the class allow for various forms of communication. The simplest medium of communication is the send(. . .) and recv(. . .) methods. To allow for correct the functioning of these methods they must be paired together. For example if the root node sends a message to node root+1 then that node should have a corresponding receive. public MMPI(int nodeType) public int getRank() public int getSize() public void send(Object buf, int offset, int count, ←֓ int dataType, int dest) public void recv(Object buf, int offset, int count, ←֓ int dataType, int source) public void scatter(Object sendBuf, int sendCount, ←֓ int sendDataType, Object recvBuf, int recvCount, ←֓ int recvDataType, int root} public void gather(Object sendBuf, int sendCount, int ←֓ sendDataType,Object recvBuf, int recvCount, int ←֓ recvDataType, int root) public void reduce(Object sendBuf, Object recvBuff, ←֓ int count, int dataType, int op, int root) public void bcast(Object buf, int offset, int count, ←֓ int dataType, int root) public void finalize()

Listing 1. MMPI General Structure The methods that are of greater use to the programmer are the global communication methods. These methods allow for the sending and receiving of data between all nodes in the system.

3. Matrix Multiplication Matrix Multiplication is typically regarded as an O(n3 ) operation. However more efficient algorithms do exist. For an n × n matrix the best possible time complexity is O(n2 ) it cannot be less than this as all n2 cells must be visited. Strassen’s algorithm gives a time complexity of O(n2.807 ) [20]. This recursive algorithm is however difficult to implement. The most efficient algorithm today is the Coppersmith-Winograd algorithm, its complexity being O(n2.376 ) [9]. The product C of two matrices A ∈ Mn,m (R) and B ∈ Mm,p (R) is defined as (A × B)ij =

m−1 X

Matrix Size 100 × 100 200 × 200 300 × 300 400 × 400 500 × 500 600 × 600 700 × 700

Matrix Size 100 × 100 200 × 200 300 × 300 400 × 400

aik bkj , i = 0, ..., n − 1, j = 0, ..., p − 1.

private void product(int n, int m, int p, int[][] a, int[][] b, int[][] c){ int i,j,k; for(i=0;i