Towards Efficient Parallel Image Processing on Cluster Grids using ...

3 downloads 2882 Views 235KB Size Report
the graphic business, use complex low-level parallel programs to speed up image ... for an easy-to-use graphical user interface of a familiar application like Adobe Photo- ... None of them are by design coupled to specific applications, includ-.
Towards Efficient Parallel Image Processing on Cluster Grids using GIMP  Paweł Czarnul, Andrzej Ciereszko and Marcin Fraczak Faculty of Electronics, Telecommunications and Informatics Gdansk University of Technology, Poland [email protected], cierech,marcin.f  @wp.pl http://fox.eti.pg.gda.pl/  pczarnul

Abstract. As it is not realistic to expect that all users, especially specialists in the graphic business, use complex low-level parallel programs to speed up image processing, we have developed a plugin for the highly acclaimed GIMP which enables to invoke a series of filter operations in a pipeline in parallel on a set of images loaded by the plugin. We present the software developments, test scenarios and experimental results on cluster grid systems possibly featuring singleprocessors and SMP nodes and being used by other users at the same time. Behind the GUI, the plugin invokes a smart DAMPVM cluster grid shell which spawns processes on the best nodes in the cluster, taking into account their loads including other user processes. This enables to select the fastest nodes for the stages in the pipeline. We show by experiment that the approach prevents scenarios in which other user processes or even slightly more loaded processors become the bottlenecks of the whole pipeline. The parallel mapping is completely transparent to the end user who interacts only with the GUI. We present the results achieved with the GIMP plugin using the smart cluster grid shell as well as a simple round robin scheduling and prove the former solution to be superior.

1 Introduction While both the cluster and grid architecture ([1], [2]) and real grid systems and software like EU-DataGrid ([3]) based on Globus ([4]), GridLab ([5]) as well as image processing tools become mature and available for a wider range of systems and network topologies, it is still a difficult task to merge the two worlds in open NOWs and make the first one work for the other. We investigate available solutions and make an attempt to process in parallel both loosely coupled and pipelined images on NOW systems. While there is no substitute for low-latency supercomputers like IBM SP3 Bluehorizon1 or dedicated clusters as Gdansk’s holk2 , there is a huge market growing in PCs for both businesses and homes which can be interconnected and exploited easily by inexperienced users. Thus in the context of parallel image processing there is a need for an easy-to-use graphical user interface of a familiar application like Adobe Photoshop or the GIMP and an efficient but not too complex tool for selection of the best



1 2

work partially sponsored by the Polish National Grant KBN No. 4 T11C 005 25 http://www.npaci.edu/BlueHorizon http://www.task.gda.pl/kdm/holk/

resources. Ideally, as we managed to achieve, the second stage is completely hidden from the user provided the prior configuration of the parallel environment. Following the Sun Grid Engine terminology ([6]) we define a cluster grid as a set of distributed resources providing a single point of entry (running GIMP in our approach) to users in an institution.

2 Related Work Solutions for parallel and distributed computing fall into a few categories with respect to the target network topologies and customer requirements and profiles. There are cluster management systems like LoadLeveler, PBS, LSF ([7]) and others meant mainly for environments dedicated to high performance computing (HPC). There are software tools more suited for multi-user, multi-tasking environments where other processes compete for computing nodes. They range from system-level Mosix ([8]) which features process migration for load balancing for Linux boxes, Condor ([9]) for exploiting idle cycles in shared networks up to Sun Grid Engine ([6]) which enables both queueing submitted HPC tasks and launching interactive processes or consoles on least loaded nodes and can be used for small groups working together. [10] addresses building applications using skeletons and resource assignment using performance prediction in grid environments. Finally, libraries and environments like MPI and PVM ([11], [12]) are available for programmers. None of them are by design coupled to specific applications, including graphic design. However, in regard to support for multithreaded image processing, it is possible to use applications like Adobe Photoshop on SMP machines where some filters like Gaussian Blur, Radial Blur, Image Rotate, Unsharp Mask can potentially benefit from many processors ([13]). However, some filters which can be accomplished fast can even run slower on SMP boxes than on a single processor due to the large synchronization overhead compared to the processing time. It has been tested that multithreaded approach can also give a speed-up in processing time on the latest Intel HyperThreading processors ([14]) when running filters in Photoshop 7.0. [15] presents threaded GIMP plugins implementing Gaussian Blur. In [16] a new functional based methodology for development of parallel image processing software is presented. Instead of augmenting serial C++ codes with parallel libraries interacting with parallel harness controlling the parallel environment, it is suggested that an image processing application can be expressed in the functional ML language, translated to CML which is a parallel-enabled ML version based on the CSP model. The CML version is then abstracted to a parallel CML harness version and a serial ML code. This is what needs to be done by the programmer, the program should then be translated to a parallel C++ version. [17] assumes, similarly to our approach, as it is unrealistic to expect the knowledge on parallelization issues from graphic specialists and provides the programmer with the architecture in which data parallel image processing applications can be coded sequentially, automatically parallelized and run on a homogeneous cluster of machines. Results for template matching, multiline stereo vision and line detection are provided.

[18] presents a skeleton based approach for data parallel image processing in which only algorithmic skeletons coded in C/MPI need to be chosen for particular given lowlevel image operators to produce a parallel version of the code. Finally the Parallel Image Processing Toolkit (PIPT, [19]) provides an extensible framework assisting image processing offering low-level API which can be incorporated into parallel MPI programs. This enables operations on chunks of images in parallel. Similarly, [20] describes a library of functions SPMDlib meant to help the development of image processing algorithms and a set of directives SPMDdir which are mapped by a parser to the library which provides an easy-to-use and high-level API for portable image processing on clusters using MPI.

3 Our Approach for Cluster Parallel Image Processing In order to show the benefits of cluster computing in an open multi-user NOW, we have implemented an extension to the DAMPVM environment ([21], [22], [23]) which provides the end user with a smart cluster shell which starts a shell command or a set of commands on best nodes in the network. Compared to Sun Grid Engine, it supports any platform PVM can run on, not only Solaris Operating Environments (on SPARC processors) and Linux x86 for Sun Grid Engine 5.3. Moreover, it is possible to modify the DAMPVM sources to suit the current and future needs of the graphic software to use a cluster efficiently. In view of this, the presented solution can easily use the monitoring/filtering infrastructure ([24]) and the divide-and-conquer features ([23]) of DAMPVM. Thus it has been naturally extended to enable a GIMP plugin to launch processes on least loaded processors in a parallel environment incorporating run-time changes into decision making. The plugin can make use of the GIMP support for reading and writing various graphic formats like TIFF, PNG, JPEG etc. and leave the computing part for the efficient C++ code using PVM or other means of parallel communication like MPI. To prove the usefulness of this approach we have tested the following scenarios: 1. Starting command-line multi process conversion (using ImageMagick’s convert utility) of large images in a network with varying loads. The cluster shell proves to select the least loaded nodes where convert is to be run and thus optimizes the wall time of the simulation. 2. A GIMP plugin which invokes parallel pipelined processing using PVM applying a sequence of filters on images read by GIMP, all implemented by us within this work3 . In corresponding external conditions, we prove that the allocation of pipeline processes to processors using the DAMPVM cluster shell can lead to a noticeable reduction of the execution time compared to round-robin/random allocation. This is visible even in lightly loaded networks where the DAMPVM shell chooses the least loaded nodes. In the pipeline, even small, difficult to notice processor loads appear to be a bottleneck for the pipeline which justifies our approach. 3

Download software from http://fox.eti.pg.gda.pl/  pczarnul

3.1 System-level Parallel Image Processing using DAMPVM Remote Shell Speed Measurement Choosing the least loaded node in the network presents some challenges. Firstly, in a heterogeneous network different but comparable processors like AMD Athlon XPs or MPs and Intel Pentium 4 with or without HyperThreading can process different codes at different speeds. Secondly, the load of other users must be measured and filtered in order to hide high-frequency peaks corresponding to shortlived but CPU-bound actions like starting Mozilla etc. The latter was implemented in the DAMPVM runtime before ([24]). Within the scope of this work it was extended with precise speed measurement for specific code operations to reflect the real CPUbound long application which follows the measurement. For instance, in parallel image processing, this can be the same processing command used on smaller images. Running the remote shell corresponds to launching a remote console in Sun Grid One with the wider architecture support in case of DAMPVM. To run command command in instancecount instance on least loaded processors in the network, the user invokes: dampvmtasklauncher -r "command" -i instancecount. Task Queueing The literature describes systems which allow both to launch interactive jobs as well as submit batch jobs like Sun Grid One or LSF. While it may prove more efficient to map e.g. two processes per node to hide communication latency ([25]), it is usually not preferable to map more processes due to considerable process/thread switching overhead. This is especially true for image processing operations which will make use of the disk cache quite often to reload tiles of large images they work on (e.g. this is the way Adobe Photoshop works). Thus we applied the algorithm shown in Figure 1 to the DAMPVM cluster shell. When there are idle processors in the mixed

P C I n s t r u c t i o n s ( 0 ) ; / / t h i s t a s k o n l y spawns t a s k s on o t h e r m a c h i n e s for ( int o =0;o i t e r a t i o n s ; o++) i f ( n H o w M a n y P r o c e s s o r s A v a i l a b l e =0) / / a t r i c k i f some p r o c e s s o r s were / / a v a i l a b l e from t h e p r e v i o u s c h e c k t h e n w h i l e ( ( n H o w M a n y P r o c e s s o r s A v a i l a b l e= H o w M a n y P r o c e s s o r s A v a i l a b l e ( ) ) = = 0 ) s l e e p ( 2 ) ; / / do n o t i n v o k e H o w M a n y P r o c e s s o r s A v a i l a b l e ( ) a g a i n / / now t h e r e i s an i d l e p r o c e s s o r o u t t h e r e s o l e t ’ s s t a r t t h e a p p l i c a t i o n / / f i n d new p r o c e s s o r s i f a l l i d l e o n e s p r e v i o u s l y h a v e b e e n a l l o c a t e d t a s k s i f ( ( nHowManyProcessorsAvailable = 0 ) ( firsttime ) / for the f i r s t time / ) M Y I d l e P r o c e s s o r s = G e t I d l e P r o c e s s o r s ( MYhostnames , M Y a v a i l a b l e p r o c e s s o r s p e e d s ) ; / / now mark p r o c e s s o r s a s i d l e



2 4 6 8

12

// // // PC // //

14 16 18 20

24



 



10

22











f i n d t h e b e s t node among t h e p r e v i o u s l y s e l e c t e d p r o c e s s o r s ON WHICH NO TASKS HAVE BEEN SPAWNED ; spawn t h e t a s k on t h e b e s t node v i a a r e m o t e l a u n c h e r which s e t s t h e work r e q u i r e m e n t s f o r t h i s t a s k on t h a t node Spawn ( ” d a m p v m t a s k l a u n c h e r 1 ” , NULL, MYhostnames , 1 , & p c t i d , NULL ) ; mark t h e p r o c e s s o r a s b u s y now s e n d t h e s y s t e m command t o t h e spawned l a u n c h e r



nHowManyProcessorsAvailable ; i f ( n H o w M a n y P r o c e s s o r s A v a i l a b l e =0) / / a f t e r a l l t h e t a s k s h a v e b e e n spawned sleep ( 10 ) ; // w a i t u n t i l t h e s y s t e m becomes a w a r e o f t h e i n c r e a s e d l o a d

PC Recv ( ) ;





/ / w a i t f o r c o n f i r m a t i o n from t h e t a s k s

Fig. 1: Task Queueing in DAMPVM Remote Shell



single-processor, SMP system (as indicated by DAMPVM schedulers) the relevant information is stored in an array and the idle processors are assigned tasks successively with no delays. After the procedure has been completed, a 10-second delay is introduced to allow for the increased load being detected by the DAMPVM runtime. When all processors are still busy, every 2-second time slot a load check is performed. This scheme allows to use idle processors at once and queue pending supposedly processor and disk-bound image processing tasks. The load is monitored on every host by DAMPVM schedulers and then collected asynchronously by a cluster manager using ring fashion communication. This includes the following parameters: machine speed, the number of CPUs per node, idle percentages of the processors, CPU load by other users and the system, the number of processes, load requirements by processes if applicable, file storage available on node, link start-up times and bandwidth. 3.2 A GIMP Plugin for Pipelined Operations on Clusters Architecture The main idea of the plug-in is to apply a set of up to ten filters to a large number of images. The images should be placed on one machine with the GIMP, connected via network to a cluster of computers, a cluster grid. It is also required that PVM (Parallel Virtual Machine) is running and properly configured. The architecture of the plug-in is shown in Figure 2. It consists of three layers: – a Script-Fu wrapper, – a supervisory GIMP-deployed plug-in, – slave node modules in the proposed parallel pipelined architecture.

Fig. 2: GIMP Plugin’s Architecture

The GIMP provides developers with an easy to use scripting language called ScriptFu, that is based on a scheme programming language. From the Script-Fu level you can

run any of the GIMP library function including the GIMP plug-ins properly registered in the PDB (GIMP’s Procedural DataBase). Script-Fu also aids developers in creating of plug-in interfaces. The wrapper’s function is to gather data from the user and relay them to the supervisory GIMP-deployed plug-in. Both, the supervisory module and the slave module, are programs written in C using PVM for communication. The supervisory module calls the GIMP’s library functions to open each image for processing, acquires raw picture data from the GIMP’s structures, and passes it to the first of the pipeline nodes. This allows to combine the easy-to-use interface with the underlying parallel architecture. The images are then processed in the created slave node pipeline. The pipeline can be created in two ways: either by using PVM (Pipeline plug-in) in accordance with a static list of hosts created by the user, or dynamically, with assistance of PVM based DAMPVM remote shell (Pipeline dampvmlauncher plug-in) thus taking advantage of the speed and load information of individual nodes in creating the slave node pipeline. Slave node module implements a series of image filters (3x3, 5x5 matrix area filters and simple non-context filters). Each slave applies a filter to the image and passes it on through the pipeline.

Fig. 3: GIMP’s Pipelined Plugin Interface

Plug-in Usage The interfaces for both plug-ins are identical, the difference lies in the code that is implemented in the supervisory module. The user invokes the plugin from the GIMP’s context menu and specifies a path name for the files intended for filtering and the number of filters that will be applied to all the images. Figure 3 shows the plugin window superimposed on the context menu invoking it. Slide bars let the user specify the type of filters at the stages of the pipeline. It is suggested that the user opens the GIMP’s error console in order to receive detailed information on the work

progress. Filtering results are saved in the directory with the original images with the Filtered- prefix added.

4 Experimental Results We have performed lab experiments to prove that image processing on cluster grids can be successfully assisted by the load-aware remote DAMPVM shell. We used a cluster of 16 Intel Celeron 2GHz, 512MB machines interconnected with 100Mbps Ethernet. 4.1 Parallel Image Conversion with a Cluster Shell

 "! #%$&('#%#%$")+*,'.-0/21$"$43 563 798,!;:=#4?6! 7"%$"#%$ In this case, we tested the ability of  @ ?A7" #"B B79?AC the DAMPVM runtime and the remote  shell to detect least loaded nodes, spawn  tasks remotely, queue pending tasks as  D ,! #%$ ('#%#%$%)E*,' described above and submit them when  processors become idle. These features F $"#>,!  were tested in the scenario with one node  loaded with other CPU-bound processes          for which the results are shown in FigG *"HJI#4?"7>K @ ? 7" #"B B79?6B L(B#%$ ure 4. We ran ImageMagick’s convert utility to convert a MDNONONQPRMSNSNON 48MB TIFF image to Postscript. On 1 proces- Fig. 4: Scaled Speed-up for Remote Shell sor, we ran the command without the re- convert Runs mote shell. On TVUXW processors in Figure 4 there were TZY[W processors available one of which was overloaded. The remote shell omitted the overloaded node. The times are larger than the single processor run without the shell due to additional load monitoring, spawn and queueing procedure. In any case, it shows small overhead compared to the ideal run. 4.2 Pipelined Image Filtering as a GIMP plugin using a Cluster In this case we compared launching pipelined computations using a static allocation of pipeline stages to processors and the dynamic allocation using the remote shell. In the case that work is distributed in an unloaded and homogeneous network the efficiency of both plugins should be similar. However, in a normal network environment such conditions are hard to achieve and thus it is expected that total work times will be considerably shorter for the DAMPVM module. It is not only in academic examples in which some node(s) is overloaded but also in a seemingly idle network. In the latter case, some system activities or a user browsing the Internet contribute to the processor usage and such a node becomes effectively a pipeline bottleneck if selected as a stage of the pipeline. The remote shell enabled plugin should successfully avoid placing pipeline nodes on machines that are already loaded with work (e.g. the computer running the GIMP). The variables in the pipelined simulations are as follows:

– – – –

the number of stages/processors in the pipeline ( T ), the number of images to process (\ ), the size of the images: may be of similar or different sizes, the type of filters at the stages: may be uniform or different in regard to the processing time.

Figure 5 presents results for ]^P_] matrix filters (taking same time to complete) on processors for \c`edOfhgSNONiPkjONSN and W"jSNON^PlW,dmNSN bitmaps while Figure 6 shows results for for fQPRf matrix filters (taking same time to complete) on Tn`nW"N processors for \o`pdSf and \o`nfmNqgONSNQPrjSNON bitmaps. In all cases there were 14 idle processors available. On one of them there was the GIMP running, acting as a master host, on another one a user performed simple editing in Emacs. In the “Static not loaded” case, the allocation was done statically by listing the available hosts in a file, from which 10 successive processors were chosen. The master host at the end of 14processor list was thus omitted. In the second test (“Static random”) the master host was acting as the first stage in the pipeline, both reading images through GIMP, processing it and passing to the second stage. It is obvious that especially for large images this becomes a bottleneck since the master host also saves the results to the disk. Finally, in the “Dynamic with shell” example, the remote shell was launched to start 10 slave node processes on least loaded nodes. It automatically omitted both the master host and the processor busy with text editing although seemingly idle. The results show that even small additional loads in the latter case slow down the pipeline, justifying the use of the remote shell.

Ta`bW"N

}m~>%€>‚„ƒ …m†ˆ‡‰ƒ Š‹&ŒŽ,&Dƒ ‚ŽŠ‹‘4’%“>”4•–˜—‰™6…"€v,“ “ …m™6“ ƒ †&—ƒ ’>4š ƒ †>"› w"t>t œ ‚A‘>‚„ƒ €.†>…%‚ | t>t š …"‘",% { t>t œ ‚A‘>‚„ƒ € z>t>t ™6‘4†>,…9Š y%t>t žŸ,†>‘9ŠJƒ €   ƒ ‚Ž¡.“>¡>,š š xt>t t s"t>tvuw%t%t xw%t>tu4x;y%t>t Dƒ ‚+Š‹‘4’ œ ƒ ¢ £Œ ¤¥~§¦›D¨ ’"ƒ ~>4š “;©

ª,« ³ µ9¶·%¸>¹vº„» ¼m½¿¾‰» À‹·&Á0ÂÃÅĉÆA¼"¸v·"Ç Ç ¼9Æ6Ç » ½JÄ» È·4É » ½·%Ê Ë ª>³>³ ºA̺„» ¸‹½>¼>º ´v« ³ ËÉ ¼"Ì%Í,·%Í ºA̺„» ¸ ´³>³ ÆAÌ4Ï ½Í"¼mÀ S Î «³ Ð ½Ì4ÀJ» ¸ » ºŽÑ.ÇÑ>·,É É ³ ª,«‹¬m­ ®„¯§°%±%² « ³&¬m­ ®„¯§°%±>² Ë Ò ¹,ÀJÓ>·4Æ,¼>ÔmÕD» ºŽÀ Ì9È>Ç ¹4Ó"À.» º ºA·%Í

Fig. 5: Execution Time for Various Pipeline Al- Fig. 6: Execution Time for Various Pipeline Allocation Methods by Image Size location Methods by Number of Images

The best theoretical speed-up, assuming no overhead for communication which is ÖÅØ× Ú (\Û`ÜdOf , TÝ`ÞW"N ) more apparent for larger bitmaps, can be estimated as Ù Ö × á äæå‰ç ]Df . The time obtained for 1 processor and 25 gONSN˜PèjSNON which approximates to ß âàãZ bitmaps and the 10-stage pipeline was 439s giving speed-up 5.49. This is due to costly communication and synchronization. It must be noted also that on 1 processor all the processes run concurrently which means that there are costly context switches and slow downs due to the GIMP’s disk operations.

5 Summary and Future Work We have presented software developments combining advanced pipelined filter processing of images selected in GIMP with parallel clusters showing improvement in the execution time when used with a load-aware remote shell rather than static process assignment. It is easy to select the proposed as well as add new filters to customize the graphic flow, for example, to perform the common sequence of operations on images transformed to thumbnails for WWW use, usually transformed from TIFF: adjust levels (can be pipelined itself), convert to the 16-bit format, adjust contrast, brightness, possibly saturation, scale the image, apply the unsharp mask, convert to JPEG. As it is a more practical pipeline flow, we are planning to implement such a pipeline and execute it on a cluster of 128 2-processor machines at the TASK center in Gdansk, Poland as well. Note that the proposed approach can be used widely for pipelined image conversion for WWW gallery creation or assist in pipelined sequences of advanced graphic designers working with GIMP. The implementation could also be extended to other popular applications like Adobe Photoshop.

References 1. Foster, I., Kesselman, C., eds.: The Grid: Blueprint for a New Computing Infrastructure. Morgan Kaufmann (1998) ISBN 1558604758. 2. Foster, I., Kesselman, C., Tuecke, S.: The Anatomy of the Grid: Enabling Scalable Virtual Organizations. International Journal of High Performance Computing Applications 15 (2001) 200–222 http://www.globus.org/research/papers/anatomy.pdf. 3. EU-DataGrid (EDG): The DataGrid Project (2003) http://eu-datagrid.web.cern .ch/eu-datagrid. 4. Globus: Fundamental Technologies Needed to Build Computational Grids (2003) http://www.globus.org. 5. GridLab: A Grid Application Toolkit and Testbed (2003) http://www.gridlab.org. 6. Sun Microsystems Inc.: Sun Grid Engine 5.3. Administration and User’s Guide. (2002) http://wwws.sun.com/software/gridware/faq.html. 7. Platform Computing Inc.: PLATFORM LSF, Intelligent, policy-driven batch application workload processing (2003) http://www.platform.com/products/LSF/. 8. Barak, A., La’adan, O.: The MOSIX Multicomputer Operating System for High Performance Cluster Computing. Journal of Future Generation Computer Systems 13 (1998) 361–372 9. Bricker, A., Litzkow, M., Livny, M.: Condor Technical Summary. Technical report, Computer Sciences Department, University of Wisconsin-Madison (10/9/91) 10. Alt, M., Bischof, H., Gorlatch, S.: Program Development for Computational Grids Using Skeletons and Peformance Prediction. In: Proceedings of 3rd International Workshop on Constructive Methods for Parallel Programming (CMPP 2002), Dagstuhl, Germany (2002) citeseer.nj.nec.com/586305.html. 11. Geist, A., Beguelin, A., Dongarra, J., Jiang, W., Mancheck, R., Sunderam, V.: PVM Parallel Virtual Machine. A Users Guide and Tutorial for Networked Parallel Computing. MIT Press, Cambridge (1994) http://www.epm.ornl.gov/pvm/. 12. Wilkinson, B., Allen, M.: Parallel Programming: Techniques and Applications Using Networked Workstations and Parallel Computers. Prentice Hall (1999) 13. Marc Pawliger: Multithreading Photoshop (1997) http://www.reed.edu/  cosmo/ pt/tips/Multi.html.

14. Mainelli, T.: Two cpus in one? the latest pentium 4 chip reaches 3 ghz and promises you a virtual second processor via intel’s hyperthreading technology. PC World Magazine (2003) 15. Briggs, E.: Threaded Gimp Plugins (2003) http://nemo.physics.ncsu.edu/  briggs/gimp/. 16. Johnston, D., Fleury, M., Downton, A.: A Functional Methodology for Parallel Image Processing Development. In: Proceedings of the IEE Visual Information Engineering Conference, VIE, University of Surrey, Guildford, UK (2003) 266–269 http://www.essex.ac.uk/ese/research/mma lab/rapid/vie.pdf. 17. Seinstra, F., Koelma, D., Geusebroek, J., Verster, F., Smeulders, A.: Efficient Applications in User Transparent Parallel Image Processing. In: Proceeding of International Parallel and Distributed Processing Symposium: IPDPS 2002 Workshop on Parallel and Distributed Computing in Image Processing, Video Processing, and Multimedia (PDIVM’2002), Fort Lauderdale, Florida, U.S.A. (2002) citeseer.nj.nec.com/552453.html. 18. Nicolescu, C., Jonker, P.: EASY-PIPE - An ”EASY to Use” Parallel Image Processing Environment Based on Algorithmic Skeletons. In: Proceedings of the 15th International Parallel and Distributed Processing Symposium (IPDPS’01), Workshop on Parallel and Distributed Image Processing, Video Processing, and Multimedia (PDIVM’2001), San Francisco, California, USA (2001) http://csdl.computer.org/comp/proceedings/ipdps/2001/0990/03/ 099030114aabs.htm. 19. Squyres, J.M., Lumsdaine, A., Stevenson, R.L.: A Toolkit for Parallel Image Processing. In: Proceedings of SPIE Annual Meeting Vol. 3452, Parallel and Distributed Methods for Image Processing II, San Diego (1998) 20. Oliveira, P., du Buf, H.: SPMD Image Processing on Beowulf Clusters: Directives and Libraries. In: Proceedings of International Parallel and Distributed Processing Symposium (IPDPS’03), Workshop on Parallel and Distributed Image Processing, Video Processing, and Multimedia (PDIVM’2003), Nice, France (2003) http://csdl.computer.org/comp/proceedings/ipdps/2003/1926/00/ 19260230aabs.htm. 21. Czarnul, P.: Programming, Tuning and Automatic Parallelization of Irregular Divide-andConquer Applications in DAMPVM/DAC. International Journal of High Performance Computing Applications 17 (2003) 77–93 22. Czarnul, P., Tomko, K., Krawczyk, H.: Dynamic Partitioning of the Divide-and-Conquer Scheme with Migration in PVM Environment. In: Recent Advances in Parallel Virtual Machine and Message Passing Interface. Number 2131 in Lecture Notes in Computer Science, Springer-Verlag (2001) 174–182 8th European PVM/MPI Users’ Group Meeting, Santorini/Thera, Greece, September 23-26, 2001, Proceedings. 23. Czarnul, P.: Development and Tuning of Irregular Divide-and-Conquer Applications in DAMPVM/DAC. In: Recent Advances in Parallel Virtual Machine and Message Passing Interface. Number 2474 in Lecture Notes in Computer Science, Springer-Verlag (2002) 208–216 9th European PVM/MPI Users’ Group Meeting, Linz, Austria, September/October 2002, Proceedings. 24. Czarnul, P., Krawczyk, H.: Parallel Program Execution with Process Migration. In: International Conference on Parallel Computing in Electrical Engineering (PARELEC’00), Proceedings, Quebec, Canada (2000) 25. Czarnul, P.: Dynamic Process Partitioning and Migration for Irregular Applications. In: International Conference on Parallel Computing in Electrical Engineering PARELEC’2002, Proceedings, Warsaw, Poland (2002) http://www.parelec.org.