Executing and Visualizing High Energy Physics ... - CiteSeerX

4 downloads 81373 Views 310KB Size Report
ning CMSIM jobs and make the process more efficient by using shared Grid resources. Users can transfer and execute their simulation in a computer much more ..... intelligent job dispatcher that can select the best host computer for the user's ...
Executing and Visualizing High Energy Physics Simulations with Grid Technologies Marko Niinim¨aki, John White, Juha Herrala Helsinki Institute of Physics at CERN, CH-1211 Geneva, Switzerland

Abstract. The emergence of the Grid computing model is important for disciplines that are computing and data storage-intensive. A generic web-based interface to a computing grid is presented. The package allows the user to submit a high energy physics simulation to Grid resources, track the progress of the job and upon job completion, display graphical results.

1 Introduction Foster and Kesselman describe the Grid as follows: “The Grid is a software infrastructure that enables flexible, secure, coordinated resource sharing among dynamic collections of individuals, institutions and resources.” [12] Grid computing will be especially important in High Energy Physics. As an example let us consider the Large Hadron Collider (LHC) particle accelerator, being built at CERN. “Computational requirements of the experiments that will use the LHC are enormous: 5-8 Petabytes of data will be generated each year, the analysis of which will require some 10 Petabytes of disk storage and the equivalent of 200,000 of today’s fastest PC processors.” [23]. As Foster and Kesselman’s definition suggests, the Grid can be seen as a layered approach where applications access resources and users and user groups have access rights to applications. In this paper, we present a design where the application is a simulation of particle interaction in a high energy physics detector (CMSIM) [3]. The user interaction is carried out through a web browser. The motivation of the work presented in this paper is to simplify the process of running CMSIM jobs and make the process more efficient by using shared Grid resources. Users can transfer and execute their simulation in a computer much more powerful than their workstation. They can recover the simulation results and display histograms with their web browser. The user interface is simple and there is no need to install external programs other than some standard Grid packages. Moreover, this paper discusses many underlying technologies of Grid computing, introduces a generic Grid platform called GridBlocks, and presents a working prototype.

2 Background and Terminology To facilitate the Grid, so-called Grid middleware packages have also been created; among them Globus [11], GridEngine [6], Legion [14], and NorduGrid [21]. All of

them provide services such as resource management, authentication and authorization of users, secure (encrypted) file transfers and remote program execution. In the case of Globus, resource management and remote program execution is handled by the Globus Resource Allocation Manager (GRAM) and the Global Access to Secondary Storage (GASS); information services are provided by the Metacomputing Directory Service (MDS) [11]; user authentication and authorization are based on X.509 certificates [32]; and for file transfer Globus has the GridFTP [25]. An essential part of the Globus system is the Grid Security Infrastructure (GSI) [31], which allows secure connections to all resources (any GSI-enabled computer or storage facility) on the Grid. The user authentication is based on X.509 certificates and public key cryptography. The certificate subject acts as a generally applicable Grid identity, thus separate user IDs and passwords for different machines on the Grid are not needed. The GSI methods have also been integrated into two important programs, FTP and SSH. The FTP client and server programs have been extended and GSI-enabled to result in GridFTP [30]. The GridFTP protocol uses certificates for authorization and supports multi-threaded data transfer with variable TCP window size. The addition of GSI to SSH has resulted in GSI-SSH [10] that also uses certificates for authorization and, importantly, allows any process in possession of a user’s credential to act as that user. An important feature of the Grid X.509 authentication scheme is the ability to generate a proxy credential or certificate. A proxy certificate is typically a file generated by the user by means of a proxy generator program (in Globus, “grid-proxy-init”). The inputs to the proxy generator program are the user’s public key, private key, and personal Grid password. The generated output, derived from the original user credentials, is a new time-limited proxy certificate that holds the user’s Grid identity. The proxy certificate file can now be securely passed to various Grid services, such as the MyProxy [17] online credential repository, as the user’s temporarily valid credential. The Grid services (job submission, resource monitoring etc) are accessed through the Globus toolkit, generally through the Unix command line interface. There are efforts to interface Globus to existing programming languages, such as Java, Perl, and Python through Commodity Grid (CoG) Kits. These CoG toolkits provide the interface to create more user-friendly programs such as Web-based Grid portals. One Grid portal project, the open-source Grid Portal Development Toolkit (GPDK) [22] “seeks to provide generic user and application portal capabilities”, by using Java server pages (JSP), Java, and the Java CoG toolkit [26].

3 The GridBlocks Portal In this Section we introduce the GridBlocks portal, a novel open source Grid interface that simplifies the execution of Grid applications. GridBlocks integrates some aspects of GPDK with an XML presentation layer, abstraction of Grid resources, and a persistent data storage. There are other Grid portal projects aimed towards the field of particle physics job submission, they are GRAPPA [8] and Genius [9]. GRAPPA is a portal aimed at one particular future particle physics experiment, ATLAS [2] and Genius is a European DataGrid-specific portal. These portals function under different philosophies: the

GRAPPA portal runs as a personal access point to the Grid that runs on the user’s workstation; the Genius portal runs as a highly centralised portal that is not envisioned to run as a personal program. The GRAPPA portal uses Jython and a CoG kit to run job scripts and Genius uses the proprietary software EnginFrame [7] to connect to Grid services. The GridBlocks portal can be run in a similar fashion to these other Grid user interface programs; it functions as a centralised, multi-user service in the same manner as Genius. GridBlocks can also be run as a personal portal, accessible from any browser like the Grappa interface. The GridBlocks portal, like GRAPPA, uses an open source CoG kit to submit jobs and has been designed to be a general-purpose job submission platform that works with many Grid middleware flavours. 3.1 GridBlocks Background The Helsinki Institute of Physics GridBlocks (http://gridblocks.sourceforge.net) is an open-source Grid portal that extends the implementation of GPDK. Written in Java, GridBlocks can utilize the Globus Java CoG toolkit, but common Grid services, like file transfer and job execution, are defined as abstract methods and can be implemented in different ways. In addition to the CoG implementation, GridBlocks currently supports executing NorduGrid client programs and Globus command line utilities for job execution and file transfer. In most cases it is unreasonable to expect the user to wait for the execution of their job, given that the execution can last many hours or days. Thus, GridBlocks supports storing and retrieving job metadata in a database. In our implementation this means that the GridBlocks server monitors job execution and, as it finds that the information (such as job output) has changed, it stores the new information in the database. Thus, after having submitted their job, the user can log out, login later, and retrieve the results of their job. 3.2 GridBlocks Architecture GridBlocks platform is a Grid portal with the additional features of XML presentation, Grid abstraction and a persistent data storage (a relational database). Technically, GridBlocks is a Java servlet that is executed under the control of a servlet container, in our case the Apache Tomcat [24] web server. The basic architecture is illustrated in Figure 1. GridBlocks promotes the separation of the user interface from the actual data. This approach has several benefits, for example, it simplifies designing of suitable user interfaces according to requirements of different users (NorduGrid users, Globus users) or different client programs and devices (web browsers, wap phones). For XML presentation and converting Java objects into XML, GridBlocks utilises Apache Cocoon ([19]) and Joda ([18]) technologies. The user interface communicates with physical Grid resources through various abstraction layers. A “Grid resource” is a collection of descriptions of methods and services that are supported by a physical resource. A “Grid domain” is an abstraction for a group of Grid resources that share common functionality. Thus the same job submission

methods and file transfer techniques can be used when communicating with different resources of a certain domain, like the NorduGrid domain. The “Grid resource manager” holds information about the user privileges on different domains and delegates remote jobs. A “Grid job” is an interface that hides the implementation of low-level portal operations such as remote job submission, status request and retrieving job results. These operations usually vary from one Grid platform to another. Embedded in GridBlocks, we introduce a generic notion of a document, which enables us to map portal objects, like “Grid resources” and “Grid jobs”, to XML documents. A document has an identifier and some (possibly structural) fields. A document can be stored in a database; in our default implementation we use a small relational database HSQLDB (http://hsqldb.sourceforge.net). Consequently, a document can be retrieved from the database and transformed to a new portal object. Finally, the generic notion of a document has enabled us to create storable implementations of different types of Grid tasks (file transfer, remote job execution) and their results. For the time being, the GridBlocks portal can create, execute, store and display Globus and NorduGrid tasks.

4 High Energy Physics Simulation with CMSIM and GridBlocks One of the use-cases studied in the portal project is to run a simulation of a future high energy physics detector, the Compact Muon Solenoid [27] (CMS) at CERN’s Large Hadron Collider [15]. In this section we first present the CMS simulation package, CMSIM [4]. We then describe the workflow and discuss the software components employed in running and visualizing CMSIM simulation jobs using GridBlocks. Furthermore, we introduce a concrete example of the authentication, job submission, monitoring and visualization process. 4.1 CMSIM Terminology CMSIM is a simulation package for computational studies of the CMS detector with different configurations. This Monte Carlo program stochastically simulates the decay chain of short-lived high energy particles and then tracks the interaction of these particles with the CMS detector. CMSIM is written in Fortran and uses various wellestablished CERN software packages to simulate the particle decay chains and CMS detector geometry. The input and output of CMSIM are flat files; in the case of the output the flat file is written in the ZEBRA [13] format and each simulation run produces one output file based on a set of runtime parameters. As CMSIM is Fortran-based it is very portable and can be run on almost any platform and is thus ideal for testing in a heterogeneous Grid environment. The visual output from a CMSIM job is written in the ZEBRA format that can be read by HBOOK [1], a histogramming, fitting and data presentation software package developed at CERN. The front-end for HBOOK is the Physics Analysis Workstation (PAW) [5] and recently the PAW functions have been implemented in a Java analysis package (JAS) [29]. Thanks to the open source license of JAS, we have been able to utilize the visualization code in GridBlocks.

4.2 CMSIM Workflow The workflow of a CMSIM simulation typically contains many steps, including the preparation of the job, file transfers, job execution and retrieving the results. The job preparation phase follows the steps that a physicist would manually perform when submitting a computing job to a batch processing centre. In general a physicist must send the correct version of the executable program and input file(s) from the host machine (generally a desktop workstation) to the target machine (batch job processor). – A GSI-SSH job checks if the job executable exists on the target machine. – If yes, two GSI-SSH jobs check the md5 sums of the executables on both the host machine and the target machine. If the md5 sums differ, the executable must be transferred to the target machine. Otherwise, the executable on the target machine is assumed to be up-to-date. – The executable and input file(s) are packaged on the host machine and the package is transferred to the target machine using the GridFTP utility. – The package is extracted on the target machine via GSI-SSH and the job files are placed in the correct directory. – The batch job is started by a GRAM job request that communicates with the Globus gatekeeper on the target machine. Traditionally, physics experts have carried out all these steps manually. In the Grid environment, there exist basic resource specification languages (RSLs) for expressing resource requirements and workflow, including Globus RSL and NorduGrid xRSL [20]. With heterogeneous Grid environments, one cannot assume that an RSL execution environment is available in a given node. We are currently in the process of designing an RSL for job control that will be totally controlled by the GridBlocks platform, and therefore no job control language capacities would be expected from the underlying environments. 4.3 An Example of Using GridBlocks for CMSIM Simulation In this section, we describe the process of preparing, executing and visualizing a CMSIM job from the user’s point of view. As a prerequisite, we assume the that the user has their personal certificates and he can initiate a proxy file by grid-proxy-init -out /tmp/PROXY. User authentication and authorization is carried out by user certificates, both when logging into the portal and in file transfer and job execution. The user can either upload their proxy certificate file to the portal or let the portal fetch the proxy certificate from a MyProxy server. Figure 2 illustrates the user logging into the system. The portal authenticates the user based on their Grid identity, the subject of their proxy certificate. If the proxy certificate is valid (not expired, signed by a trusted certificate authority etc.), the user’s login to the portal succeeds. The portal then contacts the MDS server and inspects which resources are available and whether the user has the rights to access them. Inspecting the user rights is based on fetching the grid-mapfile

from each of the accessible computers and checking if the user’s Grid identity is listed in them. A grid-mapfile typically maps Grid identities to local user IDs. Once logged in, the user selects CMSIM, fills in the form and submits the job as in Figure 3. In the case of our example, the user knows which one of the computers is suitable for executing his job. This, of course cannot be the case in general; a more intelligent job dispatcher that can select the best host computer for the user’s job is being developed. Work flow support is one of the main focus areas in order to make the application flexible for end users. For the time being, a CMSIM job is packaged in an archive file that contains a “run cmsim” executable, the actual cmsim program and a configuration file. the run cmsim executable contains the instructions of compiling and executing the job. After the job has been executed, it prints the name of the directory that contains the results. The archive file is transfered to the computer indicated by the user, then unarchived there and run cmsim is executed. The portal probes the execution at frequent intervals and retrieves the output. While this method is simple and well understood by the users, alternative methods are being studied and implemented. It would be beneficial if the work flow could be coded with a simple syntax and the work flow control file could be executed directly under the control of the portal. For these purposes, the Jakarta Ant package [16] provides many tools. File transfer relies on certificate-based GSI-enabled file transfer tools like GridFTP. The user’s proxy certificate is used to authenticate them to remote systems. Controlled remote execution can employ several methods, since with GridBlocks, a “job” and “job submission” can have many implementations. For jobs that run for a relatively short time, we use GSI-SSH, a remote execution facility that is a part of Globus. For jobs that run for hours or days, it is better to use globus-job-submit utilities that exit after submitting the job and return an identifier by which the job status can be probed. The portal then probes the status of the job (for example once per minute) and informs the user about status changes. After having submitted their job, the user can log out and re-login later. They will find their finished job in the “Saved data” section and inspect it, as in Figure 4. He can retrieve the output files by pressing the Transfer link, or visualize the results immediately by pressing the Visualize link. Visualizing the results. After the job has been run, the resulting HBOOK file can be transfered in the portal computer and visualized by using a histogram servlet (included in the portal). A visualization menu, as in Figure 5, is presented and the user selects the histogram he wants to inspect. The results are shown on the right hand side of the figure.

5 Summary and discussion In this paper we have presented a user friendly method of executing CMSIM jobs and visualizing their results. The design is based on Grid technologies, including certificatebased authorization and authentication and secure file transfer. Moreover, Grid technologies enable the user to execute their jobs in a computer that is much faster than their workstation. Since our software is a Java servlet that runs under a servlet capable web server, the user interface (both in job submission and visualization stages) is familiar to users that have experience in World Wide Web applications. The portal as such can be easily extended and modified. In addition to CMS simulations, it has been used by the NorduGrid community for remotely executing NorduGrid jobs. Instead of Apache Tomcat, the portal can be run under other servlet containers, too. We have tested the portal with JBoss (see [28]), a clustering servlet container, and it seems to provide the portal with good scalability. Executing and visualizing CMSIM jobs has been developed as a part of GridBlocks platform. GridBlocks is free software, available at http://gridblocks.sourceforge.net. A demonstration of the software is available at http://opengrid.cern.ch

References 1. Computing Application Software Group and Networks Division. HBOOK – Statistical Analysis and Histogramming Reference Manual. CERN, version 4.22 edition, 1994. Available at http://wwwinfo.cern.ch/asdoc/hbook html3/hboomain.html. 2. ATLAS. Atlas - A Toroidal LHC ApparatuS. Available on: http://atlas.web.cern.ch, 2003. 3. CMS Collaboration. CMS simulation package CMSIM. Available on: http://cmsdoc.cern.ch/cmsim/manual/cms126/manual.html, 2002. 4. CMS detector simulation software group. CMS Simulation Package CMSIM, User’s Guide and Reference Manual. 5. Information Technology Division. Physics Analysis Workstation. CERN, 1999. CERN Program Library Long Writeup Q121. 6. Grid Engine. Welcome to grid engine. Available on: http://gridengine.sunsource.net/, 2002. 7. EnginFrame. Enginframe. Available on: http://www.enginframe.com, 2003. ˜ 8. D.Engh et al. Grappa: Grid access portal for physics applications. Available on: http://arxiv.org/abs/cs.DC/0306133, 2003. ˜ 9. R.Barbera et al. The genius grid portal. Available on: http://wwwconf.slac.stanford.edu/chep03/register/ administrator/papers/papers/TUCT001.PDF, 2003. 10. National Center for Supercomputing Applications. Gsi-enabled openssh. Available on: http://www.ncsa.uiuc.edu/Divisions/ACES/GSI/openssh/, 2003. 11. I. Foster and C. Kesselman. Globus: A metacomputing infrastructure toolkit. International Journal of Supercomputer Applications, 11(2), 1997. 12. I. Foster, C. Kesselman, and S. Tuecke. The anatomy of the Grid: Enabling scalable virtual organizations. International Journal of Supercomputer Applications, 15(3), 2001. 13. M. Goossens. The ZEBRA system. CERN, 1995. 14. A. Grimshaw and Wm. Wulf. The legion vision of a worldwide virtual computer. Communications of the ACM, 1997. 15. CERN LHC Working Group. The large hadron collider (LHC) in the lep tunnel. Particle Accelerator, 26(141-150), 1990. 16. Eric M. Burke J. E. Tilly. Ant: The Definitive Guide. O’Reilly, 2002. ˜ ˜ ˜ 17. V.Welch J.Novotny, S.Tuecke. An online credential repository for the grid. In Proceedings of the Tenth International Symposium on High Performance Distributed Computing (HPDC10). IEEE Press, 2001. 18. Joda.org. Joda - the java properties framework. Available on: http://joda.sourceforge.net, 2002. 19. S. Mazzocchi. Introducing cocoon 2.0. Available on: http://www.xml.com/pub/a/2002/02/13/cocoon2.html, February 2002. 20. Nordic Testbed for Wide Area Computing and Data Handling. Extended resource specification language. Available on: http://www.nordugrid.org/documents/xrsl.pdf, 2002. 21. NorduGrid. Nordugrid, from world wide web to world wide grid -creating a nordic testbed for wide area computing and data handling. Available on: http://www.nordugrid.org/documents/brochure.pdf, 2002. 22. J. Novotny. The grid portal development kit. Concurrency: Practice and Experience, 2000. 23. LCG Project. The LHC computing grid project - LCG. Available on: http://lcg.web.cern.ch/LCG/, 2003. 24. The Apache Project. Apache tomcat. Available on: http://jakarta.apache.org/tomcat, 2002. 25. The Globus Project. Gridftp universal data transfer for the grid. Available on: http://wwwfp.globus.org/datagrid/deliverables/C2WPdraft3.pdf, 2000. 26. The Globus project. Commodity Grid Kits. Available on: http://www-unix.globus.org/cog/, 2002.

˜ Seez. CMS, a general purpose detector for the LHC. Nuclear Instruments and Methods, 27. C.J. A344(1-10), 1994. 28. S. Stark, M. Fleury, and The JBoss Group. JBoss Administration and Development. Sams Publishing, 2002. 29. Java Analysis Studio. JAS - a general purpose framework for data analysis. Available on: http://jas.freehep.org, 2002. 30. The Globus Project. Gsiftp tools for the data grid. Available on: http://www.globus.org/datagrid/deliverables/gsiftp-tools.html, 2002. 31. The Globus Project. Overview of the grid security infrastructure. Available on: http://www.globus.org/security/overview.html, 2002. 32. ITU X.509. Information technology - open systems interconnection - the directory: Publickey and attribute certificate frameworks. Technical report, ITU, 2000.

Servlet Platform Cocoon XML presentation framework GridBlocks XML layer

GridBlocks Data and Grid Management

Relational database Globus (CoG) implementation implementation

NorduGrid implementation

Fig. 1. GridBlocks architecture. The Grid and database access methods are hidden behind the XML presentation layer.

Fig. 2. User login. The proxy file on the local machine is selected in the window and submitted via the “Login” button.

Fig. 3. At the top of the picture the general functions can be seen. At the bottom, the specifics of submitting a CMSIM job are illustrated.

Fig. 4. The history and details of a job are shown. The actions attached to a job (transfer and visualize) are available from this screen.

Fig. 5. On the left the histogram menu is shown. The histogram(s) are selected and the resulting display is seen on the right.