Data Centric Highly Parallel Debugging - CiteSeerX

0 downloads 0 Views 627KB Size Report
First, every processor hashes the data structure to produce a small ...... in Constraint Logic Programming". Lecture. Notes in Computer Science, vol. 1520, p. 472,.
Data Centric Highly Parallel Debugging § David Abramson, Minh Ngoc Dinh, Donny Kurniawan † Bob Moench, Luiz DeRose § Faculty of Information Technology, Monash University, Clayton, 3800, Victoria, Australia † Cray Inc, Cray Plaza, 380 Jackson St, Suite 210 St. Paul, MN 55101 USA Abstract Debugging parallel programs is an order of magnitude more complex than sequential ones, and yet, most parallel debuggers provide little extra functionality than their sequential counterparts. This problem becomes more serious as computational codes become more complex, involving larger data structures, and as the machines become larger. Peta-scale machines consisting of millions of cores pose a significant challenge for existing techniques. We argue that debugging must become more data-centric, and believe that “assertions” provide a useful model. Assertions allow a user to declare their expectations about the program state as a whole rather than focusing on that of only a single process state. Previously, we have implemented a special type of assertion that supports debugging applications as they evolve or are ported to different platforms. They allow a user to compare the state of one program against another reference version. These ‘relative debugging’ assertions, whilst powerful, pose significant implementation challenges for large peta-scale machines. In this paper we discuss a hashing technique that provides a scalable solution for very large problems on very large machines. We illustrate the scheme on 65k cores of Kraken, a Cray XT5 at the University of Tennessee. Categories and Subject Descriptors !"#"$%&'()*++,(-%.+'/+0112(/3% !"4"5%6,7-2(/%0(8%!,9*//2(/%

!"#$%&'() .0+0::,:%&'1;*-2(/