3D iterative tomographic reconstruction using

1 downloads 0 Views 360KB Size Report
3D Iterative Tomographic Reconstruction. Using Graphics Processors. S. A. Zolotarev and V. L. Vengrinovich. Institute of Applied Physics, National Academy of ...
See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/227324946

3D iterative tomographic reconstruction using graphics processors Article in Russian Journal of Nondestructive Testing · August 2009 DOI: 10.1134/S1061830909080117

READS

34

2 authors: Sergei Alekseevich Zolotarev

Valeriy. Lvovich. Vengrinovich

National Academy of Sciences of Belarus

National Academy of Sciences of Belarus

51 PUBLICATIONS 35 CITATIONS

53 PUBLICATIONS 80 CITATIONS

SEE PROFILE

All in-text references underlined in blue are linked to publications on ResearchGate, letting you access and read them immediately.

SEE PROFILE

Available from: Sergei Alekseevich Zolotarev Retrieved on: 17 August 2016

ISSN 1061-8309, Russian Journal of Nondestructive Testing, 2009, Vol. 45, No. 8, pp. 582–591. © Pleiades Publishing, Ltd., 2009. Original Russian Text © S.A. Zolotarev, V.L. Vengrinovich, 2009, published in Defektoskopiya, 2009, Vol. 45, No. 8, pp. 82–94.

X-RAY METHODS

3D Iterative Tomographic Reconstruction Using Graphics Processors S. A. Zolotarev and V. L. Vengrinovich Institute of Applied Physics, National Academy of Sciences of Belarus, Akademicheskaya ul. 16, Minsk, 270072 Belarus E-mail: [email protected] [email protected] Received March 5, 2009

Abstract—An original parallel computional algorithm is developed for 3D tomographic reconstruction in a conic beam based on graphics processors. DOI: 10.1134/S1061830909080117

INTRODUCTION Computational diagnostics including computer tomography faces the need of multiple acceleration of computational procedures at an ever-increasing frequency. An important direction is the application of general-purpose graphics processors making it possible to accelerate different algorithms of computer tomography. They, as a separate branch, are included in the class of multicore processors whose appearance became a qualitative leap in the field of developing efficient supercomputers with higher performance/cost ratio compared to existing systems based on supercomputers and cluster systems. The application of multicore processors provides flexible possibilities for changing configurations and scaling of the power of computational systems, namely, from personel computers and work stations to servers and cluster systems. Theoretically, owing to their exclusive computational capabilities, multicore processors are the most promising hardware for information technologies connected with intensive computations. Now, among multicore processors with parallel architecture, graphics processing units (GPUs) and central processing units (CPUs) of the IBM CELL and Intel Core type are the best known. In the last decade, GPUs have been developed the most dynamically, resulting from the computer graphics requirements for increasing the computational power of graphics cards, which is needed for creating high-quality real-time images. Thus, it is a GPU that we consider below as a device capable of greatly accelerating the process of 3D tomographic reconstruction. In 1994, B. Carbal, N. Cam, and J. Foran [1] described how computer tomograpy could be implemented (based on the method of filtered back projections) with a hardware support of texture mapping using a GPU. Later, K. Mueller and R. Yagel [2] used the same hardware for iterative reconstruction (by the SART method), which used texture mapping of some volume for direct projection and back texture mapping of the volume for back projection. While both operations can be done by implementing the standard instructions of the OpenGL graphics library, which uses projective textures [3] for back projection at random angles, the main limitation was related to a lack of accuracy in performing floating-point operations. This required accumulation of the results of back projection using a conventional serial processor that resulted in a significant deceleration in the computational speed owing to slow bulk transmission of data from the computer memory to the videocard texture memory. Modern graphics processors do not have these limitations. In most cases, they are programmable, completely support single-precision floating-point arithmetic and, as a result, using the PCI-Express protocol, quite rapidly transmit data from the computer memory to the videocard texture memory. MODERN GRAPHICS PROCESSORS WITH PARALLEL ARCHITECTURE Modern GPUs inlude completely programmable parallel geometrical and pixel processors equipped with a set of commands for performing arithmetic and logic operations with the support of a 32-bit format of vector and scalar floating-point operations. Such GPUs became attractive for implementing nongraphical computations and are considered as general-purpose graphics processors (GPGPUs). This fact is stimulated by 582

3D ITERATIVE TOMOGRAPHIC RECONSTRUCTION

583

two main factors: the performance/cost criterion and the performance growth rate of a GPU, which doubled every six months. At the end of 2006, NVIDIA produced a GeForce 8800 (G80) new-generation graphics processor. This processor is a high-performance multicore and multithreaded microprocessor. Videocards based on a G80 support the SLI technology, which provides the parallel operation of several GPUs, and allow one to achieve a performance of 520 GFLOPS. HARDWARE ACCELERATION OF ITERATIVE TOMOGRAPHIC RECONSTRUCTION IN A CONIC BEAM Algebraic reconstriction techniques (ARTs) and the related algebraic reconstruction technique with simultaneous iterations (SART) are used for reconstrcting 2D or 3D objects in terms of X-ray projections. Algebraic techniques in certain cases have significant advantages over a more popular technique of filtered back projections. However, the very low rate of convergence of these iterative computational techniques does not allow them to be used in many important practical applications. Nevertheless, this limitation can be overcome by 2D texture images, which may be easily implemented using conventional graphics videocards, and this approach can be used for accelerating 3D algebraic reconstruction. In spite of the fact that the most widespread practical techniques of tomographical conic-beam reconstruction are based on the technique of filtered back projections (FBP) (see [4]), but subsequent studies [5] have shown that both the ART (with some modifications) and SART methods can reconstruct conventional cone data with a high accuracy for cone angles as large as 60°. However, iterative computations are slow [6], and this limitation in computation speed does not allow one to use ART in different practically important industrial and medical tasks. We would like to overcome this limitation because there are many problems for which ART has significant advantages over the more widely used FBP technique. For example, the advantages of the application of ART are obvious when there is a limited number of projection forms and the projections are uniformly distributed within the angle range set or when the projections are rarefied, or when there are no directions of X-raying at all [7, 8]. However, note that there are other quite effective algorithms of 3D tomographic cone-beam reconstruction which are devoted to solving this problem [9]. Tomographic reconstruction with algebraic techniques is reduced to solving the following system of algebraic equations: N

Pi =

∑w

ij v j .

(1)

j=1

in N = n3 voxels belonging to the reconstruction field, where index j changes

We must recover the value vj from 1 to N, using values pi in pixels with the current index i belonging to the projection images Pϕ. The angle ϕ features the geometrical orientation of the source–detector pair for which X-raying is performed; in this context, a circular path of motion of the source around an object (without distortion of the integrity of the future consideration) is assumed. In Eq. (1), wij is the weighting ratio with which the voxel with the index j contributes its value to the pixel i. SART solves this system of equations by the iterative technique, which simulteneously corrects the values in the jth voxel for all voxels belonging to the reconstruction field in the current iteration with the number k as indicated in Eq. (2), N ⎛ ( k – 1 )⎞ w il v l ⎜ pi – ⎟ ⎜ ⎟ l=1 -⎟ ⎜ --------------------------------------N ⎟ pi ∈ Pϕ ⎜ w il ⎜ ⎟ ⎝ ⎠ l=1 (k – 1) = vj + λ ---------------------------------------------------------- . w ij



(k) vj





(2)



pi ∈ Pϕ

In Eq. (2), λ is the relaxation parameter, which usually is much smaller than unity, λ  1.0. The procedure of correcting voxels by the SART algorithm can be divided into several steps. Linear integrals can be calculated by presenting a discrete 3D image in the form of a field of 3D interpolating cores (overlapping) so that each such core is located at the center of a voxel and approximates the attenuation of this voxel vj. The weighting coefficient wij with which the jth voxel interacts with the ray ri is then a linear integral on the segment of intersection of the beam with the function of the voxel core [10]. The direct projection of the RUSSIAN JOURNAL OF NONDESTRUCTIVE TESTING

Vol. 45

No. 8

2009

584

ZOLOTAREV, VENGRINOVICH

reconstructed volume by the SART technique using texture mappings, which have hardware implementation on graphics processors, are described in detail below. The 3D image reconstructed is divided into n layers and each layer is processed individually. When projecting a reconstructed object (Fig. 1), each layer is associated with a quadratic polygon on which a 2D texture is superϕ imposed, which represents the values of the reconstructed parameter in the corresponding layer of the object. Turning the polygon with the superimposed texture by the angle of the geometrical location of the radiation source Screen (projection image) ϕ and performing perspective projection with the cone angle γ, we thereby image the appropriately weighted Fig. 1. Direct projection. Projection geometry. voxels in the considered layer of the reconstructed volume on the pixels of the projection image. This picture illustrates the projection algorithm in which the image of the reconstructed layer is projected onto the screen (frame buffer) by a graphics videocard. However, the projections of layers are accumulated by a program technique in the computer RAM owing to the limited number of bits used for presenting the data in the frame buffer. After all n texture polygons are accumulated in the program buffer, it will contain the projection of the reconstructed volume for the projection angle ϕ. Further, we describe the calculation of the correcting image in detail. After the projection image is generated using hardware, the correcting image is calculated by the program technique in the computer central processor. At this stage, hardware acceleration is impossible because the graphics processor cannot process negative-value data. First, the calculated projection image is subtracted from the projection image obtained by experiments. After this, the resulting image is divided into a weighting image also calculated for this orientation. The weighting image represents the sum of weights in the denominator of the fraction located in the numerator (2) for each individual pixel. Ideally, this sum of the weights is equivalent to the length of intersection of the beam with the solid sphere and this length can be calculated theoretically. However, in fact, it is not the same because of the approximate integration of the beam and the interpolation rounding errors when projecting layers using hardware support. For this reason, the weighting images are calculated by the same technique as projections, i.e., by projecting the volume where all voxels inside the spherical field of reconstruction are 1.0 and all the other voxels are zero. This weighting image can be repeatedly used for all reconstructions that have the same angular step between projections and the same cone angle of X raying. Finally, while the texture image must have a value only in the range [0.0, 1.0], and the correcting image can have a value within the range [–1.0, 1.0], we must scale and convert the values of the correcting image to the range [0.0, 1.0]. Note that the values in the layers of the reconstructed object always lie within the interval [0.0, 1.0]. In performing a back projection, it is necessary to superimpose the correcting image on the layers of the reconstructed volume. This is achieved by association of each layer of an object one by one with the screen on which the correcting image is projected. Moreover, this correcting image is preliminarily converted into the texture and superimposed on the polygon. In principle, this situation is the reverse of that in Fig. 1 with a screen, which represents the layer of the reconstructed object, and the polygon with the superimposed texture, which represents the correcting image. It should be noted that, in contrast to the previous case, the main viewing axis is no longer perpendicular to the screen and does not always intersect the center of the screen (i.e., the center of the object layer). Although the OpenGL graphics library does not allow the viewing axis to be inclined with respect to the screen, this is not a barrier to correct projection in this case either. But note that simple inversion of a direct projection for performing a back projection is impossible for the OpenGL graphics interface. We can indicate an approach that allows us not to consider the layers of the reconstructed volume as the screen and thereby to avoid the problem arising for the reason of mismatching of the intersection point of the viewing axis with the screen center. This approach uses projective textures described by M. Segal [3], which operate as projected slides. Let us consider Fig. 2, which illustrates this technique. As opposed to the direct approach, now, the correcting image is perspectively projected onto the polygon located in the position of the current layer of an object, which must be appropriately corrected. The correcting image projected onto this polygon is then reprojected into a frame buffer by an orthographic projection. Thus, the correcting image serves as a slide (projective structure), which is projected by a conic light source onto the slide screen (polygon, which is geometrically aligned with the layer of the reconstructed object), and the projected slide is then reprojected by a beam of parallel rays in the frame buffer. Texture polygon, conic angle γ

(Volume layer)

RUSSIAN JOURNAL OF NONDESTRUCTIVE TESTING

Vol. 45

No. 8

2009

3D ITERATIVE TOMOGRAPHIC RECONSTRUCTION

585

Volume layer = Slide screen (projection polygon) X S

ϕ

r

γ –Z

Correction of volume layer Projective texture – slide (frame buffer) (correcting image)

Fig. 2. Back projection. Projection geometry.

NEW TECHNIQUE FOR INCREASING THE SPEED OF TOMOGRAPHIC RECONSTRUCTION Let us consider in detail the implementation of tomographic reconstruction with hardware acceleration based on the SART algorithm. Algorithm of reconstruction by the SART method with hardware acceleration 1: Initialize a 3D digital image by the initial value 2: while Iterative process converges, i.e., discrepancy on the current vector of the reconstructed image is less than that in the previous iteration 3: do 4: Go to the next projection Pϕ 5:Read the current layer of a reconstruction object from the computer memory 6: Load the layer data into the texture memory of the videocard 7: Directly project the object layer onto the screen (frame buffer) using hardware acceleration (graphics processor) 8: Read the data in the frame buffer from the videocard’s texture memory to computer’s processor memory 9: Add the read projected data of the current layer and add them to the projection data of the previous layers 10: if Projection layer is not the last then 11: Go to 5: 12 else 13: Calculate the correcting image 14: Load the calculated correcting image in the texture memory 15: end if 16: Display the correcting image using hardware acceleration (graphics processor) on the current layer of the reconstructed object (frame buffer) 17: Read the frame buffer’s data from the videocard’s texture memory in the computer’s processor memory 18: Calculate corrections for correcting the object layer 19:Store the corrected layer in the computer memory 20: if The corrected layer is not the last then 21: Go to 16: 22: else 23: Go to 4 21: end if 22: end while RUSSIAN JOURNAL OF NONDESTRUCTIVE TESTING

Vol. 45

No. 8

2009

586

ZOLOTAREV, VENGRINOVICH

As follows from the consideration of the detailed procedure of implementing the SART iterative algorithm using hardware support, only two stages of this procedure are hardware-accelerated via their implement ation on a graphics processor of the videocard, namely, steps 7: and 16:. Note, first, that step 8: is performed very slowly, because the data from the texture memory are tramsmitted much more slowly to the computer’s processor memory than the data are transmitted inside the texture memory. Second, the same can refer to step 14:, at which the correcting image from the computer’s RAM is loaded into the videocard’ texture memory, and, finally, at step 17:, there is deceleration as at step 8:. It is clear that, in order to produce the maximal speed attributed to the use of graphics accelerators, it is necessary to perform all operations inside the texture memory of the videocard addressing the computer’s processor memory only when it is necessary to transmit the reconstructed 3D digital image. We developed and programmatically implemented a new algorithm that performs about 95% of all the steps of 3D tomographic reconstruction using only the videocard’s texture memory. We describe the basic principles of this algorithm. Because most work stations and personal computers are equipped with videocards whose frame buffer has accuracy limited to 12 or sometimes 8 bits, one has to use the program buffer for tomagraphic reconstruction, as described earlier. The accumulation of the projected layers in the program buffer results in catastrophic deceleration of the algorithm. For the software developed under the OpenGL application graphical interface, a simpler and more efficient technique capable of avoiding these difficulties is the new extension GL_EXT_framebuffer_object. This extension ensures the base for direct output into the texture. The main advantage of the object of the frame buffer is that it supports floating-point data; i.e., it ensures an accuracy of 32 bits, which is quite sufficient for the algorithms of tomographic reconstruction. An additional convenience is that it allows one to obtain a correcting image just in the frame buffer when projecting the layers of a reconstructed object and, just from there, throw it into the projective structure, which will be displayed at the location of the corrected layers of the recovered object during back projection. Below, we cite a code fragment commented in detail in the C++ programming language using the OpenGL graphics library that describes the developed direct projection process in detail. //Create frame buffer glBindFramebufferEXT(GL_FRAMEBUFFER_EXT,g_frameBuffer); glFramebufferTexture2(GL_FRAMEBUFFER_EXT,GL_COLOR_ATTACH MENT0_EXT,GL_TEXTURE_RECTANGLE_ARB,g_dynamicTextureI,D0); //Set frame buffer as the input buffer and output buffer glDrawBuffer(GL_COLOR_ATTACHMENT0_EXT); glReadBuffer(GL_COLOR_ATTACHMENT0_EXT); //Clear the frame buffer gLClear(GL_COLOR_BUFFER_BIT|GL_DEPTH_BUFFER_BIT); //Enable 3D textures glEnable(GL_TEXTURE_3D); gLTexEnvf(GL_TEXTURE_ENV,GL_TEXTURE_ENV_MODE,GL_RELACE); glMatrixMode(GL_PROJECTION); //Bind to the texture containing the reconstruction object glBindTexture(GL_TEXTURE_3D, texName2[0]); // External cycle with respect to the number of iterations and the internal cycle with respect to the number of projections for(kk=0;kk