Fast GPU-based ray tracing in radial GRIN lenses Shuma Horiuchi,* Shuhei Yoshida, and Manabu Yamamoto Department of Applied Electronics, Tokyo University of Science, 6-3-1 Niijuku, Katsushika-ku, Tokyo 125-8585, Japan *Corresponding author: [email protected] Received 9 April 2014; revised 28 May 2014; accepted 30 May 2014; posted 2 June 2014 (Doc. ID 209870); published 30 June 2014

This paper describes that in ray tracing of a radial gradient-index (GRIN) lens, analysis with a graphics processing unit (GPU) can complete faster than analysis with a central processing unit (CPU). The refractive index of the radial GRIN lens varies in the direction perpendicular to the optical axis. We prepare two types of the refractive index distribution for the radial GRIN lens. One type of distribution is represented by the power series expansion equation and the other is an arbitrary distribution type represented by a cubic spline interpolation curve. Although the performance of ray tracing with these distribution representations varies between representations, in both representations, the analysis with a GPU is about 19 times (on average) faster than that with a CPU. The average GPU effective performance is 90% and 40% when the refractive index distribution is given by the power series expansion equation and spline interpolation curve, respectively. These increased performances indicate that ray tracing with these distributions is very effective for the GRIN lens illumination and rendering analyses, which have to trace a significant number of rays. © 2014 Optical Society of America OCIS codes: (080.1753) Computation methods; (080.2740) Geometric optical design; (110.2760) Gradient-index lenses. http://dx.doi.org/10.1364/AO.53.004343

1. Introduction

The radial gradient-index (GRIN) lens has a refractive index distribution that varies in a perpendicular direction to the optical axis and has been used as an imaging lens. In addition, it can be applied in areas that include line sensors and optical fibers. There have been many reports [1–7] about the principle and methodology for the ray tracing performance analysis of optical systems in the GRIN media and GRIN lenses. There have also been many reports concerning ray tracing with a graphics processing unit (GPU), most of which have focused on the field of computer graphics [8,9]. In this paper, we will show that the radial GRIN lens can be represented in a form suitable for ray tracing with a GPU, and that with adoption of this form, ray tracing with a GPU can be faster than that performed with a central processing unit (CPU). It is known that a GPU can fully exert its performance in an algorithm in which the thread synchronization 1559-128X/14/194343-06$15.00/0 © 2014 Optical Society of America

and memory access are less frequent, and ray tracing in the GRIN lens does meet these conditions. Since rays propagating in the GRIN lens are mutually independent, they are not required to be synchronous. In ray tracing of the GRIN lens, the equations are sequentially solved in the GRIN media and the memory is not accessed in this sequential computation. Since the optical ray vector does not change along the optical axis of the radial GRIN lens, all of the sequential computation parts of the ray tracing equations can be represented with a form that can be solved with the product-sum operations. That is, since division and special function operations are not required, the equations can be solved at a high speed with the GPU. In the aforementioned discussions, it is assumed that the refractive index distribution is represented by power series expansion equations in which the higher-order aberration is included. However, in this paper, in addition to this power series expansion representation of the distribution, a distribution type represented with cubic spline interpolation is proposed. When the refractive index distribution is represented with cubic spline interpolation, the division operation, a special 1 July 2014 / Vol. 53, No. 19 / APPLIED OPTICS

4343

function, and access to memory are required to solve the equations. However, despite these additional requirements, the GPU can solve the equations faster than a CPU. These representations of the refractive index distribution that allow the equations to be solved at a high speed have been deemed effective and efficient in the analysis of the illumination optics and rendering [10] that include many rays of light.

(1)

where r is the variable-position vector on the ray, n is the refractive index, and ds is the geometric increment along the ray path. This equation establishes itself in the Cartesian coordinate system, and its z axis is on the optical axis. The optical ray vector is expressed by a p; q; l and the next relation holds true: (2)

Since l is constant in the radial GRIN lens, l in the above equation is denoted as l0 (l l0 ). Under these conditions, the first-order differential equation respectively for x, y, p, and q is transformed in the following equation [11]:

(3)

Rn1 Rn K1 2K2 2K3 K4 ∕6;

4344

APPLIED OPTICS / Vol. 53, No. 19 / 1 July 2014

K2 hDzn h∕2; Rn K1 ∕2;

B. Refractive Index Distribution in Radial GRIN Lenses

The refractive index distribution in the radial GRIN lens is given by the next power series expansion equation [12]: n2 r n20 sech2 gr 2 17 n20 1 − gr2 gr4 − gr6 ; 3 45

(5)

where n0 is the refractive index on the optical axis, and g is the gradient coefficient (1/length). In this paper, we assumed the lens to have a positive gradient (g > 0). To take the higher-order aberration into account, Eq. (5) is transformed and the next equation is derived: n2 r n20 1 − gr2 h4 gr4 h6 gr6 ;

(6)

where h4 and h6 are the higher-order aberration constants. The differentiation of Eq. (6) with respect to r results in the following equation: (7)

dp xn20 −g2 2h4 g4 r2 3h6 g6 r4 ; l0 dz

where r is the radial distance from the optical axis. These equations have to be solved in ray tracing of the radial GRIN lens. In this study, a set of Eq. (3) are solved with the fourth-order Runge–Kutta method, which gives the following equations:

where

K1 hDzn ; Rn ;

Thus, the third and fourth equation in a set of Eq. (3) can be transformed into the following equations:

dp x ∂n2 ; dz 2l0 r ∂r

zn1 zn h;

B C B dy∕dz C C Dz; R B B dp∕dz C; @ A dq∕dz

∂n2 r 2rn20 −g2 2h4 g4 r2 3h6 g6 r4 : ∂r

dx p ; dz l0 dy q ; dz l0

dq y ∂n2 ; dz 2l0 r ∂r

1

h is the minimum calculation step along the z axis.

The ray equations used in ray tracing are given in the following:

p2 q2 l2 n2 :

dx∕dz

K4 hDzn h; Rn K3 ;

Ray Tracing in Radial GRIN Lenses

d dr n ∇n; ds ds

0

K3 hDzn h∕2; Rn K2 ∕2;

2. Numerical Analysis of Radial GRIN Lenses A.

0 1 x B C ByC C RB B p C; @ A q

(4)

dq yn20 −g2 2h4 g4 r2 3h6 g6 r4 : l0 dz

(8)

Here, we extend our discussions to a GRIN lens whose the refractive index distribution is of a peculiar shape. In other words, the discussion is extended to a GRIN lens in which it is difficult to represent the refractive index distribution through Eq. (6). A GRIN lens that has been designed and manufactured to have a peculiar distribution (or defects) and a prototype GRIN lens are typical samples of a GRIN lens with a peculiar distribution (e.g., see [13]). In this paper, we prepare equations for a GRIN lens whose the refractive index distribution is represented with cubic spline interpolation in order to analyze a GRIN

lens that has a particular refractive index distribution, such as those above. With the introduction of cubic spline interpolation, the refractive index distribution can be represented by the following equation: ni r C0i r − ri 3 C1i r − ri 2 C2i r − ri C3i ; (9) where i is an index (i 0; 1; …; N D − 1), N D is the total number of data points, C0 , C1 , C2 , and C3 are the interpolation constants. The differentiation of Eq. (9) with respect to r results in the following equation: ∂ni r 3C0i r − ri 2 2C1i r − ri C2i : ∂r

Create threads (N threads) Set ray (xi, yi, zi, pi, qi, li) of thread i

Fourth-order Runge-Kutta zi + h To the exit pupil

(10) Transfer results from GPU memory

dp xni ∂ni x C r − ri 3 C1i r − ri 2 l0 r ∂r dz l0 r 0i C2i r − ri C3i 3C0i r − ri 2 2C1i r − ri C2i ; dq dp y : dz dz

Run Kernel

End of Kernel (Synchronization)

Thus, the third and fourth equation in a set of Eq. (3) can be transformed into the following equations:

x

Transfer to GPU memory the coordinates and the directional vector of N rays on the entrance pupil

(11)

Note that this paper describes the ray tracing of a radial GRIN lens in which Eq. (11) becomes 0 when r is 0 because ∂ni ∕∂x and ∂ni ∕∂y is 0 at the optical axis of the radial GRIN lens. In addition, note that the refractive index distribution must be continuous. 3. Ray Tracing in Radial GRIN Lenses with GPU

The ray tracing equations for the radial GRIN lens have a form suitable for analyzing with the GPU. First, because each ray propagating in the GRIN lens can be analyzed independently, synchronization among the threads is not required within the GRIN lens. Next, the radial GRIN lens in this study has the following features: the optical ray vector along the optical axis does not vary. That is, since the value l0 is constant (i.e., does not change) the division operation of l0 in Eq. (8) can be represented with the form of the product. When we try to solve the ray tracing equations with the GPU, we find a significant decrease in the computation speed if the division operation and special function are required to solve the equations. However, when the refractive index distribution is represented with the power series expansion equation, a set of Eqs. (3), (4), and (8) can be described with the product-sum operations. To solve Eq. (6) while taking as high as the 10th order aberration into account, 120 times the product-sum operation is required to receive a solution at one calculation step, and this operation is repeated until the end of the calculation steps at the exit pupil of the GRIN lens. When this loop process is performed to arrive at a solution 500 times, it is necessary to

Fig. 1. Flowchart of ray tracing with GPU.

run the product-sum operations 60 thousand times to arrive at solutions for the entire length of the GRIN lens, even though this number is a function of the length of the GRIN lens and the minimum calculation step h. Since any thread synchronization operation and memory access are not required during the many repetitions, the performance of the GPU is fully exploited. A flowchart of these operations is shown in Fig. 1. When the refractive index distribution is represented with an interpolation curve, the process to solve the ray tracing equations differs from that used to solve equations whose distribution is represented with power series expansion equations. In solving the differential equation, the division operation with r is required in Eq. (11) and a square root operation is required to obtain the radial distance from the optical axis r, which lowers the computational speed. In addition to the above operations, the accessing of memory is required to get the interpolation coefficients, C0 through C3 in Eq. (9), and this access of the memory constitutes a bottleneck in the computation. A technique for improving the computation speed is described in the rest of this section. The first technique is to store the interpolation coefficients in the level 1 cache of the GPU shared memory. The capacity of the shared memory in the GPU of the present architecture is as small as several tens of KB but it can save about 1000 points of double precision refractive index distribution data because the data is one-dimensional data. This is a sufficient number of data points to configure a refractive index distribution. In analyses such as illumination analysis and rendering, rays are created with the Monte Carlo method, and the shared memory is randomly accessed; however, even in this analysis case it is expected that the computation speed will be faster than the normal technique. Furthermore, when the refractive index distribution data points are located at the radial distance and at the same interval between the points along the radial direction, an index search for the interpolation provides a unique search result. In 1 July 2014 / Vol. 53, No. 19 / APPLIED OPTICS

4345

other words, when the data point interval is Δr, an index i at this point is uniquely given by the following equations, and does not require search algorithms such as the binary search tree and interpolation search [14]:

Table 2.

Computation Speed Comparison between Ray Tracing with CPU and GPU

Computation Time [s] Equation

Precision

CPU

GPU

(CPU/GPU)

Float Double Float Double

188.4 370.6 790.0 1665.8

10.8 30.7 23.6a 128.1

17.4 12.1 33.5 13.0

Power series

(12)

With the application of these techniques in solving the equations, high-speed computation becomes feasible even when the refractive index distribution is represented with the interpolation curve. 4. Comparison Results of Computation Speed Measurements

This section indicates the measurements of computation speed when the equations are solved with the GPU. An NVIDIA Geforce GTX Titan was used in solving the equations. The theoretical performance of this GPU is 4.5 TFLOPS for single precision data processing and 1.31 TFLOPS for double precision data processing. A CUDA 5.5 compiler was used. The ray tracing computation time is measured at 655,360,000 rays, where this number of rays is obtained under the following assumptions: GRIN lens is divided into 500 (in other words, the Runge–Kutta computation is repeated 500 times) and the ray is computed in an area of 256 × 256 pixels at 10,000 samples per pixel (spp). Since the measurement in this study is to analyze the effective performance of a GPU, rays that are vignetted are not taken into account. In these measurements, the refractive index distribution was represented with the power series expansion equation in which an aberration as high as the 10th order was considered and the number of data points for an interpolation curve was 200. The computation speeds measured under the above conditions are listed in Table 1. When the refractive index distribution is represented with the power series expansion equation, since all the ray tracing equations can be solved by the product-sum operations and do not require the synchronization operations between the threads, the performance of the GPU is fully exploited. The computation speed analyzed with the CPU is compared with the speed analyzed with the GPU and this comparison of the speed is listed in Table 2. The CPU used in the Table 1.

Equation Power series Spline

Time Required Completing Ray Tracing with GPU

Precision Float Double Float Double

Computation FLOPS Effective Time [s] [G] Performance [%] 10.8 30.7 23.6a 128.1

3671.7 1291.7 2180.0 401.6

81.6 98.6 48.4 30.7

a When shared memory is not used, the computation time is 46.1 and 168.0 s, respectively, for single precision and double precision computation. These computation times are 2 and 1.3 times longer than the times required when shared memory is used.

4346

APPLIED OPTICS / Vol. 53, No. 19 / 1 July 2014

Spline

a When the application fast math [15] (which can approximate the special function and division operation with a low degree of accuracy) is not used with the GPU, the calculation speed is 37.9 s, which is 20.8 times faster than the speed when the CPU is used.

measurements was an Intel Core i7-4770 and the single instruction multiple data (SIMD) operations were repeated with the Intel AVX2 instruction set. The compiler used was an Intel C++ Compiler XE 14.0. In the SIMD operations, fused multiply add (FMA) was used. Additionally the OpenMP command was used to perform the parallel computing between CPU cores. In this comparison the speed when the GPU is used is 19 times (i.e., an average of ratios in Table 2) faster than the speed when the CPU is used, although different speeds will be obtained depending on the performance of the CPU and GPU. This comparison reveals that the technique (algorithm) described in this paper can provide a computation speed that is much faster than the speed obtained by the other algorithm [16] in which the GPU was used. In the rest of this section, the relationship between the number of rays used in the computation and the GPU performance is discussed. The computation speed is measured for various numbers of rays to obtain the GPU effective performance, which is shown in Fig. 2. In this computation, it is assumed 100

Effective performance [%]

i floorr∕Δr:

Ratio

80

60

40

20

0

1

10

100

1000

10000

Samples per pixel Fig. 2. Relationship between the number of rays and the GPU’s effective performance: the plot connecting points with circles is the data whose refractive index distribution is represented by the power series expansion equation, and that connecting points with triangles is the data whose refractive index distribution is represented by the spline interpolation curve. The solid plot line is computed with a single precision (floating type) computation, and the dashed plot line is computed with a double precision (double type) computation. The total number of rays is 256 × 256 spp.

1.556

Refractive index n

1.554 1.552 1.550 1.548 1.546 1.544

0

0.02

0.04

0.06

0.08

0.1

0.12

Radial distance r [mm] Fig. 4. Refractive index distribution of the GRIN lens: this is an example of the distribution of the prototype GRIN lens. The date number, N D , is 45. 40

Ratio (CPU/GPU)

that the GRIN lens is divided by 500 and the number of pixels is 256 × 256. According to Fig. 2, under the aforementioned conditions, the GPU performance begins to degrade when the number of rays reaches approximately six million. When the number reaches 65,536, the GPU performance degrades by a maximum of approximately 20%. This relationship between GPU performance and the number of rays will change with the GRIN lens division number (length). Even when the number of rays is small, the effective performance of the GPU increases as the lens division number increases. Due to this, as shown in Fig. 1, the number of loops to repeat the Runge– Kutta operation will reach the division number and will cause the operation load in one thread to increase. In addition, the relationship between the computation speed and the vignetting of the rays within the GRIN lens has been investigated. The shape of the refractive index distribution is related to the vignetting ratio of the ray within the GRIN lens. Thus far the discussions, as well as the resulting conclusions, have been obtained under the assumption that there is no vignetting of the ray; however, when there is vignetting in the lens the GPU and CPU computation times will be different than when it has been assumed that there is no vignetting. Next, the GPU and CPU computation times were measured at various vignetting rates. This measurement was made with the optical system shown in Fig. 3, and the GRIN lens which has the refractive index distribution is shown in Fig. 4. As can be seen in Fig. 4, the GRIN lens used in the measurement has poor imaging performance, which causes a lot of vignetting. This GRIN lens is laid out in the system as shown in Fig. 3. The vignetting ratio of the light was varied by changing the diameter of the circular light source from 0.5 to 8 times that of the diameter of the entrance pupil. The total number of rays used in the computation is 655,360,000. The CPU computation speed and GPU computation speed were compared under the aforementioned conditions at various vignetting ratios and these results are shown in Fig. 5. The ratio of CPU/GPU computation speed decreases as the vignetting ratio increases because the amount of

30

20

10

0

0

20

40

60

80

100

Vignetting ratio [%] Fig. 5. Ratio of CPU and GPU computation speeds (CPU/GPU) versus the ray vignetting ratio. The solid plot line is computed with a single precision (floating type) computation and the dashed plot line is computed with a double precision (double type) computation.

data that can be processed with SIMD operations is different from GPU to CPU. In the SIMD operations in the CPU used in this study, 256-bit data can be processed, and eight sets of the float type data and four sets of the double type data can be processed respectively in one cycle. Whereas in the GPU, the SIMD operations are carried out in the execution operator group (called warp) that is composed of 32 threads. The difference in the amount of data processed in these SIMD operations influences the vignetting ratio and computation speed. 5. Conclusion

Rays

h = 4.7 mm / 500 0.22 mm

Light source GRIN rod lens

z

1.3 mm 4.7 mm 1.3 mm

Imaging plane

Fig. 3. GRIN lens layout: the working distance is 1.3 mm and the lens length is 4.7 mm. Rays from the light source are circularly concentrated on the lens.

In this paper, it was shown that in ray tracing of the radial GRIN lens, analysis performed with a GPU could be completed 19 times (on average) faster than analysis performed with a CPU. As for representation of the refractive index distribution, we have prepared cubic spline interpolation in addition to the conventional power series expansion equation and shown the technique to enable faster computation of equations with a GPU. References 1. E. W. Marchand, “Ray tracing in gradient-index media,” J. Opt. Soc. Am. 60, 1–2 (1970). 1 July 2014 / Vol. 53, No. 19 / APPLIED OPTICS

4347

2. A. Sharma, D. V. Kumar, and A. K. Ghatak, “Tracing rays through graded-index media: a new method,” Appl. Opt. 21, 984–987 (1982). 3. A. Sharma, “Computing optical path length in gradient-index media: a fast and accurate method,” Appl. Opt. 24, 4367–4370 (1985). 4. A. Sharma and A. K. Ghatak, “Ray tracing in gradient-index lenses: computation of ray-surface intersection,” Appl. Opt. 25, 3409–3412 (1986). 5. T. Sakamoto, “Ray trace algorithms for GRIN media,” Appl. Opt. 26, 2943–2946 (1987). 6. D. T. Moore, “Ray tracing in gradient-index media,” J. Opt. Soc. Am. 65, 451–455 (1975). 7. W. H. Southwell, “Ray tracing in gradient-index media,” J. Opt. Soc. Am. 72, 908–911 (1982). 8. J. Krüger and R. Westermann, “Acceleration techniques for GPU-based volume rendering,” in Proceedings IEEE Visualization 2003 (IEEE, 2003), pp. 287–292. 9. D. Weiskopf, T. Schafhitzel, and T. Ertl, “GPU-based nonlinear ray tracing,” Comput. Graph. Forum 23, 625–633 (2004).

4348

APPLIED OPTICS / Vol. 53, No. 19 / 1 July 2014

10. S. Horiuchi, S. Yoshida, and M. Yamamoto, “Simulation of modulation transfer function using a rendering method,” Opt. Express 21, 7373–7383 (2013). 11. J. Puchalski, “Numerical determination of ray tracing: a new method,” Appl. Opt. 31, 6789–6799 (1992). 12. S. Kawakami and J. Nishizawa, “An optical waveguide with the optimum distribution of the refractive index with reference to waveform distortion,” IEEE. Trans. Microwave Theory Tech. 16, 814–818 (1968). 13. S. Horiuchi, S. Yoshida, Z. Ushiyama, and M. Yamamoto, “Performance evaluation of GRIN lenses by ray tracing method,” Opt. Quantum Electron. 42, 81–88 (2010). 14. T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, Introduction to Algorithms (MIT, 2001). 15. “NVIDIA CUDA Math API,” http://docs.nvidia.com/cuda/ cuda‑math‑api/. 16. V. W. Lee, C. Kim, J. Chhugani, M. Deisher, D. Kim, A. D. Nguyen, N. Satish, M. Smelyanskiy, S. Chennupaty, P. Hammarlund, R. Singhal, and P. Dubey, “Debunking the 100× GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU,” in ACM SIGARCH Computer Architecture News (ACM, 2010), Vol. 38, pp. 451–460.

This paper describes that in ray tracing of a radial gradient-index (GRIN) lens, analysis with a graphics processing unit (GPU) can complete faster than analysis with a central processing unit (CPU). The refractive index of the radial GRIN lens varies in the direction perpendicular to the optical axis. We prepare two types of the refractive index distribution for the radial GRIN lens. One type of distribution is represented by the power series expansion equation and the other is an arbitrary distribution type represented by a cubic spline interpolation curve. Although the performance of ray tracing with these distribution representations varies between representations, in both representations, the analysis with a GPU is about 19 times (on average) faster than that with a CPU. The average GPU effective performance is 90% and 40% when the refractive index distribution is given by the power series expansion equation and spline interpolation curve, respectively. These increased performances indicate that ray tracing with these distributions is very effective for the GRIN lens illumination and rendering analyses, which have to trace a significant number of rays. © 2014 Optical Society of America OCIS codes: (080.1753) Computation methods; (080.2740) Geometric optical design; (110.2760) Gradient-index lenses. http://dx.doi.org/10.1364/AO.53.004343

1. Introduction

The radial gradient-index (GRIN) lens has a refractive index distribution that varies in a perpendicular direction to the optical axis and has been used as an imaging lens. In addition, it can be applied in areas that include line sensors and optical fibers. There have been many reports [1–7] about the principle and methodology for the ray tracing performance analysis of optical systems in the GRIN media and GRIN lenses. There have also been many reports concerning ray tracing with a graphics processing unit (GPU), most of which have focused on the field of computer graphics [8,9]. In this paper, we will show that the radial GRIN lens can be represented in a form suitable for ray tracing with a GPU, and that with adoption of this form, ray tracing with a GPU can be faster than that performed with a central processing unit (CPU). It is known that a GPU can fully exert its performance in an algorithm in which the thread synchronization 1559-128X/14/194343-06$15.00/0 © 2014 Optical Society of America

and memory access are less frequent, and ray tracing in the GRIN lens does meet these conditions. Since rays propagating in the GRIN lens are mutually independent, they are not required to be synchronous. In ray tracing of the GRIN lens, the equations are sequentially solved in the GRIN media and the memory is not accessed in this sequential computation. Since the optical ray vector does not change along the optical axis of the radial GRIN lens, all of the sequential computation parts of the ray tracing equations can be represented with a form that can be solved with the product-sum operations. That is, since division and special function operations are not required, the equations can be solved at a high speed with the GPU. In the aforementioned discussions, it is assumed that the refractive index distribution is represented by power series expansion equations in which the higher-order aberration is included. However, in this paper, in addition to this power series expansion representation of the distribution, a distribution type represented with cubic spline interpolation is proposed. When the refractive index distribution is represented with cubic spline interpolation, the division operation, a special 1 July 2014 / Vol. 53, No. 19 / APPLIED OPTICS

4343

function, and access to memory are required to solve the equations. However, despite these additional requirements, the GPU can solve the equations faster than a CPU. These representations of the refractive index distribution that allow the equations to be solved at a high speed have been deemed effective and efficient in the analysis of the illumination optics and rendering [10] that include many rays of light.

(1)

where r is the variable-position vector on the ray, n is the refractive index, and ds is the geometric increment along the ray path. This equation establishes itself in the Cartesian coordinate system, and its z axis is on the optical axis. The optical ray vector is expressed by a p; q; l and the next relation holds true: (2)

Since l is constant in the radial GRIN lens, l in the above equation is denoted as l0 (l l0 ). Under these conditions, the first-order differential equation respectively for x, y, p, and q is transformed in the following equation [11]:

(3)

Rn1 Rn K1 2K2 2K3 K4 ∕6;

4344

APPLIED OPTICS / Vol. 53, No. 19 / 1 July 2014

K2 hDzn h∕2; Rn K1 ∕2;

B. Refractive Index Distribution in Radial GRIN Lenses

The refractive index distribution in the radial GRIN lens is given by the next power series expansion equation [12]: n2 r n20 sech2 gr 2 17 n20 1 − gr2 gr4 − gr6 ; 3 45

(5)

where n0 is the refractive index on the optical axis, and g is the gradient coefficient (1/length). In this paper, we assumed the lens to have a positive gradient (g > 0). To take the higher-order aberration into account, Eq. (5) is transformed and the next equation is derived: n2 r n20 1 − gr2 h4 gr4 h6 gr6 ;

(6)

where h4 and h6 are the higher-order aberration constants. The differentiation of Eq. (6) with respect to r results in the following equation: (7)

dp xn20 −g2 2h4 g4 r2 3h6 g6 r4 ; l0 dz

where r is the radial distance from the optical axis. These equations have to be solved in ray tracing of the radial GRIN lens. In this study, a set of Eq. (3) are solved with the fourth-order Runge–Kutta method, which gives the following equations:

where

K1 hDzn ; Rn ;

Thus, the third and fourth equation in a set of Eq. (3) can be transformed into the following equations:

dp x ∂n2 ; dz 2l0 r ∂r

zn1 zn h;

B C B dy∕dz C C Dz; R B B dp∕dz C; @ A dq∕dz

∂n2 r 2rn20 −g2 2h4 g4 r2 3h6 g6 r4 : ∂r

dx p ; dz l0 dy q ; dz l0

dq y ∂n2 ; dz 2l0 r ∂r

1

h is the minimum calculation step along the z axis.

The ray equations used in ray tracing are given in the following:

p2 q2 l2 n2 :

dx∕dz

K4 hDzn h; Rn K3 ;

Ray Tracing in Radial GRIN Lenses

d dr n ∇n; ds ds

0

K3 hDzn h∕2; Rn K2 ∕2;

2. Numerical Analysis of Radial GRIN Lenses A.

0 1 x B C ByC C RB B p C; @ A q

(4)

dq yn20 −g2 2h4 g4 r2 3h6 g6 r4 : l0 dz

(8)

Here, we extend our discussions to a GRIN lens whose the refractive index distribution is of a peculiar shape. In other words, the discussion is extended to a GRIN lens in which it is difficult to represent the refractive index distribution through Eq. (6). A GRIN lens that has been designed and manufactured to have a peculiar distribution (or defects) and a prototype GRIN lens are typical samples of a GRIN lens with a peculiar distribution (e.g., see [13]). In this paper, we prepare equations for a GRIN lens whose the refractive index distribution is represented with cubic spline interpolation in order to analyze a GRIN

lens that has a particular refractive index distribution, such as those above. With the introduction of cubic spline interpolation, the refractive index distribution can be represented by the following equation: ni r C0i r − ri 3 C1i r − ri 2 C2i r − ri C3i ; (9) where i is an index (i 0; 1; …; N D − 1), N D is the total number of data points, C0 , C1 , C2 , and C3 are the interpolation constants. The differentiation of Eq. (9) with respect to r results in the following equation: ∂ni r 3C0i r − ri 2 2C1i r − ri C2i : ∂r

Create threads (N threads) Set ray (xi, yi, zi, pi, qi, li) of thread i

Fourth-order Runge-Kutta zi + h To the exit pupil

(10) Transfer results from GPU memory

dp xni ∂ni x C r − ri 3 C1i r − ri 2 l0 r ∂r dz l0 r 0i C2i r − ri C3i 3C0i r − ri 2 2C1i r − ri C2i ; dq dp y : dz dz

Run Kernel

End of Kernel (Synchronization)

Thus, the third and fourth equation in a set of Eq. (3) can be transformed into the following equations:

x

Transfer to GPU memory the coordinates and the directional vector of N rays on the entrance pupil

(11)

Note that this paper describes the ray tracing of a radial GRIN lens in which Eq. (11) becomes 0 when r is 0 because ∂ni ∕∂x and ∂ni ∕∂y is 0 at the optical axis of the radial GRIN lens. In addition, note that the refractive index distribution must be continuous. 3. Ray Tracing in Radial GRIN Lenses with GPU

The ray tracing equations for the radial GRIN lens have a form suitable for analyzing with the GPU. First, because each ray propagating in the GRIN lens can be analyzed independently, synchronization among the threads is not required within the GRIN lens. Next, the radial GRIN lens in this study has the following features: the optical ray vector along the optical axis does not vary. That is, since the value l0 is constant (i.e., does not change) the division operation of l0 in Eq. (8) can be represented with the form of the product. When we try to solve the ray tracing equations with the GPU, we find a significant decrease in the computation speed if the division operation and special function are required to solve the equations. However, when the refractive index distribution is represented with the power series expansion equation, a set of Eqs. (3), (4), and (8) can be described with the product-sum operations. To solve Eq. (6) while taking as high as the 10th order aberration into account, 120 times the product-sum operation is required to receive a solution at one calculation step, and this operation is repeated until the end of the calculation steps at the exit pupil of the GRIN lens. When this loop process is performed to arrive at a solution 500 times, it is necessary to

Fig. 1. Flowchart of ray tracing with GPU.

run the product-sum operations 60 thousand times to arrive at solutions for the entire length of the GRIN lens, even though this number is a function of the length of the GRIN lens and the minimum calculation step h. Since any thread synchronization operation and memory access are not required during the many repetitions, the performance of the GPU is fully exploited. A flowchart of these operations is shown in Fig. 1. When the refractive index distribution is represented with an interpolation curve, the process to solve the ray tracing equations differs from that used to solve equations whose distribution is represented with power series expansion equations. In solving the differential equation, the division operation with r is required in Eq. (11) and a square root operation is required to obtain the radial distance from the optical axis r, which lowers the computational speed. In addition to the above operations, the accessing of memory is required to get the interpolation coefficients, C0 through C3 in Eq. (9), and this access of the memory constitutes a bottleneck in the computation. A technique for improving the computation speed is described in the rest of this section. The first technique is to store the interpolation coefficients in the level 1 cache of the GPU shared memory. The capacity of the shared memory in the GPU of the present architecture is as small as several tens of KB but it can save about 1000 points of double precision refractive index distribution data because the data is one-dimensional data. This is a sufficient number of data points to configure a refractive index distribution. In analyses such as illumination analysis and rendering, rays are created with the Monte Carlo method, and the shared memory is randomly accessed; however, even in this analysis case it is expected that the computation speed will be faster than the normal technique. Furthermore, when the refractive index distribution data points are located at the radial distance and at the same interval between the points along the radial direction, an index search for the interpolation provides a unique search result. In 1 July 2014 / Vol. 53, No. 19 / APPLIED OPTICS

4345

other words, when the data point interval is Δr, an index i at this point is uniquely given by the following equations, and does not require search algorithms such as the binary search tree and interpolation search [14]:

Table 2.

Computation Speed Comparison between Ray Tracing with CPU and GPU

Computation Time [s] Equation

Precision

CPU

GPU

(CPU/GPU)

Float Double Float Double

188.4 370.6 790.0 1665.8

10.8 30.7 23.6a 128.1

17.4 12.1 33.5 13.0

Power series

(12)

With the application of these techniques in solving the equations, high-speed computation becomes feasible even when the refractive index distribution is represented with the interpolation curve. 4. Comparison Results of Computation Speed Measurements

This section indicates the measurements of computation speed when the equations are solved with the GPU. An NVIDIA Geforce GTX Titan was used in solving the equations. The theoretical performance of this GPU is 4.5 TFLOPS for single precision data processing and 1.31 TFLOPS for double precision data processing. A CUDA 5.5 compiler was used. The ray tracing computation time is measured at 655,360,000 rays, where this number of rays is obtained under the following assumptions: GRIN lens is divided into 500 (in other words, the Runge–Kutta computation is repeated 500 times) and the ray is computed in an area of 256 × 256 pixels at 10,000 samples per pixel (spp). Since the measurement in this study is to analyze the effective performance of a GPU, rays that are vignetted are not taken into account. In these measurements, the refractive index distribution was represented with the power series expansion equation in which an aberration as high as the 10th order was considered and the number of data points for an interpolation curve was 200. The computation speeds measured under the above conditions are listed in Table 1. When the refractive index distribution is represented with the power series expansion equation, since all the ray tracing equations can be solved by the product-sum operations and do not require the synchronization operations between the threads, the performance of the GPU is fully exploited. The computation speed analyzed with the CPU is compared with the speed analyzed with the GPU and this comparison of the speed is listed in Table 2. The CPU used in the Table 1.

Equation Power series Spline

Time Required Completing Ray Tracing with GPU

Precision Float Double Float Double

Computation FLOPS Effective Time [s] [G] Performance [%] 10.8 30.7 23.6a 128.1

3671.7 1291.7 2180.0 401.6

81.6 98.6 48.4 30.7

a When shared memory is not used, the computation time is 46.1 and 168.0 s, respectively, for single precision and double precision computation. These computation times are 2 and 1.3 times longer than the times required when shared memory is used.

4346

APPLIED OPTICS / Vol. 53, No. 19 / 1 July 2014

Spline

a When the application fast math [15] (which can approximate the special function and division operation with a low degree of accuracy) is not used with the GPU, the calculation speed is 37.9 s, which is 20.8 times faster than the speed when the CPU is used.

measurements was an Intel Core i7-4770 and the single instruction multiple data (SIMD) operations were repeated with the Intel AVX2 instruction set. The compiler used was an Intel C++ Compiler XE 14.0. In the SIMD operations, fused multiply add (FMA) was used. Additionally the OpenMP command was used to perform the parallel computing between CPU cores. In this comparison the speed when the GPU is used is 19 times (i.e., an average of ratios in Table 2) faster than the speed when the CPU is used, although different speeds will be obtained depending on the performance of the CPU and GPU. This comparison reveals that the technique (algorithm) described in this paper can provide a computation speed that is much faster than the speed obtained by the other algorithm [16] in which the GPU was used. In the rest of this section, the relationship between the number of rays used in the computation and the GPU performance is discussed. The computation speed is measured for various numbers of rays to obtain the GPU effective performance, which is shown in Fig. 2. In this computation, it is assumed 100

Effective performance [%]

i floorr∕Δr:

Ratio

80

60

40

20

0

1

10

100

1000

10000

Samples per pixel Fig. 2. Relationship between the number of rays and the GPU’s effective performance: the plot connecting points with circles is the data whose refractive index distribution is represented by the power series expansion equation, and that connecting points with triangles is the data whose refractive index distribution is represented by the spline interpolation curve. The solid plot line is computed with a single precision (floating type) computation, and the dashed plot line is computed with a double precision (double type) computation. The total number of rays is 256 × 256 spp.

1.556

Refractive index n

1.554 1.552 1.550 1.548 1.546 1.544

0

0.02

0.04

0.06

0.08

0.1

0.12

Radial distance r [mm] Fig. 4. Refractive index distribution of the GRIN lens: this is an example of the distribution of the prototype GRIN lens. The date number, N D , is 45. 40

Ratio (CPU/GPU)

that the GRIN lens is divided by 500 and the number of pixels is 256 × 256. According to Fig. 2, under the aforementioned conditions, the GPU performance begins to degrade when the number of rays reaches approximately six million. When the number reaches 65,536, the GPU performance degrades by a maximum of approximately 20%. This relationship between GPU performance and the number of rays will change with the GRIN lens division number (length). Even when the number of rays is small, the effective performance of the GPU increases as the lens division number increases. Due to this, as shown in Fig. 1, the number of loops to repeat the Runge– Kutta operation will reach the division number and will cause the operation load in one thread to increase. In addition, the relationship between the computation speed and the vignetting of the rays within the GRIN lens has been investigated. The shape of the refractive index distribution is related to the vignetting ratio of the ray within the GRIN lens. Thus far the discussions, as well as the resulting conclusions, have been obtained under the assumption that there is no vignetting of the ray; however, when there is vignetting in the lens the GPU and CPU computation times will be different than when it has been assumed that there is no vignetting. Next, the GPU and CPU computation times were measured at various vignetting rates. This measurement was made with the optical system shown in Fig. 3, and the GRIN lens which has the refractive index distribution is shown in Fig. 4. As can be seen in Fig. 4, the GRIN lens used in the measurement has poor imaging performance, which causes a lot of vignetting. This GRIN lens is laid out in the system as shown in Fig. 3. The vignetting ratio of the light was varied by changing the diameter of the circular light source from 0.5 to 8 times that of the diameter of the entrance pupil. The total number of rays used in the computation is 655,360,000. The CPU computation speed and GPU computation speed were compared under the aforementioned conditions at various vignetting ratios and these results are shown in Fig. 5. The ratio of CPU/GPU computation speed decreases as the vignetting ratio increases because the amount of

30

20

10

0

0

20

40

60

80

100

Vignetting ratio [%] Fig. 5. Ratio of CPU and GPU computation speeds (CPU/GPU) versus the ray vignetting ratio. The solid plot line is computed with a single precision (floating type) computation and the dashed plot line is computed with a double precision (double type) computation.

data that can be processed with SIMD operations is different from GPU to CPU. In the SIMD operations in the CPU used in this study, 256-bit data can be processed, and eight sets of the float type data and four sets of the double type data can be processed respectively in one cycle. Whereas in the GPU, the SIMD operations are carried out in the execution operator group (called warp) that is composed of 32 threads. The difference in the amount of data processed in these SIMD operations influences the vignetting ratio and computation speed. 5. Conclusion

Rays

h = 4.7 mm / 500 0.22 mm

Light source GRIN rod lens

z

1.3 mm 4.7 mm 1.3 mm

Imaging plane

Fig. 3. GRIN lens layout: the working distance is 1.3 mm and the lens length is 4.7 mm. Rays from the light source are circularly concentrated on the lens.

In this paper, it was shown that in ray tracing of the radial GRIN lens, analysis performed with a GPU could be completed 19 times (on average) faster than analysis performed with a CPU. As for representation of the refractive index distribution, we have prepared cubic spline interpolation in addition to the conventional power series expansion equation and shown the technique to enable faster computation of equations with a GPU. References 1. E. W. Marchand, “Ray tracing in gradient-index media,” J. Opt. Soc. Am. 60, 1–2 (1970). 1 July 2014 / Vol. 53, No. 19 / APPLIED OPTICS

4347

2. A. Sharma, D. V. Kumar, and A. K. Ghatak, “Tracing rays through graded-index media: a new method,” Appl. Opt. 21, 984–987 (1982). 3. A. Sharma, “Computing optical path length in gradient-index media: a fast and accurate method,” Appl. Opt. 24, 4367–4370 (1985). 4. A. Sharma and A. K. Ghatak, “Ray tracing in gradient-index lenses: computation of ray-surface intersection,” Appl. Opt. 25, 3409–3412 (1986). 5. T. Sakamoto, “Ray trace algorithms for GRIN media,” Appl. Opt. 26, 2943–2946 (1987). 6. D. T. Moore, “Ray tracing in gradient-index media,” J. Opt. Soc. Am. 65, 451–455 (1975). 7. W. H. Southwell, “Ray tracing in gradient-index media,” J. Opt. Soc. Am. 72, 908–911 (1982). 8. J. Krüger and R. Westermann, “Acceleration techniques for GPU-based volume rendering,” in Proceedings IEEE Visualization 2003 (IEEE, 2003), pp. 287–292. 9. D. Weiskopf, T. Schafhitzel, and T. Ertl, “GPU-based nonlinear ray tracing,” Comput. Graph. Forum 23, 625–633 (2004).

4348

APPLIED OPTICS / Vol. 53, No. 19 / 1 July 2014

10. S. Horiuchi, S. Yoshida, and M. Yamamoto, “Simulation of modulation transfer function using a rendering method,” Opt. Express 21, 7373–7383 (2013). 11. J. Puchalski, “Numerical determination of ray tracing: a new method,” Appl. Opt. 31, 6789–6799 (1992). 12. S. Kawakami and J. Nishizawa, “An optical waveguide with the optimum distribution of the refractive index with reference to waveform distortion,” IEEE. Trans. Microwave Theory Tech. 16, 814–818 (1968). 13. S. Horiuchi, S. Yoshida, Z. Ushiyama, and M. Yamamoto, “Performance evaluation of GRIN lenses by ray tracing method,” Opt. Quantum Electron. 42, 81–88 (2010). 14. T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, Introduction to Algorithms (MIT, 2001). 15. “NVIDIA CUDA Math API,” http://docs.nvidia.com/cuda/ cuda‑math‑api/. 16. V. W. Lee, C. Kim, J. Chhugani, M. Deisher, D. Kim, A. D. Nguyen, N. Satish, M. Smelyanskiy, S. Chennupaty, P. Hammarlund, R. Singhal, and P. Dubey, “Debunking the 100× GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU,” in ACM SIGARCH Computer Architecture News (ACM, 2010), Vol. 38, pp. 451–460.