Fast and Automatic Heart Isolation in 3D CT ... - Semantic Scholar

7 downloads 84 Views 1MB Size Report
lation is also necessary in radiotherapy planning to mask out the heart for the treatment of ... calcified plaque and green for a soft plaque). needs to be masked ...
Fast and Automatic Heart Isolation in 3D CT Volumes: Optimal Shape Initialization Yefeng Zheng1 , Fernando Vega-Higuera2 , S. Kevin Zhou1 , and Dorin Comaniciu1 1

2

Siemens Corporate Research, Princeton, USA, Computed Tomography, Siemens Healthcare, Forchheim, Germany [email protected]

Abstract. Heart isolation (separating the heart from the proximity tissues, e.g., lung, liver, and rib cage) is a prerequisite to clearly visualize the coronary arteries in 3D. Such a 3D visualization provides an intuitive view to physicians to diagnose suspicious coronary segments. Heart isolation is also necessary in radiotherapy planning to mask out the heart for the treatment of lung or liver tumors. In this paper, we propose an efficient and robust method for heart isolation in computed tomography (CT) volumes. Marginal space learning (MSL) is used to efficiently estimate the position, orientation, and scale of the heart. An optimal mean shape (which optimally represents the whole shape population) is then aligned with detected pose, followed by boundary refinement using a learning-based boundary detector. Post-processing is further exploited to exclude the rib cage from the heart mask. A large-scale experiment on 589 volumes (including both contrasted and non-contrasted scans) from 288 patients demonstrates the robustness of the approach. It achieves a mean point-to-mesh error of 1.91 mm. Running at a speed of 1.5 s/volume, it is at least 10 times faster than the previous methods.

1

Introduction

While most previous work on heart segmentation focuses on segmenting heart chambers [1], segmenting the heart as a whole (or heart isolation) has a real interest in several applications. For example, after separating the heart from the proximity tissues (e.g., lung, liver, and rib cage), we can clearly visualize the coronary arteries in 3D, as shown in Fig. 1. Such 3D visualization provides an intuitive view to physicians to diagnose suspicious coronary segments (as indicated by the black and green arrows in Fig. 1). For this application, generally, the patient is scanned with contrast agent applied for better visualization of the coronary arteries (see the top row of Fig. 2). The segmented heart mask should not cut the coronary vessels, which are located on the surface of the heart chambers. This presents a big challenge to the segmentation algorithms. The second application of heart isolation is radiotherapy planning. Usually, radiotherapists need to delineate, either manually or automatically, the boundary of the sensitive organs that must not be affected by radiation. The heart often

Fig. 1. Heart isolation for 3D visualization of the coronary arteries. Left: Before heart isolation. Right: After heart isolation. Arrows indicate suspicious regions (black for a calcified plaque and green for a soft plaque).

needs to be masked out for the treatment of lung or liver tumors. Normally, a non-contrasted volume, as shown in the bottom row of Fig. 2, is often used for radiotherapy planning. Heart isolation is a hard problem due to the following challenges. 1) The boundary between the heart and some of the neighboring tissues (e.g., liver and diaphragm) is quite weak in a CT volume. 2) The heart is connected to other organs by several major vessel trunks (e.g. aorta, vena cava, pulmonary veins, and pulmonary arteries). We must cut those trunks somewhere (normally at the position where the vessels connect to the heart), though there is no image boundary. 3) The deformation of the whole heart in a cardiac cycle is more complicated than each individual chamber. This brings a large variation in the heart shape. Furthermore, there are quite a few scans with a part of the heart missing in the captured volume, especially at the top or bottom of the heart, which introduces extra shape variation. 4) We are targeting both contrasted and non-contrasted data, instead of just one homogeneous set (e.g., [2] for contrasted data and [3] for non-contrasted data). This presents an additional challenge. There are only a limited number of papers on heart isolation. The atlas based methods are often used to segment the heart. For example, Rikvoort et al. [4] presented an adaptive local multi-atlas based approach. It took about 30 minutes to segment a scan. Lelieveldt et al. [5] proposed another atlas based approach, segmenting several organs (e.g., lung, heart, and liver) in a thoracic scan using a hierarchical organ model. Their approach only provided a rough segmentation and an error as large as 10 mm was regarded as a correct segmentation. It took 5 to 20 minutes to process one volume. Gregson et al. [6] proposed to segment the lungs first and the heart was approximated as a sphere between the left and right lungs. Moreno et al. [3] presented a more thorough model for the geometric relationship between lungs and the heart. Funka-Lea et al. [2] proposed an automatic approach based on graph cut. They used the volumetric barycenter weighted by intensity as an initial estimate of the heart center. A small ellipsoid was put at the estimated heart center and progressively grown until it touched the transition between heart and lung (which was easy to detect in a CT volume).

Fig. 2. Heart isolation for a contrasted scan (top row) and a non-contrasted scan (bottom row). The first three columns show orthogonal cuts of the volume with green contours showing the automatically segmented heart surface mesh. The last column is 3D visualization of the segmented heart.

Graph cut was then applied to achieve the final detailed boundary delineation. It took about 20 seconds to process one volume, which was still slow for a clinical application. In this paper, we present an efficient and fully automatic approach for heart isolation. First, marginal space learning (MSL) [1] is exploited to efficiently estimate the position, orientation, and scale of the heart. A mean shape (which is trained on a set of given example shapes) is then aligned with the estimated pose as an initial estimate of the shape. We then use learning based boundary detectors to guide the boundary evolution under the active shape model (ASM) framework [7]. Due to the large deformation of the heart and complication of the surrounding tissues, the initialization using the aligned mean shape needs to be accurate, otherwise, the final boundary may get stuck in a wrong position. In this paper, we propose a method to search for the best mean shape, such that it can optimally represent the whole population of the heart shapes. For the application to coronary artery visualization, we have to completely remove the bright tissues (mostly the rib cage) surrounding the heart surface. Otherwise, they may block the coronary arteries in a 3D visualization. A post-processing step is further applied to exclude the rib cage. Our approach achieves more robust results than the previous methods and works for both contrasted and non-contrasted scans. It runs at about 1.5 s/volume, which is at least 10 times faster than the previous methods [2, 4, 5].

2 2.1

Machine Learning Based Approach Marginal Space Learning for 3D Pose Estimation

Recently, marginal space learning (MSL) [1] was proposed as an efficient and robust method for 3D anatomical structure detection in medical images. In MSL,

object detection or localization is formulated as a binary classification problem: whether an image block contains the target object or not. During detection, the object can be found by testing exhaustively all possible combinations of locations, orientations, and scale using the trained classifier. However, exhaustive searching is very time consuming. The idea of MSL is not to learn a monolithic classifier, but split the estimation into three steps: position estimation, positionorientation estimation, and position-orientation-scale estimation. Each step can significantly prune the searching space, therefore resulting in an efficient object detection algorithm. Please refer to [1] for more details of MSL. 2.2

Optimal Mean Shape for Accurate Shape Initialization

After MSL based object pose estimation, we align the mean shape (which is trained on a set of example shapes) with the estimated translation, rotation, and scale as an initial shape. This initialization needs to be accurate. Otherwise, the final boundary evolution may get stuck in a wrong position due to the complication of the surrounding tissues (e.g., liver and rib cage). The mean shape is generally calculated as the average of the normalized shapes in an objectcentered coordinate system. Therefore, the mean shape depends on the definition of the object-centered coordinate system, which is often set heuristically. In [1], the orientation of a heart chamber is defined by its long axis; the position and scale are determined by the bounding box of the chamber surface mesh. Although working well in applications with relatively small shape variations, the mean shape derived using the previous methods is not optimal at all. In this paper, we present an approach to searching for an optimal mean shape m ¯ that can represent the whole population well. A group of training shapes, M1 , M2 , . . . , MN are supposed to be given and each shape is represented by J points Mij , j = 1, . . . , J. The optimal mean shape m ¯ should minimizes the residual errors after alignment, m ¯ = arg min

N X

2

kTi (m) − Mi k .

(1)

m i=1

Here, Ti is the corresponding transformation from the mean shape m ¯ to each individual shape Mi . This procedure is called generalized Procrustes analysis [8] in literature. An iterative approach can be used to search for the optimal solution. We first randomly pick an example shape as a mean shape. We then align each shape to the current mean shape. The average of the aligned shapes (the simple average of the corresponding points) is calculated as a new mean shape. The iterative procedure converges to an optimal solution after a few iterations. Previously, the similarity transformation (with isotropic scaling) is often used as the transformation T . MSL can estimate anisotropic scales of an object efficiently. By removing more deformations, the shape space after alignment is more compact and the mean shape can represent the whole population more accurately. Therefore, we use an anisotropic similarity transformation to represent the transformation between two shapes,

2   J   X Sx 0 0

ˆ S ˆ = arg min

R 0 Sy 0 M1j + T − M2j . Tˆ , R, (2)

T ,R,S 0 0 Sz j=1

To the best of our knowledge, there are no closed-form solutions for estimating the anisotropic similarity transformation. In this paper, we propose a two-step iterative approach to searching for the optimal transformation. Suppose there is a common scale s = (Sx + Sy + Sz )/3, let Sx0 = Sx /s, Sy0 = Sy /s, and Sz0 = Sz /s. Equation (2) can be re-written as  S0 J  X

Rs 0x

T ,R,S 0

ˆ S ˆ = arg min Tˆ , R,

j=1

0 0 Sy0 0 0 Sz0



 j M1

+T



j M2

2

.

(3)

In the first step, suppose the anisotropic scales Sx0 , Sy0 , and Sz0 are known. (At beginning, we can assume the scaling is isotropic, Sx0 = 1, Sy0 = 1, and Sz0 = 1.) We can calculate the isotropic similarity transformation using a closed-form solution [8]. In the second step, assuming that the isotropic similarity transformation (T, R, s) is given, we estimate the optimal anisotropic scales Sx0 , Sy0 , and Sz0 . Simple mathematic derivation gives us the following closed-form solution, PJ PJ PJ j j j j j j ˆ0 = S x

j=1 P J

M1 (x)P2 (x)

2 M1j (x) j=1

ˆ0 = S y

where j

P2 =

j=1 P J

M1 (y)P2 (y)

2 M1j (y) j=1

ˆ0 = S z

j=1 P J

M1 (z)P2 (z)

j=1

M1j (z)

2

1 −1 j R (M2 − T ). s

,

(4)

(5)

The above two steps iterate a few times until they converge. With a module solving the anisotropic similarity transformation between two shapes, we can plug it into the generalized Procrustes analysis method to search for the optimal mean shape m. ¯ Besides the optimal mean shape, the optimal alignment Ti from the mean shape to each example shape is also obtained as a by-product. The transformation parameters of the optimal alignment provide the pose ground truth that MSL can learn to estimate. 2.3

Heart Surface Boundary Delineation

After the MSL based heart pose estimation, we align the optimal mean shape with the estimated transformation to get a rough estimate of the heart surface boundary. We then deform the shape for more accurate boundary delineation. In this step, we follow the method described in [1], leveraging a learning based boundary detector. 2.4

Post-Processing to Exclude Rib Cage from Heart Mask

For most cases, good segmentation results can be achieved after 3D heart pose detection and boundary delineation. However, for a few cases, a part of the rib cage (sternum and ribs) may be included in the heart mask (left column of Fig. 3) since the heart boundary is quite weak around that region. A post-processing step is further applied to explicitly segment the sternum and ribs based on adaptive thresholding and connected component analysis. We first detect three landmarks, namely the sternum (red dot), the left (yellow dot) and right (cyan dot)

Fig. 3. Post-processing to exclude the rib cage from the heart mask. Left: Cross-section and 3D visualization of the result before post-processing. Right: After post-processing.

lung tips on each slice, as shown in the left column of Fig. 3. These landmarks determine a region of interest (ROI) (indicated by a blue polygon in Fig. 3). A machine learning based technique is used to detect the landmarks on each slice. To be specific, 2D Haar wavelet features and the probabilistic boosting tree (PBT) [9] are used to train a detector for each landmark. After landmark detection, we extract the ROI on each slice. Stacking the ROIs on all slices, we get a volume of interest (VOI). Normally, bones are brighter than the soft tissues in a CT volume, therefore, we can use intensity thresholding to extract the rib cage. However, due to the variations in the scanners, patients, and scanning protocols, a predefined threshold does not work for all cases. An adaptive optimal threshold is automatically determined by analyzing the intensity histogram of the VOI. For some cases, a part of a chamber may be included in the VOI, though rare. Three dimensional connected component analysis of the bright voxels is performed and only the large components are preserved as the rib cage. We then adjust the heart mesh to make sure the rib cage is completely excluded from the mask (see the right column of Fig. 3).

3

Experiments

The method has been tested on 589 volumes from 288 patients. The scanning protocols are heterogeneous with different capture ranges and resolutions. Each volume contains 80 to 350 slices and the slice size is 512 × 512 pixels. The resolution inside a slice is isotropic and varies from 0.28 mm to 0.74 mm, while slice thickness is generally larger than the in-slice resolution and varies from 0.4 mm to 2.0 mm. For training and evaluation purposes, the out-most surface of the heart is annotated, using a semi-automatic tool, with a triangulated mesh of 514 points and 1024 triangles. The cross-volume point correspondence is established using the rotation-axis based resampling method [1]. The point-to-mesh error, Ep2m , is used to evaluate the segmentation accuracy. For each point in a mesh, we search for the closest point in the other mesh to calculate the minimum distance. We calculate the point-to-mesh distance from the detected mesh to the ground-

truth mesh and vice versa to make the measurement symmetric. A four-fold cross-validation is used to evaluate the performance of the algorithm. First, we evaluate the shape initialization error of the optimal mean shape and the heuristic bounding-box based mean shape [1]. After MSL based heart pose estimation, we align the mean shape with the estimated position, orientation, and anisotropic scales. We then calculate Ep2m of the align mean shape w.r.t. the ground truth mesh. As shown in Table 1, the optimal mean shape is more accurate than the heuristic bounding-box based mean shape. It reduces the mean initialization error from 4.35 mm to 3.60 mm (about 17% reduction). After shape initialization, we deform the mesh under the guidance of a learning based boundary detector, which further improves the boundary delineation accuracy. As shown in Table 1, the mean error is 2.12 mm if we start from the boundingbox based mean shape. Using the proposed optimal shape initialization, we can reduce the final mean error about 10% to 1.91 mm. Our method works well on both contrasted and non-contrasted scans. The mean and median errors on the contrasted data are 1.85 mm and 1.71 mm, respectively. The corresponding errors increase moderately on the non-contrasted data to 2.22 mm and 2.11 mm, respectively. We also compared our approach with the graph cut based approach proposed by Funka-Lea et el. [2]. A binary program was provided by authors of [2], therefore avoiding the performance difference introduced in re-implementation of the algorithm. This method was proposed to generate a 3D visualization view of the heart. Tissues darker than the myocardium (e.g., lung) included in the heart mask does not effect the visualization since the intensity window can be tuned to hide these extra tissues. Therefore, it is controversial to measure its segmentation error against our annotation and compare with our approach. Keeping this in mind, the mean and median errors achieved by the graph-cut based method are 4.60 mm and 4.00 mm, respectively. Its performance degrades significantly on non-contrasted data (3.94 mm vs. 7.65 mm in mean errors). Visual inspection unveils that cutting the coronary arteries is rare for both methods, which is good for intuitive 3D coronary artery visualization. However, the graph cut based approach tends to include more tissues into the heart mask, e.g., a significant part of the liver (with an intensity comparable with or brighter than the myocardium) is included in 10-20% of cases. These extra tissues may block the coronary arteries in 3D visualization. Our approach is computationally efficient. On average, it takes 1.5 s to process a volume on a computer with duo-core 3.2 GHz CPUs and 3 GB memory. For comparison, the graph cut based approach takes about 20 s to process a volume on the same computer, which is more than 10 times slower than the proposed method.

4

Conclusion

In this paper, we proposed an efficient method for automatic heart isolation in CT volumes. A large-scale experiment on 589 volumes (including both contrasted and non-contrasted) from 288 patients demonstrates the robustness of

Table 1. Comparison of the proposed optimal mean shape and the heuristic bounding-box based mean shape [1] on shape initialization and final heart isolation errors. The point-to-mesh error (in millimeters) is used to measure the accuracy in boundary delineation. Shape Initialization Final Segmentation Bounding-Box Bounding-Box Optimal Mean Shape Optimal Mean Shape Mean Shape Mean Shape Mean Error 4.35 3.60 2.12 1.91 Std Deviation 1.43 1.05 0.89 0.71 Median Error 4.11 3.52 1.89 1.77

our approach, compared to the state-of-the-art graph cut based method [2]. It runs more than 10 times faster than the latter too.

References 1. Zheng, Y., Barbu, A., Georgescu, B., Scheuering, M., Comaniciu, D.: Four-chamber heart modeling and automatic segmentation for 3D cardiac CT volumes using marginal space learning and steerable features. IEEE Trans. Medical Imaging 27(11) (2008) 1668–1681 2. Funka-Lea, G., Boykov, Y., Florin, C., Jolly, M.P., Moreau-Gobard, R., Ramaraj, R., Rinck, D.: Automatic heart isolation for CT coronary visualization using graphcuts. In: Proc. IEEE Int’l Sym. Biomedical Imaging. (2006) 614–617 3. Moreno, A., Takemura, C.M., Colliot, O., Camara, O., Bloch, I.: Using anatomical knowledge expressed as fuzzy constraints to segment the heart in CT images. Pattern Recognition 41(8) (2008) 2525–2540 4. van Rikxoort, E.M., Isgum, I., Staring, M., Klein, S., van Ginneken, B.: Adaptive local multi-atlas segmentation: Application to heart segmentation in chest CT scans. In: Proc. of SPIE Medical Imaging. (2008) 5. Lelieveldt, B.P.F., van der Geest, R.J., Rezaee, M.R., Bosch, J.G., Reiber, J.H.C.: Anatomical model matching with fuzzy implicit surfaces for segmentation of thoracic volume scans. IEEE Trans. Medical Imaging 18(3) (1999) 218–230 6. Gregson, P.H.: Automatic segmentation of the heart in 3D MR images. In: Canadian Conf. Eletrical and Computer Engineering. (1994) 584–587 7. Cootes, T.F., Taylor, C.J., Cooper, D.H., Graham, J.: Active shape models—their training and application. Computer Vision and Image Understanding 61(1) (1995) 38–59 8. Dryden, I.L., Mardia, K.V.: Statistical Shape Analysis. John Wiley, Chichester (1998) 9. Tu, Z.: Probabilistic boosting-tree: Learning discriminative methods for classification, recognition, and clustering. In: Proc. Int’l Conf. Computer Vision. (2005) 1589–1596