The reproducibility of cephalometric measurements - Semantic Scholar

3 downloads 12676 Views 82KB Size Report
published data on reproducibility of increments as used in ... using the analysis toolpack in Microsoft® Excel ... analysis was an Intel Pentium II processor®, with.
European Journal of Orthodontics 24 (2002) 655–665

 2002 European Orthodontic Society

The reproducibility of cephalometric measurements: a comparison of analogue and digital methods E. M. Ongkosuwito*, C. Katsaros*, M. A. van ’t Hof**, J. C. Bodegom* and A. M. Kuijpers-Jagtman* Departments of *Orthodontics and Oral Biology and **Preventive Dentistry and Cariology, University Medical Centre St Radboud Nijmegen, The Netherlands

The aim of this study was to compare the reproducibility of longitudinal cephalometric measurements between analogue and digital methods using two different resolutions. Cephalometric radiographs of 20 patients were selected at the start (T1) and end (T2) of treatment: 24 cephalometric variables were calculated at T1 and T2, and their increments (T2 – T1) were also evaluated. All measurements were performed twice by two observers. Quality of methods [analogue, digital: 300 and 600 dots per inch (DPI) resolution] was evaluated by comparing the reliability coefficients and the total error between the digital and analogue methods. The inter-observer agreement was good. The 300 DPI was comparable to the analogue method. The reproducibility for the variables was comparable, but mandibular incisor increments tended to show better results with the 300 DPI method, whereas skeletal jaw relationship increments were not reliable with either method. Maxillary incisor increments, however, were reliable with both methods. The 300 and 600 DPI resolutions were found to be comparable. Scanning of cephalometric radiographs at a resolution of 300 DPI is sufficient for clinical purposes and comparable to analogue cephalometrics. However, all methods were found to be poor in assessing skeletal jaw relationships longitudinally. SUMMARY

Introduction Cephalometric errors can be divided into acquisition, identification, and technical measurement errors. Acquisition errors are made when cephalometric radiographs are acquired either during exposure or by computer processing [i.e. when analogue cephalometric radiographs are transformed into digital pictures of poor quality (e.g. resolution too low or format wrong)]. Identification errors are associated with landmark recognition. Research carried out on conventional cephalometrics proved landmark identification to be the main source of error (Baumrind and Frantz, 1971a,b; Houston, 1983; Houston et al., 1986). Among the factors contributing to the identification error are observer

experience, landmark definition, and the density and sharpness of the image (Björk and Solow, 1961). Technical measurement errors are those induced by the measuring device or technique. Various investigators have evaluated the use of computerized cephalometrics and the digitizing process of cephalometric radiographs (Houston, 1979; Richardson, 1981; Stirrups, 1989; Oliver, 1990; Davis and Mackay, 1991; Macrì and Wenzel, 1993; Nimkarn and Miles, 1995; Lim and Foong, 1997; Geelen et al., 1998; Rudolph et al., 1998; Chen et al., 2000; Liu et al., 2000). In the digitizing process, several steps can be distinguished. The X-ray exposure is the first step, with the X-ray beam exposing a photopaper/ cephalometric radiograph, a phosphor plate, or a sensor. The next step is conversion of the image

656 into a digital picture. The cephalometric radiograph is scanned with a flatbed scanner, a video, or other device; the phosphor plate is scanned with a special scanner, while a digital cephalometric radiograph is directly produced by the sensor. The third step is to save the digital picture to the computer. At the same time, different parameters have to be specified such as the dots per inch (DPI, resolution), the bits (grey shades), and format (this is the technique used for saving). Within the phosphor plate system, the resolution has to be determined twice, before scanning and before saving. Specific software can be used for enhancement of the digital picture. After this last step it is possible to perform cephalometrics directly on the digital picture on screen, or to convert the digital picture back to an analogue cephalometric radiograph by printing. In reproducibility research on digital pictures, investigators have compared digitizing methods such as phosphor plates, video imaging, and flatbed scanners with analogue methods (Oliver, 1990; Macrì and Wenzel, 1993; Nimkarn and Miles, 1995; Geelen et al., 1998; Chen et al., 2000). However, the results concerning the preferred method are contradictory. It remains unclear whether quality parameters such as resolution or the amount of grey shades influence the reproducibility. Furthermore, a search in the literature revealed only studies on digital pictures that concern the reproducibility of landmarks. The accuracy and precision of a combination of landmarks (variables such as distances and angles) may well deviate from those of a single landmark. Orthodontic research generally tends to put great effort into longitudinal studies in which several stages in time are compared. Differences in identification of landmarks on cephalometric radiographs of the same individual at various time points can influence the reproducibility. However, no published data on reproducibility of increments as used in longitudinal studies could be found. The aim of the present study was therefore to investigate the reproducibility of longitudinal cephalometric measurements and to compare analogue and digital methods using two different picture resolutions.

E . M . O N G KO S U W I TO E T A L .

Subjects and methods Patients, measurements, and observers Cephalometric radiographs of 20 patients (11 males and nine females) were randomly selected using the analysis toolpack in Microsoft® Excel 97 (Microsoft Corporation, Redmond, USA) from the files of an orthodontic practice. Cephalometric radiographs of good quality were used, which provided good scans and on which good analogue measurements could be undertaken. From each patient two cephalometric radiographs were used: at the start (T1) and at the end of active treatment (T2). On each cephalometric radiograph 17 landmarks were defined (Table 1). With these landmarks 24 variables were calculated (Table 2). Two observers (EO and JB) performed all measurements according to the methods described below. The films were assessed in two ways: analogue (method 1) and digital (methods 2 and 3). All cephalometric radiographs were isolated for tracing and for scanning. The time interval between the measurements according to the analogue and the digital method was five weeks, and between the two digital methods three weeks.

Method 1: analogue measurement The cephalometric radiographs were traced and measured manually by the two observers, both of whom worked in a dark room while making the tracings. They used trace foil (3M Unitek® Corporation, Monrovia, USA), a 4H pencil and a cephalometric protractor (3M Unitek®) for measuring variables.

Method 2: digital measurement with 300 DPI The cephalometric radiographs used for the analogue measurement were converted to digital pictures with a Linotype Hell Opal scanner (optical resolution: max 800 × 1600 DPI; interpolated: 9600 × 9600 DPI, maximum grey scale output 8 bits/pixel), plug-in Linotype CPS ColorFactory Pro v2.0 software (both by Linomed Medical Imaging GmbH, WindeckWiedenhof, Germany) and Photoshop 5.01®

R E P RO D U C I B I L I T Y O F D I G I TA L C E P H A L O G R A M S

Table 1

657

Landmark definitions.

Landmark

Definition

A

Point A: the most posterior point on the labial surface of the maxilla between the anterior nasal spine and the alveolar process. The anterior tip of the nasal spine at the lower margin of the anterior nasal opening. Point B: the most posterior point in the curvature from a line drawn from infradentale (highest and most anterior point on the mandible at its labial contact with the mandibular central incisors) to pogonion. Condylion: the most posterior-superior point on the curvature of the condylar head. Gnathion: the middle between pogonion and menton on the contour of the bony symphysis. Gonion: the constructed point where the ramus plane and the mandibular plane intersect. Lower incisor apex: the apex of the most anterior mandibular central incisor. Lower incisor incisal edge: the incisal tip of the mandibular central incisor. Menton: the most inferior point on the symphysal outline of the mandible. Nasion: the most anterior point of the nasofrontal suture. Orbitale: the lowest point on the external border of the orbital cavity. Pogonion: the most anterior point on the symphysis. Posterior nasal spine: the most posterior point at the sagittal plane on the bony hard palate. Porion: the most superior point of the external auditory meatus. Sella Turcica: the point representing the centre of the pituitary fossa. Upper incisor apex: the apex of the maxillary central incisor. Upper incisor, incisal edge: the incisal tip of the maxillary central incisor.

ANS B Co Gn Go LIA LIE Me N Or Pg PNS Po S UIA UIE

Table 2

Variable definitions.

Variable

Unit

Definition

Facial angle Convexity angle S–N–A S–N–B Maxillary length Mandibular length Wits appraisal SN to palatal plane SN to mandibular plane LAFH Total face height U1 to SN U1 to palatal plane U1 to N–A angle U1 to N–A U1 to A–Pg Inter-incisal angle L1 to mandibular plane L1 to N–B angle L1 to N–B L1 to A–Pg SN to occlusal plane Pg to N–B FOP (reference plane)

degree degree degree degree mm mm mm degree degree mm mm degree degree degree mm mm degree degree degree mm mm degree mm

Angle between Po–Or and N–Pg Angle between N–A and A–Pg Angle between S–N and N–A Angle between S–N and N–B Distance between Co and A Distance between Co and Gn Distance between A and B, both projected perpendicular on the FOP Angle between S–N and ANS–PNS Angle between S–N and Go–Gn Distance between ANS and Me (lower anterior facial height) Distance between N and Me Angle between S–N and UIA–UIE Angle between ANS–PNS and UIA–UIE Angle between N–A and UIA–UIE Distance from N–A to UIE Distance from A–Pg to UIE Angle between UIA–UIE and LIA–LIE Angle between Go–Gn and LIA–LIE Angle between N–B and LIA–LIE Distance from N–A to LIE Distance from A–Pg to LIE Angle between S–N and the FOP Distance from N–B to Pg Functional occlusal plane: the line that bisects the posterior occlusion of molars and premolars (or deciduous molars)

(Adobe Systems Incorporated, San Jose, USA). The procedure was carried out without filtering on 600 DPI with 8-bit colour depth, and the images were saved as JPEG files on compression

rate 8 (1 is full compression and 10 no compression). The computer used for the cephalometric analysis was an Intel Pentium II processor®, with 64 megabyte internal memory and a

658 conventional 17 inch screen set to an average resolution of 1024 × 768 (this corresponds to 75 DPI). Within Photoshop the digital pictures were resaved at 300 DPI. This resolution was chosen as many clinicians use a resolution equal to or lower than 300 DPI in routine orthodontic practice and because 300 DPI is the output of most commercially available scanners and X-ray devices used for (in) direct acquisition.

Method 3: digital measurement with 600 DPI The average screen resolution was increased to 1600 × 1200 (117 DPI) and an enlargement factor between 2 and 5 was used on screen. For this method the digital pictures previously saved at 600 DPI were used. For both digital methods the cephalometric analysis was undertaken with AOCeph™ prerelease (American Orthodontics, Sheboygan, USA), a cephalometric analysis program with image processing. The observers were allowed to use various enhancing functions such as changing magnification, brightness, and contrast. The landmark was identified and then set by a crosspointer. After completing the digitizing of a set of landmarks, the program automatically generated the variables.

E . M . O N G KO S U W I TO E T A L .

3. Total error was calculated using Dahlberg’s formula (total error = √Σdif2/2n). The total error may be interpreted as the combination of systematic difference and random error. 4. Reliability coefficient, representing the quality of the measurements (variables) in its context, i.e. its use in clinical practice. The measurement error (e) was weighed against the population standard deviation (σ) by the reliability coefficient = σ2/(σ2 + e2). A reliability coefficient = 1 implies a perfect measurement, while a reliability coefficient = 0 implies a non-discriminating useless measurement. A measurement with a reliability coefficient > 0.70 is generally regarded as an acceptable measurement. Reliability coefficients are estimated by the correlation coefficients between two situations (observer or method). These four quality criteria were used to evaluate the digital assessments and analogue assessment in relation to each other as follows: 1. Systematic differences were seen as relevant or interesting only when they were considerably larger than the random errors. Notice was paid to such situations. 2. The total error was used as an indication of the measurement error. 3. The reliability coefficient was used to discriminate between useful and useless measurements.

Statistics The statistical analysis followed the strategy described below and was applied to evaluate variables at a single point of time (T1 and T2, n = 40) as well as differences between T1 and T2 (increments, n = 20). This statistical approach took into account the following four quality criteria: 1. Systematic differences between the two observers or the two measurement methods were evaluated using the paired t-test. 2. Random errors were estimated from the standard deviations of the differences between paired observations. Since these differences included the measurement error twice, the random error was calculated as: random error = SD (differences)/√2.

The quality of the three different measurement methods was evaluated according to the following questions: 1. How does the 300 DPI method compare with the analogue method in terms of total error and reliability? 2. Can the quality of the measurements and increments be improved by increasing the resolution of digital pictures? Both questions were answered by comparing the reliability coefficients associated with the respective methods as obtained from the interobserver comparison. The defined quality parameters of the different methods are presented in tables, so that the calculated results of the study are open to

659

R E P RO D U C I B I L I T Y O F D I G I TA L C E P H A L O G R A M S

in reliability of the considered methods. In comparing the analogue method with the 300 DPI method, the total error ratio was: total error analogue method/total error 300 DPI. The total error ratio in testing whether an increased resolution can improve the quality of measurements was: total error 300 DPI/ total error 600 DPI.

further conclusions and interpretations. In order to give a guideline in interpreting the results, the following rules were applied: 1. Two methods were seen as equivalent (=) when the reliability coefficients differed by no more than 0.35; otherwise superiority was indicated (e.g. A for Analogue and D for Digital). 2. Two equivalent methods (=) might be equivalent on an acceptable level (+) or on an unacceptable level (–) if the average reliability coefficient was larger or smaller than 0.70, respectively. 3. A method was considered to be superior to another method if the reliability coefficient of the better method was > 0.70. 4. The ratio in total error may serve as an explanation of an observed difference Table 3

Results Systematic difference and random error: inter-observer agreement Tables 3, 4, and 5 present the systematic and random errors obtained for the three methods. In 38 out of 138 (28 per cent, marked by an

Analogue method: systematic differences and random errors (inter-observer). Variables (n = 40)

Increments (n = 20) Error

Measurements

Mean SD

DIF

Facial angle Convexity angle S–N–A S–N–B A–N–B Maxillary length Mandibular length Wits appraisal SN to palatal plane SN to mandibular plane LAFH Total face height U1 to SN U1 to palatal plane U1 to N–A angle U1 to N–A U1 to A–Pg Inter-incisal angle L1 to mandibular plane L1 to N–B angle L1 to N–B L1 to A–Pg SN to occlusal plane Pg to N–B

82.6 4.5 0.10 5.2 4.6 0.33 79.5 3.2 1.58* 76.8 3.4 1.08* 2.3 2.6 0.50* 97.6 6.0 3.33* 119.2 7.7 1.13 1.5 4.0 0.43 7.1 3.0 –0.78 31.9 5.2 –0.65 70.4 5.8 0.68* 122.5 6.5 –0.10 106.8 7.9 1.05 113.4 8.0 0.55 26.6 8.3 0.73 5.8 3.1 –0.53 7.1 3.8 –0.40 124.9 10.9 –1.13 96.5 7.8 0.15 25.7 7.1 1.10 5.0 2.7 –0.20 2.7 3.3 –0.55 14.5 4.0 –1.70* 2.7 2.0 –0.03

Error

SE DIF Random Total Mean SD

DIF

SE DIF Random Total

0.47 0.46 0.39 0.27 0.22 0.61 0.68 0.49 0.40 0.38 0.19 0.32 0.56 0.66 0.75 0.28 0.36 0.63 0.49 0.63 0.18 0.36 0.69 0.31

0.00 0.55 –1.25 –0.45 0.80 –1.55 0.95 –1.45 0.15 –0.20 –0.05 1.00 0.10 1.50 2.05 1.15 1.20* –1.35 –0.10 0.70 0.30 1.50* 2.70* 1.15

0.88 1.09 0.87 0.63 0.46 1.11 0.85 0.82 0.83 0.83 0.40 0.75 1.03 1.28 1.69 0.59 0.50 0.97 0.86 1.16 0.34 0.62 1.13 0.55

2.12 2.07 1.75 1.19 0.96 2.71 3.05 2.20 1.79 1.71 0.87 1.41 2.50 2.94 3.37 1.26 1.61 2.82 2.18 2.81 0.79 1.61 3.08 1.37

2.09 2.05 2.06 1.40 1.01 3.56+ 3.12 2.19 1.85 1.75 0.98 1.40 2.58 2.93 3.36 1.30 1.61 2.89 2.16 2.88 0.79 1.64 3.27 1.36

2.8 5.4 2.1 1.7 2.0 3.9 7.8 3.6 1.6 2.7 5.5 8.0 6.7 7.2 7.7 2.6 3.4 8.5 6.0 5.6 1.5 2.3 3.4 1.3

*P < 0.05 (systematic difference). + Systematic difference > random error. DIF, difference between the two observers; SE DIF, standard error of the DIF.

2.0 5.3 1.8 1.6 2.1 3.8 4.4 2.0 2.0 3.2 2.6 3.5 5.3 5.4 6.3 1.9 2.5 6.9 4.5 4.1 1.1 1.5 2.8 1.8

2.77 3.44 2.75 2.01 1.44 3.51 2.69 2.60 2.64 2.63 1.27 2.36 3.27 4.04 5.35 1.87 1.57 3.05 2.72 3.67 1.08 1.97 3.58 1.74

2.70 3.37 2.82 1.98 1.52 3.60 2.71 2.73 2.57 2.57 1.24 2.41 3.19 4.08 5.41 1.99 1.75 3.13 2.66 3.61 1.07 2.19 3.98 1.88

660

E . M . O N G KO S U W I TO E T A L .

Table 4 300 DPI method: systematic differences and random errors (inter-observer). Variables (n = 40)

Increments (n = 20) Error

Error

Measurements

Mean SD

DIF

SE DIF Random Total Mean SD

DIF

SE DIF Random Total

Facial angle Convexity angle S–N–A S–N–B A–N–B Maxillary length Mandibular length Wits appraisal SN to palatal plane SN to mandibular plane LAFH Total face height U1 to SN U1 to palatal plane U1 to N–A angle U1 to N–A U1 to A–Pg Inter-incisal angle L1 to mandibular plane L1 to N–B angle L1 to N–B L1 to A–Pg SN to occlusal plane Pg to N–B

81.6 5.8 78.8 76.9 2.3 93.5 117.6 –0.2 6.7 32.9 70.7 122.2 106.0 112.5 27.2 6.7 7.1 125.5 95.7 25.5 5.0 3.2 15.7 2.6

–0.58 0.84 0.88* –0.21 1.08* 0.95 –1.41* 2.78* 0.47 –0.48 1.01* 0.55* –0.03 0.74 –0.95 –1.57* –0.87* –3.23* 3.74* 3.13* 0.18 –0.68* –2.21* 0.03

0.33 0.64 0.41 0.23 0.32 0.81 0.50 0.78 0.30 0.36 0.25 0.24 0.53 0.55 0.68 0.43 0.18 0.63 0.51 0.43 0.09 0.17 0.97 0.09

1.72* –0.47 1.45* 0.60 0.86 –0.14 –1.79 3.54* 0.13 –0.66 –0.52 0.00 0.01 –0.26 –1.58 –1.00 –0.30 0.99 –0.35 –0.37 –0.05 –0.24 –3.81 –0.05

0.61 1.21 0.60 0.45 0.52 1.64 0.95 1.59 0.55 0.68 0.51 0.53 1.00 1.05 1.11 0.70 0.32 1.01 0.90 0.74 0.14 0.27 1.95 0.16

4.2 4.1 3.9 3.6 2.9 6.6 8.1 4.3 2.7 5.1 5.7 6.4 8.2 7.7 8.2 3.4 3.5 10.8 8.0 7.6 2.6 2.2 5.5 1.8

1.46 2.87 1.82 1.05 1.42 3.63 2.24 3.51 1.33 1.61 1.10 1.06 2.36 2.47 3.05 1.92 0.82 2.83 2.29 1.93 0.41 0.78 4.32 0.42

1.50 2.89 1.90 1.04 1.59 3.64 2.43 3.98 1.36 1.62 1.30 1.12 2.33 2.49 3.09 2.20 1.02+ 3.61+ 3.48+ 2.92+ 0.42 0.91 4.54 0.41

3.4 4.1 2.3 1.5 1.4 5.0 7.7 4.6 1.3 2.5 5.3 8.0 7.0 7.1 7.6 2.7 2.8 8.6 5.7 6.1 1.6 2.0 6.2 0.9

2.3 3.6 1.6 1.4 2.6 4.5 4.3 3.2 1.0 2.5 2.5 3.3 5.7 5.5 5.2 2.2 2.1 7.6 4.2 4.6 1.1 1.4 4.8 0.7

1.92 3.81 1.88 1.43 1.63 5.19 3.01 5.03 1.73 2.14 1.61 1.67 3.15 3.31 3.50 2.21 1.02 3.20 2.83 2.33 0.45 0.84 6.17 0.52

2.23 3.73 2.10 1.45 1.70 5.06 3.20 5.51 1.69 2.14 1.61 1.63 3.07 3.23 3.59 2.26 1.01 3.20 2.77 2.29 0.44 0.83 6.59 0.50

*P < 0.05 (systematic difference). + Systematic difference > random error. DIF, difference between the two observers; SE DIF, standard error of the DIF.

asterisk) variables and their increments, the paired t-test showed a significant difference between the two observers. A systematic difference may be neglected if it is smaller than the random error. In only 10 variables (see Tables 3–5 marked by +) was the systematic difference larger than the random error. This was true of one variable in the analogue method, of four variables in the 300 DPI method, and of five variables in the 600 DPI method. None of the 10 measurements involved were increments, indicating that systematic differences did not play an important role in this study and that the total error was thus a good measure for the measurement error.

Comparison of the methods: analogue versus 300 DPI (Table 6) Most of the measurements proved to be equivalent for these two methods. For the variables, only the Wits appraisal was found to be better with the analogue method whereas only Pg to NB was found to be better with the digital method. The reliability of most variables proved to be acceptable. Exceptions were ‘Convexity angle’ and ‘SN to occl. Pl.’. For the increments too, the reliability of both methods seemed to be equivalent. Only four increments were measured more reliably by the digital (300 DPI) than the analogue method (facial angle, L1–NB, L1–A–Pg,

661

R E P RO D U C I B I L I T Y O F D I G I TA L C E P H A L O G R A M S

Table 5

600 DPI method: systematic differences and random errors (inter-observer). Variables (n = 40)

Increments (n = 20) Error

Error

Measurements

Mean SD

DIF

SE DIF Random Total Mean SD

DIF

SE DIF Random Total

Facial angle Convexity angle S–N–A S–N–B A–N–B Maxillary length Mandibular length Wits appraisal SN to palatal plane SN to mandibular plane LAFH Total face height U1 to SN U1 to palatal plane U1 to N–A angle U1 to N–A U1 to A–Pg Inter-incisal angle L1 to mandibular plane L1 to N–B angle L1 to N–B L1 to A–Pg SN to occlusal plane Pg to N–B

81.4 5.8 78.4 76.9 1.7 93.0 118.2 –0.1 7.1 32.7 70.3 122.4 105.5 112.3 27.1 7.0 7.3 125.8 96.0 25.6 5.1 3.5 15.3 2.5

–0.29 –0.09 0.78 –0.33 1.10* 0.92 –2.14* 2.06* 1.64* 0.69 0.88* 0.85 –0.36 1.61* –1.14* –1.89* –0.92* –2.47* 2.16* 2.47* –0.02 –0.59* –0.78 0.03

0.37 0.53 0.50 0.33 0.29 0.67 0.71 0.69 0.36 0.34 0.26 0.50 0.57 0.50 0.52 0.34 0.13 0.60 0.38 0.35 0.09 0.20 1.00 0.10

0.71 –2.17 0.95 0.45 0.50 –1.46 –1.35 –0.05 –0.69 –0.14 0.46 0.20 0.34 0.13 –0.56 –0.39 –0.06 –1.18 0.99 1.31 0.37 0.23 0.17 –0.06

0.76 1.10 1.05 0.51 0.63 1.37 1.54 1.37 0.60 0.54 0.68 1.09 1.44 1.16 1.19 0.76 0.26 1.38 0.79 0.78 0.20 0.42 1.79 0.18

4.2 4.0 3.8 3.7 3.0 5.8 8.2 4.4 2.5 5.1 5.6 6.6 8.2 8.0 7.9 3.5 3.4 11.1 7.7 7.5 2.6 2.4 5.6 1.5

1.64 2.35 2.22 1.47 1.28 2.98 3.18 3.10 1.59 1.52 1.17 2.25 2.56 2.21 2.33 1.52 0.56 2.68 1.70 1.56 0.41 0.87 4.46 0.44

1.61 2.32 2.24 1.45 1.48 3.00 3.48 3.38 1.95+ 1.58 1.30 2.28 2.53 2.45 2.43 2.00+ 0.84+ 3.16 2.26+ 2.32+ 0.32 0.95 4.43 0.32

3.0 4.7 2.9 1.6 1.9 4.2 8.5 4.3 1.3 2.0 5.3 8.0 7.0 7.1 7.2 3.2 2.6 8.7 5.4 5.8 1.6 2.3 5.4 0.8

2.0 3.5 2.2 1.5 2.7 3.7 6.0 2.8 1.1 2.2 2.2 3.4 5.3 5.6 5.1 2.3 2.2 7.6 3.8 4.0 1.1 1.5 4.6 0.7

2.41 3.48 3.32 1.61 1.98 4.34 4.86 4.33 1.90 1.71 2.15 3.45 4.56 3.67 3.77 2.40 0.84 4.38 2.49 2.46 0.62 1.32 5.66 0.56

2.40 3.72 3.31 1.61 1.96 4.36 4.83 4.22 1.92 1.67 2.12 3.37 4.45 3.58 3.70 2.36 0.81 4.35 2.52 2.57 0.66 1.30 5.52 0.55

*P < 0.05 (systematic difference). + Systematic difference > random error. DIF, difference between the two observers; SE DIF, standard error of the DIF.

and Pg–NB). In three of these increments (L1–NB, L1–A–Pg, and Pg–NB) this was due to the large total error ratio, which ranged from 2.5 to 3.8. The ‘facial angle’ increment, however, had only a small difference in total error, so that the better ‘result’ for the digital method might be mere coincidence. For approximately half the increments, the reliability of the measurements was not satisfactory.

equivalent. This equivalence, however, was not always at an acceptable level of reliability, as shown by the minus (–) sign in Table 7. This was the case for most skeletal relationship increments. Equivalence and an acceptable level of reliability were found for most other measurements (mostly incisor measurements), as shown by the addition (+) signs in Table 7. Discussion

Improvement of the 300 DPI method by using 600 DPI (Table 7) According to the criteria, all measurements (variables and increments) have to be seen as

Many different study designs have been used to test the reproducibility of measurements (Houston, 1979; Richardson, 1981; Oliver, 1990; Davis and Mackay, 1991; Macrì and Wenzel,

662

E . M . O N G KO S U W I TO E T A L .

Table 6 Analogue versus 300 DPI method: total error ratios (analogue/300) and inter-observer correlation coefficients. Variables (n = 40)

Increments (n = 20)

Reliability analogue

300

Error ratio

Evaluation

Reliability analogue

300

Error ratio

Evaluation

Measurements Facial angle Convexity angle S–N–A S–N–B A–N–B Maxillary length Mandibular length Wits appraisal SN to palatal plane SN to mandibular plane LAFH Total face height U1 to SN U1 to palatal plane U1 to N–A angle U1 to N–A U1 to A–Pg Inter-incisal angle L1 to mandibular plane L1 to N–B angle L1 to N–B L1 to A–Pg SN to occlusal plane Pg to N–B

0.79 0.80 0.68 0.88 0.88 0.79 0.84 0.74 0.64 0.89 0.98 0.95 0.91 0.87 0.84 0.84 0.83 0.93 0.93 0.85 0.92 0.77 0.41 0.59

0.88 0.51 0.79 0.93 0.74 0.72 0.92 0.28 0.78 0.90 0.96 0.97 0.92 0.90 0.86 0.67 0.94 0.93 0.92 0.93 0.98 0.89 0.42 0.94

1.4 0.7 1.1 1.3 0.6 1.0 1.3 0.6 1.4 1.1 0.8 1.3 1.1 1.2 1.1 0.6 1.6 0.8 0.6 1.0 1.9 1.8 0.7 3.3

= = = = = = = A = = = = = = = = = = = = = = = D

0.35 0.80 –0.07 0.18 0.68 0.39 0.74 0.57 –0.01 0.68 0.88 0.77 0.86 0.81 0.72 0.64 0.83 0.92 0.76 0.50 0.44 0.34 0.31 0.28

0.78 0.50 0.41 0.48 0.45 0.30 0.62 0.16 –0.12 0.65 0.61 0.74 0.88 0.87 0.86 0.64 0.89 0.92 0.75 0.83 0.93 0.82 0.18 0.78

1.2 0.9 1.3 1.4 0.9 0.7 0.9 0.5 1.5 1.2 0.8 1.5 1.0 1.3 1.5 0.9 1.7 1.0 1.0 1.6 2.5 2.6 0.6 3.8

D = = = = = = = = = = = = = = = = = = = D D = D

+ – + + + + + + + + + + + + + + + + + + + + – +

+ – – – – – – – – – + + + + + – + + + + + + – +

=, methods are seen as equivalent; A, analogue method seen as superior to digital method; D, digital method seen as superior to analogue method; +, reliability at an acceptable level; –, reliability not acceptable.

1993; Nimkarn and Miles, 1995; Geelen et al., 1998; Rudolph et al., 1998; Chen et al., 2000; Liu et al., 2000). Most of these investigations tested the reproducibility of single landmarks, because the identification of landmarks is the main source of error. It was not possible to identify studies in the literature that tested reproducibility of increments as used in longitudinal cephalometric investigations. This is surprising because many longitudinal studies are carried out in the field of orthodontics. Furthermore, the influence of resolution and other digital picture quality parameters has not previously been evaluated. The present study showed that scanning of cephalometric radiographs with a higher resolution did not influence the measurement

error. Comparison of the 300 DPI and the analogue method revealed that the lower incisor increments tended to be less reproducible with the analogue method. It is, however, difficult to explain why the digital method should be more reproducible than the analogue method, especially for the lower incisor increments. Upper incisor increments were reproducible with both methods so that either method can be used. The reproducibility of the skeletal jaw relationship increments, however, was not satisfactory either for the 300 DPI or for the analogue method. This must be taken into consideration when results of longitudinal studies are interpreted. It is important to find more efficiently reproducible measurements that describe skeletal jaw relationships.

663

R E P RO D U C I B I L I T Y O F D I G I TA L C E P H A L O G R A M S

Table 7 300 DPI method versus 600 DPI method: total error ratios (300/600) and inter-observer correlation coefficients. Variables (n = 40)

Measurements Facial angle Convexity angle S–N–A S–N–B A–N–B Maxillary length Mandibular length Wits appraisal SN to palatal plane SN to mandibular plane LAFH Total face height U1 to SN U1 to palatal plane U1 to N–A angle U1 to N–A U1 to A–Pg Inter-incisal angle L1 to mandibular plane L1 to N–B angle L1 to N–B L1 to A–Pg SN to occlusal plane Pg to N–B

Increments (n = 20)

Reliability 300

Error ratio

Evaluation

600

Reliability 300

Error ratio

Evaluation

600

0.88 0.51 0.79 0.93 0.74 0.72 0.92 0.28 0.78 0.90

0.86 0.67 0.65 0.85 0.82 0.74 0.86 0.50 0.58 0.92

0.93 1.25 0.85 0.72 1.07 1.21 0.70 1.18 0.70 1.03

= = = = = = = = = =

+ – + + + + + – – +

0.78 0.50 0.41 0.48 0.45 0.30 0.62 0.16 –0.12 0.65

0.57 0.64 –0.02 0.45 0.46 0.36 0.59 0.33 –0.28 0.68

0.9 1.0 0.6 0.9 0.9 1.2 0.7 1.3 0.9 1.3

= = = = = = = = = =

– – – – – – – – – –

0.96 0.97 0.92 0.90 0.86 0.67 0.94 0.93 0.92 0.93 0.98 0.89 0.42 0.94

0.96 0.89 0.90 0.92 0.91 0.80 0.97 0.94 0.95 0.96 0.98 0.87 0.37 0.93

1.00 0.49 0.92 1.02 1.27 1.10 1.22 1.14 1.54 1.26 1.34 0.95 1.03 1.30

= = = = = = = = = = = = = =

+ + + + + + + + + + + + – +

0.61 0.74 0.88 0.87 0.86 0.64 0.89 0.92 0.75 0.83 0.93 0.82 0.18 0.78

0.49 0.41 0.74 0.84 0.83 0.69 0.94 0.86 0.81 0.82 0.87 0.67 0.38 0.71

0.8 0.5 0.7 0.9 1.0 1.0 1.2 0.7 1.1 0.9 0.7 0.6 1.2 0.9

= = = = = = = = = = = = = =

– – + + + – + + + + + + – +

=, methods are seen as equivalent; +, reliability at an acceptable level; –, reliability not acceptable.

As stated above, this study did not test whether the precision and accuracy were better in the X- or Y-direction for a single landmark or a combination of landmarks. Most studies, however, have reported a difference in error in the horizontal and vertical directions for single landmarks (Chen et al., 2000; Liu et al., 2000). The findings in the present investigation indicate that digital cephalometrics may be a better method for some measurements. The digital technique also has the following advantages: no need for a dark room for tracing, chemicals, or physical space for storage. It should be borne in mind, however, that digital pictures that originate from poor-quality analogue cephalometric radiographs often give an even poorer

image. This is important because poor-quality (digital) cephalometric radiographs influence the identification of landmarks. There are also differences between the various digital techniques. The advantages of a sensor technique are that the digital cephalometric radiographs are produced directly by the sensor, which is more efficient, while storage is easier and the pictures can be enhanced afterwards. The main advantages of the phosphor plate technique are that no new X-ray machine is needed and the quality of digital pictures is better than with the sensor technique (Geelen et al., 1998). The reliability of landmark identification using different digitizing techniques has been

664 investigated in various studies. The phosphor plate technique was found to be slightly better for cephalometrics performed on printed digital pictures compared with digital pictures assessed on screen with 170 DPI, 8 bits, and TIF format (Geelen et al., 1998). Chen et al. (2000), who used a flatbed scanner for digitizing, also agreed that digital cephalometrics could produce better results using digital pictures of 150 DPI, 8 bits. On the other hand, all authors using a video camera to digitize the cephalometric radiographs (Oliver, 1990; Macrì and Wenzel, 1993; Nimkarn and Miles, 1995) found poorer results for their digital technique compared with conventional cephalometric radiographs, using digital pictures with an unknown format and lower quality parameters, 65 DPI, 8 bits and ‘average’ original quality (Oliver, 1990), 51 DPI, unknown grey shades (Macrì and Wenzel, 1993), or unknown parameters (Nimkarn and Miles, 1995). In the present study, pictures in standard resolution (300 DPI) and high resolution (600 DPI) with an 8-bit grey scale were used. This was necessary because magnification should still be possible without pixelizing when using an average screen resolution of 115 DPI. Grey scale is also important, since identification of landmarks is most often based on evaluation of grey shades. The use of at least a 7-bit grey scale is mandatory because fewer grey shades may lead to unreliable decisions on reproducibility of measurements (Thijssen, 1993). The compression technique must also be taken into consideration since it could affect the grey scale or number of pixels. In the present study a ‘lossy’ compression technique (JPEG) was used. The JPEG format has been shown to have no effect on diagnostic accuracy in the field of thoracic imaging (MacMahon et al., 1991; Goldberg et al., 1994). Conclusions Scanning of cephalometric radiographs at a resolution of 300 DPI is sufficient for clinical purposes and comparable to analogue cephalometrics. However, all methods are poor in measuring skeletal jaw relationships longitudinally.

E . M . O N G KO S U W I TO E T A L .

Address for correspondence Professor A. M. Kuijpers-Jagtman Department of Orthodontics and Oral Biology P.O. Box 9101 117 Tandheelkunde 6500 HB Nijmegen The Netherlands References Baumrind S, Frantz R C 1971a The reliability of cephalometric radiograph measurements. 1. Landmark identification. American Journal of Orthodontics 60: 111–127 Baumrind S, Frantz R C 1971b The reliability of head film measurements. 2. Conventional angular and linear measures. American Journal of Orthodontics 60: 505–517 Björk A, Solow B 1961 Measurements on radiographs. Journal of Dental Research 41: 672–683 Chen Y J, Chen S K, Chang H F, Chen K C 2000 Comparison of landmark identification in traditional versus computer-aided digital cephalometry. Angle Orthodontist 70: 387–392 Davis D N, Mackay F 1991 Reliability of cephalometric analysis using manual and interactive computer methods. British Journal of Orthodontics 18: 105–109 Geelen W, Wenzel A, Gotfredsen E, Kruger M, Hansson L G 1998 Reproducibility of cephalometric landmarks on conventional film, hardcopy, and monitor-displayed images obtained by the storage phosphor technique. European Journal of Orthodontics 20: 331–340 Goldberg M A, Pivovarov M, Mayo-Smith W W 1994 Application of wavelet compression to digital radiographs. American Journal of Roentgenology 163: 463–468 Houston W J 1979 The application of computer aided digital analysis to orthodontic records. European Journal of Orthodontics 1: 71–79 Houston W J B 1983 The analysis of errors in orthodontic measurements. American Journal of Orthodontics 83: 382–390 Houston W J B, Maher R E, McElroy D, Sherriff M 1986 Sources of error in measurements from cephalometric radiographs. European Journal of Orthodontics 8: 149–151 Lim K F, Foong K W 1997 Phosphor-stimulated computed cephalometry: reliability of landmark identification. British Journal of Orthodontics 24: 301–308 Liu J-K, Chen Y C, Cheng K S 2000 Accuracy of computerized automatic identification of cephalometric landmarks. American Journal of Orthodontics and Dentofacial Orthopedics 118: 535–540 MacMahon H, Doi K, Sanada S 1991 Data compression: effect on diagnostic accuracy in digital chest radiography. Radiology 178: 175–179.

R E P RO D U C I B I L I T Y O F D I G I TA L C E P H A L O G R A M S

Macrì V, Wenzel A 1993 Reliability of landmark recording on film and digital lateral cephalograms. European Journal of Orthodontics 15: 137–148 Nimkarn Y, Miles P G 1995 Reliability of computergenerated cephalometrics. International Journal of Adult Orthodontics and Orthognathic Surgery 10: 43–52 Oliver R G 1990 Cephalometric analysis comparing five different methods. British Journal of Orthodontics 18: 277–283 Richardson A 1981 A comparison of traditional and computerised methods of cephalometric analysis. European Journal of Orthodontics 3: 15–20

665 Rudolph D J, Sinclair P M, Coggins J M 1998 Automatic computerised radiographic identification of cephalometric landmarks. American Journal of Orthodontics and Dentofacial Orthopedics 113: 173–179 Stirrups D R 1989 A comparison of the accuracy of cephalometric landmark location between two screen/ film combinations. Angle Orthodontist 59: 211–215 Thijssen M 1993 [Definition and control of image quality in radiographic diagnostics] Bepaling en bewaking van de beeldkwaliteit in de radiodiagnostiek. PhD Thesis. Universiteitsdrukkerij KUN (Netherlands), pp. 1–164