Pilot Phase Studies on the Accuracy of Dietary Intake ... - CiteSeerX

11 downloads 0 Views 112KB Size Report
Western European countries,1 a 2-year pilot phase was ... As part of the European Prospective Investigation into Cancer and Nutrition (EPIC), preliminary ...
International Journal of Epidemiology © International Epidemiological Association 1997

Vol. 26, No. 1(Suppl. 1) Printed in Great Britain

Pilot Phase Studies on the Accuracy of Dietary Intake Measurements in the EPIC Project: Overall Evaluation of Results RUDOLF KAAKS, NADIA SLIMANI AND ELIO RIBOLI Kaaks R (International Agency for Research on Cancer, 150 cours Albert Thomas, 69372 Lyon Cedex 08, France), Slimani N and Riboli E. Pilot phase studies on the accuracy of dietary intake measurements in the EPIC project: overall evaluation of results. International Journal of Epidemiology 1997; 26 (Suppl. 1): S26–S36. Background. As part of the European Prospective Investigation into Cancer and Nutrition (EPIC), preliminary studies were conducted to evaluate the accuracy of individuals’ dietary intake measurements from newly developed questionnaires. Methods. In six countries that adhered from the very beginning to the multicentre, co-ordinated EPIC project, the validity studies were based on two repeat questionnaire measurements at the start and at the end of a 1-year period, in groups of about 100 volunteers of both sexes. In addition, during this year, up to 12 24-hour recalls per person were taken monthly, and up to four blood and urine specimens were collected for measurement of biochemical markers. In three countries that joined EPIC later, the designs of the validity studies and type of ‘reference’ measurement chosen were somewhat different. The results presented in this overview paper are taken partly from more detailed, country-specific publications, and partly from a central (re-)analysis of the original data, to ensure a uniform approach to the statistical analyses and presentation. Results. Averaged over subgroups by country and gender, Spearman coefficients of correlation between questionnaire measurements and the individuals’ average 24-hour recalls ranged from 0.37 for fish to 0.68 for dairy products and 0.79 for alcoholic beverages. For energy-adjusted nutrient intakes (or nutrient densities, in the UK), mean Pearson correlation coefficients, corrected for residual attenuation due to day-to-day variations in the 24-hour recalls in all but two countries, ranged from 0.37 for retinol and 0.48 for vitamin E to 0.60 for carbohydrates and 0.12 for total alcohol intake. Correlations between energy-adjusted nutrient intakes and biochemical markers on average were low, but varied considerably between study centres. Conclusions. On average, most estimated correlation coefficients were of similar magnitude to those observed by independent research groups. The role of the preliminary validity studies, and various benefits drawn from these studies for further planning of the EPIC project are discussed. Keywords: dietary questionnaires, 24-hour diet recall, validation, prospective studies, calibration, EPIC

Western European countries,1 a 2-year pilot phase was planned to develop a detailed and well-standardized protocol for the mechanisms of recruitment of study participants, collection of questionnaire data on dietary and non-dietary risk factors, collection of anthropometric measurements and blood samples, and follow-up for cancer and other forms of chronic disease, in each of about 20 participating study centres. A particularly important component of the pilot phase was the development of questionnaires for accurate assessment of individuals’ habitual dietary intake patterns, and the preliminary evaluation of whether these questionnaires would yield measurements with a sufficient level of accuracy: clearly, for a study the size of the EPIC project focusing on diet as a major determinant of health and disease, it was important to invest in such

The conduct of a large prospective cohort study on diet and indicators of nutritional status in relation to chronic disease endpoints including cancer, diabetes and cardiovascular disease is obviously an exercise that requires careful planning, both from a scientific point of view (i.e. determining the main hypotheses to be evaluated and hence what type of information should be collected) as well as from a financial and logistic viewpoint (i.e. evaluating the question of how to maximize the amount and quality of the information collected, given a total investment of time and research funds). In the case of the EPIC project, which involves several hundred thousand participants in initially seven, now nine, International Agency for Research on Cancer, 150 cours Albert Thomas, 69372 Lyon Cedex 08, France.

S26

OVERALL EVALUATION OF RESULTS

preliminary studies in order to evaluate whether dietary questionnaire measurements reached minimum standards of accuracy, and to spot potential weaknesses in dietary questionnaires before their application to several tens of thousands of study participants per country. In this paper we briefly summarize the principal criteria required for the dietary questionnaires to be used in the various EPIC centres, as well as a few methodological innovations that were tested. We then present the main results of the preliminary validity studies, and discuss the role of these studies in the development of dietary assessment methods, as well as in the broader context of the future data analyses to investigate relations between dietary intake patterns and occurrence of disease. In this supplement, individual papers give reports in full detail of the development and validation of the dietary questionnaires in France,2 Germany,3,4 Greece,5 Italy,6 the Netherlands,7,8 Spain,9–11 Sweden,12 and the UK.13 For Denmark, fully detailed reports have appeared previously, in another issue of this journal.14,15

STRUCTURE OF THE DIETARY QUESTIONNAIRES The structure and content of the dietary questionnaires were to some degree dictated by two types of consideration. Firstly, as the questionnaires were to be used for very large numbers of subjects, and as in most countries it was financially impossible to have the questionnaires filled out during a personal interview with a dietician, the questionnaires had to be self-administered and thus simple enough to be filled out by the study subjects independently. It was therefore decided to develop precoded, self-administered food frequency questionnaires (FFQ) for use in France, Germany, Greece, Italy, and the Netherlands. In Spain and in Ragusa (Sicily), however, where previous experience had indicated that the rates and quality of response to a self-administered questionnaire might be unacceptably low, it was decided to develop questionnaires to be used for direct interviews by a dietician. Besides being easy to fill out, most of the questionnaires were designed to be read by optical scanning, or otherwise had to be designed for quick manual data entry, which implied the extensive use of pre-coded options for answers. In the UK, where the validity study was planned prior to and undertaken initially independently of those in the other EPIC centres, it was decided to combine a short selfadministered FFQ with a 7-day food consumption diary. In Denmark, where the validity study had also started independently of EPIC, an FFQ similar to those

S27

of the other EPIC countries was developed.14 Finally, in Sweden, where the cohort study started 3 years before EPIC, a food diary was used for the assessment of cooked dishes consumed mainly during dinner, plus a simplified FFQ for simple foods consumed predominantly during breakfast and lunch. A second consideration was that the questionnaires should provide measurements of the individuals’ habitual consumption of all main food items during the previous year. The estimation of specific types of food consumed, as well as of their average frequency of consumption and usual portion sizes, should be sufficiently accurate and detailed to allow the calculation of total energy intake and of intake levels of a minimum (core) list of nutrients and major food groups. It was recognized that, with the large number of potential disease outcomes (different types of cancer, but also other forms of chronic disease), many different aspects of diet had to be considered as potential risk factors; new aetiological hypotheses and more detailed food composition tables might be developed in the course of the cohort studies. Thus it was felt, for instance, that attention should be given to accurate assessment of intakes of different types and sources of fat—given the hypotheses relating high fat intakes to, for example, cancers of the colon or female breast. It was also found important to measure accurately the consumption of different types of fruits, vegetables and cereals—given the growing interest in the possible preventive activity of various plant food constituents16—and to measure accurately the consumption of meat, fish, and foods rich in carbohydrates with low or high glycaemic indices.17 It was recognized that inferences would focus on the relation between disease risk and dietary composition, after adjustment for total energy intake.18 The latter argument implied that estimates of betweensubject differences in total energy intake would be needed. As there was (and still is) growing interest in the potentially beneficial effects of some specific ‘nonnutritive’ substances that may be found in a few specific foods or families of foods (e.g. cruciferous vegetables), it was also felt that the individuals’ food intakes should be measured with sufficient specificity and completeness to allow the estimation of intake levels of such ‘anti-carcinogens’ as soon as more detailed food composition tables become available. A recent example of a study on a specific potential anticarcinogen was the study by Hertog et al.19 on cancer risk in relation to the intake of quercetin (a bioflavonoid) using data from the Seven Countries Study started some 30 years ago. Intake levels of quercetin, estimated by means of a specially developed food

S28

INTERNATIONAL JOURNAL OF EPIDEMIOLOGY

composition table, appeared to be determined primarily by only a few foods (tea, red wine, onions and apples) that are generally classified in different food groups. The study by Hertog et al. illustrates how estimated intakes of relatively specific food items may later be combined in a meaningful manner that was not initially foreseeable and thus how important it is to obtain detailed, specific estimations of consumption levels of individual food items. Besides these basic considerations, there were several attempts to make innovative adaptations to the questionnaires in order to improve the accuracy of the dietary intake assessments: 1. It was decided to use ‘semi-quantitative’ FFQ, with questions not only about the frequency of consumption of different food items but, for a number of foods, also about habitual portion sizes. To increase the accuracy of portion size estimation, photographs of food portions of various sizes were included. Traditional FFQ ask information only about the frequency of consumption of foods and dishes. Early work by Abramson et al.,20 confirmed later by other research groups,21 had shown that the frequency of food consumption is indeed the main determinant of between-person variations in measured dietary intake levels. Several more recent studies had suggested, however, that the accuracy of dietary questionnaire measurements could be slightly improved by asking questions also about habitual portion sizes, especially for foods or dishes where portion sizes may vary substantially between individuals.22 Evaluations of whether the use of more detailed information about habitual food portion sizes improved the accuracy of measurements were also made in the pilot studies conducted in Denmark,15,23,24 and in the Netherlands,7,8 by comparison with the estimated accuracy when fixed portion size values were applied to all individuals. 2. In a number of countries (France, Spain and Italy), where it is usual to have a warm dish or salads twice a day, at lunch and for dinner, questions about the frequency of consumption and habitual portions of the same type of food were asked separately for both meals. The reasoning was that the frequency of consumption, or the average portion size of some foods or dishes (e.g. a mixed salad) may differ systematically between lunch (frequently consumed outside the home) and dinner (usually prepared at home). Likewise, in Italy it was decided to ask separate questions for foods that can be consumed either as a starter (‘primo piatto’), or as main course (‘secondo piatto’). By asking questions separately by type of meal or by the first and second course of lunch or dinner, it was hoped to improve the accuracy of the answers about frequency and usual portion size.

3. Previous experience had shown that the longer the list of detailed food items belonging to a given main food group (e.g. many different types of vegetables), the higher the total level of consumption reported. In an attempt to overcome this problem questions were asked first about the frequency of consumption of main categories of food (e.g. meat, cheese, etc.) to estimate the overall amounts of these foods consumed. Additional questions focused on qualitative differences within each of the main categories such as, for example, the relative frequencies of consumption of specific types of meat (fat or lean pork, veal, beef, mutton). A similar structure was used for complex dishes (e.g. the Italian vegetable soup ‘minestrone’) where recipes may vary substantially and systematically between individuals, asking first about habitual frequency and portion size of the dish, and then about frequencies with which certain amounts of specific ingredients would usually be included.

GLOBAL REVIEW OF RESULTS FROM THE VALIDITY STUDIES The principal requirement for the dietary questionnaires to be developed was that they should provide the best possible ranking of individuals by their habitual intake levels of foods and nutrients. This ranking capacity was evaluated by estimating the correlation coefficient ρQT between questionnaire measurements and true habitual intake levels of different foods and nutrients. This correlation coefficient can also be referred to appropriately as the ‘validity coefficient’,25 as its square indicates what proportion of the observed variation in questionnaire measurements reflects between-individual variation in true habitual intake levels, and which part is uncorrelated with true intake levels and should thus be considered due to ‘random’ measurement errors. Figure 1 shows the basic design of the validity studies for France, Germany, Greece, Italy, the Netherlands and Spain. Questionnaire measurements were obtained at the beginning and end of a 1-year period. During the interval between the two questionnaire measurements, monthly 24-hour recalls were obtained, as far as possible covering all the days of the week equally. In addition, it was planned to collect three blood samples and three 24-hour urines from every participant at intervals of several months; in most countries, at least two blood samples and urine collections were actually obtained. In the UK, Denmark, and Sweden, which joined the EPIC project after the start of their preliminary validity studies, the designs of the validity studies were different (see detailed descriptions elsewhere in this journal).12,13,15

OVERALL EVALUATION OF RESULTS

Month

Questionnaire

24 hr diet recall





• • • • • • • • • • • •

2

12

1 2 3 4 5 6 7 8 9 10 11 12

Total planned no. of measurements or samples

24 hr urine

Blood sample

• • •

• • •

4

2

FIGURE 1 Design of preliminary studies of validity of dietary questionnaire measurements

As discussed further in another paper in this supplement,26 the estimation of the correlation coefficient ρQT relies on forms of statistical modelling in which the individuals’ true intake levels (T) must be considered as values of a ‘latent variable’, because error-free measurements of the true intake level are not available. A crucial assumption in this type of modelling is that, for a minimum number of measurements taken on the same individual, random measurement errors must be uncorrelated; that is, conditional on true intake level, the measurements must be statistically independent. Briefly, this assumption means that correlations between the measurements must be due solely to the fact that the measurements are associated with the same latent variable, defined here by the individuals’ habitual true intake levels of nutrients and foods. Given the basic design of the EPIC validity studies, the simplest approach to estimate the validity coefficient ρQT was to take the square root of the observed sample correlation coefficient ρQ1,Q2, between the first and the second questionnaire measurements.25,27 However, this estimate can be interpreted as unbiased only if the following two assumptions hold: (a) on both occasions, the questionnaire measurements have identical correlations with true intake levels (i.e. the measurements can be called ‘parallel’;25 and (b) all variations in the measurements that are uncorrelated with true intake level (i.e. ‘random’ measurement errors) are statistically independent (uncorrelated) between the first and second questionnaire measurements.

S29

In practice, these two assumptions may not hold true. In particular, the second assumption—independence of random errors—may be violated: in reality, it is not unlikely that individuals whose first questionnaire measurement (Q1) was an over- or underestimate of their true intake level will tend to make an error of similar sign and magnitude when asked to fill in the same questionnaire one year later. Thus, a high correlation ρQ1,Q2 between the questionnaire measurements Q1, Q2 may only partially reflect the fact that each measurement is associated with true intake level, and partially may also be due to the fact that individuals tend to repeat their specific errors. The square root of the correlation ρQ1,Q2 may thus generally result in an overestimate of the correlation ρQT, and therefore should rather be interpreted as an estimated upper limit of this validity coefficient. Estimates of these upper limits from the EPIC pilot-phase studies (i.e. the square roots of the reproducibility coefficients ρQ1,Q2) are shown in Table 1, for total energy intake and for energy-adjusted nutrient intakes. A second approach to estimate the validity coefficients for different foods and nutrients was to (i) calculate the correlation between questionnaire measurements (e.g. for the first measurement, Q1) with the individuals’ averages of their 12 replicate 24-hour diet recalls; and (ii) correct these ‘crude’ correlation coefficients for attenuation effects due to residual random error in the individuals’ average 12 24-hour diet recalls, using a univariate analysis of variance to estimate within- and between-subject components of variance in the average 24-hour diet recalls.28 Following this second approach, it was assumed that sources of error in the questionnaire measurements and 24-hour diet recalls—two methods of a different nature—would be diverse enough for the errors to be statistically independent. As in the first analytical approach (i.e. based on the reproducibility of questionnaire measurements), a violation of this assumption in the form of a positive covariance between the errors of both types of measurement would tend to result in an overestimation of the validity coefficient. Such a violation seems less likely than in the first analytical approach, however, given the more probable independence of the sources of error in the two types of measurement. An additional assumption was that variations in the 24hour diet recalls uncorrelated with the individuals habitual, long-term intake level (i.e. random errors) were due mainly to variations in true intake levels, and that these were uncorrelated at 1-month intervals. A violation of this second assumption in the form of a positive covariance between the random errors of replicate 24-hour recalls would result in an underestimation of

Fc 105 F 49 Md 55 F 38 M 42 F 143 M 46 F 58 M 63 F 35 M 31

115 49 55 38 42 150 47 58 63 33 31

0.82 0.75 0.79 0.78 0.79 0.78 0.77 0.87* 0.89* 0.81 0.88

0.46 0.29 0.26 0.50* 0.66* 0.35* 0.39* 0.62* 0.77* 0.79 0.82

0.83 0.97 0.91 0.82 0.87 0.89 0.89 0.97* 0.94* 0.86 0.89

0.71* 0.91 0.89 0.58* 0.90* 0.80* 0.76* 0.87* 0.85* 0.77 0.86

√ρQ1,Q2 Val

Alcohol

0.79 0.59 0.81 0.54 0.73 0.47 0.75 0.44* 0.85 0.56* 0.81 0.48* 0.46 0.35* 0.84* 0.67* 0.85* 0.71* 0.67 0.51 0.53 0.58

√ρQ1,Q2 Val

Protein

0.75 0.62 0.66 0.75 0.62 0.70 0.64 0.89* 0.80* 0.63 0.71

0.52 0.35 0.39 0.26 0.09 0.41* 0.31* 0.63* 0.61* 0.87 0.45

√ρQ1,Q2 Val

Fat

0.81 0.71 0.75 0.62 0.65 0.76 0.89 0.94* 0.85* 0.70 0.87

0.77 0.52 0.70 0.20 0.36 0.54* 0.52* 0.76* 0.74* 0.85 0.76

√ρQ1,Q2 Val

Carbohydrates

0.84 0.79 0.81 0.85 0.81 0.79 0.82 0.87* 0.85* 0.81 0.89

0.69 0.49 0.73 0.19* 0.41* 0.51* 0.51* 0.74* 0.61* 0.68 0.83

√ρQ1,Q2 Val

Dietary fibre

0.79 0.81 0.62 0.55 0.82 0.75 0.73 0.77* 0.82* 0.54 0.75

0.29* 0.26 0.45 0.17* 0.24* 0.46* 0.37* 0.62* 0.29* 0.40 0.34

√ρQ1,Q2 Val

Retinol

Energy-adjusted nutrient residuals

F F M F M

– – – 60 60

127* 58* 59* 55 44

– – – 0.85 0.82

0.52*+ – 0.23*+ – 0.40* – 0.57 0.92 0.62 0.87 0.90* – – 0.87 0.80

– – – 0.71 0.73

0.66* 0.26* 0.52* 0.43 0.61

– – – 0.79 0.69

0.63* 0.48* 0.67* 0.64 0.54

– – – 0.78 0.75

0.68* 0.47* 0.40* 0.73 0.65

– – – 0.84 0.81

0.72* 0.53* 0.39* 0.63 0.76

– – – 0.84 0.79

0.56* 0.45* 0.27* 0.41 0.36

– – – 0.75 0.75

0..81 0.76 0.66 0.48 0.86 0.80 0.82 0.79* 0.87* 0.76 0.70

0.55* – – 0.54 0.36

0.81 0.59 0.94 0.16* 0.31* 0.42* 0.47* 0.31* 0.32* 0.73 0.38

√ρQ1,Q2 Val

β-carotene

– – – 0.85 0.83

0.87 0.70 0.71 0.82 0.66 0.83 0.81 0.84* 0.87* 0.89 0.75

0.67* 0.51* 0.64* 0.70 0.65

0.69* 0.66 0.48 0.33* 0.34* 0.49* 0.44* 0.71* 0.43* 0.91 0.72

√ρQ1,Q2 Val

Vitamin C

– – – 0.85 0.88

0.64 0.72 0.79 – – 0.79 0.77 0.79* 0.80* 0.59 0.73

– 0.39* 0.45* 0.81 0.48

0.42* 0.43 0.29 – – 0.38* 0.48* 0.41* 0.58* 0.55 0.57

√ρQ1,Q2 Val

Vitamin E

All variables were log-transformed before analysis. a Square root of the Pearson coefficient of correlation between two repeat questionnaire measurements, taken 1 year apart; this square root gives an estimated upper limit of the ‘validity’ coefficient. b ‘Validity’ coefficients. Pearson coefficient of correlation between questionnaire measurements and reference measurements (24-hour recalls or food consumption records), corrected for attenuation due to random error in the reference measurements. c Females. d Males. e Results presented are for the ‘Oxford’ version of the questionnaire. (See 13) f Results presented are for the B version of the questionnaire. (See 12) * Values found in more detailed publications (in this supplement, or elsewhere); all values without asterisk were (re)computed at the IARC, from the original data, for uniformity of statistical analysis and presentation. Description of development and structure of the questionnaires, and of the design of the preliminary validity studies have also been given in these country-specific publications. + Spearman correlation.

Swedenf

UKe Denmark

B) Countries where the pilot phase studies had been initiated before joining the EPIC project, and were based on different designs

Netherlands Spain

Italy

Greece

France Germany

Energy

√ρQ1,Q2a Valb √ρQ1,Q2 Val

No. of subjects

A) Countries where the EPIC pilot phase studies have been planned jointly according to a standard protocol

TABLE 1 Reproducibility and estimated validity coefficients of dietary questionnaire measurements of habitual nutrient composition of diet, in the nine countries participating in the EPIC project

S30 INTERNATIONAL JOURNAL OF EPIDEMIOLOGY

OVERALL EVALUATION OF RESULTS

the attenuation effect and therefore to an underestimation of the validity coefficient ρQT. Estimates of the validity coefficients obtained by the second analytical approach are also shown in Table 1. For most nutrients and in most study centres the estimated validity coefficients were indeed lower than their estimated upper limits. Nevertheless, in a number of cases the estimated validity coefficient exceeded its estimated upper limit. The latter was quite substantially the case for β-carotene in German men (0.94 versus 0.66)—and in this particular case the high value for the validity coefficient contrasts also sharply with the lower values found in other centres—and for carbohydrates (0.85 versus 0.70) and fats (0.87 versus 0.63) in Spanish women. One explanation for these somewhat paradoxical observations may be simply a lack of precision of the estimated validity coefficient (as estimated by the second analytical approach), and of its estimated upper limit (as estimated by the first approach). In theory, this explanation is more likely to hold in situations where the errors of repeat questionnaire measurements (Q1, Q2) are either truly independent or the variance of these errors is relatively small, or both; in these situations, the estimated validity coefficient and its estimated upper limit are expected to be the same, and may therefore be so close as to be inverted easily as a result of random fluctuations in the estimates. The measurements of alcohol intake, with apparently rather small random errors given the high estimated correlations ρQ1,Q2 between the first and second questionnaire measurements, and given the high estimated validity coefficients ρQ,T from the (corrected) correlations with average 24-hour diet recalls (Table 1), seem to illustrate this case. A second possible explanation for the inversion between the estimated validity coefficient and its estimated upper limit is that one or more model assumptions were violated. For instance, the validity coefficients may have been overestimated due to a positive correlation between the random errors of the questionnaire measurements and 24-hour diet recalls. The fact that four out of seven situations where the estimated validity coefficient exceeded its estimated upper limit were observed in the Spanish validity study (for protein in men, and for carbohydrates, fat, and vitamin C in women) suggests that some violations of model assumptions (e.g. some correlation of errors between 24-hour diet recalls and the diet history interviews) may have occurred, perhaps more systematically in Spain than in the other participating centres. A third approach to evaluate the validity of dietary questionnaire measurements was to compare these measurements with biological markers of diet. Three biomarkers measured in blood serum were vitamin C,

S31

vitamin E and β-carotene. Although these markers cannot be taken simply as a substitute measurement of the daily intake level of these several compounds,29 they do provide a correlate of the intake level.29,30 Another marker was the 24-hour urinary nitrogen excretion which, rather exceptionally in the field of biomarkers of diet, can be translated into a measurement of absolute daily intake levels of protein.31,32 Table 2 shows the partial correlations between the various markers and measurements of nutrient intakes obtained at start, adjusted for the questionnaire’s measurement of total energy intake. To account for the fact that plasma levels of vitamin E and β-carotene tend to be higher when plasma levels of lipids are high,32 the correlations between these markers and intake levels were also adjusted for concentrations of total cholesterol, as an indicator of the plasma lipid levels. Overall, the correlations between questionnaire measurements and biomarkers were rather weak, with very few values greater than 0.50. This relative weakness of correlations between questionnaire measurements and biomarkers has also been observed by other researchers (reviews by Willett21 and van ‘t Veer et al.33). It may be explained partially by factors related to the absorption, postabsorptive metabolism, or even physiological regulation of nutrient levels, which can be important determinants of variations in the marker independently of the true nutrient intake level; this may be the case especially for markers that were based on the plasma concentration of a specific compound, i.e. vitamin C, vitamin E, and β-carotene. If the effects of such intervening factors vary systematically between individuals, this will affect the validity of the marker as a measurement (or correlate) of dietary intake level, and correlations with questionnaire measurements may remain weak even if the average of several replicate biomarkers measurements is taken for each individual. Therefore, even when adjustments are made for the attenuation due to random (day-to-day) variations in the marker, the correlation between questionnaire measurements and biomarkers often can be interpreted only as an estimated lower limit for the validity coefficient ρQT. For food groups, only unadjusted Spearman coefficients of correlation between questionnaire measurements and the individuals’ average 24-hour recalls were computed (Table 3). For all countries and both sexes combined, mean correlation coefficients ranged from 0.37 for fish to 0.79 for alcoholic beverages. The main reasons why adjustment was not made for total energy intake and/or attenuation effects, as was done in the analyses for nutrients, were of a technical nature; population distributions of the consumption levels of individual food items, but also of larger categories of foods, cannot

S32

INTERNATIONAL JOURNAL OF EPIDEMIOLOGY

TABLE 2 Pearson’s partial correlation coefficients between biomarkers and measured nutrient intakes from questionnaire 1, adjusted for total energy intake, or for total energy intake plus blood level of cholesterol Dietary proteina and urinary nitrogen

Dietary vitamin Ca and serum ascorbic acid

ρ(Q1,M)c

ρ(RM,M)d

ρ(Q1,M)

ρ(RM,M)

β-caroteneb in diet and serum

Dietary vitamin Eb and serum α-tocopherol

ρ(Q1,M)

ρ(RM,M)

ρ(Q1,M)

ρ(RM,M)

France

Women

0.37 (78)

0.58 (82)

0.45 (92)

0.49 (94)

0.27 (91)

0.36 (93)

0.28 (72)

0.14 (73)

Germany

Women

0.36 (55) 0.24 (48)

0.47 (55) 0.32 (48)

0.15 (55) 0.32 (49)

0.34 (55) 0.38 (49)

0.27 (55) 0.48 (49)

0.27 (55) 0.33 (49)

0.04 (55) 0.19 (49)

–0.01 (55) 0.36 (49)

0.30 (38) 0.41 (42)

0.53 (38) 0.57 (42)

0.04 (38) 0.50 (41)

0.50 (38) 0.43 (41)

–0.25 (38) 0.37 (41)

–0.01 (38) 0.39 (41)









0.31 (155) 0.19 (56)

0.32 (150) 0.60 (48)

0.29 (93) –

0.39 (93) –

0.27 (154) 0.34 (49)

0.25 (148) 0.25 (47)

0.18 (154) 0.13 (49)

0.23 (148) –0.08 (47)

0.54 (61) 0.40 (68)

0.51 (61) 0.47 (68)









0.18 (61) –0.13 (68)

0.22 (61) 0.20 (68)

0.11 (61) 0.29 (68)

0.31 (61) 0.47 (68)

0.45 (29) 0.12 (24)

0.21 (38) 0.55 (35)

0.65 (29) –0.03 (19)

0.62 (39) 0.44 (32)

0.38 (29) 0.04 (18)

0.48 (40) 0.42 (31)

0.44 (29) 0.31 (18)

0.36 (40) 0.34 (31)

0.07 (12) 0.27 (13)

0.53 (12) 0.78 (13)

0.55 (53) 0.47 (44)

0.55 (53) 0.51 (44)

0.53* (52) 0.25* (42)

0.46* (52) 0.15* (42)

0.12* (53) 0.11* (44)

0.02* (53) 0.05* (44)

Men Greece

Women Men

Italy

Women Men

Netherlands

Women Men

Spain

Women Men

Sweden

Women Men

All variables were log-transformed before analysis. All values were computed at the IARC, from original data. * Partial correlation with energy only (no adjustment for blood cholesterol). a Partial correlation adjusted for energy. b Partial correlation adjusted for energy and cholesterol. c Correlation between questionnaire and biomarker. d Correlation between Reference Method and biomarker. e Figures in brackets show number of individuals on which the analysis was based.

be transformed easily to normality because they can be considered as a mixture of two different types of distribution: (i) a binomial distribution, indicating the probability that a subject does or does not consume a given type of food, and (ii) a right-skewed (e.g. approximately log normal) distribution of non-zero values (i.e. for consumers only). The statistical approaches generally used to estimate de-attenuated correlation or regression coefficients in dietary validity studies may provide biased results when applied to measured intake values with non-normal population distributions.

DISCUSSION AND CONCLUSIONS This paper summarizes the main results from the dietary validity studies conducted during the EPIC pilot phase. The principal aim of these preliminary validity studies was to develop optimal instruments—in most countries a self-administered or interview-administered questionnaire—for assessing individuals’ habitual intakes of foods and nutrients. The main objective was to develop an instrument that classifies individuals as accurately as possible by high or low intake levels; that is, dietary intake measurements obtained by these

S33

OVERALL EVALUATION OF RESULTS

TABLE 3 Spearman correlation coefficients for estimated food intakes from questionnaire measurements at start, and from the average of 12 24-hour diet recalls Food groups

Vegetables Potatoes Fruits Dairy products Cereals Meat Fish Eggs Fats Sugar and sweets Cakes Alcoholic beverages

France

Germany

Italy

Netherlands

Spain

Sweden

Women

Men

Women

Men

Women

Men

Women

Men

Women

Men

Women

0.54 0.52* 0.44* 0.67* 0.56* 0.43* 0.39* 0.40* 0.58* 0.62* 0.43* 0.71*

0.39 0.50 0.33 0.54 0.39 0.63 0.41 0.40 0.14 0.55 0.48 0.63

0.53 0.40 0.45 0.46 0.19 0.41 0.22 0.38 0.38 0.52 0.54 0.82

0.30 0.40* 0.56* 0.78 0.44 0.39* 0.42* 0.50* 0.34* 0.51* 0.57* 0.79*

0.49 0.24* 0.39* 0.66 0.51 0.38* 0.47* 0.23* 0.30* 0.26* 0.34* 0.83*

0.38* 0.58* 0.68* 0.73 0.75 0.47* 0.32* 0.41* 0.65* 0.78* 0.56* 0.74*

0.31* 0.70* 0.56* 0.77 0.68 0.70* 0.37* 0.43* 0.67* 0.69* 0.45* 0.87*

0.60 0.38 0.79 0.77 0.72 0.44 0.42 0.36 0.75 0.59 0.12 0.90

0.55 0.37 0.67 0.74 0.69 0.59 0.55 0.49 0.73 0.63 0.47 0.82

0.42 0.64 0.72 0.70 0.63 0.57 0.23 0.50 0.38 0.66 0.63 0.82

0.49 0.79 0.62 0.67 0.53 0.42 0.24 0.64 0.44 0.52 0.58 0.80

* Value found in EPIC publications elsewhere in this Supplement.

instruments should have the highest possible correlation with true habitual intake levels (highest possible validity coefficients). There is, however, no single, clearly defined cutoff point beyond which the correlation between questionnaire measurements and true intake levels of a given dietary component can be considered either as acceptable or as insufficient. As discussed in our accompanying paper on methodological aspects of validation and calibration,26 the main effect of a decreasing validity coefficient for a given type of dietary intake factor (e.g. saturated fat) is that the betweensubject variation in true dietary intake levels predicted by the questionnaire measurements shrinks. Consequently, the statistical power of the study to demonstrate the presence of a specific type of dietdisease association is reduced. Thus, a deterioration of the validity coefficient(s) can be seen primarily as a problem of study efficiency. An additional aspect is that the simultaneous reduction of the validity coefficients for different types of food or nutrients will drastically reduce the accuracy with which an excess disease risk can be attributed to specific dietary intake factors: the greater the variances of random measurement errors, the greater may be the covariances between the random errors of different food items or nutrients. Thus, with large random error variances, there is a higher probability that the measured intake level of a given food item or nutrient does not represent the true intake level of that item with high specificity. The question of whether a given type of measurement reflects specifically what was intended can be formulated as a problem of multivariate validation,

but this has so far received relatively little attention in the literature. Overall, the estimated correlations from the EPIC pilot phase studies were of a similar magnitude to those in previous validity studies conducted by independent research groups and, from this perspective, the dietary questionnaires developed for the EPIC project can be considered acceptable. Clearly, the intake measured with highest accuracy was alcohol, both according to the reproducibility of the questionnaire measurements and to the de-attenuated coefficient of correlation between questionnaire measurements and the average of 24-hour diet recalls. For total energy intake and macronutrients (energy-adjusted), the estimated validity coefficients tended to be somewhat weaker in Germany, Greece and Italy than in the Netherlands, Spain and Sweden. As in most previous studies, estimated correlations between measured and true dietary intake levels (the estimated validity coefficients, ρQT) generally were no higher than about 0.5–0.7, which implies that one-half to three-quarters of the total variation in questionnaire measurements is not actually related to true intake level, and must be the result of random measurement errors. Weaknesses in questionnaires were spotted by simple descriptive analyses, indicating that questions had been misread or worded ambiguously, that questions about some frequently consumed food items had been omitted from the questionnaire, that the range of pre-coded values for frequency of consumption could be either simplified or should be extended, or that pictures of food portion sizes should be added or withdrawn from the questionnaire. Comparisons between mean intake

S34

INTERNATIONAL JOURNAL OF EPIDEMIOLOGY

measurements from the dietary questionnaire and from 24-hour recalls (given in more detail in papers) also provided useful indications, for example, to see whether certain frequently consumed foods were missing from the questionnaire, or whether intake levels were overor underestimated to very different degrees for different main categories of foods. In a number of cases, specific weaknesses were found by estimation of the correlation ρQT, through comparison with the 24-hour recalls and/ or biochemical markers. For instance, an unacceptably low estimated validity coefficient indicated that, among Italian men, the questionnaire had been unsatisfactory for measuring total fat intake. In all instances where specific weaknesses were identified modifications were made to the questionnaires. It was not, however, considered worthwhile to undertake new validity studies to evaluate whether modifications made to the dietary questionnaires had in fact improved the accuracy of their measurements; a new validity would have required considerable additional investments of time and money, which were not expected to be outweighed by important further improvements of the questionnaires. The work to improve dietary assessment methods which has been undertaken over the past 30 years seems to indicate that there is a natural upper limit to the degree of accuracy with which individuals’ habitual dietary intake levels can be measured because of such factors as the subjects’ capacity to recall (using FFQ) or the degree of co-operation that can be obtained without disturbing their normal daily habits (using food diaries). This implies that ‘calibration’ and ‘validation’ substudies will generally be needed to evaluate what is the magnitude of variation in true intake levels of food and nutrients predicted accurately by the questionnaire measurements, and what proportion of the total between-subject variation in true intake levels this predicted variation represents.26 Following a calibration approach, adjustments can be made to relative risk estimates for quantitative differences expressed on a continuous scale. Using a validation approach (i.e. any approach that allows separate estimation of the validity coefficient ρQT), corrections can also be made for attenuation bias in relative risk estimates for different quantiles of the intake distribution of a given food or nutrient, or for estimates of population attributable risk.25,26 In multicentre investigations, the data from calibration substudies can also be used to translate questionnaire measurements into absolute, predicted intake levels on a more uniform scale with better standardization between centres (i.e. the questionnaire measurements can be adjusted for between-centre differences in systematic over- or underestimation at group level). 34,35 The validity studies of the EPIC pilot phase, however, can serve

these purposes to only a limited extent for the following reasons. Firstly, the preliminary validity studies of the pilot phase were not conducted among random (representative) subsamples of each cohort. These studies were undertaken before recruitment of the main cohorts had started, and even before the source populations for the main study had been defined precisely. In fact, for practical reasons, in some countries the preliminary validity studies had to be conducted in relatively select groups of subjects. For instance, participants in the French validity study were all women nurses in a hospital in Villejuif (Paris), whereas the main cohort study includes women teachers living all over the country. Given this potential lack of representativeness, the quantitative relations between questionnaire measurements and true intake level may not be the same in the validity studies as in the main study cohort, either because the average magnitude of errors is different, or because the between-subject variation in true, habitual dietary intake levels is not identical in the validity study and in the main study cohort. Secondly, the use of the validity studies to improve between-centre comparability of dietary intake measurements by calibration requires very careful standardization of reference measurements between study centres; that is, the reference method should preferably provide fully unbiased measurements at a group level, or at least any biases that occur should be expected to be small and of very similar magnitude in all centres. Standardization of reference measurements had not yet been achieved when the preliminary validity studies were started. In fact, one of the reasons for choosing 24-hour recall as a reference method in the validity studies was that its use was already planned for subsequent calibration studies on random subsamples of cohort members, and that several of the participating study centres needed to gain experience with this method. The 24-hour recalls obtained during the pilot phase provided a wealth of information on how much detail the interviewed subjects can provide to describe foods, on the coding of foods and complex dishes, and on the use of food pictures and other visual aids in the quantification of portion sizes. Small, parallel substudies were conducted to evaluate the accuracy of food portion size estimation using photographs36,37 for the 24-hour recalls. The various types of information obtained from these pilot studies was used to construct the internal databases of a special computer programme (EPIC-SOFT), that was developed to help standardize the structure of 24-hour diet recall interviews and the desirable level of detail of answers given by the subject (e.g. by automatic probing questions, by providing

OVERALL EVALUATION OF RESULTS

a quick reference during the interview to detailed databases with recipes of complex dishes, by fixing the type of visual aid used for estimating portion sizes of specific categories of food or beverages). This standardized 24-hour diet recall method was developed for use in subsequent calibration studies on more representative, random subsamples of subjects actually recruited to the main study. The data from these calibration studies (for which data collection is currently ongoing) will be used to evaluate a posteriori which magnitude of differences in intake level of different foods or food constituents has actually been distinguished (predicted) within individual study cohorts, and to adjust relative risk estimates for biases due to errors in dietary intake measurements obtained by the main study method (questionnaires). Furthermore, the 24-hour diet recalls will be used to improve the comparability of the dietary questionnaire measurements between study cohorts. The between-cohort standardization may help reduce between-country heterogeneity in relative risk estimates due to bias resulting from dietary assessment errors.34,35 In addition, between-cohort standardization may also allow associations between diet and disease risk to be estimated with greater precision through a between-cohort (‘ecological’) comparison of mean dietary intake levels and mean disease incidence rates.34,35 This ‘ecological’ comparison will capture the increase in variation in both dietary exposure levels and disease risk obtained by combining studies in geographically and culturally different areas. It is hoped that this overall increase in variation in dietary exposure levels, when measured accurately, will lead to a substantial increase in the statistical power of the EPIC project, as compared to individual cohort studies that have, so far, generally been conducted in single, homogeneous populations.

ACKNOWLEDGEMENTS The authors wish to thank Michel Miginiac, Catherine Gros and Bertrand Hémon for their assistance in data handling and statistical analysis. The help of several EPIC collaborators to fill in some gaps in the tables presented is also gratefully acknowledged. The EPIC project is supported by the Europe Against Cancer Programme of the European Union, as well as by national research funds.

REFERENCES 1

Riboli E, Kaaks R. The EPIC project: Rationale and study design. Int J Epidemiol 1997; 26 (Suppl. 1): S6–S14.

2

S35

van Liere M J, Lucas F, Clavel F, Slimani N, Villeminot S. Relative validity and reproducibility of a French dietary history questionnaire. Int J Epidemiol 1997; 26 (Suppl. 1): S128–S136. 3 Bohlscheid-Thomas S, Hoting I, Boeing H, Wahrendorf J. Reproducibility and relative validity of food group intake in a food frequency questionnaire developed for the German part of the EPIC project. Int J Epidemiol 1997; 26 (Suppl. 1): S59–S70. 4 Bohlscheid-Thomas S, Hoting I, Boeing H. Wahrendorf J. Reproducibility and relative validity of energy and macro nutrient intake of a food frequency questionnaire developed for the German part of the EPIC project. Int J Epidemiol 1997; 26 (Suppl. 1): S71–S81. 5 Katsouyanni K, Rimm E B, Gnardellis C, Trichopoulos D, Polychronopoulos E, Trichopoulou A. Reproducibility and relative validity of an extensive semi-quantitative food frequency questionnaire using dietary records and biochemical markers among Greek school teachers. Int J Epidemiol 1997; 26 (Suppl. 1): S118–S127. 6 Pisani P, Faggiano F, Krogh V, Palli D, Vineis P, Berrino F. Relative validity and reproducibility of a food frequency dietary questionnaire for use in the Italian EPIC centres. Int J Epidemiol 1997; 26 (Suppl. 1): S152–S160. 7 Ocké M C, Bueno-de-Mesquita H B, Goddijn H E et al. The Dutch EPIC food frequency questionnaire. I. Description of the questionnaire and relative validity and reproducibility for food groups. Int J Epidemiol 1997; 26 (Suppl. 1): S37–S48. 8 Ocké C, Bueno-de-Mesquita H B, Pols M A et al. The Dutch EPIC food frequency questionnaire. II. Relative validity and reproducibility for nutrients. Int J Epidemiol 1997; 26 (Suppl. 1): S49–S58. 9 The EPIC Group of Spain. Relative validity and reproducibility of a diet history questionnaire in Spain. I. Foods. Int J Epidemiol 1997; 26 (Suppl. 1): S91–S99. 10 The EPIC Group of Spain. Relative validity and reproducibility of a diet history questionnaire in Spain. II. Nutrients. Int J Epidemiol 1997; 26 (Suppl. 1): S100–S109. 11 The EPIC Group of Spain. Relative validity and reproducibility of a diet history questionnaire in Spain. III. Biochemical markers. Int J Epidemiol 1997; 26 (Suppl. 1): S110–S117. 12 Riboli S, Elmståhl S, Saracci R, Gullberg B, Lindgärde F. The Malmö food study: Validity of two dietary assessment methods for measuring nutrient intake. Int J Epidemiol 1997; 26 (Suppl. 1): S161–S173. 13 Bingham S A, Gill C, Welch A et al. Validation of dietary assessment methods in the UK arm of EPIC using weighed records and 24-h urinary nitrogen and potassium and serum carotenoids as biomarkers. Int J Epidemiol 1997; 26 (Suppl. 1): S137–S151. 14 Overvad K, Tjønneland A, Haraldsdóttir J, Ewertz M, MøllerJensen O. Development of a semi-quantitative food frequency questionnaire to assess food, energy and nutrient intake in Denmark. Int J Epidemiol 1991; 20: 900–05. 15 Tjønneland A, Overvad K, Haraldsdóttir J, Bang S, Ewertz M, Møller-Jensen O. Validation of a semi-quantitative food frequency questionnaire developed in Denmark. Int J Epidemiol 1991; 20: 906–12. 16 Steinmetz K A, Potter J D. Vegetables, fruit, and cancer. II. Mechanisms. Cancer Causes Control 1991; 2: 427–42. 17 Jenkins D A, Wolever T M S, Taylor R H et al. Glycemic index of foods: a physiological basis for carbohydrate exchange. Am J Clin Nutr 1981; 34: 362–66.

S36 18

INTERNATIONAL JOURNAL OF EPIDEMIOLOGY

Willett W C, Stampfer M J. Total energy intake: implications for epidemiologic analyses. Am J Epidemiol 1986; 124: 17–27. 19 Hertog M G, Feskens E J, Hollman P C, Katan M B, Kromhout D. Dietary antioxidant flavonoids and risk of coronary heart disease: the Zutphen Elderly Study. Lancet 1993; 342: 1007–11. 20 Abramson J H, Slome C, Kosovsky C. Food frequency interview as an epidemiological tool. Am J Public Health 1963; 53: 1093–101. 21 Willett W. Food frequency methods. In: Willett W. Nutritional Epidemiology. New York: Oxford University Press, 1990, pp. 69–91. 22 Hunter D J, Sampson L, Stampfer M J, Colditz G A, Rosner B, Willett W C. Variability in portion sizes of commonly consumed foods among a population of women in the United States. Am J Epidemiol 1988; 127: 1240–49. 23 Haraldsdóttir J, Tjønneland A, Overvad K. Validity of individual portion size estimates in a food frequency questionnaire. Int J Epidemiol 1994; 23: 787–96. 24 Tjønneland A, Haraldsdøttir J, Overvad K, Stripp C, Ewertz M, Jensen O M. Influence of individually estimated portion size data on the validity of a semiquantitative food frequency questionnaire. Int J Epidemiol 1992; 21: 770–77. 25 Armstrong B, White E, Saracci R. Principles of Exposure Measurement in Epidemiology. Chapter 4: Validity and reliability studies. Oxford: Oxford Medical Publications, 1992, pp. 78–114. 26 Kaaks R, Riboli E. Validation and calibration of dietary intake measurements in the EPIC project: Methodological considerations. Int J Epidemiol 1997; 26 (Suppl. 1): S15–S25.

27

Marshall J R. The reliability and validity of dietary data as used in epidemiology. Cancer Surv 1987; 6: 673–83. Rosner B, Willett W C. Interval estimates for correlation coefficients corrected for within-person variation: implications for study design and hypothesis testing. Am J Epidemiol 1988; 127: 377–88. 29 Kaaks R. Biochemical markers as an additional measurement in studies on the accuracy of dietary questionnaire measurements. Am J Clin Nutr 1996 (In Press). 30 Willett W. Reproducibility and validity of food frequency questionnaires. In: Willett W. Nutritional Epidemiology. New York: Oxford University Press, 1990, pp. 92–126. 31 Riboli E, Rönnholm H, Saracci R. Biological markers of diet. Cancer Surv 1987; 6: 686–718. 32 Hunter D. Biochemical indicators of dietary intake. In: Willett W. Nutritional Epidemiology. New York: Oxford University Press, 1990, pp. 143–216. 33 van ‘t Veer P, Kardinaal A F M, Bausch-Goldbohm R A, Kok F J. Biomarkers for validation. Eur J Clin Nutr 1993; 47: S58–63. 34 Kaaks R, Plummer M, Riboli E, Estève J, van Staveren W A. Adjustment for bias due to errors in exposure assessments in multi-center cohort studies on diet and cancer: a calibration approach. Am J Clin Nutr 1994; 49: 254S–50S. 35 Plummer M, Clayton D, Kaaks R. Calibration in multicentre cohort studies. Int J Epidemiol 1994; 23: 419–26. 36 Faggiano F, Vineis P, Cravanzola D et al. Validation of a method for the estimation of food portion size. Epidemiology 1992; 3: 379–82. 37 Lucas F, Niravong M, Villeminot S, Kaaks R, Clavel-Chapelon F. Estimation of food portion size using photographs: validity, strengths, weaknesses and recommendations. J Hum Nutr Dietetics 1995; 8: 65–74. 28