A systematic method to evaluate the dietary intake ...

4 downloads 0 Views 348KB Size Report
Foods and portions sizes were selected from the drop-down list generated from the ... dishes were isolated into complete (meat, free vegetable and cereals/.
Journal of Food Composition and Analysis xxx (xxxx) xxx–xxx

Contents lists available at ScienceDirect

Journal of Food Composition and Analysis journal homepage: www.elsevier.com/locate/jfca

Original research article

A systematic method to evaluate the dietary intake data coding process used in the research setting☆ ⁎

Vivienne Guana, , Yasmine Probsta,b, Elizabeth Nealea,b, Allison Martina,b, Linda Tapsella,b a b

School of Medicine, Faculty of Science, Medicine and Health, University of Wollongong, Australia Illawarra Health and Medical Research Institute, University of Wollongong, Australia

A R T I C L E I N F O

A B S T R A C T

Keywords: Food analysis Food composition Dietary data quality Dietary data coding Dietary data discrepancy Source data verification Diet history Food-based clinical trial

Accurate dietary intake data are the basis for investigating diet-disease relationships. Data coding is a critical step of generating dietary intake data for analyses in nutrition research. However, there is currently no systematic method for assessing dietary intake data coding process. The aim of this study was to explore discrepancies in dietary intake data coding process through source data verification. A 1% random sample of paperbased diet history records (source data) from participants (n = 377) in a registered clinical trial was extracted as a pilot audit to explore potential discrepancy types. Another 10% random sample (n = 38) of baseline dietary source data from the same trial was extracted developing the method. All items listed in the source data underwent a 100% manual verification check with food output data from FoodWorks software applied to the piloted discrepancy types. The identified discrepancies were categorized into food groups based on modified major groups of AUSNUT 2011–13. Free vegetables, meat, savory sauces and condiments, as well as cereals were found to be more prone to coding discrepancies than other food groups. A more detailed dietary intake data coding protocol is required prior to dietary data collection and coding process to ensure data coding quality.

1. Introduction Dietary intake, which is used to describe food intakes at an individual level to reflect an individual’s eating habits and behaviors (Thompson and Subar, 2013), is an important behavioral risk factor that can be targeted to improve health (Rayner and Scarborough, 2005; Reedy et al., 2014). Indirect and direct evidence for the effects of dietary strategies on the management of diseases has been provided by a large number of epidemiological studies and clinical trials (Jacobs, 2011; Satija et al., 2015). Assessing dietary intake is not only the first step to investigate the relationship between dietary intake and disease, but it also assists in the development of dietary policies and guidelines to reduce the disease burden (Ezzati and Riboli, 2013). Although the existing body of evidence appears to be focused on investigating the associations between dietary intake and health outcomes, high quality dietary intake data are fundamental for elucidating the relationship between diets and diseases. Dietary intake data should be collected using validated dietary assessment methods, which include 24-h recalls, food records and diet history interviews. The most appropriate method used to assess and estimate intake varies according to study design, resource availability,

the level of dietary intake detail required, study sample size, and the burden to the study participants (Kirkpatrick et al., 2014). The reported dietary intake data are closely related to the selected dietary assessment method (Faber et al., 2013). For example, a single 24-h recall only assesses dietary intakes from the previous day, therefore the generated data may provide only a limited range of food items but detailed information about the amount of food consumed (Kirkpatrick et al., 2014). Alternatively, food records are the most commonly used dietary assessment method to monitor actual food intakes in food-based randomized controlled trials (Probst and Zammit, 2016). A single dietary recall or dietary record is unable to reflect usual intake, though multiple dietary recalls or multiple days of food records are required to explore usual dietary intake. On the other hand, the diet history method employs an open-ended interviewer-administrated interview to collect data about an individual’s usual dietary intake over a defined time period (Martin et al., 2003). Consequently, the interviewee is likely to report a wide range of foods reflecting intake variations (Tapsell et al., 2000; Probst and Tapsell, 2007). In the case that dietary intake data are recorded in a paper-based form during data collection, it will often be manually coded into a database through nutrition analysis software supported by food

☆ ⁎

This paper was originally submitted as a poster presentation at the 39th National Nutrient Databank Conference (NNDC) held May 16–18, 2016 in Alexandria, Virginia, USA. Corresponding author. E-mail address: [email protected] (V. Guan).

http://dx.doi.org/10.1016/j.jfca.2017.07.010 Received 1 August 2016; Received in revised form 1 March 2017; Accepted 6 July 2017 0889-1575/ © 2017 Elsevier Inc. All rights reserved.

Please cite this article as: Guan, V., Journal of Food Composition and Analysis (2017), http://dx.doi.org/10.1016/j.jfca.2017.07.010

Journal of Food Composition and Analysis xxx (xxxx) xxx–xxx

V. Guan et al.

2. Materials and methods

composition tables. Manual data coding is a source of discrepancy (Håkansson et al., 2001), where a discrepancy is defined as any difference between the source data and the coded data. The discrepancies are common in clinical research databases despite rigorous quality assurance protocols (Arts et al., 2002; Shelby-James et al., 2007). Importantly, coding dietary intake data into the database is not a simple process. In practice, it involves coding the food item, the quantity of intake (portion size) and the frequency if intake in the available nutrient analysis software to reflect the reported dietary intake recorded in the source documents. For example, if a reported food item or portion size cannot be found in the software, commercial and cultural food knowledge, as well as professional judgment are required to find an appropriate match (Probst and Tapsell, 2007). Additionally, although great efforts have been made to expand and update food composition databases, the needs to code recipes into individual component foods or find alternative foods to those reported are still common to dietary intake data coding (Stadlmayr et al., 2012). This indicates that dietary intake data coding is further complicated by the nature of the dietary data, particularly data derived from an open-ended method such as the diet history interview. This may imply that dietary intake data may be prone to more coding discrepancies than other types of clinical trial data such as age, gender and weight during the data coding process. Exploration of dietary intake data coding discrepancies involves assessing discrepancies related to coding and quantification dietary intake data from source documents to the database to assist in the translation from intake to nutrients for analysis. The International Organization for Standardization (ISO) 9000 defined quality as “the degree to which a set of inherent characteristics fulfills requirements” (ISO9000, 2015). The ISO 9000 series documents offer guidance and tools to improve customers’ satisfaction; however, specific quality requirements and implementation activities for quality assessment are not provided. Quality requirements and implementation activities for quality assessment need to be developed based on needs assessment and characteristics of the specific target, which in this case is dietary intake data. On the other hand, data quality requirements have been proposed in the computer sciences literature (Batini et al., 2009). Generally speaking, data quality is determined using multiple dimensions, which are evaluated by multiple metrics (Batini et al., 2009). Data quality requirements are a trade-off between dimensions to meet the requirements of the study design, types of data available and data user’s practice (Kahn et al., 2012). Batini et al. suggest that accuracy, completeness, consistency and timeliness are the fundamental dimensions to assess data quality (Batini et al., 2009). In addition, Wang and Strong suggest that in addition to accuracy, completeness and timeliness; the reliability and relevance of the data also need to be assessed (Wang and Strong, 1996). Therefore, accuracy, completeness, consistency, timeliness, reliability and relevance are assessed to evaluate dietary intake data quality. Source data verification (SDV) is widely used to examine data quality in clinical trials (Andersen et al., 2015; Olsen et al., 2016). SDV verifies the accuracy of original source data information transcribed to the database (Schuyl and Engel, 1999). Although performing SDV is time consuming, laborious and costly (Tantsyura et al., 2010), it may offer detailed outcome information about dietary coding discrepancies, such as the types, trends and the data points related to the coding process in a given dataset. The process of conducting SDV and outcomes can be used to determine the data quality requirements related to a targeted dataset. The aim of this study was to explore the coding discrepancies in dietary intake data on the basis of food groups through SDV. This exploration may provide a better understanding of existing dietary intake data coding discrepancies, and insights into requirements of assessing dietary intake data quality.

2.1. Dietary intake data collection and coding The basis of this work was diet history interview data collected during baseline assessments of a registered food-based clinical trial. The study design, baseline sample characteristics and dietary intake data collection and coding have been described in detail elsewhere (Tapsell et al., 2015). In brief, dietary intake data were collected by Accredited Practicing Dietitians (APD) using a validated diet history interview reflecting usual weekly food consumption (Martin et al., 2003). The source data included details on the food items consumed, their quantities, frequencies and a forgotten foods checklist, which was recorded on paper-based diet history case report forms (CRFs). Data from CRFs were coded by an APD to FoodWorks Professional nutrient analysis software (Xyris, Springhill QLD, Australia, Version 7, 2007) supported by the Australian Food, Supplement and Nutrient Database for Estimation of Population Nutrient Intakes (AUSNUT) 2007 food composition database (Food Standards Australia New Zealand, 2008). Foods and portions sizes were selected from the drop-down list generated from the software based on food composition and nutrition survey data. To accurately reflect participant reported intake, where appropriate participant reported recipes of dishes and foods without an exact match in the software were created and added to the software. Once the food item was coded into FoodWorks, missed quantities and frequencies of coded food items were flagged. Missing data were then entered. A process of double data checking by a second APD was performed to correct outstanding errors. 2.2. Food-based classification for foods Using the FoodWorks output, all coding discrepancies related to the intake of food items, quantities and frequencies were categorized according to food groups. The categorization system was based on the modified AUSNUT 2011–13 food classification system at the major food group level (Food Standards Australia New Zealand, 2014) (Table 1). To distinguish the specific food items from mixed dishes, the mixed dishes were isolated into complete (meat, free vegetable and cereals/ starchy vegetables, ie. spaghetti bolognaise), incomplete (meat and free vegetable/cereals/starchy vegetables, i.e. chicken stir fry) and vegetarian (free vegetable and cereals/starchy vegetables, i.e. chick pea stew) categories. 2.3. Phase I: development of a dietary intake data coding discrepancy classification system A 1% random sample (n = 4) of paper-based diet history CRFs (source data) from participants (n = 377) in the clinical trial was extracted as a pilot audit to explore dietary intake data coding discrepancy incidences. In order to ensure consistency of the SDV process, the verification process was undertaken by an APD independent of data collection and coding. The data points in both CRFs and the FoodWorks software food output were summarized based on single food items determined by the food groups and values for the quantity and frequency. All items listed on the source data underwent a 100% manual verification check with the food output data from FoodWorks software. Identified dietary intake data coding discrepancies were recorded and categorized, with categories of discrepancy types adapted from the discrepancy definition of the European Organization for the Research and Treatment of Cancer (EORTC) (Vantongelen et al., 1989). These discrepancy types include:

• Code 2 Derivation (minor dietary intake data coding discrepancy

which does not impact on the estimation of the food and nutrient intakes [for example, an average of 2–3 cups of tea per day reported in the CRF, which was coded as 2.5 cups per day])

2

Journal of Food Composition and Analysis xxx (xxxx) xxx–xxx

V. Guan et al.

Table 1 Example of foods in each food group.a Food group code and name

Examples of food items

1 Non-alcoholic beverages 2 Alcoholic beverages 3 Cereals, cereal product and cereal dishes 4 Fruits 5 Free vegetables 6 Starchy vegetables 7 Legumes and pulses 8 Meat 9 Seeds and nuts 10 Milk and milk products 11 Savory sauces and condiments 12 Snack foods 13 Sugar products 14 Confectionery and cereal/nut/fruit/seed bars 15 Fats and oils 16 Dietary supplements 17 Soup 18 Complete dish

Tea, coffee, fruit and vegetable juice, soft drinks All beverages containing alcohol All type of breads, pasta, breakfast cereal, biscuits, cakes, pastries, batter-based products (e.g. pancakes) Fresh, canned, dried and frozen pome, berry, citrus, stone, tropical, subtropical and other fruit Brassica, carrot and similar root, leafy and stalk vegetables. Peas, beans, tomato, mushroom, zucchini Potato, sweet potato, pumpkin and corn Chick peas, kidney beans, butter beans, split peas and all other mature legumes and pulses Processed and unprocessed beef, veal, sheep, pork, poultry, game. Fresh, canned and smoked fish and seafood. Eggs Tree nuts and peanuts, coconuts and products, seeds and mixed seeds Dairy milk, cheese, yoghurt, cream, ice cream and custard Savory sauces, pickles, chutneys, relishes, salad dressing, dips Potato snacks, corn snacks and extruded snacks Sugar, honey, topping, jam and sugar-based spreads Chocolate, lollies, fruit, nut and seed bars, and muesli or cereal style bars Butter, margarine, plant oils and other fats Protein powder Homemade, dry mix, canned soup Dish contains food category 8 + 5 + 3/6 (Meat + Free vegetable + Cereals/Starchy vegetables), e.g. pizza and spaghetti bolognaise, Dish contains food category 8 + 5/3/6 (Meat + Free vegetable/Cereals/Starchy vegetables), e.g. chicken stir fried, bolognese sauces Dish contains food category 5 + 3/6 (Free vegetable + Cereals/Starchy vegetables), e.g. tofu stir fried, chick pea stew

19 Incomplete dish 20 Vegetarian dish

a Food group code and name of the major food groups in the Australian Food, Supplement and Nutrient Database for Estimation of Population Nutrient Intakes 2011–13 food classification system was adapted and modified (Food Standards Australia New Zealand, 2014).

• Code 3 Incorrect (dietary intake data coding discrepancy of crucial • •

items were not recorded on the CRFs, for example, if baked beans were recorded on the CRF without the details of quantity or frequency, recoding the item could not be performed. The discrepancy rate was computed on the basis of the total number of source data points. Statistical analyses were conducted using the SPSS software package (SPSS Inc. version 21, 2012, Chicago, IL). The normality of all data was checked using the Shapiro -Wilks test. A paired samples t-test was used for parametric data, and the Wilcoxon signed rank test was used for non-parametric data. Statistical significance was determined at p < 0.05.

information which impacts on the estimation of the food and nutrient intakes [for example, skim milk reported on the CRF, but the food item was coded as full-cream milk]) Code 4 Missing (uncoded dietary data from the source documents to the data output [for example, salt was recorded in the CRF, but not coded at the data output]) Code 5 Sourceless (coded dietary data in the data output without source documentation [for example, cheese was not recorded in the CRF, but it was coded at the data output])

The dietary intake data coding discrepancy classification was further developed based on the observed discrepancy incidences related to the reported food items, their quantities and associated frequencies and the EORTC standards (Vantongelen et al., 1989).

3. Results

2.4. Phase II: analysis of dietary intake data coding discrepancies

There were 17 discrepancy instances observed for the dietary intake data in the pilot sample (n = 4), which included intakes of food items (n = 13), quantity (n = 3) and frequency (n = 1). The sorted data coding discrepancies and examples using the EORTC discrepancy codes are shown below in Table 2. The definitions and examples of the data coding discrepancy classification system were described elsewhere (Guan et al., 2016). There were four types of discrepancies related to food items (incorrect, missed/missing, valid sourceless and questionable). A total of three data discrepancy types related to the quantity of food were developed (incorrect, valid sourceless and invalid sourceless). “Incorrect” was the only data discrepancy type found for food frequency data.

3.1. Phase I: development of a dietary intake data coding discrepancy classification system

A further 10% random sample (n = 38) of baseline dietary intake source data was extracted by an independent researcher, excluding those included in Phase 1. The sample selection method was based on the method applied in a large scale clinical randomized controlled trial to assess data quality by SDV (Mealer et al., 2013; Andersen et al., 2015). All items listed on the source data underwent a 100% manual verification check with the food output data from the FoodWorks software. The coding discrepancy classification system was applied to identify dietary intake data coding discrepancies. The verification check was completed by the same researcher who undertook Phase 1. In the case that there were newly observed data discrepancy instances not identified in Phase 1, they were recorded and discussed amongst the study team until consensus was reached. Data coding discrepancies relating to intakes of food items, their quantities, or associated frequencies were assessed and reported based on the modified AUSNUT 2011–13 food categories (Table 1). Source data for identified data coding discrepancies were re-coded into the FoodWorks software and compared with the original FoodWorks entry. Discrepancies which were unable to be re-coded were kept in the software in their original form. Discrepancies which were unable to be re-coded included invalid or valid sourceless discrepancies or those where the intake, quantity, or frequency of the specific food

3.2. Phase II: analysis of dietary intake data coding discrepancies The results of relevant data discrepancy types, the number of discrepancies, discrepancy rate and meal based discrepancy analyses have been reported elsewhere (Guan et al., 2016). In brief, there were no significant differences between the number of data points in the CRFs and the food output in the FoodWorks software (source data points 8940 vs. the food output data points 8774, p = 0.463). Of the total data points, there were 436 identified data discrepancies in 38 CRFs. A total of 26 CRFs and 223 data discrepancies were re-coded into the FoodWorks software. 3

Journal of Food Composition and Analysis xxx (xxxx) xxx–xxx

V. Guan et al.

Table 2 Number and examples of dietary intake data coding discrepancies in the pilot sample (n = 4). Codea

Number of instances

Examples

Code 2 Derivation Code 3 Incorrect

5 7

Code 4 Missing Code 5 Sourceless

3 2

Averaged 2–3 cups salad vegetable to 2.5 cups 375 ml beer entered as 285 ml One apple entered as 1 cup apple Garlic spread 2 tablespoons on the CRFb missed in FoodWorks software Meat-contained dishes from nursing home recorded with specific meat items only on the CRF, detailed dishes entered to FoodWorks software such as spaghetti bolognaise

a b

• •

Codes were adapted from the discrepancy definition of the European Organization for the Research and Treatment of Cancer (Vantongelen et al., 1989). Abbreviation: CRF – case report form.

Fig. 1. The percent of dietary intake data coding discrepancies found for each food group.

(for example, four slices of cheese were recorded in the CRF [approximately 85 g], but this was coded as four cups [approximately 280 g]) and overestimated frequencies (for example non-alcoholic beverages were reported as once per day in the CRF but coded as seven times per day in the software, and beef was reported as once per week in the CRF but coded as twice a week in the software).

The percentages of the identified coding discrepancies in each food group are shown in Fig. 1. A discrepancy rate of more than 10% of total data points was found for free vegetables (19%, 83/436), followed by meats (17%, 72/436), savory sauces and condiments (12%, 54/436) and cereals, cereal product and cereal dishes (11%, 47/436). To give an overview of the data coding discrepancies related to the food groups for each relevant discrepancy type, while minimizing the complexity of data presented, the food groups containing five or more data coding discrepancies for each discrepancy type are shown in Table 3. Another issue found during the SDV process was related to free vegetables, where the quantities of free vegetable items were entered by averaging the reported quantity throughout the free vegetables food group, rather than entering the actual quantity of each free vegetable food item. For example, the participant reported having two cups of salad for lunch. The free vegetables in the salad included lettuce, tomato, cucumber and onion. The actual quantity of each free vegetable was missed in the CRF. The quantities of the free vegetables were coded as half a cup each in the FoodWorks software, resulting in a total of two cups. After re-coding the discrepancies of 26 CRFs, the absolute difference in the identified data discrepancies for daily intake energy output between previously coded data and re-coded data was greater than 1000 kJ for three CRFs (12%, 3/26). Exploration of the reasons for these discrepancies indicated that it was due to inaccurate quantities

4. Discussion Dietary intake data coding discrepancies appear to be a factor which could impact on overall dietary intake data quality, but is often forgotten or not investigated in the literature. The method used in this study outlines a systematic method to evaluate the dietary intake data coding process used in the research setting, although users should carefully consider the dietary assessment methodology from which the data came when exploring data quality considerations. The results of this study indicate that dietary intake data coding discrepancies may differ between food groups. Free vegetables, meats, savory sauces and condiments, cereals, cereal products and cereal dishes may be more prone to coding discrepancies than other food groups in the analyzed dataset. The current findings suggest that specific free vegetables may be unable to be analyzed alone, as specific vegetable items and their 4

Journal of Food Composition and Analysis xxx (xxxx) xxx–xxx

V. Guan et al.

social desirability impacting on vegetable reporting (Hébert, 2016), a day-to-day vegetable consumption variation for a given participant has previously been found (Roark and Niederhauser, 2013). Moreover, seasonal vegetable variations further contribute to the complexity of assessing vegetable intakes. Consumption of vegetables has been found to increase from spring to summer, with an increase in consumption potentially due to the increase in product availability (Stelmach-Mardas et al., 2016). This may imply that due to variations in consumption, a participant is unable to recall detailed information about vegetable consumption during data collection. Consequently, a more detailed protocol for data collection and data entry for free vegetable items and their quantities may be required to ensure data quality. Moreover, detailed rules for handling incomplete data may also be required for data consistency. Further, detailed records of intakes of the meats, savory sauces and condiments, as well as cereals, cereal product and cereal dishes food groups on the CRF were required for accurate dietary data coding. This may be due to the increased complexity of dietary intake data for meatcontaining mixed dishes, particularly those also containing cereal foods, such as spaghetti bolognese and risotto. Prynne et al. (2009) demonstrated that meat intake data might be overestimated, as a result of improper handling of meat-containing mixed dishes during data coding. Furthermore, Fitt et al. (2009) suggested that meat-containing mixed dishes might require data to be coded and presented as separate categories to fully reflect the nature and amount of foods involved, such as rice dishes, pasta dishes and soups. For example, there is a relatively similar proportion of meat and pasta in lasagna; however, grouping this dish into either meat or pasta might overestimate the quantity of meat and pasta (Fitt et al., 2009). In addition, food items from savory sauces and condiments as well as cereal products and cereal dishes are also commonly consumed with meat or other dishes. This may challenge the categorization of cereals, cereal product and cereal dishes and savory sauces and condiments food groups during data coding. In addition, accurately transcribing portion sizes related to meats, savory sauces and condiments, as well as cereals, cereal product and cereal dishes are also challenges. This may be due to the variety of the cuts of meat and poor portion estimation related to savory sauces and condiments. Therefore, resource development or training is required to inform the data collection and coding personnel on how to collect and code information for meats, savory sauces and condiments, as well as cereals, cereal product and cereal dishes intake. Additionally, a more detailed protocol for collecting and entering food items and their quantities may also be required to ensure consistency, particularly for the cuts of meat and mixed dishes that are not currently available in food composition databases. Although accuracy, completeness, consistency, timeliness, reliability and relevance are the requirements to evaluate data quality (Batini et al., 2009; Wang and Strong, 1996), the process of evaluating the dietary intake data quality is complex. Apart from dietary intake data coding discrepancies, dietary intake data commonly contains inherent measurement errors (Mendez, 2015; Subar et al., 2015), which may in many instances be attributed by the self-reported nature of the data. Self-reporting of intakes may be influenced by an interviewees’ perceptions of the foods consumed in terms of beliefs, health risks, expectations, socio-cultural effects, lifestyles and values, the flavor of foods and sensory effects generated by foods (Font-i-Furnols and Guerrero, 2014). Interviewees might consider these factors differently, which may bias their intake reporting. Thus, food intake might be intentionally or unintentionally under- or over-reported by individuals. Although there are inherent limitations in each dietary assessment method (Kirkpatrick et al., 2014), dietary assessment methods can be validated or calibrated against a reference method before dietary intake data collection. The reference method may include a more detailed dietary assessment method or laboratory measurements such as doubly labeled water to assess energy expenditure or 24-h urine collection to assess protein intake (Martin et al., 2003; Trabulsi and Schoeller, 2001;

Table 3 The dietary intake data coding discrepancy frequencies and percentages by food group and discrepancy type. Discrepancy typea Food item Incorrect

Missing/missed

Valid sourceless

Questionable

Quantity Incorrect of quantity

Valid Sourceless

Frequency Incorrect of frequency

Food group code and nameb

Frequencyc

Percentd

3 Cereals, cereal product and cereal dishes 11 Savoury sauces and condiments Total discrepancies 1 Non-alcoholic beverages 3 Cereals, cereal product and cereal dishes 5 Free vegetables 8 Meat 11 Savoury sauces and condiments 18 Complete dish 19 Incomplete dish Total discrepancies 5 Free vegetables 11 Savoury sauces and condiments Total discrepancies 5 Free vegetables 18 Complete dish 19 Incomplete dish Total discrepancies

5

27.8

5

27.8

18 5 9

5.7 10.2

14 14 13

15.9 15.9 14.8

7 8 88 15 11

8 9.1

38 6 5 7 31

19.4 16.1 22.6

3 Cereals, cereal product and cereal dishes 5 Free vegetables 8 Meat 11 Savoury sauces and condiments 17 Soup 19 Incomplete dish Total discrepancies 3 Cereals, cereal product and cereal dishes 5 Free vegetables 8 Meat 10 Milk and milk products 11 Savoury sauces and condiments 15 Fats and oils Total discrepancies

8

12.9

9 5 5

14.5 8.1 8.1

6 8 62 8

9.7 12.9 8

28 12 11 17

28 12 11 17

6 100

6

1 Non-alcoholic beverages 3 Cereals, cereal product and cereal dishes 5 Free vegetables 8 Meat 10 Milk and milk products 18 Complete dish 19 Incomplete dish Total discrepancies

6 13

6.1 13.1

11 32 7 8 6 99

11.1 32.3 7.1 8.1 6.1

39.5 28.9

a Discrepancy types were adapted and modified from the discrepancy definition of the European Organization for the Research and Treatment of Cancer (Vantongelen et al., 1989). b Food group code and name of the major food groups in the Australian Food, Supplement and Nutrient Database for Estimation of Population Nutrient Intakes 2011–13 food classification system was adapted and modified (Food Standards Australia New Zealand, 2014). c To minimize the complexity of data presentation, the present table shows only the food groups containing five or more coding discrepancies. d Percent was calculated based on the discrepancy frequency divided by total discrepancy frequency in each discrepancy type.

quantities were assumed during data coding process in the analyzed dataset. This issue with coding vegetable data from records to a database may be a result of collecting incomplete information on vegetable consumption at the time of data collection. Accurately collecting vegetable intake is not a straightforward task. Apart from the likelihood of 5

Journal of Food Composition and Analysis xxx (xxxx) xxx–xxx

V. Guan et al.

Font-i-Furnols, M., Guerrero, L., 2014. Consumer preference, behavior and perception about meat and meat products: an overview. Meat Sci. 98, 361–371. Food Standards Australia New Zealand, 2008. AUSNUT 2007 – Australian Food, Supplement and Nutrient Database for Estimation of Population Nutrient Intakes. Food Standards Australia New Zealand [Online], Canberra Available: www. foodstandards.gov.au. Food Standards Australia New Zealand, 2014. AUSNUT 2011-13-Australian Food Composition Database. Food Standards Australia New Zealand [Online], Canberra Available: www.foodstandards.gov.au. Guan, V., Probst, Y., Neale, E., Martin, A., Tapsell, L., 2016. Development of an at-risk assessment approach to dietary data quality in a food-based clinical trial. Stud. Health Technol. Inform. 227, 34–40. Håkansson, I., Lundström, M., Stenevi, U., Ehinger, B., 2001. Data reliability and structure in the Swedish National Cataract Register. Acta Ophthalmol. Scand. 79, 518–523. Hébert, J.R., 2016. Social desirability trait: biaser or driver of self-reported dietary intake? J. Acad. Nutr. Diet. 116, 1895–1898. ISO9000, 2015. Quality Management. Geneva International Organisation for Standardisation. Jacobs Jr, D.R., 2011. Food synergy: the key to balancing the nutrition research effort. Public Health Rev. 33, 1. Kahn, M.G., Raebel, M.A., Glanz, J.M., Riedlinger, K., Steiner, J.F., 2012. A pragmatic framework for single-site and multisite data quality assessment in electronic health record-based clinical research. Med. Care 50. Kirkpatrick, S.I., Reedy, J., Butler, E.N., Dodd, K.W., Subar, A.F., Thompson, F.E., McKinnon, R.A., 2014. Dietary assessment in food environment research: a systematic review. Am. J. Prev. Med. 46, 94–102. Martin, G.S., Tapsell, L.C., Denmeade, S., Batterham, M.J., 2003. Relative validity of a diet history interview in an intervention trial manipulating dietary fat in the management of Type II diabetes mellitus. Prev. Med. 36, 420–428. Mealer, M., Kittelson, J., Thompson, B.T., Wheeler, A.P., Magee, J.C., Sokol, R.J., Moss, M., Kahn, M.G., 2013. Remote source document verification in two national clinical trials networks: a pilot study. PLoS One 8 (12), e81890. http://dx.doi.org/10.1371/ journal.pone.0081890. Mendez, M.A., 2015. Invited commentary: dietary misreporting as a potential source of bias in Diet-Disease Associations: future directions in nutritional epidemiology research. Am. J. Epidemiol. 181, 234–236. Olsen, R., Bihlet, A.R., Kalakou, F., Andersen, J.R., 2016. The impact of clinical trial monitoring approaches on data integrity and cost—a review of current literature. Eur. J. Clin. Pharmacol. 1–14. Probst, Y., Tapsell, L., 2007. What to ask in a self-administered dietary assessment website: the role of professional judgement. J. Food Compos. Anal. 20, 696–703. Probst, Y., Zammit, G., 2016. Predictors for reporting of dietary assessment methods in food-based randomized controlled trials over a ten-year period. Crit. Rev. Food Sci. Nutr. 56, 2069–2090. Prynne, C.J., Wagemakers, J.J.M.F., Stephen, A.M., Wadsworth, M.E.J., 1999. Meat consumption after disaggregation of meat dishes in a cohort of British adults in 1989 and 1999 in relation to diet quality. Eur. J. Clin. Nutr. 63, 660–666. Rayner, M., Scarborough, P., 2005. The burden of food related ill health in the UK. J. Epidemiol. Community Health 59, 1054–1057. Reedy, J., Krebs-Smith, S.M., Miller, P.E., Liese, A.D., Kahle, L.L., Park, Y., Subar, A.F., 2014. Higher diet quality is associated with decreased risk of all-cause cardiovascular disease, and cancer mortality among older adults. J. Nutr. 144, 881–889. Roark, R.A., Niederhauser, V.P., 2013. Fruit and vegetable intake: issues with definition and measurement. Public Health Nutr. 16, 2–7. Satija, A., Yu, E., Willett, W.C., Hu, F.B., 2015. Understanding nutritional epidemiology and its role in policy. Adv. Nutr.: Int. Rev. J. 6, 5–18. Schuyl, M.L., Engel, T., 1999. A review of the source document verification process in clinical trials. Drug Inf. J. 33, 789–797. Shelby-James, T.M., Abernethy, A.P., McAlindon, A., Currow, D.C., 2007. Handheld computers for data entry: high tech has its problems too. Trials 8, 2. Stadlmayr, B., Wijesinha-Bettoni, R., Haytowitz, D., Rittenschober, D., Cunningham, J., Sobolewski, R., Eisenwagen, S., Baines, J.Y.P., Fitt, E., Charrondiere, U.R., 2012. INFOODS Guidelines for Food Matching Ver 1.2. FAO/INFOODS, Rome. Stelmach-Mardas, M., Kleiser, C., Uzhova, I., Penalvo, J.L., La Torre, G., Palys, W., Lojko, D., Nimptsch, K., Suwalska, A., Linseisen, J., Saulle, R., Colamesta, V., Boeing, H., 2016. Seasonality of food groups and total energy intake: a systematic review and meta-analysis. Eur. J. Clin. Nutr. 70, 700–708. Subar, A.F., Freedman, L.S., Tooze, J.A., Kirkpatrick, S.I., Boushey, C., Neuhouser, M.L., Thompson, F.E., Potischman, N., Guenther, P.M., Tarasuk, V., 2015. Addressing current criticism regarding the value of self-report dietary data. J. Nutr. 145, 2639–2645. Tantsyura, V., Grimes, I., Mitchel, J., Fendt, K., Sirichenko, S., Waters, J., Crowe, J., Tardiff, B., 2010. Risk-based source data verification approaches: pros and cons. Drug Inf. J. 44, 745–756. Tapsell, L.C., Brenninger, V., Barnard, J., 2000. Applying conversation analysis to foster accurate reporting in the diet history interview. J. Am. Diet. Assoc. 100, 818–824. Tapsell, L.C., Lonergan, M., Martin, A., Batterham, M.J., Neale, E.P., 2015. Interdisciplinary lifestyle intervention for weight management in a community population (HealthTrack study): study design and baseline sample characteristics. Contemp. Clin. Trials 45 (Part B), 394–403. Thompson, F.E., Subar, A., 2013. Dietary assessment methodology. In: Coulston, A.M., Boushey, C.J., Feruzzi, M.G. (Eds.), Nutrition in the Prevention and Treatment of Disease. Elsevier, Oxford, England. Trabulsi, J., Schoeller, D.A., 2001. Evaluation of dietary assessment instruments against doubly labeled water, a biomarker of habitual energy intake. Am. J. Physiol. Endocrinol. Metab. 281, E891–E899. Vantongelen, K., Rotmensz, N., Van Der Schueren, E., 1989. Quality control of validity of data collected in clinical trials. Eur. J. Cancer Clin. Oncol. 25, 1241–1247. Wang, R.W., Strong, D.M., 1996. Beyond accuracy: what data quality means to data consumers. J. Manag. Inf. Syst. 12, 5–33.

Bingham et al., 1995). The aim of validation or calibration is to provide information on how well the new method collects dietary data in the population (Thompson and Subar, 2013). Therefore, the overall concepts of data quality related to timeliness, reliability and relevance of dietary data may be largely assured by the selected dietary assessment method. However, accuracy, completeness and consistency of the dietary intake data are largely influenced by both measurement errors and coding discrepancies, and thus caution may be required to clearly differentiate between these sources of errors. 5. Conclusions and recommendations In conclusion, during the process of coding dietary intake data from source documents to the database or software, accuracy, completeness and consistency should be assessed to determine the data quality. The coding discrepancy system and method used here offer a systematic approach to evaluate the dietary intake data coding process to provide data quality control in the research setting. Future users should carefully consider the dietary assessment methodology to which the data quality method is being applied to ensure it meets their needs. Food groups of free vegetable, meats, savory sauces and condiments, as well as cereals, cereal product and cereal dishes may be prone more dietary intake data coding discrepancies than other food groups for research studies, where intake is derived by diet history interview method. It is highly advisable that a detailed data collection and data entry protocol (such as based on food groups) is implemented prior to dietary data collection and coding process to ensure high quality data, particularly for targeting discrepancy prone food groups. Detailed rules for handling of incomplete data may also be required to improve dietary data quality. Conflicts of interest There are no conflicts of interest to declare. Ethical standards disclosure Ethical approval was not required. Acknowledgements The study was supported by the Illawarra Health and Medical Research Institute and the California Walnut Commission. The funding bodies were not involved in the analysis and interpretation of data, or writing of the report. Illawarra Health and Medical Research Institute facilities were used for conducting the study, including the collection of data. References Andersen, J.R., Byrjalsen, I., Bihlet, A., Kalakou, F., Hoeck, H.C., Hansen, G., Hansen, H.B., Karsdal, M.A., Riis, B.J., 2015. Impact of source data verification on data quality in clinical trials: an empirical post hoc analysis of three phase 3 randomized clinical trials. Br. J. Clin. Pharmacol. 79, 660–668. Arts, D.G., De Keizer, N.F., Scheffer, G.-J., 2002. Defining and improving data quality in medical registries: a literature review, case study, and generic framework. J. Am. Med. Inform. Assoc. 9, 600–611. Batini, C., Cappiello, C., Francalanci, C., Maurino, A., 2009. Methodologies for data quality assessment and improvement. ACM Comput. Surv. 41. Bingham, S., Cassidy, A., Cole, T., Welch, A., Runswick, S., Black, A., Thurnham, D., Bates, C., Khaw, K., Key, T., 1995. Validation of weighed records and other methods of dietary assessment using the 24 h urine nitrogen technique and other biological markers. Br. J. Nutr. 73, 531–550. Ezzati, M.P., Riboli, E.M.D., 2013. Global health: behavioral and dietary risk factors for noncommunicable diseases. N. Engl. J. Med. 369, 954–964. Faber, M., Wenhold, F.A.M., MacIntyre, U.E., Wentzel-Viljoen, E., Steyn, N.P., OldewageTheron, W.H., 2013. Presentation and interpretation of food intake data: factors affecting comparability across studies. Nutrition 29, 1286–1292. Fitt, E., Prynne, C.J., Teucher, B., Swan, G., Stephen, A.M., 2009. National Diet and Nutrition Survey: assigning mixed dishes to food groups in the nutrient databank. J. Food Compos. Anal. 22 (Suppl), S52–S56.

6