Zoo Research Guidelines - Wingham Wildlife Park

53 downloads 135 Views 553KB Size Report
Plowman, A.B. (ed)(2006) Zoo Research Guidelines: Statistics for typical zoo ...... age/sex class and zoo were fixed factors (both factors significantly affected ...
Zoo Research Guidelines Statistics for typical zoo datasets

© British and Irish Association of Zoos and Aquariums 2006 All rights reserved. No part of this publication my be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording or any information storage and retrieval system, without permission in writing from the publisher. Plowman, A.B. (ed)(2006) Zoo Research Guidelines: Statistics for typical zoo datasets. BIAZA, London. First published 2006 Published and printed by: BIAZA Zoological Gardens, Regent’s Park, London NW1 4RY, United Kingdom

ISSN 1479-5647

2

Zoo Research Guidelines: Statistics for typical zoo datasets Edited by Dr Amy Plowman Paignton Zoo Environmental Park, Totnes Road, Paignton, Devon TQ4 7EU, U.K.

Contributing authors: Prof Graeme Ruxton Institute of Biomedical and Life Sciences, Graham Kerr Building, University of Glasgow, Glasgow G12 8QQ

Dr Nick Colegrave Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, King's Buildings, West Mains Road, Edinburgh EH9 3JT

Dr Juergen Engel Zoolution, Olchinger Str. 60, 82178 Puchheim, Germany.

Dr Nicola Marples Department of Zoology, Trinity College, Dublin 2, Ireland.

Dr Vicky Melfi Paignton Zoo Environmental Park, Totnes Road, Paignton, Devon TQ4 7EU, U.K.

Dr Stephanie Wehnelt, Zoo Schmiding, Schmidingerstr. 5, A-4631 Krenglbach, Austria.

Dr Sue Dow Bristol Zoo Gardens, Clifton, Bristol BS8 3HA, U.K.

Dr Christine Caldwell Department of Psychology, University of Stirling, Stirling FK9 4LA, Scotland

Dr Sheila Pankhurst Department of Life Sciences, Anglia Ruskin University, Cambridge CB1 1PT, U.K.

Dr Hannah Buchanan-Smith Department of Psychology, University of Stirling, Stirling FK9 4LA, Scotland.

Heidi Mitchell Marwell Zoological Park, Colden Common, Winchester, Hampshire SO21 1JH, U.K.

Acknowledgements These guidelines are a result of a workshop organized by the BIAZA Research Group and hosted at Edinburgh Zoo in July 2004. All the authors were participants at the workshop, additional participants were Rob Thomas, Charlie Nevison and Colleen Schaffner and we acknowledge their valuable contributions to these guidelines. Particular thanks also go to Rob Thomas for organizing the workshop logistics and to Minitab for sponsorship of the day.

3

Contents 1. Introduction (A.B. Plowman) 1.1 What are these guidelines for? 1.2 Why are these guidelines needed? 1.3 How to use these guidelines and flowchart guide to sections 2. Randomisation tests (N. Colegrave, J. Engel and A.B. Plowman) 2.1 The problem 2.2 The solution 2.3 Use of randomisation tests for single case and small samples sizes in a zoo setting 2.4 Limitations of randomisation tests 2.5 Presentation of results 2.6 Software for randomisation tests 3. Multivariate tests (V.A. Melfi, N. Marples and G.D. Ruxton) 3.1 The problem 3.2 Common mistakes 3.3 Solutions 3.4 How is it done? 3.5 How to interpret and present results 4. Analysing activity budgets using G-tests (N. Marples, G.D. Ruxton and N. Colegrave,) 4.1 The problem 4.2 Common mistakes 4.3 Solutions 4.4 Limitations 5. General issues 5.1 Autocorrelation, temporal independence and sampling regime (S. Dow, J. Engel and H. Mitchell)

5.2 Social independence (S. Wehnelt, H. Buchanan-Smith, G.D. Ruxton and N. Colegrave) 5.3 Multiple test corrections (C.A. Caldwell, G.D. Ruxton and N. Colegrave) 5.4 Parametric versus non-parametric tests (C.A. Caldwell) 6. References (S. Pankhurst)

4

1. Introduction A.B. Plowman

1.1 What are these guidelines for? This volume aims to give zoo researchers, particularly students, clear guidelines to enable them to choose the most appropriate statistical tests for the types of datasets typically collected in zoo settings. If the guidelines in this volume are followed then researchers should be confident that they have chosen correct, valid and robust statistical analyses. The guidelines highlight typical challenges in zoo research, offer solutions and give advice on how to present the results of the tests and how to interpret these results in terms of what conclusions may or may not be drawn. With these guidelines we hope to increase not only the quality of zoo research but also the acceptance rate of zoo-based research papers in peer reviewed scientific journals. 1.2 Why are these guidelines needed? Despite a long history of fascinating, innovative and robust research carried out in zoos around the world (e.g. de Waal and van Roosemalen, 1979) many researchers in other fields do not consider zoo research a scientifically worthwhile activity. The most common reasons given for this are that animals in zoo environments are not ‘natural’ and that robust statistical analyses are not possible. The first of these objections is something all researchers should be aware of. However, with recent developments in husbandry methods and naturalistic housing and social groupings, most modern zoos now provide an extremely useful research setting; bridging the gap between highly controlled, but often extremely unnatural, laboratory conditions and the totally natural, but very difficult working conditions of the field. The second objection will hopefully be dispelled by these guidelines, since they demonstrate that valid and robust statistical tests are possible for typical zoo datasets, even studies on a single animal. However, even robust statistics can not make up for low biological validity of a study on a small number of individuals (see section 2.4), but this is a problem in common with many field studies (see Bart et al., 1998) and these guidelines also provide ways to deal with this challenge. In the past the zoo research community has not helped itself to dispel its image of poor statistical procedures and low validity. The typical statistical difficulties encountered (e.g. small samples, lack of independence of data points, non-normal distributions) have been dealt with in many different, more or less appropriate, ways by different researchers. In the published literature featuring zoo research one can find almost as many different statistical procedures applied to very similar datasets as there are papers. Thus, it is not surprising that many researchers find it hard to know which, if any, are correct. In addition to demonstrating which analyses are most appropriate we hope these guidelines will promote greater consistency in the way typical zoo research datasets are analysed and presented. Consensus on, and standardisation of, the methods we use can only be of benefit to all zoo researchers, increasing our own confidence and competence, improving the quality of our research and enhancing the value of our subject among the wider scientific community. 1.3 How to use these guidelines It is vital that the relevant sections of these guidelines are read BEFORE starting research as the tests to be used will strongly influence the way data are collected Sections 2, 3, and 4 of these guidelines provide information on the types of tests recommended in various situations that commonly occur during zoo research. The flowchart below provides a simple way for researchers to find the appropriate section for their experimental situation. Section 5 will be useful for all researchers as it provides general guidance on sampling procedures and how to avoid common statistical pitfalls, which are relevant irrespective of the tests being performed.

5

No

Are you comparing the same animal(s) in two or more conditions e.g. evaluating enrichment, before and after an enclosure move or modification, with high and low visitor numbers? Or are you comparing two or more animals (or groups) in one condition e.g. males vs females, adults vs juveniles?

No

Are you investigating the relationships between many dependent and independent variables e.g. multi-zoo studies using existing differences in husbandry to evaluate their effects on animals, MBA studies?

START Do you have a large sample size (>15) and normally distributed data (see section 5.4)?

Yes

Yes

Standard parametric tests see readily available text books

Are you focusing on one dependent variable e.g. cortisol level, time spent pacing?

Yes

Randomisation tests Section 2

No

Are you investigating changes in several related dependent variables e.g. the whole activity budget,

Yes

Yes

G-tests and derivatives Section 4

Multivariate tests Section 3

6

2. Randomisation Tests N. Colegrave, J. Engel and A.B. Plowman

2.1 The problem A frequent problem of studies carried out in a zoo setting is that, due to practical or ethical limitations, they are often based around a limited number of replicates. For example, zoos may be limited in the number of animals that are available to test a hypothesis, or the number of independent enclosures in which animals can be kept while being studied. In multi zoo studies, individual zoos will often be used as the independent data points, creating obvious difficulties in generating large data sets. Small sample size studies present three specific problems: • First, with few data points it is difficult to decide with any confidence whether the data meet the assumptions required for a particular test. For example, most parametric statistical tests assume that the data are drawn from populations with an underlying normal distribution. Determining whether this is the case in a study with only eight replicates is not realistic. • Second, small studies will generally have extremely low statistical power, and since the power of parametric tests will decline rapidly as assumptions are violated, they may be extremely inefficient tools for extracting the maximum information from our data. • Third, despite best intentions, it will often be difficult or impossible to design zoo studies with the idealised sampling regimes envisaged in statistical text books. Instead data will often be collected opportunistically, leading to obvious problems. The most frequent proposed solution to these problems is to use non-parametric tests. However, and despite popular belief, such tests are not assumption free, and also frequently have low statistical power as well as other limitations (e.g. more complex designs including multiple factors, or covariates may not be possible, or at least very difficult to do). Thus, alternative tests that are robust in the face of these problems are needed. 2.2 The solution Randomisation tests provide a powerful alternative to standard statistical procedures that we believe will prove useful in dealing with the problems of zoo studies described above (Edgington, 1995; Mundry, 1999; Todman and Dugard, 2001). Hypothesis testing generally relies on the production of a P value, the probability of obtaining a result equal to or more extreme than the one actually observed in the study assuming that the null hypothesis is true. Whilst most well known parametric and non-parametric tests determine the P value using an assumed theoretical probability distribution for the test statistic (like the standard normal distribution or the χ² distribution), randomisation tests generate this sampling distribution directly. Practically, this is achieved by resampling or reshuffling the data obtained in the study to determine directly the probability of an experimental outcome as large as or more extreme than that observed. In very small studies (such as the example of the lady drinking tea, originally used by Fisher in 1935 to outline this procedure), it is possible to compute all possible permutations of the data, and provide what is known as an exact P value. For larger data sets, this is not feasible, but the recent increase in computing power has allowed these tests to be practically extended by generating a large number of random permutations of the observed data allowing an estimated P value to be generated. Since these tests do not rely on an underlying distribution, specified in advance, they are much more robust than parametric and non-parametric tests when the underlying distribution is unknown. In theory, a randomisation test equivalent to any standard statistical test can be designed. Although more complex designs may need bespoke programming, appropriate randomisation tests for most

7

types of design commonly used in zoo research are available in a number of software packages (see section 2.6). Furthermore, a randomisation procedure can be designed to specifically examine the study as it was actually carried out (e.g. including any peculiarities in sampling) rather than assuming a perfect sampling scheme. Thus, the difficulties of analysing data from non-standard designs will be reduced significantly. Other advantages of randomisation tests, particularly over other non-parametric tests, are that they use the original data values rather than ranks so are more powerful and they have no difficulty with handling tied data. 2.2.1 Basic principles of randomisation tests The basic principle common to all randomisation and exact tests will be illustrated here with a simple example of testing for a significant difference between two means. This is the equivalent of a Student’s t-test or a Mann-Whitney U-test for two independent samples A and B. Thus, imagine that we wish to test the hypothesis that a novel nutritional regime increases the growth rate of penguin chicks in our penguin enclosure compared to the standard diet. We feed eight randomly chosen chicks the new diet, and another eight the control diet, and we determine their change in weight over that period. The first step in our analysis is to determine the average weight change for the eight experimental penguins, and the average weight change for the controls. To determine the observed experimental difference we then subtract the average control change from the average experimental change. We then begin our randomisation procedure. Taking our 16 data points, we randomly assign eight to the experimental group and eight to our control group, giving us a random data set (but based on the data actually obtained in the experiment). We then calculate the difference in weight gain in the same way as for the actual experimental data, and write it down. We then repeat this procedure to generate a second random data set, and a second difference. We continue this procedure (preferably using a computer) a large number (1000s) of times (in all, there are 12870 different possible rearrangements of this data set). We can then estimate the probability of obtaining a difference between groups at least as great as the one observed in the experiment (the one-tailed P value) as the proportion of all the random data sets in which this was the case. Example 2.1 presents a detailed example with an even smaller sample size. Example 2.1 Data sampled (observations): A B 3 10 4 12 5 Difference between the means: 4 – 11 = -7 All possible permutations: A1 B1 A2 B2 A3 B3 A4 B4 A5 3 10 3 5 3 5 3 4 3 4 12 4 12 4 10 5 12 5 5 10 12 10 12 Mean of each sample: 4 11 5.67 8.5 6.33 7.5 6 8 6.67 Difference between the means of Ax and Bx: -7 -2.83 -1.17 -2 -0.33

B5 A6 4 4 10 5 10 7

B6 A7 B7 3 4 3 12 5 10 12

6.33 7.5 7 -1.17

0.5

A8 B8 A9 5 3 4 10 4 10 12 12

6.5 9 5.5

B9 A10 B10 3 3 4 5 10 5 12

3.5 8.67 4 4.67

8.33 4.5 3.83

The difference between the means of the two samples is equal or more extreme (in this case smaller) than the one calculated from the observations (i.e. -7) only once out of ten permutations. The one-tailed P value is 1/10 = 0.1

The same general procedure can be used to address more complex designs, although careful thought must always be given to the way in which the randomisation is done, and the interpretation that can be drawn from the P value. In the section that follows we will outline some of the most common designs 8

that occur in zoo research, with guidelines on how to carry out an appropriate test. However, for more detailed discussion we would urge the reader to consult a specialised text such as the ones listed in section 6 (e.g. Todman and Dugard, 2001). 2.3 Use of randomisations tests for single case and small sample sizes in a zoo setting 2.3.1 AB designs for single cases This section covers studies with a two-phase design (baseline and treatment) in which one treatment, which cannot be easily repeated or withdrawn, is applied once to one individual (or group, if group is the sample unit). A realistic zoo example might be investigating the effects of moving to a new enclosure on the behaviour of an animal. In studies of this type the date on which the treatment is applied (the intervention date) should be determined randomly. Ideally this should be done by a truly random method of selecting the day from the range of those available e.g. by selecting one of the possible dates by drawing from a hat. More likely the zoo will set the date due to its practical agenda. However, if the decision is based on practical zoo issues (e.g. when all the appropriate staff members are available) rather than animal issues (e.g. a particular point in an oestrous cycle/breeding season) then it is effectively random with respect to the animal so does not invalidate the results. Once the intervention date is decided the appropriate data are collected over a number of days before and after the intervention. The difference between the daily mean value before the intervention and after the intervention of any variable measured can then be tested as in the example above by randomising the data and calculating the difference between the means. Based on prior knowledge of the animal, and the day to day variation in the variable being measured, it is usually desirable to determine a minimum numbers of days of data collection before and after the intervention. For example, if there is a total of 60 days available for the study it may be determined that at least seven days data should be collected before and after the intervention. The intervention date would therefore be randomly allocated to any day between day 8 and day 53 inclusive. The rerandomisation procedure should follow the experimental procedure, therefore, the re-randomised permutations should only include those on which the intervention date fell between days 8-53, data collected on days 1-7 and 54-60 would be kept in the same position. Example 2.2 demonstrates how this is done with a very short study. As can be seen in Examples 2.1 and 2.2 in such short studies it is not possible to obtain a P value