statistical ecology

0 downloads 0 Views 7MB Size Report
22.7 PILLAR'S METHOD TO REVEAL ASSEMBLY RULES. 325. 22.8 SYNOPSIS: ...... on total crop biomass. The design used a 4x4 Latin square. ...... Cramer's (1945) coefficient is comparable between samples irrespective of size: 2. 2. 122.59.
STATISTICAL ECOLOGY QUANTITATIVE EXPLORATION OF NATURE TO REVEAL THE UNEXPECTED

To Márta with love!

STATISTICAL ECOLOGY Quantitative exploration of nature to reveal the unexpected

László Orlóci Emeritus Professor of Statistical Ecology, Western University, London, Canada

SCADA Publishing – London, Canada

Refer to this book as Orlóci, L. 2012. Statistical Ecology. Quantitative exploration of nature to reveal the unexpected. SCADA Publishing, Canada. Online Edition: https://createspace.com/3476529 Look for Orlóci, L. 2013. Quantum Ecology. The energy structure and its analysis. SCADA Publishing, Canada. Online Edition: https://createspace.com/4406077 Orlóci, L. 2013. Quantum analysis of primary succession. The energy structure of a vegetation chronosere in Hawai’i Volcanoes National Park. SCADA Publishing, Canada. Online Edition: https://createspace.com/4452597 Orlóci, L. 2013. On the Energy Structure of Natural Vegetation. SCADA Publishing, Canada. Online Edition: https://www.createspace.com/4153484 Orlóci, L. 2012. Self-organisation and Mediated Transience in Plant Communities. SCADA Publishing, Canada. Online Edition: https://createspace.com/3585127 Orlóci. L. 2011. Problem-flexible Computing in Statistical Ecology. SCADA Publishing, Canada. Online Edition: https://www.createspace.com/3574792 Orlóci, L. 2012. Statistical Multiscaling in Dynamic Ecology. Probing the Long-term Vegetation Process for Patterns of Parameter Oscillations. SCADA Publishing, Canada. Online Edition: https://createspace.com/3830594

ISBN-13: 978-1453760529 ISBN-10: 1453760520 Electronic book first published 2010, revised and expanded 2012 Reprinted in revised format 2014

All rights reserved 2014 © László & Márta Orlóci [email protected]

Printed in the United States

Statistical Ecology

PREFACE I find it appropriate to begin the preface with recollections regarding the general state of the Earth process. Thomas Berry (1990) used this term and gave it well-definable meaning. Our text has in its focus manifestations of this process when presenting the concepts and methods of statistical data analysis to those in ecology and related fields. The critical state of the Earth process and the dangerous course it is running on is no longer a theory whose validity is disputed. Few do not agree that the high volume of greenhouse gases in the atmosphere, mainly from the burning of fossil fuels, is causing global warming; dumping of limitless quantities of chemical pollutants are poisoning the water supply; Sand vegetation destruction by everyday land use is triggering erosion and desertification over vast tracks of land. The biota responds with extinctions in the extreme, but not in trivial numbers. In fact conservative estimates put the extinction rate at the staggering 10,000 species annually with all forms of life counted. These suggest that one half of all the species now existing will be eradicated before the closing of the 21st century (Willson 1992, 2001). If one were to rank the deleterious effects by potential, most would probably consider global warming as society's public enemy number one. The Manabe (2xCO2) – Mason scenario (Manabe 1990, Mason 1990) is a benchmark prediction reinforced by almost two decades of research advances (IPCC 2001, 2007, Gore 2006, Orlóci 2008). According to this scenario, the Earth's climate will have undergone surface warming by about 2.5 oC on average in about seven decades (about 3.6 oC in the Century) counting from 1990. Mason’s expectation of the oceans’ thermal inertia to be overcome and atmospheric warming to manifest itself measurably became reality. The Tundra permafrost is melting and the polar Ice is doing the same. More recent estimates (IPCC 2001, 2007, Gore 2006, Orlóci 2008) of the atmospheric warming rate are much worse. But climate warming at even the early predictions is quite sufficient to force dramatic changes in the World's biota. How dramatic? Consider a potential case from a typical site in the Boreal region near Timmins, Ontario (Orlóci 1994): v

László Orlóci Annual mean precipitation

Annual mean temperature

mm

o

711

1.3

Thermal flux rate OC

Temperature increase by 2060o C

C

Expected temperature by 2060o C

3.7

9.2

10.5

Note: thermal flux is defined as the rise in local temperature per one degree rise in the global average temperature.

The necessary outcome of the process if left unchecked is a global disaster of unseen proportions. Recognition of this has led to the enactment of progressive environmental laws by many nations which mandate a coupling of technical planning and environmental protection. Clearly the statistical approach in sampling and data analysis must come up to par with the new sweeping standards of the mandated, large scale environmental studies. Assessment and prediction are the main tasks. These are concerned with the present state of the environment, its evolutionary past, and anticipated future. The complexities of implementation place premium on choices that emphasise empiricism, power, and very much, a clear local relevance. These points were uppermost in our mind when we selected the topics for presentation. We had to go beyond the deceptively simple Fisherian sampling environment (Orlóci 1993,2001b) into the real world whose complexities we know from Poore (1962), Mandelbrot (1967,1977) and Lorenz (1963). In world they describe process is in centre in its full natural colours: complex (non-linear), fractal (irregular, fragmented), and chaotic (disorderly, confused). We begin these notes with the sampling environment, recognising the sharp dichotomy in conceptualisations with Fisherian statistics (FS) in one direction and Poorean successive approximation (PSA) in the other. The defining differences are substantive and should be easily grasped by any with only a minimal exposure to ecological ideas: 1) FS assumes an ideal sampling environment of global regularity as if it were ruled by strict experimental controls. PSA does not idealise the sampling environment, but takes it as it comes. Consequently, the constraints on sampling and inference in FS are different from those in PSA. 2) PSA allows the statistical conclusion to grow and to come closer and closer in approximation to the truth through recursive sampling and analysis. FS lacks inherent facilities to allow such an evolution of the conclusion by virtue of its idealisation of the sampling environment. vi

Statistical Ecology 3) FS focuses on the "average". PSA defines a role for the "type" and for the "typical" event as well. "Average" and "typical" need not be the same. In the organization of the book's contents problem-oriented lines are followed, giving due weight for concepts and modus operandi of both FS and PSA. The text begins with definitions of terms and a discussion of general ideas. Independent chapters treat data management, biological variables and their measurement, population description, sampling, estimation, and sampling distributions. Subsequent chapters cover the methods of comparison (variables, individuals, groups), character weighting (ranking), trend seeking (regression, ordination), and classification (cluster analysis, identification). Reference list, problems, glossary and subject index conclude the book. Numerous step-by-step examples are included. Three external appendices are closely integrated with the main text and serve as a basis of hands-on practice sessions: the APICE exercise book, sample data set, and application programs. Márta Mihály, Forest Engineer, gave invaluable technical and lectoral assistance throughout the preparation of the book. For these and for her patience I express my sincerest thanks. László Orlóci Winter 2010, Kailua, Hawaii1

Author’s Note to the 2012 edition: The basis of the 2012 edition is the 2010 edition. I applied substantial revisions to the text and added new chapters on multiscale trajectory analysis, Markov chain, diversity partitions, zonal vegetation displacement issues in climate warming cycle, and a section on Pillar’s method for rfinding assembly rules. The book’s contents are now spanning an even wider range of ecological concepts and methods of analysis. László Orlóci Summer 2012, London, Canada

1 The book was reformatted in 2012 and corrections applied in 2013 .

vii

László Orlóci

Contents PREFACE

V

CHAPTER 1

16

TECHNICAL TERMS AND CONCEPTS

16

1.1 UNIT, TYPE, POPULATION, COMMUNITY 1.2 ATTRIBUTES 1.3 DATA TYPES 1.4 DOMAIN 1.5 RANDOMNESS 1.6 CONTEXT 1.7 TRUE OR FALSE 1.8 PLANNING

16 17 19 19 21 22 23 23

CHAPTER 2

25

HISTORIC PERSPECTIVE

25

2.1 PROBLEM LINES 2.2 THE STUDY SCENARIO 2.3 DATA STORAGE

25 27 28

CHAPTER 3

30

MEASUREMENT THEORY

30

3.1 DESCRIPTION OF OBJECTS 3.2 MEASURING SCALES 3.3 ERRORS IN MEASUREMENTS 3.4 MEASUREMENT ERRORS IN FUNCTIONS 3.5 SIGNIFICANT DIGITS

30 32 33 34 36

CHAPTER 4

39

POPULATION DESCRIPTION

39

4.1 POPULATION SCALARS

39 8

Statistical Ecology 4.1.1 Moments 4.1.2 Product moments 4.1.3 Entropy and information 4.1.3.1 Rényi's functions 4.1.3.2 Limiting distributions about P 4.1.3.3 More on Rényi's functions

4.2 THEORETICAL DISTRIBUTIONS 4.2.1 Poisson distribution 4.2. 2 Bernoulli distribution 4.2.3 The Normal distribution 4.3 DESCRIPTORS OF DISTRIBUTION SHAPE 4.4 COMMON TRANSFORMATIONS

39 41 44 45 46 47

50 50 51 52 55 57

CHAPTER 5

60

SAMPLING

60

5.1 GENERAL 5.2 SAMPLING FRAME 5.3 SIMPLE RANDOM SAMPLING 5.4 STRATIFIED RANDOM SAMPLING 5.4.1 Allocation by stratum size 5.4.2 Allocation by stratum variance 5.5 RANDOMLY SITED SYSTEMATIC SAMPLING 5.6 MULTISTAGE SAMPLING 5.7 PREFERENTIAL SAMPLING 5.8 SAMPLE OPTIMALITY 5.8.1 Quadrat-based estimation 5.8.2 Quadrat-based structure detection

60 61 62 62 63 63 63 64 65 65 65 66

CHAPTER 6

68

SAMPLE TO POPULATION

68

6.1 ESTIMATION 6.1.1 Consistent estimator 6.1.2 Unbiased estimator 6.1.3 Minimum sampling variance 6.2 ESTIMATION OF ENTROPY 6.3 ESTIMATION OF INFORMATION 6.3.1 Estimation of mutual information 6.3.2 Estimation of interaction information 6.4 MOMENTS AND MOMENT BASED QUANTITIES 6.5 PRODUCT MOMENTS AND RELATED QUANTITIES 6.6 ESTIMATION IN STRATIFIED RANDOM SAMPLING 6.7 ESTIMATION IN SYSTEMATIC SAMPLING CHAPTER 7

68 68 69 69 69 72 73 73 76 77 78 79 80

9

László Orlóci COMMONNESS AND PROBABILITY 7.1 WHICH KIND OF DISTRIBUTION? 7.2 NORMAL PROBABILITIES 7.3 SAMPLING DISTRIBUTIONS 7.3.1 Distribution of the mean 7.3.2 Chi-squared distribution 7.3.3 t distribution 7.3.4 F distribution 7.4 EMPIRICAL SAMPLING DISTRIBUTIONS 7.5 SETTING CONFIDENCE LIMITS 7.5.1 Point vs. interval estimation

7.5.1.1 1-  confidence interval for

7.5.1.2 1-  confidence interval for V

80



80 81 83 84 84 85 85 86 88 88 88 89

CHAPTER 8

91

MEASURING RESEMBLANCE

91

8.1 COMPARISON SPACE 8.2 MINKOWSKI METRICS 8.3 PRODUCT MOMENT 8.4 MEAN SQUARE CONTINGENCY 8.5 INDICES OF SIMILARITY AND DISSIMILARITY 8.6 GOODALL'S PROBABILITY INDEX 8.7 CALHOUN'S DISTANCE 8.8 PLEXUS DIAGRAMS 8.9 INVARIANCE

91 93 95 96 97 99 101 102 102

CHAPTER 9

104

STATING AND TESTING HPOTHSES

104

9.1 BASIC TYPES 9.2 GENERAL 9.3 PROCEDURE 9.4 SIMPLE, COMPOSITE AND MIXED HYPOTHESES 9.5 ONE AND TWO-SIDED ALTERNATIVES 9.6 PARAMETRISED AND NON-PARAMETRISED HO 9.7 ERRORS IN PROBABILISTIC DECISIONS 9.8 BARTLETT’S PARADOX

104 104 105 106 107 107 107 108

CHAPTER 10

110

PROBABILISTIC COMPARISONS I

110

10.1 SAMPLE MEAN AND A STANDARD 10.2 SAMPLE VARIANCE AND A STANDARD 10.3 SAMPLE DISTRIBUTION AND A STANDARD 10

110 111 112

Statistical Ecology 10.3.1 Chi-squared divergence 10.3.2 I-divergence information 10.3.3 Kolmogorov-Smirnov divergence 10.4 SAMPLE MEAN VECTOR AND A STANDARD 10.5 SAMPLE COVARIANCE MATRIX AND STANDARD

112 113 114 115 118

CHAPTER 11

120

PROBABILISTIC COMPARISONS II

120

11.1 SAMPLE VARIANCES 11.2 SAMPLE MEANS 11.3 SEVERAL VARIANCES AND MEANS 11.3.1 Complete randomized design 11.3.1.1 Comparing k sample variances 11.3.1.2 Comparing k sample means

11.3.2 Randomized block design 11.3.3 Latin square design 11.3.4 TWO-FACTOR DESIGN 11.4 TESTING HOMOGENEITY IN DISCRETE DATA 11.4.1 Null hypotheses for homogeneity 11.4.2 Test criteria 11.4.3 Homogeneity of replicates 11.4.4 Homogeneity of treatment means 11.5 COMPARISON OF K COVARIANCE MATRICES 11.6 COMPARISON OF K GROUP MEAN VECTORS

120 121 122 123 123 124

131 133 135 138 138 139 139 140 141 143

CHAPTER 12

147

PROBABILISTIC COMPARISONS III

147

12.1 TWO CONTINUOUS VARIABLES 12.2 TWO BINARY VARIABLES 12.3 TWO MULTISTATE DISCRETE VARIABLES 12.4 SETS OF VARIABLES 12.5 NESTED CHARACTER HIERARCHIES 12.5.1 Basic concepts 12.5.2 Additive partitions 12.5.3 Comparison of entire communities 12.5.4 Testing the significance of r 12.5.5 Interpretation of correlation profiles 12.6 UNBALANCED NESTED HIERARCHY 12.6.1 Overview 12.6.2 The isolation problem in general 12.6.3 The hierarchical relevé 12.6.4 Technical details 12.6.5 Decomposition of sum of squares 12.6.6 Decomposition of product moment 11

147 149 151 154 159 159 161 163 163 165 166 167 167 169 172 173 174

László Orlóci 12.6.7 Further on the partial variance 12.6.8 Results and interpretation 12.6.9 Remarks 12.6.10 Conclusions

175 175 179 180

CHAPTER 13

183

TREND SEEKING: UNIVARIATE RESPONSE

183

13.1 RESPONSE TYPE A STRAIGHT LINE 13.2 RESPONSE TYPE A PLANE 13.3 RESPONSE TYPE A POLYNOMIAL 13.4 RESPONSE TYPE A PRODUCT OR EXPONENTIAL 13.5 WORKING WITH RESIDUALS CHAPTER 14

183 188 191 193 194 196

CHARACTER ANALYSIS: IMPORTANCE A POSTERIORI 196 14.1 MULTIPLE CORRELATION 14.2 SPECIFIC VARIANCE 14.3 SUM OF SQUARES 14.4 INFORMATION 14.5 WEIGHTING VARIABLES: A DISCUSSION

196 199 199 200 206

CHAPTER 15

207

EXPLORATION OF COMPLEX DATA

207

15.1 MULTIDIMENSIONAL OR MULTIVARIATE 15.2 VIEWS OF THE MEDIUM 15.3 BROAD OBJECTIVES

207 208 210

CHAPTER 16

212

CONTINUOUS COMPLEXITY

212

16.1 TWO TRANSFORMATIONS 16.2 COMPONENT ANALYSIS 16.2.1 First transformation 1 16.2.2 Second transformation 2 16.2.3 Algorithm 16.2.4 Dimensions of the significant trended variation 16.2.5 A complete example 16.2.6 Presentation of the PCA results 16.3 MDSCAL: A FLEXIBLE METHOD

212 213 213 214 215 216 217 219 220

CHAPTER 17

224

EXPLORING GROUP STRUCTURE

224

17.1 SINGLE LINK CLUSTERING

224 12

Statistical Ecology 17.2 CENTROID CLUSTERING 17.3 SUM OF SQUARES CLUSTERING 17.4 ASSOCIATION ANALYSIS 17.5 ANALYSIS OF STRUCTURED TABLES 17.5.1 The data table 17.5.2 Compositional sharpness of blocks 17.5.3 Compositional gradients 17.5.4 Dimensionality 17.5.5 Identification of underlying factors 17.5.6 Partitioning the deviations

225 228 231 233 234 235 237 239 239 241

CHAPTER 18

245

EXPLORATION OF AFFINITIES: IDENTIFICATION

245

18.1 APPROACHES 18.2 GENERALIZED DISTANCE 18.3 DISCRIMINANT FUNCTION 18.4 INFORMATION DIVERGENCE 18.5 A CASE OF PROBABILISTIC CLASSIFICATION 18.6 GROUP IDENTIFICATION

245 246 248 250 251 253

CHAPTER19

255

MULTISCALE TRAJECTORY ANALYSIS

255

19.1 HISTORIC SETTING 19.1.1 The Kernerian line 19.1.2 Surrogate mathematical models 19.1.3 Ecological models 19.1.4 Focus on process governance 19.2 UNITS OF ORGANISATION 19.3 COMPONENTS OF CHANGE, MULTISCALING 19.4 INDICATORS OF CHANGE 19.5 PHASE SPACE: THE REFERENCE SYSTEM 19.6 THE MODEL AND THE DATA 19.7 FIRST-ORDER OBJECTIVES 19.8 COMPOSITIONAL TRANSITION SCALARS 19.8.1 Euclidean distance 19.8.2 Acute angle 19.8.3 Compositional transition velocity 19.8.4 Acceleration, deceleration 19.8.5 Angular velocity 19.9 SYNCHRONICITY SCALARS 19.9.1 Product moment correlation 19.9.2 The topological similarity coefficient 19.10 THE HAUSDORFF (FRACTAL) DIMENSION 19.11 DIVERGENCE SCALARS 13

255 255 256 256 256 257 257 258 258 259 261 262 262 263 263 264 264 265 265 267 273 274

László Orlóci 19.11.1 Rényi's entropy of order  19.11.2 Rényi's information of order  19.11.3 Pooled squared deviations 19.12 COMPLEX TRAJECTORY PROPERTIES 19.12.1 Phase structure 19.12.2 Determinism 19.12.3 Periodicity

274 274 274 275 275 275 276

CHAPTER 20

277

THE MARKOV CHAIN

277

20.1 GENERAL INTRODUCTION 20.2 A SIMPLE EXAMPLE 20.3 TRANSITION PROBABILITIES 20.4 WHAT DEFINES A MARKOV CHAIN? 20.5 POPULATIONS AND THE TRANSITION MATRIX 20.6 THE CALCULUS OF TRANSITION PROBABILITIES 20.7 FITTING THE MODEL 20.8 TESTS ON THE MARKOVITY OF AN OBSERVED SERIES 20.8.1 Ho: the series is zero order Markov 20.8.2 Ho: series mth order Markov 20.9 COMPARISON OF TEST OPTIONS

277 277 278 278 279 281 283 284 285 285 286

CHAPTER 21

288

DIVERSITY PARTITIONS

288

21.1 GENERAL REMARKS AND DATA CODING 21.2 ENTROPY AND INFORMATION PARTITIONS 21.3 EXAMPLE 21.4 A DISCUSSION 21.5 APPENDIX

288 289 291 293 294

CHAPTER 22

306

VEGETATION ISSUES IN CLIMATE WARMING CYCLE

306

22.1 INTRODUCTION 22.2 THE REALITY OF TROPOSPHERE WARMING 22.3 VOSTOK TEMPERATURE CONVERSIONS 22.4 UNFOLDING THE SERIES OF THERMAL EVENTS 22.5 LOCAL THERMAL FLUX RATES 22.6 PAST AND ANTICIPATED CLIMATE WARMING EFFECTS 22.6.1 Formation expansion rate 22.6.2 Taxon traits and assembly metrics 22.7 PILLAR’S METHOD TO REVEAL ASSEMBLY RULES 22.8 SYNOPSIS: A FUTURE FOR NORTHERN FOREST BIBLIOGRAPHY

307 310 313 314 318 319 319 320 325 327 329

14

Statistical Ecology GLOSSARY

347

INDEX

356

BIBLIOGRAPHY SUPPLEMENTS

367

READERS NOTES

374

15

László Orlóci

Chapter 1 TECHNICAL TERMS AND CONCEPTS This chapter covers topics selected to teach a working vocabulary without which the gap separating the esoteric from the practical could not be bridged.

1.1 Unit, type, population, community It is not just convenient but also fundamentally useful to categorize objects, animate or inanimate, according to their "type". In other words we have to classify before we investigate. The type may be a species of plant or animal, a kind of rock, a form of food, or a product, composition, etc. The "type" possesses characteristics that single objects share within their type. The parsimony in this for science is the facilitation of an overview, generalisation, and communication. The "type" may be defined at vastly different levels with different internal heterogeneity of the units that they incorporate. The units may be individual organisms, states of a variable that characterise an organism, events of other kinds (numbers showing upon throws of dice, proportions of quartz in granite, etc.) Peculiarly enough, the "type" may or may not materialise 16

Statistical Ecology as a collection of concrete objects, but only exists in the abstract, such as all possible outcomes from throws with a perfect die. In the sense of being a collection, the "type" is equivalent to a "population". Just as the populations do, communities are composed of units. The units of a community are interacting organisms, often many different species. The interactions represent high-level functionality with its own rules of governance. Example 1.1.1 The member objects of a population are called units. A unit may be a concrete object, such as an organism, a rock, a solum (a three dimensional strip of soil), a spatial aggregate of concrete objects, such as a piece of the landscape, a stand of vegetation, a bird colony, an insect swarm. A unit may be a physical state such as length, area, volume, temperature, etc., by which we describe concrete objects; an event emitted by a generating process, such as the face of a die in games of chance, a genotype in evolution, eye colour as a manifestation of the genotype, and so forth. Example 1.1.2 Description of a community type. The site: Devastation Trail in Volcanoes National Park, Hawaii. Averages, range values, occupancy counts and rates are given which were synthesised from 23 individual relevés: Populations

-+Andropogon virginicus Anemone hupehensis Cibotium glaucum Coprosma ernodeoides Dicranopteris linearis Dodonea viscosa Dubautia ciliolata Metrosideros polymorpha Myrica faya Neprolepsis exaltata Psilotum nudum Sadleria cyantheoides Styphelia douglasii Vaccinium peleanum Vaccinium reticulatum

Mean % cover in 23 quadrats 5.0 0.7 0.3 1.7 0.1 1.4 3.4 31.9 13.6 4.5 0.9 1.8 1.1 0.2 5.2

Range

0 - 20 0- 5 0- 5 0- 5 0- 1 0 - 10 1 - 10 0 - 80 0 - 40 0 - 15 0- 5 0 - 10 0 - 10 0- 5 1 - 15

Occupancy count 15 6 3 8 2 7 23 18 13 20 10 13 15 1 23

Occupancy rate 0.65 0.26 0.13 0.34 0.08 0.30 1.00 0.78 0.56 0.86 0.43 0.56 0.65 0.04 1.00

1.2 Attributes Anything that characterises an object is an attribute: the materials it is made of, the circumstances of its coming into existence, its type by form or construct, the functions it performs, or the objectives it serves. Clearly, the number of attribute is limitless. Yet, the ones actually chosen for measurement are always limited. What sets the limit? Two things: relevance and parsimony. An attribute is relevant if it bears directly on the problem to be solved. It is parsimonious if it has high information content per unit 17

László Orlóci effort of its description. A small number of attributes should sufficiently specify the problem. Ockham's razor applies.2 Example 1.2.1 Attributes are classified by kind in Figure 1.2.1. An attribute that varies among the population units is called variable. A major dichotomy isolates two kinds, factors (the causes) and responses (the effects). Example 1.2.2 The factor vs. response distinction of a variable is quite evident in a reptile's ambient and body temperature. The environment having more mass, will determine the reptile's body temperature.

Variables are either qualitative or quantitative. Intermediates are obtained by dichotomization of a quantitative variable. For example, the diameter of Thuja plicata may be scored as zero below a limit and 1 above it. The basic variable types are distinct in some fundamental ways (Table 1.2.1). Qualitative variables are either binary or multistate. The states lack a natural order. Quantitative variables have natural order. It follows that the defining criterion for a quantitative variable is the ordered nature of its states.

Figure 1.2.1 Table 1.2.1 Example

States

Type

Leaf shape Presence of resin ducts Order of hatching Clutch size Bill length

A, B, C, ... yes, no 1st, 2nd, 3rd, ... 0, 1, 2, 3, ... >0

Qualitative multistate Binary Quantitative ordered Quantitative discrete Quantitative continuous

When precise magnitude is defined, such as "10 % moisture saturation in soil", this will mean exactly twice the moisture saturation of 5%. This property sets the ratio variables apart from the ordinal variables. The ratio variables are either discrete (presentable as positive integer numbers or zero) or continuous (presentable as real numbers or fractions). 2

William of Ockham (c. 1288–c. 1348), English scholastic philosopher who enunciated the principle known as the "law of parsimony". 18

Statistical Ecology

1.3 Data types Data describe objects and events in qualitative or quantitative terms. In the case of qualitative data the elements are merely labels by which we identify the object's state or presence. In quantitative data, the elements express multitude, linear size, area, or bulk. Data types may be distinguished in other ways as well. If more than one variable is recorded simultaneously on each unit in a sample, the resulting data set is called multidimensional. This case is called multivariate if only response variables are present or multivariable if a mixture of factor and response variables is involved. The case of a single variable is called univariate. Example 1.3.1 Association behaviour of fish is recorded during 10 time periods as shown in Table 1.3.1.1. This is a typical example of multivariate qualitative data’ Table 1.3.1.1 Species 1

2

Point in time 4 5 6 7

3

8

9

10

A 2 2 3 2 1 3 2 2 2 2 B 1 1 1 1 1 1 1 1 1 3 C 1 3 3 3 3 3 3 2 3 3 Legend to scores:1 - solitary, 2 -schooling with own species, 3 - schooling with other species.

Example 1.3.2 Ruminants are scored by species according to order of arrival at a watering hole. The scores have quantitative significance in that they register order. Example 1.3.3 A typical discrete ratio variable is "stem count" within sample plots. Ecologists use this variable when they describe biomass-free density. Stem count is not suitable to compare stands of different life-forms or different ages of the same life form. Example 1.3.4 Table 1.3.4.1 represents a univariate data set. The data elements describe the states of a ratio variable that is continuous. Table 1.3.4.1 Osmotic level mO(kg)

5

95

279

543

804

1234

Proline mM

2

3

10

61

130

278

Example 1.3.5 Plants are counted according to species by plot. In addition, a composite soil sample is taken in each plot for determination of soil pH and electric conductivity. The data set is typically multivariable.

1.4 Domain We recognize a high-level dichotomy in statistical work. Surveys that supply a broad view of Nature are on one side and experiments that dwell into the details on the other. At the mega-scale, a survey may take the bioenvi19

László Orlóci ronmental system and examine its past behaviour. This requires time series data with insight into the effect of environmental forcing, say climate warming on regional vegetation shifts. At the micro scale the problem addressed is specific to some local event. The vegetation's response to changing thermal influx is an example. The scales should merge in the final synthesis. This would happen in the sense of the micro scale giving substance to the pixels in the mega scale portrait of the system, and the mega scale portrait allowing generalization of the micro scale results. The imposition of a priori controls on variables is inherent in experiments. In surveys, if there are any controls imposed, are limited and structural by stratification rather than substantive by actually controlling the forcing factors. The following examples illustrate cases of increased specificity of objectives and increased experimental controls: Example 1.4.1 Environmental protection is increasingly a concern to public, so much so that in many countries laws mandate justification in terms of environmental impact studies before any land use project is allowed. Impact studies are always exploratory in the initial phase, but then the focus shifts to specific problem areas, which may require clarification in controlled experiments. A study in a Boreal site has taken the steps like these: (1) Vegetation types were established based on examination of survey data. It was found that type and permafrost depth in August were closely related. (2) Dynamic stress experiments revealed that the vegetation type is synaptic of the kinds of mechanical treatment required to provide the required roadbed stability under anticipated maximal loads. This allowed the forecast of appropriate treatment based on the vegetation type without repetitive dynamic testing. Example 1.4.2 Table 1.4.2.1 contains clutch size data from a survey of 62 Turdus migratorius nests. Clutch size has 6 states in the data: 0, 1, 2, 3, 4, 5. The last state means (at least 5 eggs in nest). In the interest of obtaining a more general description of the survey results, a simple numerical experiment is performed: (1) Theoretical distribution functions are fitted to the data. (2) The distribution that fits the data sufficiently well is quite probably the Poisson yet to be discussed.

Table 1.4.2.1 Clutch size X 0 1 2 3 4 5 or greater Total

Number of nests f(X) 6 15 19 14 7 1 62

Relative frequency p(X) 0.097 0.242 0.306 0.226 0.113 0.016 1.000

Poisson probabilities pO(X) 0.127 0.262 0.270 0.186 0.096 0.059 1.000

Example 1.4.3 To the experimentalist the term "experiment" usually conjures a case of study that involves strict controls of the factor variables the effect of which is studied. The effect of light intensity on photosynthetic rates is an example. In the study, the controls 20

Statistical Ecology allow light intensity to change in pre-determined increments across groups of plant specimens, but all the other growth factors that could influence photosynthetic rate, most prominently CO2 concentration in the air and ambient temperature, are forced to vary in a random manner in as much as they could not be held perfectly constant. But the experimental controls take another form in most field studies. It is very much so when a forester examines with the influence of soil moisture on the productivity of a conifer. He or she will in all likelihood choose to control moisture levels and variation via stratification of the sample by environmental type (e.g. xeric, mesic, hygric). In this way the productivity records will be specific to conditions associated with moisture levels in the environmental types and the relationship of the two can be examined.

1.5 Randomness In the real world of ecology there is always a substantial random component to response. This is in the sense of some factor influences keep on generating responses with a definite non-random trend while some other mechanisms keep on generating responses that are random. If we take tree height as the response variable, inherited plasticity and chance fluctuations in the environmental effects may be generating random responses while soil moisture regime jointly with other environmental factors may be forcing non-random responses. What should be a test for randomness? In general, we would look for independence in the response from measured environmental forcing. We use the term "stochastics" to describe the superimposition of random variation on a basically directed trend. A state in the reverse, namely the rise of order in complete chaos, has come to be called the case of the "fractal".3 The convolution of random and non-random variation in the total response is a fact of Nature. Interestingly, their isolation is a purely analytical problem. But it requires an operational definition. To this end, we may think of randomness as that portion of response which is independent of one or more of the relevant external ordering criteria, including time, space, level of impact, etc. The cases described in the following examples illustrate this point. Example 1.5.1 When chance reigns through some probability law, and the states of a response variable show dispersion patterns unrelated to the defining variables of the space into which the response is projected, then we can say that "the response is undergoing random variation". That is, if we order the states of a response variable, say tree height according to soil moisture, tree height at one point on the moisture scale cannot be predicted from tree heights taken at other points on the same scale, or equivalently, response in tree height is not completely predictable from soil moisture.

3

See concepts and references in Orlóci (2001a). 21

László Orlóci Example 1.5.2 When randomness reigns, the states of a response variable are independent from a more general ordering criterion, encountered in time and/or space. Here too, the same as under Example 1.5.1, the responses are arranged in a unique sequence, other than their actual magnitude, and randomness is examined on the basis of that arrangement.

The practical consequences are clear: a. Randomness can have different operational definitions and testing for it may lead to different conclusions. This is owing to the fact that the criterion by which randomness is tested is independence from an ordering criterion and the ordering may obey different criteria. b. Even if independence is a conclusion, it is in the context of the ordering variable(s), and may have nothing to do with the workings of some latent chance mechanism of response. How to probe for the existence of a chance mechanism? Experimental evidence is needed. For tree height the experiment would aim at showing inherent variation in the species under conditions that do not enhance nor suppress the appearance of randomness. Considering the above, it is not unreasonable to expect that users will continue identifying response as random if they find a random arrangement within the analytical space which they defined, irrespective of the mechanisms that actually generated the response. Example 1.5.3 When the growth rate of a plant species is expressed as a function of available soil nutrients, a strong trend appears. A perfect relationship is unlikely, however, and as a result there will be a virtual random component of variation around the main trend line. Does this indicate true chance variation in growth? Not necessarily, since what appears random when only soil nutrients are considered may in fact appear as a trended response vis-à-vis other factors that vary independently of soil nutrients. Example 1.5.4 Outcome of throws with a perfect die exemplifies a variable undergoing pure, chance variation. This variable has six states, and because the die is perfect, chance reigns through a simple 1/6 probability law. Unpredictability is complete in the sense that no outcome (state) will attract a specific outcome with higher average probability than it does any of the other outcomes. Things change if the die is "loaded". This will force a side to show more often than expected based the simple 1/6 probability law.

1.6 Context We define the context of a problem by answering the question: Does the statement of the solution involve the notion of probability? If it does, then the context is probabilistic. If it does not, the context is deterministic. Example 1.6.1 An interesting mixture exists in some problems. For example, one of the Markov chain models of community development assumes that a set of probabilities exists a priori which specify the rates of replacement of a given species by other species in the 22

Statistical Ecology sites it already occupies. Clearly, the problem in this has a probabilistic (also called stochastic) side. But it also has a deterministic side, because the transition probabilities, set by the conditions at origination, if not allowed to change, will predetermine the Markov composition of the community in any future state.

1.7 True or false A survey or experiment rarely describes true patterns or structures in nature. More likely, they provide estimates or approximations of the "true" thing. Two typical cases stand out in this respect, each serving the same objective, but differ in one important respect: the population may be sampled or completely enumerated. If we discount observer mistakes, the outcome of complete enumeration is a true picture of the population. But populations are usually very large and normally cannot be completely enumerated. This leaves us with the only alternative, sampling. When we sample we take a limited subset of the population under observation. When doing that, it must be clear that the sample that we take is one of many possible samples. The criterion of choice is chance based in statistics. But it is more often preferential (which we discussed in the sequel). To what extent will a "sample description" of the populations or the community deviate from the "true description"? This cannot be measured exactly in most cases, but as we shall see limits can be set within which one expects to find the true description with given probability. Example 1.7.1 If we consider a population of size 100 and decide to take from this 10 units, the sample that we obtain will be one of

30!

= 30045015 possible samples, all distinct in

10!20!

at least one unit, all of size 10, and all of equal probability of being chosen in random sampling. Now if we start taking samples of 10 units, the 1st, the 2nd, and so on, and obtain a description of each, the descriptions will differ, that is, the physical property 'sample description' becomes a variable. From this, it is easy to see that the conclusions based on sampling will have to be stated in conditional terms. In general, categorical statements are allowed only under complete enumeration with a claim on the complete truth – barring human mistakes in the course of data collection and later in data analysis.

1.8 Planning In any study of nature the first step is a clear statement of the problem, and last the presentation of results and conclusions. In the interim, specific decisions are made concerning population limits, population units, attributes, measuring scales, sampling techniques, data analytical algorithms, methods of inference, etc., which have to be planned. Every decision at each step has cumulative consequences for the outcome. The following are some planning points: 23

László Orlóci Statement of problem. The investigation must have clearly stated objectives in connection with a clearly stated problem. Delineation of population. The objects may not always be natural entities. Nonetheless the conventions that define the units must be clear. The population boundaries must be unambiguous. Sampling design. Biological populations are with some exception uncountable because of the large size or because of circumstance. This precludes complete enumeration. The alternative, sampling, requires design according to precisely stated conventions. Random designs are often not implementable. Choice of approach. Often, survey and experiment are used in sequence, the survey generating the broad picture, and experiments test specific hypotheses about the details. Selection of variables. The number of potentially useful variables may be extremely large. The selection of a manageable subset must follow defined conventions. Designing data analysis. All decisions affect the scope and type of the analyses to which the data set is subjected. It is therefore essential that data analysis be planned prior to implementation of the survey or experiment. Preparing for the eventuality of repetition of the study in the spirit of "successive approximation".

THIS CHAPTER presented the study plan in which surveys and experiments are the instruments of gathering evidence for Nature’s patterns of structural and functional variation. In the course of this we introduced many technical terms that we use in the subsequent sections of this book. We distinguished two components of variation, one trended and the other random. We came to realise it as an interesting fact that the central objective of statistical analysis is the isolation of these components. We emphasised context and also emphasized that true or false are a matter of context dependent probabilistic decisions. We attached special importance to a thorough study plan drawn up with the clear understanding that the steps taken in a study are interconnected and the decisions at each step have cumulative effect on the final outcome.

24

Statistical Ecology

Chapter 2 HISTORIC PERSPECTIVE In this chapter we look back on the evolution of the field along problem lines, study scenarios, data handling, and data communications.

2.1 Problem lines Biological data analysis has a long history4 along problem lines highlighted by surveys and experiments. The analysis focussed on questions of three kinds: How common? How similar? What kind? HOW COMMON? When the commonness of an event is in question, the probability of the event comes into focus. An event is considered common if the probability of its occurrence by chance is high. When a probabilistic way of reasoning is involved, we require access to a probability distribution. There are two ways of devising a distribution: 1. Derivations from basic principles had their great stride early in this Century, most notably in the works of E. S. Pearson, R. A. Fisher, J. Neyman and 4

The beginnings of biological data analysis coincided with the formulation of questions about the properties of multitudes, such as A. Quetelet's (1796-1874) "averages", F. Galton's (1822-1911) "correlations", K. Pearson's (1857-1936) "descriptive" and "correlation" measures, and C. Raunkiaer's (1860-1937) index of comparison. 25

László Orlóci R. von Mises. We put their results under the collective title theoretical distributions to emphasise that they are not rooted in simulation experiments or actual observation. The derivation of theoretical distributions assume that a random sample can be taken, and further that specific regularity conditions exist globally, irrespective the place, time and problem type. Owing to this fact, theoretical distributions simplify the mechanics of finding probabilities, but if the conditions under which they are derived did not exist in portions of nature put under scrutiny, the probabilities found will be rendered meaningless.5 2. In bioenvironmental studies the contemporary tendency is not to use theoretical distributions. The reason for this is the recognition that global uniformity does not exist independently of space, time and problem. The preference is increasingly for sample-specific empirical distributions. We can generate empirical probability distributions in simulation experiment based on the use of randomisation techniques6 under possibly any set of conditions, even under the regularity conditions assumed by the early workers. HOW SIMILAR? This is an all-pervasive question to be answered by comparative science in which conclusions are drawn about the nature of one thing from its spatial, morphological or functional similarity to others. "How similar" is the central question in biosystematics and community typology, and also in ordinations, which arrange objects into groups order them on a line according to their similarities. Jaccard's turn-of-the-20th Century work on regional floras epitomizes the beginnings of the problem line. But explosive developments had to wait until much later in the 1950s and following decade. By that time the digital computer was perfected and became broadly available. Particularly significant in the evolution of bioenvironmental studies are the contributions of D.W. Goodall, P. Greig-Smith, E.C. Pielou, P.H.A. Sneath, R.R. Sokal, and W.T. Williams. From these emerged the new field of Quantitative Ecology and Numerical Taxonomy. WHAT KIND OF PROCESS? We are limiting the discussion under this heading to a general case of community dynamics with examples from vegetation

5 6

See for related concepts and references in Orlóci, (2001b). See related concepts and references in Edgington (1987), Pillar and Orlóci (1996). 26

Statistical Ecology ecology. Emphasis will be put on the characteristics of the process trajectory.7 The characteristics include phase-structure, determinism and attractor migration, periodicity, complexity, and parallelism.8 These are among the intangibles of the vegetation community, and as such, they cannot be measured directly. They have to be defined analytically. Trajectory characteristics of the above kind are called emergent. The name makes sense since these characteristics do not exist without the community. An important point to be made is that the trajectories characteristics are indicators of factor influences, and also, they are symptomatic of the governing principles of the process.

2.2 The study scenario Studies of complex phenomena historically invoked a three-step study scenario. At first in an ecological study, knowledge about the phenomena is in good part involves intuitions. As the study progresses knowledge expands and understanding deepens. This can lead to refinements, even involving the study scenario itself. Such a dynamically executed study scenario is "Poorean successive approximation"10. The Poorean scenario is also hierarchical. This implies that the study tasks move down through scales from the macro-characteristics to the understanding of the details. Moving down, and not up, should guarantee that the natural unity of the problem is kept structurally and conceptually intact. It is an interesting fact that Science always found utility of macromanifestations in the search for causes and governing principles. To see this to be true, it is sufficient for one to think of the works of the Great Masters. Newton's basic laws of motion came about this way, so did the Darwin-Wallace theory of species evolution by natural selection, the Men9

7

This follows Orlóci et al. ( 2002). Phase structure implies the segmentation of a trajectory into significantly distinct intervals. Determinism, or directedness, is the manner in which the process could be running its course if the attractor, the set of conditions that define process direction, has not been overwhelmed by random effects. Periodicity is the tendency for certain structural variables, such as diversity, to revisit past states with regular or irregular frequency. The notion of dimensionality is understood in the sense of complexity, measured as a fractal, related to shape. Parallelism implies the tendency of the process to run its course in a manner of coordination with itself during different periods of time, or with trajectories from other sites. 8

9

See Wiegert ( 1988), Anand ( 1994, 1997).

10

Poore (1962). 27

László Orlóci delian 3:1 statistical law of particle based inheritance and Kerner's doctrine11 of plant community development by facilitation.

2.3 Data storage The presentation of techniques and reasoning about choices make up much of the book's contents. But before we turn to those, we present a brief review, regarding data storage, communication, and processing. The techniques of data storage and sorting evolved over time from manual to mechanical and electronic. The idea of mechanization came early, but it was not until the late 19th Century, following Charles Babbage's work and the availability of the electric motor, that commercially viable equipment began to appear. Among the many early designs Herman Hollerith's electric machine was the most successful. It used punch cards for data storage and was outfitted with mechanical sensors and counters for tabulation. Hollerith's machine was used to process the data in the 1890 U.S. census. The manufacturer became known as IBM. Electronic storage on magnetic and optical devises and processing in the computer's memory has long replaced the punch card and electro-mechanical sorting. The landmark invention which revolutionised the development of communications was the movable type in the 15th Century by Johannes Gutenberg, who opened the way to broad dissemination of information. Centuries later, the invention of the telegraph by Samuel Morse (1837) and wireless telegraphy by Gugliermo Marconi (1901) made instantaneous passage of information possible over large distances. The invention of the digital computer brought automation to communications and permitted transmission of large volumes at high speed by serving as a control unit and buffer. Communication satellites enhanced further communications on a global scale and lead to the development of a world wide web, the Internet. Calculating devices have existed since ancient times12. The abacus, for example, was in use in Babylon and China millennia before any mechanical calculators appeared. Among the early mechanical machines, Wilhelm Schickard's calculator (17th Century) was probably the most versatile. The forbearers of the early 20th Century electric machines began to appear in 11 Explained

in his landmark book of 1863 "Das Pflanzenleben der Donauländer" published by Wagner in Insbruck. 12

Goldstine (1972) narrates the history of computer evolution from Pascal to von Neumann. 28

Statistical Ecology the mid and late 1800's. These machines were designed for specialized tasks, such as the Pehr Scheutz difference engine (1853), which was used to construct of life expectancy tables. The advent of the electromechanical calculator represented the next step forward but its practical life ended abruptly in the 1970s by the advent of the electronic calculator. The first of these was the MARK I, built by H.H. Aiken and IBM engineers at Harvard in 1944. By today's standards, the MARK I had an enormous size and dismally slow speed. The ultimate tool for data manipulations is the modern computer. Its prototype, built in 1952 by IAS at Princeton according to John von Neumann's logic, was the first machine that used an internally held program. Since its introduction, computers developed in leaps and bounds, memory capacity and speed increased, and physical size became greatly reduced, thanks to the silicon chip. THE PROBLEM LINES are related to three simple yet very potent questions: How common? How similar? What kind? The approaches that develop these questions and attempt to find answers differ. We made clear that our preference is for a very flexible approach which we called Poorean successive approximation. In this answers, in the manner of inferences, are the outcome a process of recursive sampling and analysis. Therefore, the answers are not absolute, but the best that can be achieved at a reasonable investment of time and resources.

29

László Orlóci

Chapter 3 MEASUREMENT THEORY The description of objects is the chapter’s main topic. When the question posed is “how large”, the answer has measurement theoretical implications concerning measuring scale, measurement error, and significant digits.

3.1 Description of objects An objects description involves its quality, linear size, area, volume, or bulk. An example is the survey of commercial quality timber in a forest. The surveyor's choice of attributes – such as tree quality, height of the trunk and its diameter – are obvious choices for measurement that bear directly on the amount of commercial quality lumber to be harvested. The linear measurements can be the basis of derived variables, such as total volume and mass. The set of attributes of possible interest to an ecologist doing bioenvironmental assessment is very large. Because of the necessity to remain within manageable limits, rules are required to control the selection. As a general rule, Ockham's razor applies: the inflation of the problem by including attributes not directly relevant to the problem should be avoided. The question is how we know how many attributes are necessary. This may call for a pilot study. Familiarity with the attributes themselves is required: 30

Statistical Ecology Accessibility, clarity. A variable will only be useful if its states are not ambiguous and accessible for measurement without undue effort. Example 3.1.1 A vegetation survey is planned which calls for counting plant individuals within plots. This is practical when the individuals are recognizable on the basis of their aerial parts and not overly numerous. Many plants, regularly grass and sedge species, have their aerial parts (culms) interconnected by roots or rhizomes. Counting the aerial part is therefore counting the "organs", not the organisms.

Variation and co-variation. The carriers of information about a population and about cause and effect relationships are the attributes that manifest variation and co-variation. Example 3.1.2 One of the usual study objectives in ecology is to examine the relationship between sets of factor and response variables. The response variable that varies most sensitively conveys most information about the factor effects.

Commonness. Populations usually have unusual mutant traits represented by a few individuals. The characteristics of these traits add negligibly to the total variation in the population. In contrast, characteristics not limited to unusual traits are common and have the potential of undergoing much variation. Example 3.1.3 Abphyl corn plants are recognised by the pattern of leaf initiation and the shape of the shoot meristem. They are numerically rare in any population compared to plants with normal leaf arrangement. While the rare character "Abphyl" is useful to establish a qualitative dichotomy in corn, characters of more common type – such as height, leaf area, number of ears per plant – will help to gauge better the broad state of the corn population.

Logical dependence. If the states of a variable are logically connected to the states of other variables, they are said to be logically dependent. This condition may have a structural or functional basis. Example 3.1.4 An investigator interested in measuring leaf surface area may determine this by a planimeter. Alternatively, the investigator may decide to use leaf length or width (much simpler to measure) in lieu of leaf area, using as justification the reasoning that area is logically correlated with length and width. A manifestation of this is the more or less constant ratio of length to width during leaf maturation (Figure 3.1.4.1).

Figure 3.1.4.1 Example 3.1.5 It is normal for natural variables to be correlated if they have functionally co-ordinated responses to influences of their environment. Light interception by the forest canopy and its effect on the composition of the sinusiae (community strata) in the understory is an example. 31

László Orlóci Sensitivity and sharpness. Sensitivity is an expression of plasticity, i.e., the potential for variation. Sharpness is in contrast with being fuzzy. Example 3.1.6 It is interesting to observe that plant reproductive organs show little variation in a population, unlike plant size which may vary greatly. In these terms, the reproductive characteristics are ecologically "insensitive", but not fuzzy.

3.2 Measuring scales We categorize attributes by their type, which determines the properties of their measuring scale. Four scales are discussed: Nominal scale. The units on this scale have arbitrary order (Figure 3.2.1). They are merely labels (1,2,3, …. ) that identify the "measured" object's class affiliation. Only the logical operations of identity (=) and non-identity (≠) are defined. Example 3.2.1 Consider the variable "leaf type'' with 5 states in a sample such as shown in Figure 3.2.1. For two plant species i and k, whose leaf shapes are Xi = 4 and Xk = 5 the operations Xi = Xk and Xi ≠ Xk are defined, but the operations Xi < Xk and Xi > Xk are meaningless. Now consider how "crisp" or "fuzzy" is a comparison of two plants on the basis of the leaf shapes that they posses with attributes as given in Figure 3.2.1 and Table 3.2.1.1. Suppose that plant specimen Si has leaf type1 and plant Sj has leaf type 2. How crisply can the two plants be distinguished based on their leaf types? – 1. Proportion of the 3 attributes' states shared by S1 and S2 is 2/3 2. All proportions 2/3, 0/3, 2/3, 1/3, 2/3 0/3, 1/3, 0/3 3. Degree to which a proportion 2/3 blends into the sample (fuzziness) is 3/8 4. Degree to which it does not blend in (crispness) =5/8

Figure 3.2.1 What is the conclusion? Leaf type is a rather crisp, but far from being perfect discriminator of Pi and Pj within the sample of leaf types as given. Table 3.2.1.1 Attribute Top Bottom Venation

1 round round acute

2 round round u-shape

Leaf type 3 4 acute round flat acute none acute

5 acute acute acute

Ordinal scale. Similar to the nominal scale in being discrete, but the states have a natural order. Two algebraic operations are meaningful: identity or non-identity (=,≠) and order operations (>,X k is a meaningful operation. Importantly there is no information conveyed about actual time differences. In other words, the difference X k - X t is not meaningful as an expression of how much extra time skier k needed to complete the course. The ratio X k / X t is also meaningless.

Interval scale. This scale has all the properties of the previous scales, plus the extra property of making Xk - Xj a meaningful expression. Since this scale has no natural zero point, the ratio operation Xk/Xj is not meaningful. Example 3.2.3 An investigator, interested in the insulating effect of snow in the Arctic Tundra, takes simultaneous measurements and finds X1 = -35oC in the snow and X2 = -10oC in the air. The difference X2 – X1 = 25oC is meaningful, but the ratio X1/X2 is not, that is, -35oC is not 3.5 times as cold as -10oC. This is because the zero point of the scale is arbitrary.

Ratio (meristic) scale. This is an interval scale with a natural zero point. All basic algebraic operations are defined. Example 3.2.4 When temperature is expressed in Kelvin degrees the temperature scale becomes meristic. Given Celsius degrees X1 = -35o in the snow and X = -10o, a ratio is calcu2

lated after translation X 2  273.18 X1  273.18



263.18 238.18

1.105

This is meaningful, that is -35o C is 1.105 times as cold as -10o C.

Measuring scales may undergo change in a surveys or experiments as a consequence of explicit and implicit data transformations. Example 3.2.5 A set of 50 seeds is placed on moistened filter paper in a Petri dish. Some days later the seeds are inspected for germination. At the level of the individual seed, the variable has one of two possible outcomes: germinating (score one) or not germinating (score zero). If viewed at the level of the experiment, the binary variable will give rise to a discrete ratio variable. This variable can take on any integer value from 0 to 50.

3.3 Errors in measurements Measurement errors are inherent in the imprecision of the measuring instruments. Barefaced mistakes committed by the person doing the measurement are not that type and are excluded. True error. Consider the case portrayed in Figure 3.3.1. Leaf width is measured using a calibrated ruler. The basic unit is 1 mm. We assume that measurement is by the convention known as "rounding off". This convention requires reading the scale at the mark that appears nearest to the true dimension of the object, in this case the edge of the leaf. The measured leaf width, X = 72 mm, is an approximation to the exact width X*, which 33

László Orlóci happens to be in this case more than 72 mm. The true error in X,  X = X – X*, cannot be known exactly because X* is unknown. The main task is to *

derive an approximation to  X . *

Figure 3.3.1

Implied limits, implied interval, limiting error. When the value of X is observed in a data set, we cannot recapitulate whether the true value X* was less than, equal to, or actually greater than X. All that we can say is that X* must lie within the implied limits X 

X 2

 X*  X 

X 2

. In this  X is the

implied interval, the smallest division on the measuring scale, and

X 2

is

the limiting absolute error, symbolised by  X . After applying these to the *

case in Figure 3.3.2, we have X=7 1 mm,  X =1 mm,  X =0.5 mm. Since  X is measured in the same units as X, it is not comparable between scales. A relative quantity can be derived in the manner of  X 

X X

known as the

limiting relative error, which can be compared between scales.

Example 3.3.1 The approximate value of  is given in a handbook as 3.14159267. A mainframe computer uses only the first six digits of this number. The error 5.0 x 10 -9 for 3.14159267 and 5.0 x 10-6 for 3.14159 reveal a three order of magnitude increase.

3.4 Measurement errors in functions When large numbers of measurements are used in calculations, it is quite realistic to expect the measurement errors to cancel out. However, it cannot be determined whether (and if so, to what extent) their cancelling out will actually occur. Therefore, it is prudent, as engineers tend to think, to plan on error accumulation. Errors in functions of a single variable. If variable Y is a function of X, that is Y  f ( X   X ) , the error in Y will be a function of the error in X. The Y* function in series form is *

*

34

Statistical Ecology Y *  f ( X   X* )  f  X   f '( X ) X* 

f ''( X ) X*2 2!

 ...

It is customary to write for the true but unknown error of the function Y*  f '(X   X* ) and for the limiting absolute error Y  f '(X   X ) . The limiting relative error is Y 

Y Y

.

Example 3.4.1. The following are the limiting errors in commonly used functions:

X

f '(X ) 

1 X

Y 

b. Y = ln X

f '( X ) 

1 X

Y 

c. Y  X k

a. Y =

d. Y

 eX

X 2 X

Y 

Y X



X 2

x X

Y 

Y ln X

f '(X)  kX k 1

Y  kX k 1 X

Y 

Y k X   k X Xk X

f '(X )  e X

Y  e X  X

Y

  Y  Y Y

Errors in functions of several variables. Consider p variables X1 , X2 ,..., Xp and a function of these Y  f (X1 X2 ...Xp ) . The limiting absolute error comprises the sum of partial derivatives, Y 

Y X1

X  1

Y X 2

 X  ...  2

Y

X

X p

p

The limiting relative error is Y 

Y Y

.

Example 3.4.2 Addition, multiplication, and division are the operations performed on the following measurements and error terms: X1= 1.28, X2= 1.011, X3 = 0.19,  X1 = 0.005,  X2 = 0.0005,  X3 = 0.005,  X1 = 0.0039,  X2 = 0.00049,  X 3 = 0.0263 a. Addition. The limiting absolute error of the sum Y =X1+X2+…+Xp is the sum of the limiting  absolute errors: Y   X   X   X . The limiting relative error is  Y  Y . Numerical valY ues: Y =X1+X2+X3 = 2.481;  Y = 0.0105 ;  Y = 0.0042 1

2

3

b. Multiplication. The limiting absolute error of the product X1 X2 X3 is the product times the sum of the limiting relative errors of the individual terms: Y  Y ( X   X   X . The 1

2

3

 limiting relative error is  Y  Y . The numerical values: Y  X1 X2 X3  0.2459; Y = 0.0307; Y  Y = 0.0075 35

László Orlóci c. Division. The limiting absolute error of the quotient Y 

X1 is the quotient times the sum X2

of the limiting relative errors of the individual terms: Y  Y ( X1   X2 ) . The limiting relative error is  Y 

X Y . Numerical values: Y  1 = 1.2661, Y =0.0044,  Y = 0.0056 X2 Y

3.5 Significant digits All digits from a measurement are by definition significant. In chain calculations accuracy may decrease, and significant digits lost. The exact number of significant digits in a value Y is determined as follows. First determine the limiting absolute error in Y. After this one should declare as significant any digit in which the unit exceeds the limiting absolute error. Considering the examples given below, it is seen that the exact determination of the number of significant digits can be very tedious, particularly if the exercise involves chain calculations. An approximation is described in Simpson, Roe and Lewontin (1960, p.8). Example 3.5.1 Consider the results given in Example 3.4.2. The exact method by which significant digits are determined is shown in Table 3.5.1.1. Table 3.5.1.1

Y

Y 0.481 0.2459 1.2661

Significant

Rounded

digits

number

0.01 0.008 0.006

2 2 3

2.5 0.25 1.27

Example 3.5.2 The absolute dissimilarity of two objects A and B (Table 3.5.2) can be measured as a Euclidean distance: d(A,B)  ((X2 A  X2B )2  (X1 A  X1B )2 )1/2 . Based Table 3.5.2.1, 2 1/2 d(A,B)  ((15.28  21.01)2  (2.295  1.130) )  5.8472 and  d = 0.0003(5.8472) = 0.002

. Table 3.5.2.1 Variable

Cone A

Cone B

Length (cm) X1 Width (cm) X2

2.71 2.02

2.81 1.21

The value for the limiting error is derived in the following table: Quantity Limiting relative error 15.28 - 21.01 = -5.73

0.005  0.005  0.0003 36.29

36

Statistical Ecology 2.295 - 1.130 = 1.165 2

(-5.73) = 32.8329

0.0005  0.0005  0.0003 3.425

2(0.0003) = 0.0006

2

1.165 = 1.3572

2(0.0003) = 0.0006 32.8329(0.9996)  1.3572(0.0006) 32.8329+1.3572=34.1901 =0.0006 34.1901 34.1901

0.0006 = 0.0003 2

= 5.8472

 d = 0.002 Rounded distance: 5.85

The act of discarding non-significant digits follows the conventions of rounding off. The last digit retained in a number is a: — a remains a if it is followed by a digit less than 5. — a increases by one if it is followed by a digit greater than 5. —If the digit following a is 5, then — if all the digits following 5 are zero, a increases by one if it is odd and remains unchanged if it even; — otherwise a increases by one. Regard zero as an even number. Always retain an ample number of digits in the intermediate results of chain calculations and discard non-significant digits at the end. Example 3.5.3 The reading of significant digits and rounding off are illustrated by the examples in Table 3.5.3.1. Table 3.5.3.1 . Number 6.0318 6.0318 89.59 105.0365 0.0063725

Significant digits 2 1 2 5 4

5.0372 18572 18.572 10.32 10.32

3 2 3 2 1

Rounded number 6.0 6 90 105.04 0.006372* 5.04 19000 18.6 10 10

*Leading zeros not counted in significant digits.

DESCRIPTION for us is synonymous with measurement which, beyond the technical aspects, is the concern for measurement theory. “Measurement” is not the same as “statistical estimation” by the means of which the description of population properties is accomplished. The criteria of choice of 37

László Orlóci properties to measure or estimate emphasize relevance and recognisability without ambiguity. There are different types of measuring scales (nominal, ordinal, interval, ratio), but only the ratio scale has a natural zero point. Because of this property, we can perform on the measurements on the ratio scale all basic arithmetic operations: +, -, x, /. Furthermore, only the ratio scale is subject to measurement errors. We discussed in this connection the notion of implied limits which we use in the definition of the limiting absolute error and in the definition of the limiting relative error. We traced the flow and accumulation of measurement errors in chain calculations and determined significant digits on this basis.

38

Statistical Ecology

Chapter 4 POPULATION DESCRIPTION This is a statistical problem in most populations owing to sheer size. In all cases the problem of choice between descriptor functions has to be handled with prudence. Three families are discussed. One is the system of moments and product moments, and another A. Rényi's generalised entropy and information. The third is B.B. Mandelbrot’s fractal dimension.

4.1 Population scalars We have seen in the preceding chapter that the description of a population unit is a measurement problem, and that the precision of the description depends on the resolution of the measuring scale. Description of entire populations is a different task involving the measured attributes' magnitudes and frequency distributions.

4.1.1 Moments Figure 4.1.1.1 portrays a discrete system. Lever L, fulcrum  and the force f(X) at regularly spaced points X = 0, 1, 2, ..., s are the system's elements. The system is in equilibrium about . The quantity f(X)(X - ) is called the first central moment at point X. It is an expression of the tendency of force f(X) to produce motion in the system about . A continuous case is por-

39

László Orlóci trayed in Figure 4.1.1.2. In this, the simple lever (L) is replaced by a continuous rod of radius 1 (cross section unity) the density of which changes  according to function ƒ(X). The quantity f(X)(X -)dX is the central moment in the vicinity of point X.

Figure 4.1.1.1

Figure 4.1.1.2

As far as a discrete variable X is concerned, which has s states X = 0, 1, 2, ..., s and corresponding f. – totalled frequency distribution F = [f(0) f(1) f(2) … f(s)], the variables probability function is given by k

P(j ≤ X ≤ k =  p(X ) where p(X)=f(X)/f. Xj

The probability function P(j ≤ X ≤ k) expresses the proportion of all values of the discrete variable X that fall within the specified limits. When X is a continuous variable, the states are non-repeating. Because of this only frequency density ƒ(X) is defined for any state. The probability function is an X area function, A=P(X1 ≤ X ≤ X2) = 2 ƒ(X )dX . This has graphical representaX  X1

tion in Figure 4.1.1.3. The probability function defines a volume or hyper volume when there are two or more variables. The form of the functions for moments depends on the data used: a. The arithmetic mean  Frequencies are given:  

s 1 s  f (X )X   p( X )X f X 0 X 0 . N

N values are given, of which the jth is Xj:  

X j 1

j

N





The density function of X is given:  

F (X )XdX

X 

It is implied that

s

s

X 0

X 0

 p(X)  1 and  f (X )dX  f.

40

Statistical Ecology ab. The mth moment about 

m 

1 S  f (X )(X  )m f. X  0

m 

1 N (X j  )m N j 1

m 





f (X )(X  )m dX

X 

 m can be negative when m is odd. c. Variance

V

N 12 ( 12 22 )1/2

Some texts refer to  2 as the "population variance" and to ( ) as the "population standard deviation". To avoid confusion in the sequel, it is good practice to identify  2 as the second moment, V as the variance, and V2 as the standard deviation. 2 1/2

4.1.2 Product moments Figure 4.1.2.1 portrays the joint distribution of two discrete variables. Symbol f(X1,X2) designates the joint frequency of a state X1 in the first variable and a state X2 in the second. When the two variables are continuous, a density surface will describe their joint distribution. The relationship of two variables exhibiting a linear joint scatter (first and last cases in Figure 4.1.2.2) can be described by s a product moment.

Figure 4.1.2.1

Figure 4.1.1.3

a. Product moment  12 and covariance V12

 12 

1 S1 S2   f (X1 , X2 )(X1  1 )(X2  2 )  f. X1  0 X2  0

 12 

1 N (X1 j  1 )(X2 j  2 ) N j 1 41

Lászlo Orlóci 



 12 

1  X   f (X1 , X2 )(X1  1 )(X2  2 )dX1dX2 f. X1  2

V12 

N 12 N 1

 12 and V12 can be positive or negative. b. Product moment correlation 

12 

 12

( 12 22 )1/2



V12 (V12V22 )1/2

The values of 12 range from -1 to +1. Figure 4.1.2.2 illustrates different cases of joint scatter and corresponding values of 12 . It is important to note that 12 is a linear measures, and if the two variables X1,X2 had a non-linear joint scatter, such as the 3 rd case in Figure 4.1.2.2, 12 would be misused as a correlation.

Figure 4.1.2.2 Parasite/Vole

Table 4.1.2.1.1 ABCDE FGHIJ KLMNO

PQRST

Species X1

30000

33120

03323

03200

Species X2

24342

20013

24002

40044

Table 4.1.2.1.2 State X2

0

1

2

3

4

f(X1)

0

0

0

2

2

5

9

States

1

1

0

0

0

0

1

of X1

2

2

1

0

0

0

3

3

3

0

3

0

1

7

f(X2)

6

1

5

2

6

f. = 20

Example 4.1.2.1 Inspection of the stomach contents of 20 voles revealed the presence of two parasite species in numbers as shown in Table 4.1.2.1.1. The frequencies are summarized in Table 4.1.2.1.2. X has four states (0,1,2,3) and Y has five (0,1,2,3,4). F1 = (9 1 3 7) and F2 = (6 1 5 2 6) are observations. Nine joint states materialised of the possible 40. The following are the so called 'machine formulae' to compute moments, product moments, and the correlation coefficients: 42

Statistical Ecology 2   S   f ( X ) X      1 s 1 S      f (X )X and  2    f (X )X 2   X 0  f X 0 f.  X 0 f. .    

The numerical values are

1 

 12 

1(1)  3(2)  7(3)  1.4 20

1(1)  3(2)  7(9)  20

1(1)  5(2)  2(3)  6(4)  2.05 20

2 

282 20  1.84

 12 

412 20  2.55 20

135 

Given A and B such as

A

S1

S2

  f (X , X )X X 1

X1  0 X2  0

2

1

B

2

S1

S2

 f (X )X  1

X1  0

1

X2  0

f (X2 )X2

then the product moments for Table 4.1.2.1.2 are

A

 12 

f.

B f.

28(41) 20.  1.27 20.

32  

12 

 12 1.27   0.59 2 1/2 (  2 ) (1.84(2.5475))1/2 2 1

Example 4.1.2.2 A survey of tidal pools produced the starfish counts in Table 4.1.2.2.1. Table 4.1.2.2.1 Species Heliaster kubiniji X1 Pisaster ochraceus X2 Pisaster brevispinus X3

Sample plot 26 29 29 28 31 31 18 14 13

29 33 13

30 27 19

35 38 15

39 36 15

The moments, product moments and correlation values are given in matrix form: a. Arithmetic means  1   31  μ   2    32       3  15.286 

b. Second moments and product moments 16.857 11.857 1.286 Σ  11.857 13.714 4.429     1.286 4.429 4.776 

The values in the principal diagonal cells are the second moments and in the off-diagonal cells are the product moments. Symmetry exists so that  hi   ih . c. Product moment correlation coefficients The definition of the hi element accords  with 12  2 122 1/2 . The correlation of a ( 1  2 )

0.7798 0.7214   1  ρ  0.7798 1 0.5472   0.7214 0.5472 1 

variable with itself  ii is always +1.

Example 4.1.2.3 Assume that ground cover is estimated for two species (X1, X2) by the point quadrat method. The scores for 300 trials per quadrat are given for 20 quadrats in Table 43

László Orlóci 4.1.2.3.1. Note, the quadrats' order in the table represents a steeply increasing soil moisture gradient from xeric to hygric. It can be seen that cover is unimodal for both species over the moisture regime range (Figure 4.1.2.3.1A). Only the optima are displaced. It is a direct consequence of such a type of response that the joint scatter of the two species has a horseshoe shape (Figure 4.1.2.3.1B). The sequence of points in the horseshoe corresponds to quadrat positions on the moisture gradient. Notwithstanding the practically perfect (non-linear) functional relationship of X1 and X2, the product moment does not pick it up as such because the relationship is non-linear. This is a serious weakness when a linear measure is applied to a non-linear data structure.

Table 4.1.2.3.1 Plot Species

1

Species

2

1

2

3

4

5

6

7

8

9

1 0

1 1

1 2

1 3

1 4

1 5

1 6

1 7

1 8

1 9

2 0

3 0 1 0

3 5 2 0

4 5 1 5

6 0 2 0

5 5 3 0

6 5 3 5

6 5 4 0

7 0 4 5

6 5 5 5

6 5 6 0

5 5 6 0

4 5 7 0

4 0 6 5

3 5 5 5

2 0 4 0

2 5 4 0

2 0 3 5

1 5 3 0

1 0 2 0

5 1 5

Figure 4.1.2.3.1 A,B

4.1.3 Entropy and information The variables to be considered are qualitative and as such their states cannot be ordered. The mode is defined (the state with highest frequency), but "smaller" and "greater" relative to the modal state are not operational. These being so, products and product moments are not defined. The characteristic functions take an information theoretical context, such as Rényi's (1961) generalised entropy and generalised information. Example 4.1.3.1 Sixty plant species were listed from a site. They represent three life forms that occur with different frequencies: Life form j Frequency f j Relative frequency pj =fj /N

Annual 40

Biennial 15

Perennial 5

N 60

0.67

0.25

0.08

1.00

Variable "life form" is qualitative and the frequency distribution F is the basis of its statistical analysis. 44

Statistical Ecology

4.1.3.1 Rényi's functions Generalizations of entropy H and information I are of special interest to ecologists. The reason is the scale factor  that allows cases to be defined that correspond to different indices of diversity used by ecologists. The basic expressions can be given in the form of H 

1 1 

S

ln  pi and I  i 1

1

 1

S

ln  i 1

pi qi 1

Symbols pj and qj are elements in two s-valued probability distributions, P and Q. The distributions are identically ordered and have identical totals: S

S

i 1

i 1

 pi   qi . Unit sums are assumed, p1+ p2+ ...+ ps = 1 and q1+ q2+ ...+ qs = 1, but incomplete distributions are also permitted in special cases in which p1 + p2 + ... + ps = q1 + q2 + ... + qs < 1.The individual terms in P are defined s fi and T   fi . Symbol f i is a frequency (Example T i 1 4.1.3.1). Q may be considered a "standard" matrix, but not necessarily, to which P is compared. Values of  are freely chosen in the range from zero and up, but  cannot be exactly 1. When  comes close to 1, say .99999… or 1.00001…, the expression simplifies to Shannon's entropy function S S p H   pi ln pi or to Kullback's I-divergence information I   pi ln i . q i 1 i 1 i

according to pi 

What is then the significance of  in the expressions? As a scale variable, it defines an infinite number of possible point measures of entropy and information. Three of these have special significance for ecologists: — Entropy of order zero (=1) is maximum entropy. This is attained when F is an equidistribution (all elements are equal). — Entropy of order one ( approaching 1) is the Shannon entropy, and information of order one is one half of Kullback's (1959) I-divergence information. — Entropy of order two (=2) is the log Simpson index (described later). The order variable is useful in still other respects, namely to generate a curve on which one may determine the  point at which the value of

H or I becomes "stable". Select-

ing point diversity at that point is an advantage when comparisons involve different localities or different times. To avoid possible confusions in terminology, Brillouin's information should be mentioned. This is a limiting case for the N-multiple of entropy of order one, given by I  log2

N! f1 f2 ... fs

bits. Clearly, this is not a divergence measure. The use of symbol I is arbitrary. Note: log22 45

László Orlóci is one bit and the maximum is log2 s bits. Brillouin shows that when the fi are large, say 100 each or greater then

I/N will come close in value to Shannon's entropy function. In this, log2 e

e is the natural base (2.718281828).

4.1.3.2 Limiting distributions about P These are two limiting distributions that go with P, symbolically P(m) and P(l). They include the same number of elements and they have the same total as P. P(m) is a most dispersed distribution called an equidistribution, meaning that all s elements are equal: p1 = p2 = .. .= ps =1/s. P(l) is the least dispersed (most contagious) distribution that can be associated with P. The elements of P(l) are identically equal to 1/s, except one that is equal to 1-(s1). When the elements are viewed as possible future events, those of P(m) are the least predictable. This means that the mechanism that generates the elements is ruled by chance, disorder is maximal and the entropy is also maximal. P(l) represents maximal order and minimal entropy in Shannon's sense. Example 4.1.3.2.1 The Hgraphs of P(m), P(l) and P (Table 4.1.3.2.1.1) are presented in Figure 4.1.3.2.1.1. It is seen that in each case entropy has identical maximum value at ln s irrespective of or N. The minimum value depends on s, N, and  The ratio

H ln s

entropy in relative terms. E. C. Pielou called H  the measure of "evenness". ln s Table 4.1.3.2.1.1 Distribution

Annual

Biennial

Perennial

F

40

15

5

Total 60

P

0.667

0.25

0.083

1.00

F(m) P(m)

20

20

20

60

0.333

0.333

0.333

F(l) P(l)

1

1

58

0.0167

0.0167

0.9667

60

Table 4.1.3..2. 2 .1



H

Type of measure

0

H0= ln s

Log state richness

 lim 1*

s

H   pi ln pi

Shannon-Wiener

i 1

2

Log Simpson index

S

H  ln  pi2 i 1

*Read “  lim 1" as "  approaching one".

46

expresses

Statistical Ecology

Figure 4.1.3.2.1.1 Example 4.1.3.2.2 Rényi's generalised entropy of order  allows generalizations of the diversity concept and the definition of several well-known diversity functions as special cases of the same generic type. Three of the cases are identified explicitly in Table 4.1.3.2.2.1. They represent point diversities at the dots on the observed graph in Figure 4.1.3.2.1.1. Example 4.1.3.2.3 The frequency distribution F in Example 4.1.3.1.1 has limiting distributions displayed in Table 4.1.3.2.1.1. The corresponding entropy quantities are as follows: HO = ln 3 = 1.10, H1 = - (0.667 ln 0.667 + 0.25 ln 0.25 + 0.083 ln 0.083) = 0.823 , H2 = - ln 0.823 (0.6672 + 0.252 + 0.0832) = 0.665 , max H = ln 3 = 1.10, H1R =  0.748 (evenness) and 1.10 0.665 H2R = = 0.909 (evenness). That HR is an evenness measure is obvious from the prop1.10 erty that it comes closer and closer in value to one as P comes closer and closer to an equidistribution.

4.1.3.3 More on Rényi's functions When two or more variables are involved, each will have an associated distribution F1, F2, … . When the variables are linked by a common object on which they are observed such as the life form of competing species and the state of soil moisture regime within given area units, life form and moisture regime will have a joint distribution of their states. The bivariate (two-variable) case, F1+2, has an example in Table 4.1.3.3.1. The fij are joint frequencies and the fi and f.j are marginal frequencies. Just as before, the frequencies may be expressed as proportions pi , pi., p.i . JJust as before, an entropy quantity can be computed for each type of proportions. H ;12 

1 1 

S1

S2

ln  phi is a new quantity, the joint entropy of order alpha. h 1 i 1

Table 4.1.3.3.1

1 1

f11

States of variable Y 2 ... s f12

... 47

f

y

FX

1s y

f1.

László Orlóci

2

f21

f22

...



.

.

sx

fSX 1

fSX 2

… ...

FY

f.1

f.2

...

f

2s y

.

f2. .

fSX SY

fSX .

f.SX

f..

The term "information" in its present context implies mutuality. When P is the joint distribution of X and Y and when Q is the distribution expected, i.e. qhj 

fh. f. j f..

, I ;X ,Y will measure

the strength of association between X and Y. When we take approaching 1, then the relative strength of association is defined as H  HX  HY  HX Y  I / f.. in entropy terms. Abramson's (1963) term for

H XY is mutual information (two-dimensional case), but it is

better to call it mutual entropy to avoid a conflict of terms. In the multivariate case (three or more variables present), HXY incorporates all interaction, while mutual information is that small amount of information that all variables share and which is expected to rapidly decline in relative terms when the number of variables increases dramatically. Kullback's term for the interaction information is a one-way information divergence. HX and HY are main effects, and HX+Y is the joint effect. To further complicate matters for the student, an additional relative information quantity has to be recognised, the entropy difference HX|Y = HX – HXY, which is specific for variable X, given variable Y. Abramson's term for H X|Y is equivocation information. A further point to be made is concerned with the case of incomplete distributions s

s

p   q  1 . When this happens, provision has to be made for incompleteness by way  1 j

j

j

j 1

of using the sum of the observed relative frequencies as weights in the expression qaj

s  I  

1

 1

ln

j 1

paj 1

s  qj

. For purposes of comparison, HXY may be expressed in the manner of

j 1

  1d XY

2 XY

known as the coherence coefficient. In this, d12 =

HX Y  HXY

H

is known as Raj-

X Y

ki's metric.

The relationship of the different H quantities is displayed in the Venn diagram of Figure 4.1.3.3.1. This Figure is not drawn to scale for any of the examples already discussed. It is clear from that HXY cannot be greater than the smaller of HX and HY, and further that

HX Y  HX|Y  HY|X  HXY . It is also clear that the maximum value of H XY is equal to the lesser of ln s1 and ln s2. Venn diagrams are defined also for the information divergence I

48

Statistical Ecology

 . This has minimum value at zero when P and Q have element-by-element identity, and maximum value when both P and Q are most conta-

gious with the

T  s 1 T

quantity placed in offset positions. So far only cases

of involving one or two variables were discussed. The 3-variable case is considered later in Chapter 6. Example 4.1.3.3.1. Thirty plant species are classified according to life-form (variable X1) and stratum of occurrence (variable X2). The raw scores are displayed in Table 4.1.3.3.1.1 and the frequencies are given in Table 4.1.3.3.1.2 . The marginal F1, F2 and joint distribution F have structural significance.

Figure 4.1.3.3.1 Table 4.1.3.3.1.1 Life-form X1 Stratum X2

111111111112222 222222233333444 111112223331222 222222333333333 Table 4.1.3.3.1.2

Life-form/ Stratum Phanerophyte Chamaephyte Geophyte Hemicryptophyte

Code 1 2 3 4 F2

1 5 1 0 0 6

2 3 9 0 0 12

3 3 1 5 3 12

F1

P1

11 11 5 3 30

0.367 0.367 0.166 0.100 --

P2

0.2

0.4

0.4

---

1.000

The following are entropy quantities and values derived from these defined for the joint

frequencies: H1 =1.2646, H2 =1.0549 H12 =

5 / 30 3 / 30 5 3 ln + ... + ln = 0.4436 11 6 3 12 30 30 30 30 30 30

H1+2 = 1.2646 + 1.0549 - 0.4436 = 1.8759

d12 =

12 = (1 - 0.76352)0.5 = 0.6458 49

1.8759  0.4436 = 0.7635 1.8759

László Orlóci The possible maximum of H12 is 1.0549, the value of the lesser of the main effects. The absolute maximum is ln 3. It can be numerically shown to be this way based on the rearranged frequencies in Table 4.1.3.3.1.3 For the rearranged table we have: H1 = 1.2069 H1+2 = 1.2069

H2= ln 3 =1.0986 H12 = 1.0986

d12 = 0.0897

12 = 0.9959

Table 4.1.3.3.1.3 10 0 0 0 F2 10

0 10 0 0 10

0 0 9 1 10

F1 10 10 9 1 30

So far the examples considered the measurement of entropy in F. When Q is given, I is also defined. It should be emphasized that entropy and information of order one can be partitioned into perfectly additive components. We will show how the user can take advantage of this property to find diversity partitions specific to environmental effects. The models involved are very much the same in logic as the fragmentation of the Venn diagram into area segments. Just as the diagram itself becomes unwieldy with increasing numbers of variables, the partition functions also become complicated. This takes the discourse to the presentation of thoughts on diversity and complexity later on in the text.

4.2 Theoretical distributions Theoretical distributions are used to formalise the description of variables under specific regularity conditions. These conditions are assumed to exist. To this extent, theoretical distributions are external to the problem and their importation does not generate new data nor do they increase the information content of the observed data set. The Poisson, Bernoulli , and the Normal are typical cases.

4.2.1 Poisson distribution  X The function that generates this takes the form po(X) = e λ . It is such 

X!



that

 p (X ) = 1. o

The Poisson variable X has states 0,1,2, ... and mean 

X 0

50

Statistical Ecology

A revealing property of po(X) is the equality of mean and second moment. The value of the po(X), ranges from 0 to 1. It expresses in relative terms the number of cases expected to contain exactly X counted items. This is a reasonable expectation for a biological variable such as the number of plants counted within area units provided that a plant's occurrence in the area sampled is a completely random event. This expectation of random arrangement on the ground no doubt will strike the practitioners of ecology as a highly arbitrary condition for two reasons. First, plants tend to occur in a patchy pattern. The intensity of perceived patchiness or randomness will depend on the sampling unit size and shape. Example 4.2.1.1 Consider fitting the Poison distribution to observed frequencies f(X) as given in Table 4.2.1.1.1, taken from Table 1.4.2.1. The sample mean of X is 2.0645. Poisson frequencies are calculated next by substitution of the known value of  into the probability function po(X)= e

2.0645

2.0645X X!

and by letting X take the values 0, 1, 2, 3, 4. Note that the relative frequency of count "5 and over" is equal to the difference 1 - (po(0)+ po(1)+ po(2)+ po(3)+ po(4)). Note further that fo(X) = 62 po(X) for any X. Table 4.2.1.1.1 X f(X) po(X)

0 6 0.127

1 15 0.262

2 19 0.270

3 14 .186

4 7 0.092

>4 1 0.059

fo(X)

7.9

16.2

16.7

11.5

6.0

3.7

4.2. 2 Bernoulli distribution Consider a variable that has two states, such as a seed’s viability: viable (success) and not viable (failure). If success is a completely random event, the count of successes X in n trials (n seeds place on moistened filter paper in a Petri dish and the number of seed germinating after a set time counted) will be a Bernoulli variable with probability function po(X) 

n! X !(n X)!

p X qn - X

X has n+1 states 0, 1, ..., n, and po(X) gives the probability of exactly X successes in n completely random trials, given constant p (the probability of succeeding once in one random trial) and q = 1- p. The distribution mean and second moment are functions of n, p and q in the manner of = np,  = npq for X and = p. 51

László Orlóci Example 4.2.2.1 Four seeds of a species were placed on moistened filter paper in each of 100 Petri dishes. Symbolically: n= 4 and f. = 100. After one week, the germinated seeds were counted. Table 4.2.2.1.1 contains the data. The total number of seeds examined is nf. = 400, out of which 246 germinated. We find the germination rate p = 246/400 = 0.615 and q = 1 - 0.615 = 0.385. The fitted relative frequencies in Table 4.2.2.1.1 accord with po(X) =

4! 0.615X 0.3854- X X!(4 X)!

The fitted frequencies are fo(X) = N po(X): Table 4.2.2.1.1 X f(X) po(X)

0 3 0.022

1 10 0.140

2 36 0.336

3 40 0.358

4 11 0.143

f. 100 1

fo(X)

2.2

14.0

33.6

35.8

14.3

100

4.2.3 The Normal distribution A distribution is said to be Normal if its density function is of the form

F(X )  Be

 x   /2 2 2

In this e is the natural base (2.7182818) and B is a scale factor that determines the height of the normal curve at X =  (see Figure 4.2.3.1.) When B

1 2

2

then the total area under the normal graph is equal to 1. This 

is because



 x    /2 2 2

Be

dX  (2 2 )1/2

X 

It is seen that the Normal distribution is defined by two non-trivial constants,  (the mean) and  (the square root of the second moment). In the case of p normal variables, the joint distribution has density function

f (X1 , X ,..., Xp )  Be0.5(Xμ)'Σ 2

Figure 4.2.3.1

52

1

(Xμ)

.

Statistical Ecology

In the exponent of e, X is a vector of p observations, μ a vector of p means, and Σ is a p x p matrix of product moments:    1   μ 2  .      

 X1    X X   2  .   X  p 

When B-1 is equal to Σ

1/ 2

11   Σ   21  .   31

12 ... 1 p    22 ...  2 p 



.

... .   32 ...  3 p  

 2  the volume under the normal surface 0.5 p

is unity. Table 4.2.3.1.1 Class i 1 2 3 4 5 6

Class limits m 45.5 - 50.5 50.5 - 55.5 55.5 - 60.5 60.5 - 65.5 65.5 - 70.5 70.5 - 75.5

Midpoint Xi 48 53 58 63 68 73

Frequency f(X) 5 11 23 31 10 1 f. = 81 Example 4.2.3.1 This example describes the construction of a frequency table such as Table 4.2.3.1.1. The raw data is a set of 81 tree height measurements. The steps: A. Find the implied range (IR) in the data. The range R is the difference between the smallest and largest value (29 m in the example). Because each height was measured to the nearest meter, the implied interval  X is 1m, the limiting absolute error



is 0.5m, and the implied

X

range: IR = R +  X = 30m. B. Determine the number (s) of frequency classes. The number should not be too small nor to numerous. Making the classes too broad or too narrow can blur the distribution structure. In the example s = 6. C. Determine the class interval CI= IR/s. This is 5 m in the example. D. Construct class limits. The limits of the first class are X - X =45.5m and X - X + CI = 50.5m. In these, X is the lowest value in the data (46). Other class limits are found by incrementing the upper limit by the constant class interval. E. Tally frequencies. Tallying assigns each observation to a frequency class. The number assigned within a class is the class frequency f(X). Example 4.2.3.2 Having constructed the frequency table, and considered that X is continuous, the question of how well the Normal distribution may fit the data is legitimately put. The fitting normal densities to class frequencies: a. The normal density function with sample values substituted for

f o(X)  Be

(X 60.03734)2 /(2(30.10975))

53

.

 and 

2

is

László Orlóci An ample numbers of decimals are included to facilitate long-hand reproduction of the computer results. b. Fitting ƒ(X). The method presented finds the optimal B value by iteration. The optimal B s

minimizes the sum of squared deviations, Q   ( f (i ) f o (i))2 . Follow these steps: i 1

o 1. Begin with a low initial value of B, say B=1. Compute an ƒ (X) value for each class midpoint X, and compute a value of Q. Plot this value of Q as the 1st point in a the Q on B graph (Figure 4.2.3.2.1).

Figure 4.2.3.2.1 o 2. Increment B in small steps, say 1, and compute a new ƒ (X) and new Q values at each step. Plot the Q values in the Q on B graph. Stop when Q starts to rise. 3. Reduce B by the last step size and start incrementing it in smaller steps, say 0.1. Stop the iterations when the desired accuracy is reached. A small final Q indicates a close fit. The o last value of B is 29.4 in the example at step size 0.1. The last set of ƒ (X) values are the fitted normal densities (Table 4.2.3.2.1). These are plotted in Fig. 4.2.3.2.2 as a continuous curve. The observed frequencies are shown by the vertical lines in the graph. Note the total area under the normal graph at the given B is 29.4(2 30.10975)1/ 2 square units. Table 4.2.3.2.1 Class midpoint X

48

53

58

63

68

73

f(X)

5

11

23

31

10

1

ƒO(X) at B=29.4

2.65

12.92

27.44

25.41

10.26

1.80

A further point is concerned with the multivariate normal distribution. We observe that the density function incorporates the product moment matrix Σ in (X - μ)'  1 (X - μ) (the equation of an hyperellipsoid in p-dimensions). Observing this fact, it is clear that multivariate normality implies an ellipsoidal joint dispersion of the variables. Two facts point against this as a natural condition: a. The elements of Σ are linear measures of relationship (correlation, covariance) (Figure 4.1.3.2.2). These cannot detect non-linear relationships. When the relationships are substantially non-linear (Figure 4.1.2.3.1), Σ will have been misused, and by extension, the normal density function will have been misapplied.

54

Statistical Ecology

Figure 4.2.3.2.2 b. Multivariate normality implies an ellipse or ellipsoidal shape of the joint scatter. This is so with variables that respond linearly to their environment. But such a response is highly unrealistic in a full-dimensional niche in Nature. Much more realistic is to expect non-linear responses that are non-monotonic. If the optimal performances are shifted along the niche axes, the variables will have a curved joint scatter, horseshoe-shaped (Figure 4.1.2.3.1) or more complex in which case the global normal assumption has no justification.

4.3 Descriptors of distribution shape Shape is usually described in comparative terms. Based on Fig. 4.2.4.2 we can compare the observed graphs to the normal. It is seen that the observed distribution is somewhat higher than normal and asymmetrical with mode (point of maximum observed frequency) on the right side of the mean (distribution skewed left). This case is described by the term "leptokurtic". Two shape functions are commonly used to identify the type of skewness and kurtosis:  1 

3

 

2 3/2

4

and  2 

 

2 2

In some distributions, but not

always, a zero  1 indicates perfect symmetry about the mean, a zero  2 normal height,  1 less than zero skewness to the left,  1 larger than zero skewness to the right,  2 less than zero flatter than normal (platykurtic curve), and  2 larger than zero higher than normal (leptokurtic curve). Example 4.3.1 The following are based on data in Example 4.2.3:  = 30.1 m2; =-58.2 m3; =2508.3 m4;

1

= -0.35;

normal are indicated. Regarding

 2 = -0.23 . m,

Skewness to the left and a height lower than

consistency with the definitions in Section 4.1.1 is es-

4 2 2 sential, noting that a   ( ) and  3   2 .

To make shape measurements more general, consider a forest edge line. It will not be a simple smooth curve, but a complex ragged one, and as such, its complexity may have importance to the ecologists when trying to 55

László Orlóci

interpret the biological edge effects. Analogue problems to measuring a graphs shape complexity are discussed in the literature under Mandelbrot's (1967) fractal dimension (also called Hausdorff dimension). Example 4.3.2 describes the technique for graphs.

Figure 4.3.2.1 Randomly generated series of 1000 numbers at 0.1 steps. To calculate a D, do the following: Example 4.3.2 What is a fractal dimension? It is an index that can take on values from 1 to 2 depending on the graphs complexity. The lower value corresponds to a completely orderly, smooth graph. The upper value indicates a completely jugged, disorderly shape such as the Brownian trajectory of a molecule in which the turns and twists are generated by a perfectly random rule. The fractal of a graph (D) is related to the power law, linking graph length L(r) with scale unit r, in the manner of L(r) ~ r1-D. The exponent 1-D is negative if L(r) increases as r decreases, which is the case if the shape is not smooth. By contrast, for smooth shape, D tends to 1 as r is tending to zero. Taking logarithms, log L(r)=(1-D) log r, and performing linear regression analysis, D is approximated by the one complement of the regression coefficient b, in the manner of D=1-b. The student is referred to the works of Mandelbrot (1967,1977) and Schroeder (1991) for details on theory, and to Palmer (1998), Kenkel and Walker (1993), Scheuring (1991), and Orlóci et al. (2002) for details about ecological applications. The following example illustrates the arithmetic performed on the a random trajectory line in Figure 4.3.2.2. 1) Step through the density graph at different calliper settings and record the r and L(r) values. Calliper setting r and length L(r) are in graph units. The total length of the graph is 32439 in graph units. In the example the initial calliper width is 1 incremented in steps of 1 upto 100. For this a computer program was used. 2) Perform regression analysis of "log L(r)" "log r" to obtain an equation such as in Figure 4.3.2.2.

Figure 4.3.2.2 Regression analysis of the line length measurements.

56

Statistical Ecology 3) Calculate the fractal dimension in the manner of D=1-b ≈2.0. This value is concordant with the very complex shape of the graph.

4.4 Common transformations Variables often do not have the properties that may facilitate better their statistical analysis. On such property is symmetric distribution assumed. Others include the functional independence of  and  , linearity of response, and so forth. Appropriate data transformations can force the data set into closer compliance with the assumptions. But this may be at a price: factor to response or response to response relationships are recast, the one-to-one correspondence of original and transformed variables may be lost, or biologically meaningful units of measure may be replaced by meaningless quantities. Simple cases are illustrated by example: 2

Example 4.4.1 A plant taxonomist wishing to compare leaf shapes among species, measures 'leaf length' and 'leaf width'. But these as such are measurements of 'leaf size' and not 'shape'. When she replaces the linear measurements with their ratio

leaf length

a new

leaf width

variable is created more likely to be indicative of shape than are length and width individually. Example 4.4.2 Given n measurements X1, X2, ..., Xn in the same unit, standardization Y = X/S will make X unit free if S were defined as the total of X, highest of the n value of X, the range, or the standard deviation. Example 4.4.3 New variables are sometimes the outcome of logarithmic, square root, angular (usually inverse sine), probit, or other kinds of transformation: a. Logarithmic transformation. This is usually to base 2.71828. It compresses the values of X unevenly, which is an advantage when extremely large values occur among small values in the data set. Additivity improves provided the responses are multiplicative. Exponential X (such as popuation growth under certain conditions) is linearize. b. Inverse sine transformation. This is used for the trigonometric rescaling of proportions in the 0 to  / 2 range. c. Probit transformations. Proportions are translated into probability points according to some chosen probability law, like the Normal. d. Square root transformation. This applied in conjunction with arcsine transformation can make the mean and variance of a variable independent. In each case natural structures and relationships are likely to be changed in the data and new structures implanted as artefacts. Example 4.4.4 Data coding is a transformation which makes a hard-to-manage variable more manageable. Given N measurements X1, X2, ..., XN with mean  and second moment 2

 , the following forms of simple coding are common: 57

László Orlóci a. Adding a constant k to each data element. The coded data X1 + k, X2 + k,..., XN + k will 2

have mean  + k and second moment  . b. Multiplying each data element by a factor L. The coded data LX1, LX2, ..., LXN will have 2

mean L  and the second moment L2  . c. Adding a constant k to each observation and multiplying by a common factor L. The coded 2

data L(X1+k), L(X2+k), ..., L(XN+k) will have mean L(  +k) and second moment L2  . d. Decoding proceeds in the reverse order of coding so that the operation done last in coding is done first in decoding For example:

Mean: if  * = L(  + k) then  

*

L

k

Second moment: if  *2 = L2  then  2  2

 *2  k L2

* * Product moment: if  12 = L1L2  12 then  12  12 LL

1 2

Example 4.4.5 Consider X = (X1 X2 Xp) a set of correlated normal variables and Y=(Y1 Y2 ... Yt) which are the principle axes of the ellipsoid whose equation is (X - μ)' Σ-1 (X - μ) . Clearly the data structure of X is retained by Y. But t will likely be much less than p.

WHEN WE FULLY DESCRIBE populations we in fact are describing structures. But the structures we will have described depend on the descriptor function. This is so vividly true when we juxtapose moments and entropy or product moment and information. So any population has potentially as many distinct structures to be described as the number of descriptor functions we care to apply. This requires us to exercise reasoned choices. A salient point in this is the fact that while products and product moments assume order in the measurements, entropy and information does not. This is why the arithmetic mean, variance, covariance, and correlation coefficient are only valid descriptors with ordered variables. Entropy, information, the coherence coefficient, and Rajski's metric are higher level descriptors applied to frequency. Theoretical distributions represent special cases. When legitimately assumed, a theoretical distribution can be most useful to associate observations with probabilities. Different variables have different probability distributions. The Poisson distribution may be legitimate in the case of count data when the presence of the counted objects within the sampling units, such as airborne pollen grains on the exposed object plate, is a completely random event. Counts of success (or failure) in fixed numbers of trials, such as the number of seeds found germinating out of a fixed number placed in a Petri dish, will have the Bernoulli 58

Statistical Ecology

distribution, provided that success and failure are random events. The distribution of continuous variables is often assumed to be Normal. Not unexpectedly, biological variables will often not fit the theoretical mould and require handling as unique cases. To determine how closely a theoretical distribution fits the observed data, theoretical probabilities or probability densities are compared with the observed proportions. Techniques were described for this. The case of the normal distribution was detailed and departures from it were discussed. Data transformations were considered which bring a variable’s probability distribution closer to what is assumed by the intended analysis.

59

László Orlóci

Chapter 5 SAMPLING Taking units (a sample) from the population means “sampling”. Taking a sample is in lieu of complete enumeration prevented by large population size. The sample being the basis of estimation and inference of the population descriptors, the sample must obey strict rules of selection. This section presents both the rules and the specifics of implementation in typical cases.

5.1 General When the tenets of statistical sampling are followed, units will be selected at random. This allows every unit of the population to be chosen with an equal probability. Random sampling is the impersonal, mechanistic way of going about the task of sampling, allowing blind fate to hand to the ecologist the experimental materials. But random selection requires a complete sampling frame, such as the telephone directory in public opinion surveys or numbered identification labels attached to pots in a greenhouse experiment with plants, which cannot be constructed if the accessibility of the population units is hindered. This leads to making unit selection a matter of personal preference rather than chance. The taxonomist's choice of "type specimens" or the ecologist's "homogeneous" vegetation units are typical examples. Irrespective of the technique of selection, the sample will be the sole source for information that a statistical analysis can reveal about the population. 60

Statistical Ecology

A sample of n units chosen from the population of N units will be one of many possible samples each of which could have been taken should the sampling rules dictated it that way. In random sampling the sample chosen will be one of

N! n!(N n)!

different but equally probable samples. There is in

this sense an emergent property V, the sampling variance, associated with the sample actually taken. The magnitude of V determines a sample's accuracy. V can be estimated. The statistician's way of estimation is by falling back upon some umbrella theory of mathematical statistics concerning the sampling behaviour of averages. Most conveniently to the theoretician this behaviour is assumed to be in the manner of the Normal distribution. A more pragmatic approach foregoes the assumption of what may or may not be the case, and derives an estimate for V in repeated sampling. The sampling variance may be defined for any sample property simple or complex. The sample mean is as simple as a single number (scalar). But entire structures could be involved as complex as the description of the trophic linkage of plant and animal species within a community. Be it simple or complex, the property that varies the least is the most accurate, but not necessarily the most desired sample descriptor. The sample size at which the sample structure reaches stability is the optimal sample size.

5.2 Sampling frame The implementation of statistical sampling requires a sampling frame. The sampling frame identifies the units in sufficient detail that they can be located. Example 5.2.1 One hundred plants are grown in a greenhouse. An investigator is interested to studying biomass, but he does not want to sacrifice more than 10 plants. He thus faces the problem of finding a representative sample of 10 plants. For this he tags each plant with a number and then selects 10 plants according to random numbers. The sampling frame is the set of tag numbers. Example 5.2.2 The plant taxonomist interested in the study of variation in fruit colour of hawthorn creates the sampling frame by ground mapping of the hawthorn specimens. Example 5.2.3 In a biological investigations of cold resistance in Metrosideros polymorpha, random sampling were rejected in favour of preferential sampling. The specimens selected were the once accessible and considered suitable for plant physiological study. Example 5.2.4 A study of the behaviour of a predator at a watering hole is planned. Not all animals could possibly be observed, so the problem of selection arises. The method involved selecting of one lion at random from among the first r arrivals, and then the selection of every rth thereafter. The sampling is randomly laid systematic. Example 5.2.5 A structural study of plant communities is intended on the basis of species interactions and changes in the interaction pattern relating to environmental stress. After 61

László Orlóci the site boundaries were mapped, a rectangular grid is laid over the site. This grid guides random sampling. The arbitrariness of the definition of a population unit as a grid unit is obvious.

5.3 Simple random sampling To implement simple random sampling, a complete sampling frame is required. All C possible samples of n units are given an equal chance to be chosen. Example 5.3.1 Random sampling requires access to all units in the population. The units are marked and selection is by random numbers. The following algorithm generates quasi-random numbers U1 . How does it work? The user specifies the range, for example a = 1 to b = 12, and an arbitrary seed value fo between 0 and 1, for example 0.5284163. The following calculations are undertaken in sequence: 997fo = 526.831052 , f1 = 0.831052 (the fractional part of 997fo), U1 = INT[f1(b-a+1)] = 10, 997f1 = 828.558844,, f2 = 0.558844 (the fractional part of 997f1), U2 = INT[f2(b-a+1)] = 7, 997f2 = 554.216744, f2= …

This is carried until the desired number of random numbers were generated. Note that INT is a function that finds the integer portion of a number by truncation. The period length is 500000 before the sequence starts repeating itself. Example 5.3.2 Should an investigator wished to take a random sample of 4 units from a population of 12 units (sampling intensity f= 4/12). The sampling frame is a simple list as shown in Table 5.3.2.1. Random numbers are obtained from a table or by the algorithm described. Based on the latter, the units chosen are 10, 7, 6, 11, corresponding to seed 0.5284163. Table 5.3.2.1 Unit label

1

2

3

4

5

6

7

8

9

10

11

12

Unit

A

B

C

D

E

F

G

H

I

J

K

L

5.4 Stratified random sampling Stratified random sampling is a natural extension of simple random sampling when natural strata exist in the population. The basic idea is quite simple: – identify the strata; – define total sample size, apportion sample between strata; – implement simple random sampling within each stratum.

62

Statistical Ecology

Assume that the hth stratum contains Nh units from which nh are to be selected. Sample size allocation is in proportion of either the stratum size or stratum variance:

5.4.1 Allocation by stratum size Given k strata and stratum size Nh for the hth stratum, the proportion of sample size and variance stratum size is kept constant such as n n n where n = n + ... + n . Accordingly, the sample size is  ...   1 k 1

k

N

N

1

nh 

k

nNh k

 Nh

N

. For this to be working, the stratum sizes have to be known.

h 1

5.4.2 Allocation by stratum variance The proportion of sample size nh and stratum variance Vh weighted by stratum size Nh is kept constant such as n1  ...  nk . Accordingly, the samN1V 1

ple size is nh 

nNhVh k

N V h 1

N kV k

. For this to work, the stratum sizes and variances

h h

must be given. Example 5.4.2.1 Consider 24 units A,B,C, ..., X arranged in two strata traversing three other strata. The sampling frame is a stratified list as given in Table 5.4.2.1.1. Table 5.4.2.1.1 A D

B

C

G

H

E

F

L

M

N1=6

I

J

K

Q

R

N

O

P

U

V

N2=10

S

T

W

X

N3=8

With sampling intensity set to 0.5, the sample size n is 12. If equal variability is assumed within each stratum, sampling by proportional allocation is justified. The sample sizes are:

n1

12

6 24

3, n1

12

10 24

5

and

n1

12

8 24

4 . The mechanics of sampling

within the strata is the same as in simple random

5.5 Randomly sited systematic sampling This technique requires a pivotal unit chosen at random. Starting with the pivot, other units are selected at regular intervals. The pivot is selected by one of two methods: 63

László Orlóci

Method I. Select pivot from the entire population at random; this requires a complete sampling frame. Method II. Select pivot from among the first r population units at random; the sampling frame is partial. In either of these the sampling interval r depends on the sampling intensity f. Further, r=1/f and sample size n=N/r. Take the nearest integer for n if r is a fraction. Example 5.5.1 Consider 10 units arranged in one line, for which the sampling frame is a simple map: A B C D E F G H I J. The sampling intensity is 0.4. This corresponds to sampling interval (r) of 3 and sample size (n) 10/3. So take 3 or 4 units. In Method I the pivot would be selected by drawing a unit at random from the set of 10. There are 10 possibilities for a pivot but only four distinct samples of sizes, depending on the pivot (Table 5.5.1.1). Only when N is an exact multiple of n will the sample sizes be the same. In Method II the pivot will be one of the first set of four individuals (A B C D). The number of possible samples is 4. Methods I and II give us in identical samples, but these samples have unequal probabilities of being drawn (Table 5.5.1.2 and Table 5.5.1.3). Note: where N/n is an integer, Methods I and II are equivalent. Table 5.5.1.1 Pivot

Units in sample

A B C D E F G H I J

A,E,I B,F,J C.G D,H A,E,I B,F,J C,G D,H A,E,I B,F,J

Table 5.5.1.3 Sample A,E,I B,F,J C,G D,H

Frequency

Frequency

(Method I)

(Method II)

3 3 2 2

1 1 1 1

5.6 Multistage sampling Sampling is often done in two or more stages. This is typical when the units are aggregates (supra-individuals) of small units. Insect nests are an example. Assuming that there are N large units with M small units within each large unit, two-stage sampling begins by selecting n of the large units, and m of the small units within each of the n large units. The total sample size is m x n. 64

Statistical Ecology Example 5.6.1 Population productivity is studied in the understory of a forest. The investigator randomly locates n large plots and measures the biomass of the understory species within m small plots of each larger plot.

5.7 Preferential sampling Much of what is known about biological processes, organisms, populations, and communities is the result of work which have used preferentially selected units. The term “preferential” implies that the choice is not left to chance, but to judgement of the units being (i) typical or (b) the only ones accessible. From a statistical point of view, preferential sampling is subjective and it may not supply reliable information about the range and average variability of the population. Example 5.7.1. The classification of the plant and animal kingdoms are based chiefly on the morphological and genetic study of type specimens. These are collected without the aid of a formal statistical sampling method.

5.8 Sample optimality The previous sections were mainly concerned with unit selection. Suppose that one wished to recognize plant groups of uniform inheritance. How many type specimens should be collected? This question has to be asked whenever the objective is estimation, targeting population values, intrinsic structures, or structural connections. In either of these the sampling goes on, taking new units one after the other, but when to stop? A simple rule applies: the sample must be judged optimal. The decision is based on the level of the sample’s success in returning a stable estimate for the target population value or population structure.

5.8.1 Quadrat-based estimation It is interesting to observe the assumption of an idealized medium implicit in conventional texts on statistics when they discuss sampling theory. These texts always postulate population units that are unambiguous for identification and measurable with no exception. Complete accessibility is taken for granted, and the sampling environment is narrowed to a state in which random sampling is possible. With these assumed, sample size becomes the sole determinant of sampling accuracy, and sample size determination reduces to weighing a variance related requirement such as SV 

VX n



VX N

against the cost of sampling. In this equation SV is the sam-

pling variance, and VX the variance of X, N is population size and n sample 65

László Orlóci

size. The SV equation is a statement of the obvious: as n increases, SV decreases. The equation of SV as given is of course not much use for the practitioner when N and VX are not given. Furthermore, when area units are used, SV is sensitive to quadrat shape and size. Sensitivity to quadrat shape and size creates an anomalous situation to which the solution is process sampling. In this, sample size, quadrat shape and quadrat size are regarded as variables. It is likely that long, rectangular quadrats reduce VX significantly, but this will not be an advantage if it is done at the cost of blurring structures in the sample.

5.8.2 Quadrat-based structure detection Two characteristics that set structure detection apart from parameter estimation is the potential complexity of structural descriptions in a fuzzy sampling environment. This is typical when the medium is an aggregate, such as the natural vegetation that exists spontaneously in a continuum and has no well defined natural units. What is an optimal sample then? A practical definition of sample optimality is based on the stability of the sample estimates. This is measurable in process sampling. The proposition in process sampling is reminiscent of Poore's (1962) successive approximation approach (PSA). It has the hallmark of a flexible approach, which allows the sampling decisions to be influenced by the results of concurrent data analysis. Indeed, in PSA the sampling becomes a process in which the step-by-step expansion of sample size is intricately tied to the evolution of the sample description. At each step the stability of the description is judged. When stability is detected, that is when the sample description ceases to change significantly with additional sampling effort, the sampling is stopped. Example 5.8.2.1 The stability of a sample’s descriptions is equivalent to the stability of the sample structure being described. The data set underlying Figure 5.8.2.1.1 contains cover/abundance estimates for 54 plant species within randomly selected quadrats of 5 m side length each in a transect from sub-boreal vegetation. Process sampling is carried up to 42 square quadrats in steps of three quadrats between analyses. The 5 environmental variables recorded on each quadrat include elevation, exposure, slope, soil depth, and soil tex2 1/2 ture. Figure 5.8.2.1.1 portrays changes in a stress function  VE  (1   (D;Δ)) . This is written for the similarity expressed by  (D; Δ) comparing vegetation structure D to the en-

vironmental structure with given sample size n. D and  represent n x n distance matrices with a characteristic element s

e

h1

h1

d2jk   (Y hj Y hk )2 and  2jk   (X hj  X hk )2 66

Statistical Ecology Symbols s and e indicate the number of species and the number of environmental variables in the sample; h is subscript for species or environmental variable, and j,k is a label for a quadrat pair. The values of

 VE

and  (D; Δ) range from 0 to 1. Although random sampling

continued until 42 quadrats, the sample at 18 quadrats would be sufficient to recover more or less the same structural information. It is important to know: structural stability (the levelling off of the

 VE

graph) is not the same as structural sharpness  (D; Δ) .

Fig. 5.8.2.1.1

THE METHOD OF SAMPLING has consequences in the results of data analysis. The different sampling designs serve different purposes. In simple random sampling, the random selection of units gives every member of the population an equal chance to be selected. But random sampling cannot be implemented without a complete sampling frame. This means an unhindered access to every unit in the population. Stratified random sampling involves pre-sampling stratification. After that problems of random sampling still exits, because the strata have to be sampled at random. Randomly sited systematic sampling is less problematic. Different criteria exist regarding the determination of the optimal sample size and the level of stability of the sample’s properties. These can be variance based or concerned with the sample structure. We paid special attention to preferential sampling which targets the “typical units” in the population. Preferential sampling happens to be the method of the Great Masters instrumental in the discovery of the most fundamental truths about governance in the physical environment as well as in the biota.

67

László Orlóci

Chapter 6 SAMPLE TO POPULATION We cannot know the exact value of the population parameters. We can only estimate what they may be based on the sample. Statistics has different mathematical functions (estimators) to do the estimation. These range in complexity from the simple, such as the arithmetic mean, to the complex, such as the information based structural functions. Whatever is the estimator, it must satisfy specific criteria, or be biased.

6.1 Estimation Estimation, a specific term for inference, requires a suitable estimator (see Chapter 4). The actual numerical value obtained is an estimate. We use symbol  for both estimator and estimate, and  for the population value when there is no need to be specific about exact functional form. Ideally, the estimator should be consistent and unbiased, and it should have minimum sampling variance.

6.1.1 Consistent estimator  is said to be a consistent estimator if on average its value comes closer and closer to  as the sample size n comes closer and closer to the population size N. Figure 6.1.1 illustrates the general principle in a fictitious



case. The value of  2    



2

is traced corresponding to increasing sam-

ple size (Section 5.9). 68

Statistical Ecology

6.1.2 Unbiased estimator  is said to be unbiased, if its expectation is  symbolically: C



i

= C Read this: "the mean of means of all possible samples is equal to the population value  ". C is the number of possible samples of size n that can be taken from a population of size N (see Section 5.1). E( ) =

i 1

Figure 6.1.2.1

6.1.3 Minimum sampling variance The estimator  whose sampling variance C

V 

(   )

2

i 1

i

C is minimal is said to be an efficient estimator.

6.2 Estimation of entropy E. C. Pielou recommends the Brillouin function (Section 4.1.4) as smallsample estimator and not Shannon's. Pielou’s reasoning is based on the property that Shannon's entropy is a mathematical approximation to Brillouin's. This distinction is unnecessary for the purposes in hands. In the following Shannon’s entropy will be taken as a special case of Rényi’s generalised entropy. Symbol H will be used for Shannon’s entropy of the sample and symbol for Shannon entropy of the population. Accordingly, we put the expectation E(H) =  with no change in the general form of the function for H or  . We base the estimation upon counts of objects of there are s different kinds, such as individual plants of s species within k area units or pollen 69

László Orlóci

grains of s different taxa on k object plates. The data are arranged in s rows and k columns : Species\Units

1

2



k

Total

1

f11

f12

...

f1k

f1.

2

f21

f22

...

f2k

f2.

.

.

.

...

.

.

s

fs1

fs2

...

fsk

fs.

Total

f.1

f.2

...

f.k

f..

Given such a data set, one may be tempted to use the mean entropy over the k repetitions as an estimate of : H

F.1H1  F.2H2  ...  F.k Hk f..

But this is a biased estimator since it misses a term for mutuality that links the k samples into a single super sample. The following does include a term for mutuality: H

f ij 1 k s f ij f .. 1 k s   f ij ln    f ij ln f .. j1i1 f . j f .. j1i1 f i. f . j s

which is the same as H    f i.ln i 1

f i. . E. C. Pielou suggests to use partial f ..

sums for the estimation: * *  Qu   1 u s f ln f ui. H  ui. f u f .. f .. j 1 i1 .u

such that u=1, 2, …, k. The subscript u indicates that the sums fui and f.u are defined over the first u samples in the order as they are taken. At each * graph value of u a new entropy value is computed. The shape of the H u is the basis of an important step in E. C. Pielou’s method. The weighted * average of the H u values in the flat portion of the curve is used to estimate  . Table 5.5.1.2 Sample

Unit

A

A,E,I

B

B,F,J

C

C,G

D

D,H

70

Statistical Ecology Example 6.2.1 Plants were counted of 3 species within 6 quadrats:

Species /Quadrat:

1

2

3

4

5

6

1

100

93

43

87

97

97

2

1

26

42

65

100

86

27

50

11

17

21

19

128

169

96

169

218

202

3 Totals

Entropy is estimated by E. C. Pielou’s method: Step 1. Determine entropy in the 1st column of the data table, *   1 (100ln 100  ln 1  27ln 27 )  0.5321 H 1 128 128 128 128

Step 2. Combine the columns 1 and 2 to obtain the cumulative counts, f11 + f12 = 193, f21 + f22 = 27, f31 + f32 = 77, 128 + 169 = 297 Compute *   1 (193ln 193  27ln 27  27ln 77 )  0.8297 H 2 297 297 297 297

Step 3. Keep combining columns in like steps and compute additional entropy quantities, * * * * H3  0.9317 , H 4  0.9623 , H5  0.9804 , H6  0.9786 *

*

2

Step 4. Plot the  2u  (Hu 1  Hu ) values (Figure 6.2.1.1).

Figure 6.2.1.1 * *  Step 5. Calculate Hu  T u1Hu1 T uHu for u = IP, ..., k-1. IP is the deemed inflection T u1  T u

point. The T quantities are totals: Tu = f.1 + ...+ f.u and Tu+1= f.1 + ... + f.u+1. IP set to 3, the entropi values are H4 = 1.0335, H5 = 1.0273, H6 =0.9713 Step 7. Calculate the average

71

László Orlóci H

1 k 1 3.0321  1.0107  H  k  IP u IP u 3

The theoretical maximum is ln s = 1.10 . An estimate of the sampling variance is the mean square S2H 

k 1 2  (Hu  H)  0.0040 (k  IP)(k  IP  1) uIP

higher order entropy follows similar steps. The next set of results contains estimates up to order 3: .Estimates of

a. Alpha = 0: Inflection point selected: 1; Mean H = 1.0986123; Variance of estimate H = 0; Sampling variance =0; Maximum H: 1.0986123; Evenness: 1 b. Alpha = 1: Inflection point selected: 3; Mean H = 1.0123603; Variance of estimate H = 1.2098077e-3; Sampling variance = 4.0326925e-4; Maximum H: 1.0986123 Evenness: .92149004 c. Alpha = 2: Inflection point selected: 3; Mean H = .96016685; Variance of estimate H = 2.0436837e-3; Sampling variance = 6.8122792e-4; Maximum H: 1.0986123; Evenness: .87398154 d. Alpha = 3: Inflection point selected: 3; Mean H = .9237635; Variance of estimate H = 2.817831e-3; Sampling variance = 9.3927701e-4; Maximum H: 1.0986123; Evenness: .84084577 .

6.3 Estimation of information The case to be discussed involves k successive samples. The hth of the k samples has fh.. units. Each unit (could be a plant specimen) is classified according to criteria A and B. The outcome is a kxqxt frequency tables. The hth qxt lattice looks like this: Sample h

Criteria

B1

B2

...

Bt

A1

fh11

fh12

...

fh1t

fh1.

A2

fh21

fh22

...

fh2t

fh2.

.

.

...

.

fhq1

fhq2

...

fhqt

fhq.

fh.1

fh.2

...

fh.t

fh..

. Aq Total

Total

.

Criterion A has q states and criterion B has t states. An element fhij is the frequency of observation h being in category i of classification A and also being a member in category j, classification B. The table total is fh.. . How strong is the association between the two classifications? In the manner of 72

Statistical Ecology

the method in Example 6.2.1, an average value of the interaction infor of Section 4.1.3.1 can be written for the k samples in the manmation IhAB ner of I' 

f hij f h.. f 1..I1  f 2..I2  ...  f k..Ik 1 k q t     f ln f... h1 i 1 j 1 hij f hi. f h. j f ...

But this is biased since it does not incorporate a term for the k classification tables being linked structurally by the one-to-one correspondence of their elements. The definition of Rényi's information function will depend on the type of mutuality (dependence, association, interaction) perceived.

6.3.1 Estimation of mutual information Considering information of order one, the quantity on which to base the estimation is k q t k q t f f I  I ' 1    f hij ln .ij ...  1    f hij ln f ... h1 i 1 j 1 f .i. f .. j f ... h1 i 1 j 1 f hi. f h..

f hij f ... f .ij f h. j f .i. f .. j

In other terms: f hij k r t q f f f and phij  hi. .ij h. j I  1 ln    hij1 where qhij  f h.. f .i. f .. j  1... h1 i 1 j 1 phij f ...

.

6.3.2 Estimation of interaction information *The information quantity in this case is f .i. f ... 1 I  I '    f hij ln f ... h1 i 1 j 1 f hi. f h.. k

q

t

f .. j f f2 f ... 1 k q t  f hij ln hij ...    f h. j f ... h1 i 1 j 1 f h.. f .i. f .. j f h..

In other terms: I 

f hij f f f k r t q 1 and phij  h.. .i. .. j . ln    hij1 where qhij  f ... f ... f ... f ...  1 h1 i 1 j 1 phij

We use a similar method for the estimation of as E. C. Pielou’s for entropy (Example 6.2.1). Example 6.3.2.1 Three tree species were examined for infestation by four species of yeast in six samples. Each sample consisted of 32 randomly chosen trees. The following is the first sample: 73

László Orlóci Tree species Yeast species

1112221123322331 1122233311221121 3234443341134213 3343411233433342

The numerals are identification labels for species. The following is the first frequency table: Sample 1 1 2 3

Trees Total

Yeats species 2 2 0 2 4

1 0 0 5 5

3 11 3 0 14

4 0 9 0 9

Total 13 12 7 32

The proportions are

The hypothesis that the yeast species are tree specific can be tested. If they were not tree specific we would expect a random assortment of the

f.. observations among the cells in

the table, in which case the different combinations (h,j) of tree and yeast would have exf f o pected frequencies according to: f hi  h . .i . The matrix of expected frequencies is f..

and the proportions,

The two sets of species have shared information 0.0625

0.3438

0.0625

Itrees;yeast = 0.0625 ln 0.0508 +0.3438 ln 0.1777 +...+0.0625 ln 0.0273 = 0.752 Since q< t, the maximum possible value is ln 3. The ratio R = 0.752/1.099 = 0.684 indi cates a relatively high shared information inferred on the basis of a single sample. Whether this should be interpreted as an indication of high tree specificity of the yeast species will depend on the probability of obtaining a value of R at least as large as 0.684 under random sampling, when the value in the population is zero. We return to the questions in Example 7.4.1. A more accurate estimate of I is computed on the basis of cumulated frequencies in the manner of the previous section. The last 5 samples are given by the following tables: Sample 2 1

1 1

2 3 74

3 8

4 1

Total 13

Statistical Ecology Tree sp.

2 3

1 6 8

1 3 7

1 1 10

6 0 7

9 10 32

1 2 3

1 0 1 4 5

2 3 0 0 3

3 10 2 1 13

4 1 10 0 11

Total 14 13 5 32

1 2 3

1 2 1 6 9

2 1 1 2 4

3 12 1 1 14

4 0 5 0 5

Total 15 8 9 32

1 2 3

1 0 0 6 6

2 1 0 0 1

3 12 1 2 15

4 2 7 1 10

Total 15 8 9 32

1 2 3

1 0 2 8 10

2 0 1 4 5

3 9 1 0 10

4 1 6 0 7

Total 10 10 12 32

Total Sample 3 Tree sp. Total Sample 4 Tree sp. Total Sample 5 Tree sp. Total Sample 6 Tree sp. Total Sample estimates for I:

a. Estimates for mutual information, corresponding to alpha values from 1 to 9: 0.0959 0.1665 0.2638 0.3983 0.5675 0.7474 0.9103 1.0452 1.1533

Possible maximum value: 1.098612 Relative values: 0.0873 0.1516 0.2402 0.3625 0.5166 0.6803 0.8286 0.9514 1.0498 Variances: 0.0005276 0.0010691 0.0030536 0.0089206 0.0220458 0.0417763 0.0622543

0.0789618 0.0910622

Sampling variances: 0.0001319 0.0002673 0.0007634 0.0022302 0.0055114 0.0104441 0.0155636 0.0197405 0.0227655

b. Estimates for interaction information, corresponding to alpha values from 1 to 9:

0.6195

0.8027 0.9062 0.9777 1.0339 1.0802 1.1191 1.1519 1.1799

Possible maximum value:

1.386294

Relative values: 0.4468 0.5790 0.6537 0.7053 0.7458 0.7792 0.8072 0.8309 0.8511 Variance: 0.0020583 0.0009393 0.0005595 0.0006980 0.0011882 0.0020020 0.0031438 Sampling variance:

0.0045937 0.0062987

0.0005146 0.0002348 0.0001399 0.0001745 0.0002970 0.0005005 0.0007860 0.0011484

0.0015747

75

László Orlóci

6.4 Moments and moment based quantities Commonly used moment estimators include: Mean: N 1 n 1 C E(X )   X j   Xk   X  Xj n j 1 C k 1 j 1 Variance:

S2 

n

1 n 1

(X j  X )2

E (S 2 ) 

j 1

N

1 N 1

(Xi  )2  j 1

1 C

N 2

C

 Sk2  k 1

N 1

V

Sampling variance:

SX2  In each case C 

S2

n (1  ) n N N! n! N !

E (SX2 ) 

1

C

S C k 1

2 Xk

V n  (1  )  VX n N

. We make a distinction between the sample variance 2

2

2

S and the sampling variance S X . S is an estimate of the variance of varia2

ble X while S X is an estimate of the variance of C sample means of variable X at the given sample size n. Example 6.4.1 A simple method can determine the bias in the estimator. It involves examination of averages of sample estimates obtained from all distinct samples drawn from an artificial population. Recalling that C =

N!

and that the condition for an unbiased esti-

n! N !

mator is that the average of the C sample estimates is equal to the population value. The method is now applied to a population of 4 units: X1=2, X2=5, X3=3, X4=4. The population constants are:



14 = 4 = 3.5,

V=

(-1.5)2 + 1.52 + (-0.5)2 + 0.52 = 5/ 3 = 1.67, 3

In addition, for a sample of size 2, VX = the number of distinct samples is C =

6!

2

 =

1.67

2 (1  ) = 0.417. With sample size set to 2, 2 4

 6 . The samples and sample estimates are:

2!4!

Sample k

Xk

3 1.67 = 1.25.. 4

Sk2

76

SX2k

n 1 2 Sk n

Statistical Ecology X1, X2 X1, X3 X1, X4 X2, X3 X2, X4 X3, X4 Total

3.5 2.5 3.0 4.0 4.5 3.5 21.0

Mean

3.5

4.5 0.5 2.0 2.0 0.5 0.5 10.0 1.67

1.125 0.125 0.500 0.500 0.125 0.125 2.500

2.25 0.25 1.00 1.00 0.25 0.25

0.417

0.83

5.00

2 Xk

2 k

These clearly establish the fact that X k , S and S are indeed unbiased estimators of the population mean, variance and the sampling variance in simple random sampling. However, the sample second moment is a biased estimator of the population second moment (  2 ). This holds true regardless of population size or sample size.

6.5 Product moments and related quantities The following are the commonly used sample estimators and population values: Covariance: N

n

S12 

(X1 j  X1 )(X2 j  X2 ) j 1

n 1

E (S12 ) 

(X

1j

 1 )( X2 j  2 )

j 1

N 1

=

N 12 N 1

 V12

Product moment: n

(n  1)S12 n

 ( X1 j  X1 )( X2 j  X2 )



j 1

n N

 (n  1)S12   E  n  

 ( X1 j  1 )( X2 j  2 )

j 1

 σ 12

N

Correlation:

r12 

S12 S11 S22

E (r12 ) 

 12  11 22

Example 6.5.1 Consider the bivariate population X1=( 2 4 1 3); X2=( 1 5 2 4). The population values are 1  2.5 , 1  1.25 ,  2  3.0 ,  2  2.5 , 12  1.5 , V12 =

2.0, 12 = 0.849. Using samples of 3 units, the number of distinct samples is C  The following are the samples and sample estimates: 77

4! 3!2!

4

László Orlóci

Bivariate samples

S12k

(n  1)S12k

1 2

X1 = (2 4 1), X2 = (1 5 2) X1 = (2 4 3), X2 = (1 5 4)

2.667 2.000

1.778 1.333

0.839 0.961

3 4

X1 = (2 1 3), X2 = (1 2 4) X1 = (4 1 3), X2 = (5 2 4) Total Average

1.000 2.333 8.000 2.000

0.667 1.556 5.334 1.333

0.655 1.000 3.455 0.864

k

r12k

n

It is concluded that the sample covariance S12 provides an unbiased estimate of the population covariance V12 in simple random sampling. The sample second moment and the correlation coefficient are both biased. It has been suggested that the quantity 1/2

 1  122  12  1   should be used as an expectation under random sampling in bivariate 2n   normal populations, i.e., when X1 and X2 are normal individually and also jointly (Chapter 7). In the present example this has value 0.809, which compared to E(r12) = 0.864, make things even worse. It is true, and can be shown by numerical example, that the bias is reduced as r12 approaches the extremes -1, +1, or 0.

6.6 Estimation in stratified random sampling What has been said about estimators in the preceding section will also apply to estimation within the strata in stratified random sampling. The main difference is the use of pooled strata estimates. To obtain the pooled estimates, weighted averaging is applied. The weights are the sizes of the strata or the degrees of freedom specific to the strata. The weighted averages estimate pooled population values, and not the global ones used in simple random sampling. When sampling units are assigned to strata by proportional allocation, the weighted estimates are: Mean:

X

Variance: S 2 

1

1 n

k

 nj X j

E(X ) 

j 1

k

n S n j

j 1

2 j

E (S 2 ) 

1

1 N

k

N  j

j

j 1

k

N V N

j j

j 1

Symbol n is the pooled sample size (n = n1, ..., nk), N the pooled population size (N = N1 + ... + Nk), and k the number of strata (compartments) among which the population is divided. The sampling variance, covariance, and 78

Statistical Ecology

correlation are defined similarly by weighted averaging of the strata values.

6.7 Estimation in systematic sampling Only the sample mean X is unbiased, and only in Method I (Chapter 5). Example 6.3.1 The following population is given: X1 = 2, X2 = 1, X3 = 8, X4 = 6, X5 = 7, X6 = 4, X7 = 3, X8 = 5, X9 = 0, X10 = 9. For this

 = 4.5, V = 9.2.

At a sampling interval of 4,

the total number of distinct samples is 4. Note that the size of the samples depends on the method and the pivot. The following are the samples, the number of ways (frequencies) in which the samples can come about, and sample estimates: k

Sample k

1 2

S k2

X1, X5, X9

Method I fk 3

Method II fk 1

3.0

13.0

X2, X6, X10

3

1

4.7

16.3

3

X3, X7

2

1

5.5

12.5

4

X4, X8

2

1

5.5

0.5

Total

10

4

Xk

Method I -3x3  3x4.7  4x5.5 E(X)   4.5   10 3x13  3x16.3  2x12.5  2x0.5 E(X)   11.4  V 10 Method II -E(X)  4.7  

E(X)  10.6  V

THE ESTIMATOR AND THE METHOD OF ESTIMATION must be in tandem with the sampling technique to meet the requirements of consistency, unbiasedness, and efficiency. We have seen that certain estimators by their very nature falter on one or more of these criteria and will invite the penalties of false estimation and misinterpretation without corrections. May we tolerate estimators which fail on being unbised? Yes we may, if we can determine how far they are from meeting the criteria. Interestingly we can determine bias by performing a Monte Carlo experiment on data sets obtained by sampling and re-sampling the original sample itself. The sampling design actually used to obtain the original sample has to be retained.

79

László Orlóci

Chapter 7 COMMONNESS AND PROBABILITY This chapter is about the central idea in Statistics expressed in the question “how common or rare is an event”. If we find the event not to be common under the circumstances that we assumed for the sampling environment we will have a good reason to suspect that we deal with something irregular that may need to be investigated further to reveal its true nature. Where exactly do specific states of a given variable rank individually or jointly on the commonness scale in the population? To find this out we need to have access to the variable’s probability distribution. This can be derived. Two methods are available: axiomatic derivation from first principles, and empirical derivation in Monte Carlo experiments.

7.1 Which kind of distribution? The axiomatic method seeks to derive distributions which have maximum generality. This is achieved by assumption of a limiting case under specified regularity conditions. The more specific the assumption, the lower is the local relevance of the derived distribution. A most often-used assumption stipulates linear response to factor influences and linear interactions. The p-dimensional case of this is captured in the density (co-ordinate) function 13 of the multivariate normal distribution :

13 Mathematical statisticians point to the central limit theorem to spruce up their use of

the Normal distribution so liberally in assessments of complex biological problems. The theorem simply stated asserts that averaging will improve normality when X is not exactly normal. However, it is difficult to see how averaging could improve Normality in the presence 80

Statistical Ecology

f ( X1 ,..., X p )  Be

( X-μ )' 

1

( X-μ )

B is the inverse of the total volume under the p-dimensional hyper ellipsoid, X = (X1 ... Xp) a p-valued observational vector,  = (1 ... p) the population mean vector, and 

1

the inverse of the covariance matrix.

7.2 Normal probabilities It was mentioned that the definition of a scale factor in the manner of 1 will render the total area under the univariate normal graph B  2 V  equal to unity. Any fraction of the total area will be a direct expression of probability. For example, the expression 1/2

  P( X  X 

  f ( X )d ( X ) X  X

defines the probability of a random state of the normal variable X being at least as extreme as X (Figure 7.2.1). 'Extreme' meant that (X -  )  ( X -  ). The point X is the



probability point of the distribution. In the case of

unknown population values  , V and  , the same are replaced by their sample estimates. Note that when the variance replaces the second moment in the normal density function, neither the shape nor the area of the normal curve will change and X remains unaffected. 2

Example 7.2.1 The normal density function is fitted to data taken from Example 4.2.3.1. When the objective is to find probabilities, the normal probability integral has to be solved. To determine probability points the solutions of the inverse normal probability integral is required. Standard algorithms generate tables for the standard normal variate

Z

X V

1 /2

or Z 

X

VX 1/2

.

of the inherently complex non-linearity of responses, the antithesis of Normality, that we see everywhere in the living world (Orlóci 1993). The most obvious sign of the Normal assumption not being on all fours with non-linear biological responses is the incorporation of the  matrix in the Normal density function. This means a very inefficient description of the biological covariation starting right off the bat. 81

László Orlóci Written for Z, the normal density function takes the form of f (Z ) 

1

 2 

1/2

e



Z2 2

.

(Figure 7.2.1.1). When the probability sought is

  P(ZRND  Z ) , the area under the unit normal curve with limits Z  ZRND   is to be found:

 = ƒ(Z)(b1t + b2t2 + b3t3 + b4t4 + b5t5). This can be used for Z>0. The expected computational error in constants are: t=1/(1+rZ)

r=0.2316419

b1=0.31938153

b3=1.781477937

b4=-1.821255978

b5=1.330174429

Figure 7.2.1



is less than 7.5x10-8. The

b2=-0.356563782

Figure 7.2.1.1

The probability for X = 6.3576,  = 4.28, and V = 1.1236 is found under the right tail  . 6.3576  4.28 Calculate Z   1.96 first. Corresponding to this value, the polynomial ap1/2  1.1236 proximation based on the constants is  = 0.025. This is the probability of a random value being at least as large as X = 6.3576. When is given, the probability point can be calculated based on Z = t - (C0+ C1t + C2t2) / (d0 + d1t + d2t2 + d3t3). This is a valid approximation for the probability point Z at any  greater than zero but



less than 0.5. The expected maximum error in Z is less than 4.5x10-8. The constants have



the following values:

 

t  ln d0=1.00

2 1/2

C0=2.515517

C1=0.802853

C2=0.010328

d1=1.432788

d2=0.189269

d3=0.189269

For a given = 0.025 the corresponding left and right probability points are: Z = -1.96 and

 Z =+1.96 . Note, for a negative deviation from zero the area is marked off under the left  tail identical to the area marked of by the positive deviation under the right tail. Standard statistical tables can be used to find probabilities or probability points, but they may require interpolation. The tables show either one-sided or two-sided probabilities. In a one-sided 82

Statistical Ecology case, Z is given for . In two-sided tables Z is tabulated for  / 2 . For example:   Table type Probability point entered Probability shown in table

One-sided 1.96 0.025

Two- sided 1.96 0.05

Example 7.2.2 An identification problem is considered next. There are two host populations HA and HB. They are distinguished by specific morphological differences, but are similar in other respects. The identification tag of a sample specimen X was lost and has to be reassigned. Which of HA and HB is the most likely parent for X? The population values are given below: Petiole length Mean



Variance V

HA

HB

5.20 cm

3.85 cm

1.37 cm2

0.80 cm2

The petiole length of X is 4.5 cm. This corresponds to ZA = -0.60 and ZB = 0.73. Observing that |ZA| < |ZB|, assignment of specimen X to population HA is indicated. This decision may best be regarded as being provisional, until the relative commonness of X in the two populations is considered (Figure 7.2.2.1):

 A* 

pA A pA A  pBB

and B* 

pBB pA A  pBB

NA , pB  NB , N=NA+NB. The decision? - when N N X is assigned to population HA,  A* is an expression of the “weakness” of the assignment or of the level of “misclassification”.

This is such that  A*   B*

 1 . In this,

pA 

Figure 7.2.2.1 The available data do not show values for  A* and  B* . This being the case we fall back on  A =0.27 and  B =0.23, the solutions of the normal probability (area) integral based on the Z values already found. Taking NA=50 and NB=20 or the proportions pA=0.71 and pB=

0.29, the probabilities sought are  A*  0.74 and  B*  0.26 . An assignment of X to population HA is indicated. This leaves a probability of a mistaken assignment of 26%.

7.3 Sampling distributions A normally distributed variable X and another variable Y = f(X) are involved. If the population of X were sampled and re-sampled, and if a value Y is 83

László Orlóci

determined for each sample, there will be

C

N! sample values Y1, Y2, n!N!

...,

Yc for Y at the end of the experiment. The Y values will have their own sampling distribution depending on the functional form of f(X). The following functions are considered: Mean: X  X1  ...  Xn Sum of squares:  2  Z12  ...  Zn2 n

Student's t: t 

X  2

SX

Variance ratio: F 

S12 S22

7.3.1 Distribution of the mean When X is normal, the distribution of the mean X will also be normal un

( X   )2 2VX

der random sampling with density function f ( X )  Be . The scale fac1 tor is B  . Probabilities and probability points can be obtained 1/2  2 VX  as explained in Example 7.2.2.

7.3.2 Chi-squared distribution If X is normal,  2 will have a Chi squared distribution under random sampling. This distribution has a single constant  which is the degrees of freedom. It follows that a different Chi squared distribution is defined for each sample size n. The mean of the Chi squared distribution is  and its variance is 2 . Representative graphs at different degrees of freedom are given in Figure 7.3.2.1. The density function that generates the graphs is symbolically f(  2 | ). The distribution is asymmetrical, but symmetry improves as n increases. Statistical tables give one-sided probabilities,

Figure 7.3.2.1

Figure 7.3.2.2 84

Statistical Ecology

7.3.3 t distribution If the distribution of X is normal and sampling is random, t will have the Student t sampling distribution with constant . It follows that a different t-distribution is defined for each sample size. The distribution is symmetrical around zero for any value of  and its variance is  / (  2) . Two representative curves are shown in Figure 7.3.3.1. Note that the height of the curves (the value of the density function f(t| ) ) increases as the degrees of freedom increase, until a normal height and width are attained. Statistical tables give one- or two sided probabilities. Example 7.3.3.1 Solution of the t probability integral is based on a series approximation given in Abramowitz and Stegum (1970). For example, given t = 2.069 and = 23, the probability of a more extreme t is 0.025 (Figure 7.3.3.2). Given = 23 and  = 0.025, the probability point is t0.025;23= 2.069 .



Figure 7.3.3.1



Figure 7.3.3.2

7.3.4 F distribution 2

2

If X is normal, the ratio of the variances, F  S1 / S2 or the ratio of Chi 1 /  1 2

squared F 

2 /  2 2

from two independent random samples of sizes n1 and

n2 will have the F distribution. The graph of this is asymmetrical with two

constants  1 = n1-1 and  2 = n2 - 1. Representative curves are shown in Figure 7.3.4.1.

85

László Orlóci Figure 7.3.4.1

The tabulated probabilities are one-sided. When S1 > S2 then  is a right2

sided probability point F ;

1

, 2

2

(Figure 7.3.4.2a). The probability point shifts

to the left tail otherwise it is a left side probability point F ; 2

1

, 2

(Figure

2

7.3.4.2b), when S2 > S1 . Since tables give right-sided probability points, it is important to remember the subscripts conversion: F1- ; , = 1 2

1 F ; , 2 1

Figure 7.3.4.2 a,b

Example 7.3.4.1 The solution of the F probability integral is described in Abramowitz and Stegum (1970). The inverse F probability integral is solved by iteration. Given  1 = 12,  2 = 8, and F=3.28, the probability found is 0.05 (Figure 7.3.4.1.1).

Figure 7.3.4.1.1

7.4 Empirical sampling distributions We can think of the normal distribution as one of possibly many distributions each with utility in biological work. One may even suggest that there is a good chance to encounter sampling distributions in actual work, other than and not related to the normal. The distributions generated empirically 86

Statistical Ecology

by random simulation from local data should interest the practitioner for their potentially high local relevance. The method for this is called the “Monte Carlo experiment”. In this artificial populations are constructed which conform to specified local constraints, and then sampled-and-resampled many times over. The result is an empirical distribution for the sample descriptors of interest. Example 7.4.1 The objective is to determine the bias of the estimator Htree;yeast used in Example 6.3.2.1. In symbolic terms the bias is tree;yeast , the mean in a surrogate popu-

lations in which the frequency of any joint state A,B has an equal chance to materialise as an observation. The steps: Step 1. Select 32 pairs of random numbers with the first member coming from the integers 1,2,3 and second member from 1,2,3,4. The first value in a pair identifies one of the tree species and the second identifies one of the yeast species. The first set of 32 pairs, Tree species

1213122312113222 2311311132231131

Yeast species 1332413421133441 2221344123142141 The first contingency table, Yeast species Total Tree sp. Total

1 2 3

1 6 4 0 10

2 3 1 3 7

3 2 3 2 7

4 1 2 3 8

Total 14 10 8 32

The value of mutual entropy for this table is Htrees;yeast = 0.135 . Step 2. Repeat step 1 many times, using a new set of random numbers each time. Calculate a new Htrees;yeast for each sample. Use the average of all generated Htrees;yeast values as the population mean. Finally, construct a reference distribution for H by ordering the Htrees;yeast values and determining probability points. Step 3. Compare the observed H=0.752 to the probability points. The lower is the associated probability, the more significant the association of trees and yeasts. Use R=H/maxH as a measure of the association’s strength. H has to be deemed significant before R can be interpreted.

 tree;yeast, the mean of the simulated Htrees;yeast values. An actual experiment using 0.5284163 as the random seed, and the Step 4. The bias in Htree;yeast =0.752 is

option to repeat step 1 exactly 100 times, gave  tree;yeast =0.117 ( the mean of the 100 simulated Htrees;yeast values). The corrected estimate is therefore Htrees;yeast =0.752 0.117= 0.635. The experiment yielded ordered probability points for the sampling distribution of Htrees;yeast by counting down from the largest to the smallest:  H at 



0.036 0.950

0.107 0.500

0.216 0.050

0.423 0.010

The empirical distribution suggests that Htrees;yeast = 0.752 is extremely high, so high in fact that the actual sample originating from a population with zero mutual entropy must be rejected at better than 0.01 probability. The yeast species are in fact highly tree-specific. 87

László Orlóci

7.5 Setting confidence limits The sampling distributions allow us to develop methods to determine confidence limits for sample estimates. These limits help us to assess the success of a sample estimate in coming close to the population value in probability term.

7.5.1 Point vs. interval estimation Estimation as discussed in Chapter 6 is known as point estimation. In this, the population value , a point, is estimated by the sample value  , also a point (Figure 7.5.1.1). However, point estimation supplies no information about the probabilistic range of  . Such a range is supplied in interval estimation. In this,  is a point, but limits are set within which  is expected to lie with 1 -  probability (Figure 7.5.1.2). The limits can be determined experimentally or derived from basic principles.

Figure 7.5.1.1

Figure 7.5.1.2

7.5.1.1 1-  confidence interval for  If an underlying normal distribution is assumed, and if sampling is random, the determining relation for the confidence limits is t

X  2

.

SX

The 1 -  probability interval about the point t 

X  2

SX

has limits t /2;

and  t /2; (Figure 7.5.1.1.1). The interval for the unknown  has upper and lower limits X  t /2; S X and X  t 2

 /2 ;

SX2 . When the population value

VX is given, Z may be used instead of t to define the limits. 88

Statistical Ecology

Figure 7.5.1.1.1 Example 7.5.1.1.1 Length of 24 male bowfin whales of uniform age were measured. The mean and variance of these measures were found to be X = 31.1 m and S2 = 4.48 m2. The upper and lower limits of the 0.95 (or 95%) confidence interval for the unknown population mean are given by LL = 31.1-2.069(0.43) = 30.2 m and UL = 31.1+2.069(0.43) = 32.0 cor1/2

responding to

 S2  SX    n

= 0.43 m

and t0.025;23 = 2.069. The true value of the mean

(the average length for male bowfins of this age in the sampled population) has a 95% chance of being captured by these limits. Random sampling and a normal distribution of X are assumed.

7.5.1.2 1-  confidence interval for V The determining relation is 2

 

S

2

V

Random sampling and normal distribution of X is assumed. The limits sought are

S

2

S

and

2

 /2;

2

2

1  /2;

.

Example 7.51.2.1 The 95% confidence limits for the bowfin population V (Example 7.5.1.1.1) are: LL =

23(4.48) 38.076

= 2.71 m2 and UL =

23(4.48) 11.689

= 8.82 m2

corresponding to

12 / 2; = 11.689 and 2 / 2; = 38.076. The quantity 2





1/2 2

 /2;  0.5 t /2;  (2  1) is a close approximation to Chi squared when

 is large, say 100 or more.

89

László Orlóci

WE DISCUSSED the idea of sampling distributions and showed them in the measurements of the commonness of events. We called an observation “common” when we found it representing a highly probable event in the sampling environment that we assumed. But what moves science forward, in the direction of generalizations, are the questions regarding asked about unusually uncommon evens. They force us to investigate in what respects the sampling environment of the unusual events differ from the one that we assumed in the first place when we generated the status quo sampling distribution. Regarding their derivation, we distinguished between axiomatic (hypothetical) and empirical distributions. We described several of the former (Normal, Student's t, Chi squared, variance ratio), and considered the use of the Monte Carlo method to derive the latter. The reader’s impression that we consider empirically derived distributions more reliable in the hands of the practitioner in ecology than any theoretically derived probability distribution in the same hand, is right on target. The improvement is in the expected higher local relevance.

90

Statistical Ecology

Chapter 8 MEASURING RESEMBLANCE The results of the previous chapter are the foundation for implementation of probabilistic comparisons. If we were to define statistics in the simplest of terms we could say it is a science which derives its conclusions from comparisons. But before we turn to the statistical aspects, we examine the idea simply as a mathematical exercise. For this, we need to define a comparison space into which our objects are mapped as points. We also need a pair function on the comparison space which will tell us how similar or how dissimilar are the compared objects. There are many types of comparison spaces and many types of pair-functions to choose . We discuss the more commonly used types.

8.1 Comparison space Objects, comparison criteria, and objectives are involved. The objects are entities as we defined them in Chapter 3. They have (column or row) vector representation such as in

 X1   X11 X  X 2 21 X=      ...   . X   X  p   p1

X1n 

X12

...

X22

...

X 2n 

.

...

. 

X p2

...

X pn 





Figure 8.1.1 presents a universal geometric model. In this the number of points is five, but the dimensionality of the point configuration is only two. 91

László Orlóci

In general, X has nominal dimensions p and actual dimensions (rank) not exceeding the lesser of n and p. Associated with X is an algebraic construct, termed comparison space (CS). This space is defined by the n points and the function f, the comparison criterion.

Figure 8.1.1

The exact definition of X and ƒ varies with the objectives. In Figure 8.1.1, f is a distance measure



d( A, B)  ( X A1  XB1 )2  ( X A2  XB 2 )2



1/2

.

Rather than measuring spatial placement, as done by d, interest may centre on the use of other functions that define the shape of the point configuration (whether linear, curved, manifold, etc.). Function ƒ need not be a distance. It can assume other forms (product moment, information, probability fractal dimension, etc.) for which the spatial analogy of Figure 8.1.1 does not apply (see following chapters for examples). Example 8.1.1 Four variables were measured on three wild blackcurrant plants: Plant

Fruit weigh gr

A

2.1

B

2.6

C

2.4

Fruit mm

length

Leaf width cm

Leaf length cm

7.4

4.3

6.2

6.9

4.4

6.4

7.8

3.8

6.9

The comparison space has four axes (variables) within which the three individuals are placed as points. The spatial placement of the points is completely defined by their pairwise distances:



2

2

2

d ( A, B)  (2.1  2.6)  (7.4  6.9)  (4.3  4.4)  (6.2  6.4)

2



1/2

= 0.742

Similarly d(A,C) = 0.995; d(B,C) = 1.208. These are conveniently presented in the form of the a half matrix,

0.000 0.742 0.995 D= 0.000 1.208    0.000   92

Statistical Ecology The full matrix is symmetrical (d(A,B) = d(B,A) for any pair A,B). A 3 x 3 centred product moment matrix Q computed according to 3

q( A, B)   ( X Ai  X i )( X Bi  X i ) i 1

defined an equivalent point configuration as D. Example 8.1.2 Suppose that ƒ takes an unusual form

d ( A, B)  cos

1



X A1 XB1  X A2 XB2 2 ( X A1

2

2

2

 X A2 )( XB1  XB2 )



This function is known as the geodesic metric and it defines a spherical comparison space. The distance measured is the length of the shorter arc on which A and B are points.

8.2 Minkowski metrics The basic form of this is a sum 1/ k

 p  i 1

 . 

d ( A, B)    | XiA  XiB |  (k )

k

(see Fihgure 8.2.1).

Figure 8.2.1

The absolute value function is defined when k=1, and the Euclidean distance is defined when k=2. The scale variable k is discrete or continuous. The lower limit of d(k ) ( A, B) is zero. To set an upper limit, the metric is normalized such as in the chord distance,

 p X X (k ) dr ( A, B)    iA  iB ( k )  i 1 d A dB(k ) 

k

1/ k

   

 p  i 1

1/ k

 

k and d (k ) ( A)    XiA 

(Orlóci 1967b). When k=2 then dr ( A, B)   2(1  cos  AB ) 

1/2

(k )

.

(k ) 1/ k When A and B have nothing in common we have dr ( A,B)  2 . The Minkowski metric has specific properties: 93

László Orlóci

(a) Indistinguishability of identical units - - d ( A, B) = 0 when A = B (b) Distinguishability of non-identical units - - d ( A, B) > 0 when A ≠ B (c) Symmetry - - d ( A, B) = d (B , A) (d) Triangle inequality - - d ( A, B) ≤ d ( A, C )  d (B , C ) Euclidean metrics have the cosine property (Figure 8.2.1: 2

2

2

d ( A, B)  d ( A)  d (B) - 2d( A)d(B)cos  AB Example 8.2.1 The absolute value function (k=1), Euclidean distance (k=2), and their normalized forms are now computed for tidal pools (Example 4.1.2.2.1). Detailed calculations are shown for pools 1 and 2: (a) Absolute value function d (1, 2) = |26-29| + |28-31| + |18-14| = 10. The entire matrix: 0

10 0

11 1 0

13 3 2 0

6 10 11 13 0

22 14 15 13 20 0

24 16 17 15 22 6 0

2

(b) Euclidean distance d (1,2) = [(26-29)2 + (28-31)2 + (18-14)2]1/2 = 5.831 . The entire matrix: 0

5.831 0

6.557 1.000 0

7.681 2.236 2.000 0

4.243 6.481 7.280 8.544 0

13.784 9.274 9.434 8.062 12.728 0

15.556 11.225 11.358 10.630 13.342 4.472 0

(c) Relative absolute value function dr (1,2) = |26/72 - 29/74| + |28/72 - 31/74| + |18/72 - 14/74| = 0.122 . The entire matrix: 0

0.122 0

0.144 0.022 0

0.153 0.042 0.031 0

0.067 0.127 0.144 0.169 0

0.159 0.037 0.015 0.022 0.159 0

(2)

0.167 0.083 0.072 0.093 0.167 0.071 0 (2)

d) Chord distance - - dr (1) = [262 + 282 + 182]1/2 = 42.24 and dr (2) = [292 + 312 + (2)

142]1/2 = 44.70 and dr (1,2) = [(26/42.24-29/44.70)2 + (28/42.24-31/44.70)2 + (18/42.2414/44.70)2]1/2 = 0.122 .

94

Statistical Ecology The entire matrix: 0

0.122 0

0.143 0.021 0

0.154 0.043 0.031 0

0.081 0.145 0.163 0.187 0

0.158 0.037 0.016 0.023 0.180 0

0.180 0.082 0.074 0.101 0.165 0.078 0

8.3 Product moment This is the inner product of two vectors n

Shi   Ahk Aik k 1

for rows h,i and p

q jk   Aij Aik i 1

for columns j,k. The elements in A are deviations Aij  ( Xij  X i ) for the sum of products; Aij 

( X ij  X ) i

n  1

1 /2

for the covariance; Aij 

X ij  X i 1/2

 n 2   ( X ik  X i )   k 1 

for the correlation. S is a matrix of p x p row products and Q for a matrix of n x n column products. Example 8.3.1 Consider the data given in Example 4.1.2.2.1. Starfish counts are the variables (p=3) and the tidal pools are the units (n=7). The covariance matrix is calculated as follows:

 31    15.3 

(a) Calculate species means -- X =  32

(b) Centre the data within rows and adjust according to the covariance option –

95

László Orlóci

 -2.04 A =  -1.63   1.11

3.27 

-0.82

-0.82

0.82

-0.41

1.63

-0.41

-0.41

-0.41

-2.04

2.45

1.63 

-0.52

-0.93

-0.93

1.52

-0.12

-0.12 



(c) Calculate the matrix product --

 19.67 S  AA '   13.83   1.50

1.50 

13.83 16.00

5.17 

5.17

5.57

 

The dual matrix14 Q is computed as the matrix product:

 8.06  1.75 Q = A'A    .  9.46

1.75

1.30

...

1.11

1.32

...

.

.

...

3.27

3.22

...

 3.27   .   13.35  9.46

8.4 Mean square contingency In a field survey a square grid of n units is laid out on the ground and inspected for occupancy by two species A and B. Only presence and absence are scored. Species A and B are found jointly in a units. Species A occurs without species B in b units, and species B occurs without A in c units. Inspection turns up d units without A nor B. The scores are conveniently summarized in a 2 x 2 table: 14

Q is the dual of S, in the sense that they define identical point configurations in CS. The duality is apparent from their characteristic equations (Gower 1966; Orlóci 1966, 1967a). Recalling the appropriate theorems of linear algebra, the characteristic equation of S is SB = Λ B. After observing that S = AA' and that Q = A'A, we pre-multiply by A' to get Q(A'B) = Λ Q. Duality is proven since the Eigenvalues Λ of S and Q are identical and their eigenvectors B are related by simple linear transformation which amounts to a rigid rotation of the coordinate axes (A'B). Considering the Minkowski k=2 metric D, duality with Q can be shown if the elements of A accord with Aij 

X ij  X i 1/2

 p 2   ( Xej  Xe )   e 1 

. In this case D is a

matrix of centred chord distances (with upper limit 2) and the elements in the principal diagonal of Q are unities. To prove this duality it is sufficient to consider the characteristic equation of -0.5D2 which is the same as -(1 - Q)V = EV and as QV = EV because 1V = 0 (owing to the fact that the columns of V have zero sums). Since QV = EV is the characteristic equation of Q, the identities V = A'B and E = Λ proof the duality. 1 is an n x n matrix of unit 96

Statistical Ecology Species B Species A Total

+ -

Total

+

-

a c a+c

b d b+d

a+b c+d n

A commonly used function for this table is the mean square contingency coefficient, 2

r 

c

2

n

2

(ad - bc)



(a  b)(c  d )(a  c)(b  d )

.

This is the square of the product moment correlation coefficient (Chapter 4) for 0/1 data. It measures the strength of association between species A and B. The limits are 0 and 1. Example 8.4.1 The root nodules of fourteen white clover plants were inspected for the presence of two species of nitrogen fixing bacteria (A,B). The bacteria scored as follows: Clover (A)

+-+++-++++-+++

Bacterium (B) - + + - + - + + + + - - + + The marginal frequencies and jointly frequencies are given in the table,

Species A Total

+ -

Species B + 8 1 9

Total 3 2 5

11 3 14

For this table, 2

r 

c

2

n



(16  3)

2

= 0.1138 .

11(3)(9)(5)

The theoretical maximum is 1. To achieve this each marginal frequency must be 7.

8.5 Indices of similarity and dissimilarity The 2x2 table of the previous section serves as a basis for construction of an entire family of similarity and dissimilarity coefficients. The following is a limited selection: Name

Form

Type*

Inner product

a

Similarity

Euclidean

b  c

Ochiai

1/2

Limits

0-n

Dissimilarity 0 –

a ( a  b)( a  c )

Similarity

97

n1/ 2

0-1

László Orlóci

  a  Dissimilarity 0- n1/ 2 2 1    ( a  b)( a  c ) 1/2   

Chord

2a 2a  b  c a abc

Sørensen Jaccard

bc abc

Marczewski-Steinhaus

a abcd ad Simple matching abcd Sokal-Michener

Similarity

0-1

Similarity

0-1

Dissimilarity 0 - 1 Similarity

0-1

Similarity 0 - 1

*Similarity coefficients attain a zero value when two entities have nothing in common. Dissimilarity coefficients have a zero value when two entities are identical. The sampling distribution of some of these measures is well known (Orlóci 1978, p. 97 et seq.). Example 8.5.1 The different distance and similarity measures of the preceding selection are applied to the data in Example 8.4.1: Non-centered inner product = 8 Jaccard coefficient =

8 8  3 1

= 0.6667

Euclidean distance =  3  1

1/ 2

Marczewski-Steinhaus index = Ochiai coefficient =

8

11(9) 1/ 2

Sokal-Michener coefficient =

=2

4 8  3 1

= 0.3333

= 0.804

8 8  3 1 2

= 0.5714

1/ 2

Chord distance =

8     2 1  1/ 2    11(9)  

Simple matching =

82 8  3 1 2

= 0.6261

= 0.7143

98

Statistical Ecology

Sørensen index =

16 16  3  1

= 0.8

Example 8.5.2 Six plots were located in a Sphagnum bog and the presence of four species was recorded: Species

Plot 1 2 3 4 5 6

Oxycoccus microcarpon

+ + - + - +

Sphagnum magellanicum

- + + - + -

Sphagnum fuscum

+ + - + - -

Smilacina trifoliata

- + + - + +

Jaccard's coefficient was computed for species pairs 1.00 0.17 1.00 0.75 0.20 1.00 0.33 0.75 0.17 1.00 and pairs of plots 1.00 0.50 1.00 0.00 0.50 1.00 1.00 0.50 0.00 1.00 0.00 0.50 1.00 0.00 1.00 0.33 0.50 0.33 0.33 0.33 1.00

Matrices of coefficient serve as input in cluster analysis, ordination, and in other analyses to be discussed in the sequel.

8.6 Goodall's probability index Similarity (or dissimilarity) measures usually treat the pairs of objects as isolated cases: perception of the similarity of a given pair is not influenced by the perceived similarity of any other pair within the same sample. Goodall's (1966) probability index is different. It takes count of all pair-wise similarities in the sample when determining the similarity of a given pair. The following is a simplified description of the algorithm: (a) Calculate the dissimilarity of a pair Xj and Xk with respect to variable h as the absolute difference

 jk |h = |Xhj - Xhk|. Do this for all pairs to obtain n(n-1)/2 dissimilarity values.

99

László Orlóci (b) Calculate relative dissimilarities according to tion of the

 j = P(  k |h



 jk |h ). This is the propor-

 values that are less than or equal to  jk |h . The limits 0 o when Ho:E( X )≤ o .

(c) Select a rejection probability   This is the significance level of the test. The  is the probability of rejecting a true Ho by chance. Commonly selected  values include 0.05, 0.01, and 0.001. (d) Obtain reliable data. A representative and sufficiently large data set is required. (e) Select the test criterion. Z, t,  2 and F (  in general) are commonly selected criteria. Their sampling distribution is assumed unique to the test, but this cannot always be verified. (f) Accept or reject Ho. Let t be the test criterion with known sampling distribution, and let the probability of a more extreme t be denoted by P = P[tRND ≥ t|Ho]. The |Ho in the expression reads “under Ho”, i.e. Ho is true (Figure 9.3.1). RND reads “random”. The decision rules for accepting Ho depends upon how H1 is stated. If H1: E(t)>0, Ho will be rejected at  probability of an erroneous rejection if the condition t ≥ t is true, or equivalently if P[tRND ≥t|Ho]≤  is true. The t is the critical probability point of the test (Chapter

7). Rejection of Ho always requires the acceptance of H1.

9.4 Simple, composite and mixed hypotheses An hypothesis which specifies a point 1 or 2 for the constants is said to

be a simple hypothesis. For example, Ho: E(  )= 1 and H1:E(  )= 2 are simple hypotheses. Composite hypotheses specify entire regions for the constants (Figure 9.4.1), e.g., Ho: E(  )≥ o and H1: E(  )< o .

Figure 9.3.1

Figure 9.4.1

106

Statistical Ecology

Simple and composite hypotheses can be mixed within the same test, such as Ho: E  ) = o and H1:E(  )< o . In this case the testing amounts to the same problem as testing Ho: E(  )≥ o . This is because to accept Ho:E(  )≥

o it is sufficient to show that Ho: E(  )= o is in fact true.

Figure 9.5.1

Figure 9.5.2

9.5 One and two-sided alternatives Whether a composite H1 is one-sided or two-sided will influence the test.

If the alternative is given as a one-sided hypothesis H1: E(  )< o . Ho is rejected if P[  RND ≤  |Ho]≤  is true, or equivalently, if   1 is true. This condition is pictured in Figure 9.5.1. If the alternative is a two-sided hypothesis H1: E(  )≠ o . Ho is rejected if  falls outside the 1-  limits as shown in Figure 9.5.2. This rule is equivalent to basing the rejection of Ho on the condition  ≤ 1-a / 2 or  ≥  /2 . In two-sided tests, the rejection probability is apportioned symmetrically or asymmetrically between the two tails of the distribution.

9.6 Parametrised and non-parametrised H o Hypotheses that specify numerical constants are parametrised. For example, Ho:E(  )= o is a parametrised hypothesis if o is a numerical constant. Non-parametrised hypotheses contain no numerical constants.

9.7 Errors in probabilistic decisions The distribution of  has one density function f(  |Ho) when Ho is true, and another ƒ(  |H1) when H1 is true (Figure 9.7.1). o is a probability point corresponding to a significance level  . o is such that if Ho is true, the probability of a randomly chosen value of  exceeding o is exactly 107

László Orlóci

.

The probability that  will be more extreme than o , if H1 is true, is  . A one sided H1 is assumed.

Figure 9.7.1

When a decision is made, it may or may not be correct. A Type I error is committed if a true Ho is rejected; the probability of this error is  . A Type II error is committed if a false Ho is accepted; the probability of this error is  . To be able to perform the test, ƒ(  |Ho) must be known. Consequently, the probability of a Type I error will be defined. In contrast, ƒ(  |H1) is rarely known and without it  cannot be determined. We note that the two error types are related, but are not complementary  +  ≠ 1, un





 1 . Another point is that if we   reduce the probability of the Type I error by specifying a smaller  , the probability  of a Type II error will increase. Making  smaller will undermine the power of the test, since it makes the acceptance of a false null hypothesis more likely. This is potentially more detrimental than the rejection of a true null hypothesis. To see this, it is sufficient to consider the null hypothesis of 'no side effects' in the administration of a drug to patients with specific remedial objectives.

less they were standardised

9.8 Bartlett’s paradox We have to assume in all cases that the null hypothesis is true before we could proceed with the tests. We are required to do this by the necessity of justifying the use of reference probability distributions. But making such an assumption prior to the test is paradoxical: To justify the test on Ho we have to assume that it is true. But if it is true, why to perform the test? 15

15

M. S. Bartlett mentioned this to the ecologist/statistician audience during G. P. Patil’s Statistical Ecology Conference of 1969 at Yale University. 108

Statistical Ecology

Clearly, when we reject a null hypothesis, the conditions on which the test is predicated will have been deemed non-existent. Was the test misused? To reject the conditions of the test as an outcome of the test itself, is a logical thing to do. Simply a flaw in thinking is corrected. There is no solution to this, other than to base the test on another kind of logic. An alternative test would make users part with the paradoxical procedure. They would rather follow the age-old practice that uses the consistency of results from a large number of samples from the population, formalised in terms of some probabilistic measure of consistency, for discovering true patterns in Nature. WE PRESENTED THE BASICS of hypothesis testing and in the course of this we discovered that it is the null hypothesis which we try to accept or reject, but the alternative hypothesis will decide the actual form of the test. Another thing to be noted for it has revealed a paradox in the testing: in order to be able to test a null hypothesis, we set up the defining conditions as if the hypothesis were true. Will this mean then that in the case of rejection, the test was misused and the test was invalid?

109

Chapter 10 PROBABILISTIC COMPARISONS I We return to probabilistic comparison, but at this time in a very applied way. We wish to find out in specific cases if a given sample could be regarded as a member of the population whose parameters are given in numerical terms. The cases covered include means, variances, mean vectors, covariance matrices, and frequency distributions.

10.1 Sample mean and a standard Take a sample of n observations, which has mean X and standard error S X . Symbol o stands for the expectation of X . The Ho: E( X )= o can be tested based on the criterion t =( X - o )/ SX This t has the Student's tdistribution with  = n - 1 degrees of freedom under the assumption that the sample was randomly chosen from a normally distributed population whose mean is o . When the alternative hypothesis is one-sided, H1: E( X )> o , Ho will be rejected with probability  of the Type I error if the observed t value falls outside the specified acceptance region (Figure 10.1.1). If the alternative hypothesis is two-sided, H1: E( X )≠ o , the rejection probability will be partitioned between the two tails of the distribution (Figure 10.1.2).

110

Statistical Ecology Example 10.1.1 Analysis of run-off water from the base of maple trees gave the following pH values: Tre

1

2

3

4

pH

6.6

7.4

7.7

7.0

The following are the sample values: X =7.18, S

2

=0.23, 0.058, S X2 = 0.058. To test the null

hypothesis Ho: E( X )= o = 7.00 in the presence of the two-sided alternative, H1: = E( X ) ≠

o = 7.00 use t 

X  o 7.18 - 7.00 = 0.75= 0.24 = 0.75. The size of the t value alone is not SX

sufficient to decide whether to accept or reject Ho. To make this decision we must select a rejection probability  , say 0.05, find the probability point t0.025;3=3.18, and then refer to Figure 10.1.2. It is noted that H1 is two-sided, therefore

,

is apportioned between the

tails. Since tVo, the test criterion is  2 = and the rejection region is as shown in Vo Figure 10.2.1. The criterion  2 has the Chi-squared sampling distribution with  = n-1 degrees of freedom, provided that the sampling is random in a normal population. Should the alternative hypothesis be H1: E(S2) < Vo, the rejection region would shift to the left tail of the distribution. Example 10.2.1 Considering the data of Example 10.1.1, the hypothesis Ho: E( X ) = o = 7.00 is already accepted. The hypothesis that the sample is from a population with coeffi-

111

László Orlóci

 

100 V o cient of variation CV =  o

 28(7)    100 

ulation variance Vo = 

1/2

2

 1.96

= 20% is tested next. This value of CV indicates a pop3(0.23) . The associated Chi squared is  2 = 1.96 = 0.352 .

The null hypothesis Ho: E(S2) = Vo = 1.96 should be accepted, since, corresponding to the two-sided alternative hypothesis H1: E(S2) ≠ Vo = 1.96, the interval with end points = 0.216 and

2 0.975;3

2 = 9.348 includes 0.352.  0.025;3

Figure 10.2.1

10.3 Sample distribution and a standard Two distributions are given, an observed F = [f1 ... fs] and a standard Fo = o

o

[ fi ... f s ]

. The elements have one-to-one correspondence and the distribu-

tion totals are equal. Since each fi has an associated be compared based on divergence measures.

fi

o

value, F and Fo can

10.3.1 Chi-squared divergence This is defined by

( fi  fi o )2 . fi o i 1 s

2  

The null hypothesis to be tested is Ho: E(F) - Fo =0. Read E(F) - Fo =0 as "... the one-way divergence of the expectation of F from Fo is a null vector" . The alternative hypothesis H1: E(F) - Fo  0 specifies a greater than zero divergence. If we can regard F as a random sample from a population completely described by Fo,  2 will have the Chi squared distribution with  s

degrees of freedom, provided that the sample total n   fi is large. The i 1

112

Statistical Ecology

degrees of freedom depend on how Fo was derived. The decision rule follows Figure 10.2.1. Example 10.3.1.1 Table 1.7.1 contains observed frequencies F and Poisson frequencies Fo (relative frequency times 62). The total divergence of the two frequency distributions is 2  =

(1  2.6)

2

+ ... +

(34  24.3)

2.6

The degrees of freedom is

2

= 7.03.

24.3

2

 = s - 2 = 3. If we take  0.05;3 0.05;3 = 7.815 as the

critical prob-

ability point, the acceptance of Ho is indicated. Example 10.3.1.2 Gregor Mendel (1822-1884) experimented with cross fertilized pea plants which differed in pairs of contrasting characters. All plants emerging (F1 generation) from seeds produced by tall parents after cross fertilization with pollen from dwarf parents were tall. Plants in the second hybrid generation (F2) segregated as 787 tall to 277 dwarf plants. In relative terms, these are 2.8412 tall to 1 dwarf. Observing these results, it makes sense to hypothesize that the observed frequencies F = (787 277) represent a random sample from a biological population with a simple dominant-recessive gene Lotus. This leads to a test of the 3:1 ratio for which Fo = (798 266) and

2=

(787  798) 798

2

+

(277  266)

2

266

= 0.6065,

which has a single degree of freedom. The critical probability point  o . o 5;1 for  = 0.05 is 2

3.84. By observing

2  <  0.05;1 , Ho is accepted (Figure 10.2.1). The gene locus appears to be 2

of the simple dominant-recessive type.

10.3.2 I-divergence information This has functional form as in Section 4.1.3.1, s f 2I = 2  fi ln io . fi i 1

This is an approximation to  2 . Note that since the theoretical Chi squared distribution is an approximation of the actual sampling distribution of  2 , and 2I is an approximation to  2 (Kullback 1959), when we refer 2I to the theoretical Chi squared distribution with  degrees of freedom we will commit a potentially weak double approximation. This is particularly so s

when  fi is small. The user should apply 2I to the data in Examples i=1

113

László Orlóci

10.3.1.1 and 10.3.1.2 and interpret the results.

10.3.3 Kolmogorov-Smirnov divergence The basic criterion on which the test is constructed is s

D

SUP | S( X r )  F o ( X r ) | r 1

n

.

This is the nth fraction of the greatest difference. Consider a variable whose s distinct states have n > s occurrences. The states are arranged in ascending order: X1 < X2 < ... < Xs and they have frequencies f1, f2, ..., fs such that s

n   fi . A cumulative frequency S(Xr) is derived for each state r such that i 1

r

S ( X r )   fi . The divergence of S = [S(Xr), r = 1, ..., s] and So = [Fo(Xr), r = 1, i 1

..., s] may now be defined the greatest difference D.

Figure 10.3.3.1

To test the hypothesis that an observed cumulative distribution S represents a random sample from a population with cumulative distribution So, that is, Ho: E(D) = 0, the large sample distribution of D will be needed under Ho. The probability points D ;n are available from standard tables under specific regularity conditions (Massey 1951). The decision rule follows from Figure 10.3.3.1. Example 10.3.3.1 Heights of 60 white pine trees were measured on a woodlot. The normal distribution was fitted to the data. The ordered tree heights, frequencies, cumulative frequencies, and deviations are given: X

f

fo

22 m 23 24 25

1 2 4 5

0.8 1.7 3.2 5.1

S(X) 1 3 7 12

114

So(X) 0.8 2.5 5.7 10.8

|S(X)- So(X)| 0.2 0.5 1.3 1.2

Statistical Ecology 26 27 28 29 30 31 32 33

9 4 9 8 6 6 4 2

7.1 8.6 9.1 8.4 6.7 4.7 2.9 1.5

21 25 34 42 48 54 58 60

17.9 26.5 35.6 44.0 50.7 55.4 58.3 59.8

3.1 1.5 1.6 2.0 2.7 1.4 0.3 0.2

The sample constants of are n=60, X = 27.917 and S2 = 7.366 , D = 3.1/60 = 0.052. At D0.05;60 = 0.172. Ho: E(F) = FO is accepted.

10.4 Sample mean vector and a standard The data set X contains measurements for p variables of n sampling units. Symbolically X , the sample mean vector X , and covariance matrix S are written as

 X11 X 21 X=   . X  p1

X12 X22 . Xp2

...

X1n 

... X2n   ... .   ... Xpn 

 X1    X2 X   .     X p 

 S11 S 21 S  . S  p1

S1p 

S12

...

S22

...

S2p 

.

...

.

Sp2

   ... Spp 

The elements in the principal diagonal of S are variances. In the off-diagonal cells they are covariance values such that Shi  Sih . The population mean vector is

 1o   o  μo   2  .  .   o   p  The elements of μ o have one-to-one correspondence with the elements and μo is apparent from Figure 10.4.1. The tip of X marks the centroid (centre of gravity) for the sample represented as a cluster of points in space. The tip of μo marks the cenof

X.

The geometric significance of

X

troid of the population. Ho: E( X ) - μo = 0 is the hypothesis tested. The alternative hypothesis H1 negates Ho. For testing purposes the separation of 115

László Orlóci

the centroids is measured in terms of the generalized distance of Mahalanobis (1936): d(X,μ o )  (X  μ o )' S1 (X  μ o  This distance is measured 1/2

in standard units. The matrix S-1 is the inverse of the covariance matrix. Under the assumption that the p variables are normally distributed, and that X is a random sample from a population whose mean vector is μo , the test criterion

F

n(n - p) p(n - 1)

d(X ,μ o )

have the variance ratio (F) distribution with numerator degrees of freedom  1 = p and denominator degrees of freedom  2 = n - p. The decision rule uses the rejection region shown in Figure 10.4.2.

Figure 10.4.1

Figure 10.4.2

Example 10.4.1 Chemical analysis of soil samples yielded the following data (in kg/ha units): Sampling unit

1

2

3

4

5

6

7

8

9

10

Total

Nitrate N

7.5

10.9

10.8

Ammoniacal N

20.1

20.4

17.9

12.2

5.1

6.2

3.0

6.6

12.9

4.0

79.2

18.0

15.4

12.1

21.0

16.1

13.8

18.0

172.8

Elementary K

42.0

43.2

40.4

84.3

69.5

99.8

110.4

101.4

87.7

115.2

793.9

The sample values: p = 3, n = 10,

X=

 12.5218  7.92  17.28  , S   1.07844     52.3609  79.39

and 116

1.07844

 52.3609

8.53511

 26.6524

26.6524

  840.4343 

Statistical Ecology

S

1

0.0429954 0.00888425   0.120714   0.0429954 0.145355 0.00728830  .    0.00888425 0.00728830 0.00197450 

 8.3  Given the standard μ o   25.5  , the generalized distance is d(X, μ o ) = 2.98560 . The test    67.9  criterion is

F with  1 = 3,

10(10  3) 3(10  1)

2

(2.9856)  23.11

 2 = 7. Given F0.05;3,7 = 4.35, rejection of Ho is indicated: X should not be

considered as a sample from the population whose mean vector is the given μ o .

Following the rejection of Ho, we may wish to determine which of the p variables contributed significantly to the rejection. For this purpose, we perform comparisons between the means, using the contrasts  Xi  μ oi  , i = 1, ..., p, and corresponding 1-  simultaneous confidence limits, 1/2

1 p  2  ai  Xi  μ oi     ai Sii T ;p,n p  i 1  n i 1  p

In this

T2;p,n p 

p(n  1)F ;p ,n p n p

and the a’ = (a1 ... ap ) specify the variables to be included. For example, for p=3, vector a' = (1 0 0) selects  X1  μ o1  and excludes both  X2  μ o2 

and  X3  μ o3  . The vector a' = (1 1 0) selects  X1  μ o1  and  X2  μ o2  , ignoring  X3  μ o3  . If the limits include zero, the total contrast is declared non-significant, and we conclude that the variables selected do not contribute significantly to the rejection of Ho. Example 10.4.2 Continuing with Example 10.4.1, the coefficients in a' = (1 0 0) specify a contrast

 X1  μ o1  = (7.92-8.3) = -0.38. This is tested based on 117

László Orlóci 2

F0.05;3,7 = 4.35; T.05;3,7 =

1

3(9)(4.35) = 16.78; 7

n

p

= 0.1;

 ai Sii = 12.5;

i 1

1/2

1 p  2   ai Sii T0.05;3,7   n i 1 

= 4.58.

Considering that the limits LL = -0.38-4.58 = -4.96 and UL = -0.38 + 4.58 = 4.20

  rejection of Ho. To assess the effect of  X2  μ o2 

do not enclose zero, we conclude that X1  μ o1 does not contribute significantly to the

and  X3  μ o3  on the decision, the following are relevant results: a' = (0 1 0), LL = -12.00 and UL = -4.44 and a' = (0 0





1), LL = -26.06 and UL = 49.04 . The conclusion is that only X2  μ o2 contributes significantly to the rejection of Ho.

10.5 Sample covariance matrix and standard The null hypothesis is Ho: E(S)-Vo = 0, that is to say: the pattern of covariation in the sample is not statistically distinguishable from the given population covariance matrix. H1 negates Ho and suggests a significant divergence. Bartlett's method of testing relies on the determinants |S| and |Vo|:

 | Vo |  trSV 1  p  o  | S | 

  (n  1) ln 2

1

where S and Vo are p x p covariance matrices, and trSVo is the trace (the 1

sum of the diagonal elements) of matrix SVo .  2 will have the theoretical Chi squared distribution with p(p+1)/2 degrees of freedom, provided that the sample is randomly selected from a multivariate normal population whose covariance matrix is Vo. The test is one-sided. If sample size is small, the test criterion is L 2 with 2p  1  L 1

2 p 1

6(n  1)

.

Example 10.5.1 Consider the 3 x 3 sample covariance matrix S of Example 10.4.1 AND A

118

Statistical Ecology

 10.3014 Vo   1.36403   60.3333

1.36403 9.17830 22.4160

60.3333 

22.4160 

 830.445 

.

. We wish to test the null hypothesis that S has expectation Vo. The determinants of S and

Vo and the inverse of Vo are required: |S| = 53538.6 , |Vo| = 34697.2

 0.205192 -1 Vo   0.0716249   0.0168409

0.0716249 0.141644 0.00902703

 0.00902703  .  0.00267136  0.0168409

The matrix product

1.6104

0.2714

  0.0613

0.8911

1 SVo   0.2714

0.0124

 0.0124   1.1226  0.0613

has trace 1

tr SVo = 1.6104 + 0.8911 + 1.1226 = 3.6241. The test criterion is 2  = 9ln 53538.4 + 3.6241 - 3 = 1.713.

34697.2

Since the sample size is small (n=10), we use 61 L 1

2 4 = 0.88 and L  2 = 1.507. Given p(p+1)/2= 6

6(9)

degrees of freedom, rejection probability  = 0.05, critical probability point

2

0.05;6 =

12.592, Ho is accepted.

HYPOTHESIS TESTING took centre stage in this chapter too, but the comparisons involved sample estimates and population standards (numerically given parameters). The “standard” was suggested by theory or practical experience.. In each of the tests the null hypothesis stated that the sample descriptors compared are estimating the same population descriptor. We assumed random sampling and sampling distributions of specified types.

119

László Orlóci

Chapter 11 PROBABILISTIC COMPARISONS II We continue with hypothesis testing, but at this time we do not specify any population parameters. The ideas introduced in Chapters 9 and 10 are extended to sample-to-sample comparisons. All entities in comparison were assumed to have common expectation.

11.1 Sample variances We wish to test the null hypothesis

HO : E (S12 )  E (S22 ) based on the ratio of two sample variances

F

S12 S22

.

F has the variance ratio distribution with  1  n1  1 and  2  n2  1 degrees of freedom under specific regularity conditions. These conditions include random sampling, the assumption that Ho is true, and the assumption that the underlying distribution of X is normal. The decision whether to accept or reject Ho depends on whether the specified Ho is one-sided or two-sided. If S1  S2 , the relevant probability point F ;n ,n is under the right 2

2

1

2

tail of the distribution. To perform the test when S  S2 , the probability point is under the left tail. I this case, we use 2 1

120

2

Statistical Ecology

F1 ;n1 ,n2 

1 F ;n2 ,n1

for the probability point. Example 11.1.1 The percent of incoming light reflected by 10 leaves of Acer saccharum and 8 leaves of Acer pseudoplatanus was recorded: A. accharum

61 49 52 58 40 42 68 40 51 54

A. platanus

81 74 68 83 80 78 70 69

The sample variances are S  85.833 and S 22  34.839 , giving 2 1

F

S12 S22

 2.46 .

Should we be interested to know if the variability of light reflection is the same in both 2 2 species populations, the null hypothesis then becomes HO : E (S1 )  E (S2 ) , and the alterna-

tive hypothesis

F0.975;9,7 

1 4.20

H1 : E (S12 )  E (S22 )

and probability points

F0.025;9,7  4.82

and

 0.24 (corresponding to = 0.05 at  1  9 and  2  7 ), we observe that

F = 2.46 lies within the 0.95 probability region of acceptance.

11.2 Sample means The null hypothesis is Ho : E ( X1 )  E ( X2 ) . We use t 

X1  X 2 Sd

to test Ho. In

this Sd  S 2 (1 / n1  1 / n2 ) , the standard error of the difference. S 2 is a weighted average S 2 

(n1  1)S1  (n2  1)S2 n1  n2  2

with n1 and n2 representing

sample sizes. We note that pooling of the variances is justified if and only if the null hypothesis Ho : E (S1 )  E (S2 ) has already been tested and accepted 2

2

(Section 11.1). IF X1 and X 2 are the means of random samples from normal populations, and if H o is true, t will have the Student's t distribution with

  n1  n2  2 degrees of freedom. The decision rule depends on whether the alternative hypothesis is one or two-sided. Example 11.2.1 Considering the data of Example 11.1.1, Ho : E (S1 )  E (S2 ) has already 2

2

been accepted. The pooling of variances is justified, and the test on Ho : E(X1 )  E(X2 ) can proceed:

121

László Orlóci X1 = 515/10 = 51.5

X 2 = 603/8= 75.375

S2 =

9(85.833) + 7(34.839) = 63.5231 10+8-2

Sd  (63.5231(1 / 10  1 / 8))0.5  3.78057

t

51.5  75.375 3.78057

 6.32

For the two-sided alternative H1 : E ( X 1 )  E ( X 2 ) with   10  8  2  16 = and = 0.05, the critical probability points are t0.975;16  2.12 and t0.0025;16  2.12 . Since -6.32 is not enclosed by these limits, H o is rejected. We conclude that the sample means estimate significantly different light reflection values in the two species of Acer.

11.3 Several variances and means Practical problems often involve the probabilistic comparison of several sample estimates, such as 1 ,  2 ,...,  k . Faced with this task, the biologists may be tempted to formulate k(k-1)/2 separate null hypotheses and subject each to a separate test:

Ho1 : E(1 )  E(2 ) Ho2 : E(1 )  E(3 )

. . . Ho(k (k 1)/2) : E ( k 1 )  E ( k ) Comparisons on this basis are suboptimal for the verifiable reason that in the course of pair wise tests the probability of committing a Type I error accumulates (Pearson, 1942). For example, if pair wise tests were performed on 10 random samples from a single homogeneous population based on t criterion with nominal rejection probability  = 0.01, the probability of committing a Type I error, i.e., showing a significant difference where none existed, would increase to over 20 folds This cannot happen in a simultaneous test performed on all k sample estimates under the null hypothesis,

Ho : E(1 )  E(2 )  ...  E(k ) In the simultaneous test, the Type I error will remain at the nominal value. A principal method of simultaneous comparisons is the analysis of variance (ANOVA). This method, invented R. A. Fisher (Snedecor, 1934), has been applied by research workers more often than any other statistical technique, with the possible exception of regression analysis. Its frequent uses notwithstanding, users are not always aware of the fact that ANOVA re122

Statistical Ecology

quires specific restrictive assumptions not easily justified in all types of biological data. We will point out these assumptions as we describe the different model.

11.3.1 Complete randomized design A reasoned application of this model will see to it that the assignment of sampling units to treatments is completely random; hence is the term complete randomized design. Assume a single response variable (X) and a single factor variable (to be named), the following is the symbolic data set: Treatment Replicate

1 X11 X21 . Xn11

2 X12 X22 . Xn 2 2

... ... ... ... ...

k Total X1k X2k . Xn k

Total

G1

G2

...

Gk

X1

X2

...

Xk

Variance

2 1

S

S

2 2

DF





Mean

k

G 2 k

...

S



k

X

S2 

The degrees of freedom (DF) are: j = nj - 1 and  = n - k = 1 + ... + k.

11.3.1.1 Comparing k sample variances The null hypothesis to be tested specifies equal expectations:

Ho : E (S12 )  E (S22 )  ...  E (Sk2 )  V The alternative hypothesis negates this in the sense that at least two of the sample variances have different expectations. Bartlett's (1937) criterion  2  BC 1 provides a suitable test. In this, k

nj S 2j

j 1

N

S2  

k

k

j 1

j 1

, N   nj , B  ln S 2  j (1  ln S 2j ) ,



C 1

1

k

1

 3(k  1) n j 1

j



1 N

Sand j = nj – 1.In random sampling, BC-1 has the Chi squared distribution with  = k - 1 degrees of freedom under the assumption that X is normal 123

László Orlóci

and the null hypothesis is true. The test involves the one-sided probability P(  RND   ) . 2

2

 Example 11.3.1.1.1 Twelve experimental plots of a crop were assigned at random to three fertilizer treatments (5, 3, and 4 respectively). At harvest time, dry weight was measured: Fertilizer Yield kg

1 34 37 40 36

2 40 39 37 50

3 44 49 43 37

The sample values are S12 = 4.70, S 22 = 2.33, S32 = 12.33, 1 = 4, 2 = 2, 3 = 3.

The test of equal expectations for variances requires the pooled variance:

S2 

4(4.70)  2(2.33)  3(12.33) 423

 6.717

B= 9 ln 6.717 - (4 ln 4.70 + 2 ln 2.33 + 3 ln 12.33) = 1.724 1 1 C  1   1.0833    1.162 . 6

Based on these, 2= BC-1 = probability point is

9

1.724 = 1.482. Since at = 0.05 and  = k - 1 = 2, the Chi squared 1.162

 0.05;2  5.991 , the three sample variances are declared having equal 2

expectations. Therefore Ho should be accepted. Example 11.3.1.1.2 A study of the differential growth of northern pike in three lakes of a park supplied the following records:

2

Lake

1

2

3

Length of fish (cm)

49 43 50 46

49 51 46 43 42 49

35 34 40 32

2

2

The sample values are S1 = 10.00, S 2 = 13.07, S 3 = 11.58, 1 = 3, 2 = 5, 3 = 3. Based on 2 these values, 2 = BC-1 = 0.05871. Given 0.05;2  5.991 , Ho has to be accepted. It is then

concluded that the three sample variances estimate a common population variance.

11.3.1.2 Comparing k sample means Continuing with the description of the analysis of variance of a complete randomized design, we turn to the null hypothesis 124

Statistical Ecology

Ho : E( X1 )  E( X2 )  ...  E( Xk ) = . The alternative hypothesis H1 states that at least one of the k sample means has a different expectation. The test requires

Figure 11.3.1.2.1 The model of the analysis variance is: Xij =  + j + ij = j + ij (see Figure 11.3.1.2.1). What it states is that the responses are linear and additive, and encapsulate an amount of treatment effect added onto random variation. The following definitions apply: Xij - measured response of replicate i, treatment j

 - average response j - component response specific to treatment j ij - random variation, i.e., the experimental error o - state attained by the response variable in the untreated experimental material (control). The sample estimate of  is the grand mean of all replicates nj

k

 X X=

ij

j=1 i=1 k

n

=

G n

j

j=1

The sample estimate of j is the mean of the jth treatment nj

Xj 

X

ij

i 1

nj 125



Gj nj

László Orlóci

The mean of the control group is an estimate of  . An estimate of j is

Tj  X j  X . The error is assumed to be normally distributed about j with zero mean and variance V. Two basic ANOVA models are distinguished: Model I. The number of treatments is finite, selected with purpose, not based on random choice. The only source for experimental error is the uncontrolled variation in , independent of treatment. Because the number and type of treatments are fixed, this model is called the fixed effect ANOVA. The case described in Example 11.3.1.1.1 is this type. Model II. This model is the random effect ANOVA model. The k ‘treatments’ are randomly selected from a population of many more 'treatments'. The consequence of this is a second source for uncontrolled variation  added to . This case is described in Example 11.3.1.1.2. Conclusions drawn from the Model I ANOVA only apply to the treatments actually administered in the experiment. The conclusions of a Model II ANOVA have relevance for the entire treatment population. The two ANOVA models differ in the actual analysis performed on the experimental data. The following computational steps, except where indicated, are common to both models: (1) Partition the sums of squares. The initial computational objective in an ANOVA is the partition of the total sum of squares Q 

k

nj

(X j 1 i 1

ij

 X )2 into

nj

two components Q = QT + QE where k

QT   nj ( X j  X )2 and j 1

nj

QE   ( X ij  X j )2 . In these, QT is the ‘between treatments’ sum of j 1 i 1

squares and QE the ‘within treatments’ or error sum of squares. Division of the sums of squares by the degrees of freedom gives the ‘mean squares’

S2 

Q Q Q , ST2  T , SE2  E . n 1 k 1 nk

(2) Determine expectations for the variances. If H1 is true, the following are the expectations: Model

EE2

ET2 126

Statistical Ecology

VT  V 

I

1

k

 n ( k 1 j

j

 )2

V

j 1

VT  V  noV

II In these, no 

n k 1

k

n2j

j 1

n



V

. If n1 = n2 = ... = nk = r then no = r.

(3) Restate Ho. If Ho for the treatment means is true then Ho: 1

... = k = is

= 2 =

also true, and E (ST )  E (SE )  VT  V . For this reason 2

2

Ho : E( X1 )  E( X2 )  ...  E( Xk ) =  with alternative negating it, and

E (ST2 )  E (SE2 )  VT  V with alternative H1 : E (ST2 )  E (SE2 )  VT  V which specify equivalent tests. (4) Select the test criterion. H o is rejected only if the variation between the treatments is significantly larger than the variation within the treatments. The test relies on F 

ST2 SE2

which has numerator degrees of freedom

T = k - 1 and denominator degrees of freedom E = n - k. F has the variance ratio distribution if the sample is randomly chosen, if Ho is true, and if the underlying distribution of responses is normal (Section 7.3.4). Rejection of Ho is indicated if F  F ;1 , 2 . (5) Present the results. Construct an ANOVA table for this purpose: Variation measured

Degrees

Sum of

Mean

of freedom

squares

square QT

ST2

k 1

SE2

Between k treatments

k-1

QT

S 

Within k treatments

n-k

QE

S E2 

Total

n-1

Q

2 T

F

QE nk Q 2 S  n 1

Example 11.3.1.2.1 The data in Example 11.3.1.1.1 is selected for a Model I ANOVA. The treatment means to be compared include

X1  36.8 ,

X 2  38.7 , 127

X 3  46.5

László Orlóci The relevant scalars include:

Q = 283.00

QE = 60.47 QT = Q - QE = 222.53

X3 40.51 (grand mean)

From the sums of squares we obtain the variances (mean squares)

ST2 

QT 222.53   111.27 k 1 2

SE2 

QE 60.47   6.72 nk 9

and the variance ratio F = 16.56. The ANOVA table is constructed last: Source of

Degrees of

Sum of

Mean

F

variation

freedom

square

square

Treatment

2

222.53

111.27

Error

9

60.47

6.72

Total

11

283.00

16.56

The critical probability point of the F-distribution at  = 0.05, 1 = 2, and 2 = 9 is F0.05;2,9 = 4.26. Since F > F0.05;2,9, Ho is rejected. We conclude: not all three sample means estimate the same population value, i.e., there is differential response to the treatments. Example 11.3.1.2.2 The experimental setup of Example 11.3.1.1.2 represents a Model II ANOVA. The 3 lakes are the 'treatments' selected at random from all the lakes (population of treatments) in the park. To test for differences of average fish length between the lakes at large, we perform computations identically as in the preceding example: Variation type

Degrees of

Sum of

freedom

Mean

squares

square

2

381.42

190.71

Error

11

130.08

11.83

Total

13

511.50

Treatment

F 16.141

At = 0.05, 1 = 2, and 2 = 11, the critical probability point is F0.05;2,11 = 3.98. Since F > F0.05;2,11, we reject Ho and conclude that there are significant differences in the average length of northern pike among the lakes in the park.

(6) Steps after ANOVA. When the Ho of equal expectations for the treatment means is accepted, we have to conclude that the treatments are ineffective and the analysis is terminated. When Ho is rejected (as in the foregoing examples), the analysis continues. The subsequent steps depend on the ANOVA model. In Model I rejection of Ho indicates that at least one of the treatment means has a different expectation from the others. To reveal which treatments do and which do not, the treatment means are examined further. Different methods are available: The Newman-Keuls test (Newman 1939, Keuls 1952). This test includes paired comparisons of which there are k(k-1)/2 between the treatments, but the probability of committing a Type I error remains constant. The null 128

Statistical Ecology

hypothesis tested is Ho  E ( X j  X m )  0 . The alternative hypothesis negates Ho. The test criterion

q

(X j  X m )

  1  2 1  0.5SE      nj nm   

0.5

uses the common error mean square SE, This q has known sampling distribution under random sampling when X is normal and Ho is true. This distribution’s constants are n - k and p, the number of treatment means found within the interval with end-points at X j and X m . For determination of p the treatment means are ordered. Ho is rejected when the condition q ≥ q;n-k,p is satisfied. When k = 2, the value of

q

can be referred to the theoretical t-distribution with n - 2 degrees

2

of freedom. Probability tables of q are found in standard statistical texts (e.g. Zar 1974. pp. 451 et seq.). Example 11.3.1.2.3 The analysis of the data in Example 11.3.1.1 is continued, using the Newman - Keuls method. After the means are ordered

[ X 1 = 46.5] > [ X 2 = 38.7] > [ X 3 = 36.8] we take the largest difference,

q31 

46.5  36.8

 0.5(6.72)(1 / 4  1 / 5)0.5

=7.88 .

Given  = 0.05, n - k = 12 - 3 = 9 and p = 3, the critical point of the test is q0.05;9,3 = 3.199 which indicates rejection of the Ho: E ( X 3  X 1 )  0 . Additional tests are based on q32 = 5.571 (reject Ho) and finally q21 = 1.419 (accept Ho). Refer to Zar (1974) for appropriate q probability points.

Scheffé's method (Scheffé 1953). This method offers greater versatility than the Newman-Keuls method. It permits not only pair wise testing, but also tests of multiple contrasts, with the different means taken with same or different weights (the C coefficients below). Testing proceeds as follows: (1) Specify for each treatment a contrast coefficient C1, ..., Ck such that k

C j  0 . Contrast coefficients indicate which of the k treatment means j 1

are compared, and what weights they are given in the test. For example, 129

László Orlóci

given X1 , X 2 and X 3 , the coefficients C1 = 1, C2 = -1, C3 = 0 specify the contrast X1 - X 2 with equal weight given to the two means. The coefficients C1 = 1, C2 = -0.5, C3 = -0.5 specify a contrast X1 - 0.5( X 2 + X 3 ). In the latter case, the first mean is given twice the weight of the second or third. The actual magnitude of the contrast coefficients has no effect on the decision, whether accepting or rejecting the Ho. For example, C1 = 1, C2 = -1, C3 = 0 has the same effect as C1 = 0.5, C2 = -0.5, C3 = 0. k

(2) Compute D  C j X j and determine the error term, j 1

SD  S

2 E

k

n j 1

(3) Calculate Q 

|D | SD

C 2j

.

j

and the test criterion F 

Q2 k 1

. F has degrees of

freedom 1 = k-1 and 2= n-k, and the F-distribution under random sampling, provided that Ho: E(D) = 0 is true and the population distribution of X is normal. The alternative hypothesis is H1: E(D)>0. Ho is accepted if the condition Q  (k  1)F ;n1 ,n2

is satisfied, or the limits

D  SD (k  1)F ;n1 ,n2 enclose zero. Example 11.3.1.2.4 The analysis of the data set in Example 11.3.1.1.1 continues. We have

X2 and X3 to be compared. For given contrast coefficients C1 = 0, C2 = -1, C3 = 1, we have

D = 7.8, SD =

6.72(1/3+1/4) = 1.98, and Q =

3.94. For given  = 0.05,

(k  1)F0.05;2,9  2.92 . Since Q > 2.92, the null hypothesis Ho: E(D) = 0 is rejected. We declare the means of the second and third treatments significantly different. For the com-

and X3 we have Q = 0.986 (accept Ho) and for the comparison X1 and X3 we have Q = 5.579 (reject Ho). We can compare X1 and the average of X2 and X3 using C1 = 1, C2 = -0.5, C3 = -0.5. For these D = 36.8 - 0.5(38.7 + 46.5) = -5.80, parison X2

130

Statistical Ecology

| 5.80| 1 1 1   3.82 . SD = 6.72  + +  = 1.52 and Q  1.52  5 12 16  After observing that Q > 2.92, Ho is rejected.

In the Model II ANOVA, rejection of the null hypothesis is equivalent to accepting that in the population of all treatments, from which the k treatments were drawn at random, some or all of the mean values are different. Because the treatment selection is random, pair wise or multiple comparisons of means in subsets would make no sense. Instead, as the final step, ST  SE 2

we determine the treatments' share of the variance

2

nO

where no is

defined in the same way as earlier in the expression of the expected variance. Example 11.3.1.2.5 Analysis of the data from Example 11.3.1.2 has already indicated that the null hypothesis Ho : E(X1 )  E(X2 )  ...  E(Xk ) =  should be rejected (see Example 11.3.1.4). Having arrived at this conclusion, the only step of further interest is to calculate an estimate for the treatments' share of the total variance

ST2  SE2 190.71  11.83   39.13 68 nO 14  14 2

11.3.2 Randomized block design We randomize the replicates between treatments with the intention of isolating the treatment effects from the influences of any uncontrolled variation owing to position. We applied complete randomization in the ANOVA of the preceding section. But in special cases we may find it necessary to stratify the uncontrolled influence by blocking the treatments. For example, in the fertilizer experiment of Example 11.3.1.1.1, soil type is the blocking criterion of the experimental plots within the treatments. Owing to the two-way arrangement (treatments by blocks), the design is sometimes referred to as a two-way ANOVA. This is not to be confused with the two-factor designs in which the two-way arrangement is based on two types of treatments, e.g., type of fertilizer and technique of cultivation. The following is a commonly used scheme: Treatments 1 2 ... k Mean 1 X11 X12 ... X1k X 1. Blocks

2

X21

X22 131

...

X2k

X 2.

László Orlóci

. n Mean

. Xn1

X .1

. Xn2

... ...

X .2

...

. Xnk

X .k

.

X n.

X ..

Example 11.3.2.1 Consider an experiment to examine the effect of spring burning on the productivity of a large tract of grazed pastureland. The treatments consist of a control (no burning) and three burning intensities. Five blocks of land are available for study, each of which is relatively uniform with respect to nutrient and moisture regimes, aspect, species composition, and so forth. There may, of course, be environmental differences between these blocks. Therefore, in setting up the experiment, each block is divided into four quadrants, and the treatments are randomly allocated to these quadrants. This ensures that each treatment will occur in every block. Thus differences between treatments will not be obscured by environmentally induced variation among the blocks.

The analysis of a randomized block design partitions the ijth response according to Xij =  + j + i + ij. This is basically the same as in the model of the complete randomized design, except for the additional component i, which corresponds to blocking. Regarding the case, we have one replicate of each of k treatments assigned to each of n blocks. So the number of replicates within each block is k, the total number of replicates is k x n, and the number of replicates per treatment is n. This component is a manifestation of uncontrolled environmental variation among the blocks, independent of treatment, and is akin to ij in its role in the analysis. Consequently, we continue focusing on j, the true treatment effect. The elements of the model suggest a three-fold partition of the total sum of squares Q = QT + QB + QE. The individual terms: k

Q  n( X. j  X )2 (among k treatments) T

j 1

n

Q  k  ( X i .  X )2 (among n blocks) B

k

i 1

n

QE  ( Xij  Xi .  X j  X )2

(among k x n replicates).

j 1 i 1

The symbols are defined as follows: X . j - mean of jth treatment, X i . - mean k

n

of ith block , X - grand mean. Note: QB  QE  ( Xij  X. j )2 is the error j 1 i 1

sum of squares. The corresponding degrees of freedom is k(n- 1). The analysis proceeds from this point as in the complete randomized ANOVA: Variation Degrees of Sum of Mean F freedom squares square 132

Statistical Ecology

Treatments

k-1

QT

ST2

Blocks

n-1

QB

SB2

Error

(n-1)(k-1)

QE

SE2

Total

nk-1

Q

S

ST2 SE2

It is tempting to test Ho: 1 = 2 = ... =k =  based on the variance ratio

ST2

SB2 . However, this would not be valid, since the blocks represent non-random subsets of replicates. Ostle (1963, pp. 367) makes this point clear. Example 11.3.2.2 The effect of three protein diets on weight gain in rats was examined. Four litters of rats were used as a block to minimize the random effects attributable to inherent differences in the genetic makeup of the animals. Each litter contained three animals (treatments): Animal Group

1 2 3 4

1

2

3

14 17 12 15

16 19 16 16

18 22 17 18

Values are weight gains in grams. The following results were obtained: Variation

Sum of

Degrees of

Mean

squares

freedom

square

Treatment

36.167

2

18.083

Block

31.333

3

10.444

Error

3.167

6

0.528

Total

70.667

11

F 34.26

Since F > [F0.05;2,6 = 5.14] we conclude that there are significant differences between rats on the different protein diets. Multiple contrasts may be computed as the next step to further examine the data. We note that if we analysed the data according to the model of a complete randomized design, we would have SE  2

31.333  3.167 63

 3.833 . From this we conclude that blocking

substantially reduced the experimental error.

11.3.3 Latin square design This extension of the complete randomized block design involves two-way blocking. The assignment of treatments to experimental units is in accordance with a square grid design in such a way that any given treatment occurs only once in any given row or column (Example 11.3.3.1). 133

László Orlóci Example 11.3.3.1 We consider again the pasture experiment of Example 11.3.2.1. Suppose the investigator is restricted to using a single large plot. Such a plot is likely to be heterogeneous environmentally. To minimize the influence of this heterogeneity, a Latin square setup may be applied: 1

2

3

4

1 B,1 C,1 D,1 A,1 2 A,2 D,2 B,2 C,2 3 C,3 B,3 A,3 D,3 4 D,4 A,4 C,4 B,4 This scheme subdivides the field into 16 equal sized units. The four treatments (letters A to D) are assigned to these in such a way that any given treatment occurs only once in any of the rows or columns. The design assumes a complete lack of interaction among the rows, columns, and treatments. If there is any reason to doubt the independence assumption, a factorial ANOVA should be used (Kendall and Stuart, 1976, pp.138, et seq.), which we describe in the next section.

The model of partitioning response follows directly Xijk = + k + i + j +

ijk. This model emphasizes the treatment effect  which it isolate from

uncontrolled environmental variation in the direction across rows (to which the response is i) and across columns (to which the response is j). A four-fold partition of the total sum of squares is specified Q = QT + QR + QC + QE. Given the n treatments, each replicated n times and arranged in n rows and n columns, we have: n

n

n

Q  ( Xijk  X )2 (total) i 1 j 1 k 1 n

QT  n ( X..k  X )2

(between treatments)

k 1 n

QR  n ( X i ..  X )2

(among rows)

i 1

n

Q  n( X. j .  X )2 C

(among columns)

j 1

n

n

n

QE  QT   QR  QC   (Xijk  Xi ..  X..  X..k  X )2 (experimental eri 1 j 1 k 1

ror) n

n

n

QE  QR  QC  ( X ijk  X ..k )2 (error in complete randomized i 1 j 1 k 1

model) 134

Statistical Ecology

The last quantity has n(n-1) degrees of freedom. To interpret the subscripts, we conceive of a three-dimensional solid divided into k lattices of n x n cells in each. Each lattice is specific to a treatment. Considering Example 11.3.3.1, there would be 4 lattices each containing 4 non-zero values. Each colour represents a different lattice:

X11B X21A X31C X41D

X12C X22D X32B X42A

X13D X23B X33A X43C

X14A X24C X34D X44B

Note that the number of rows and columns must be the same as, or be a multiple of the number of treatments. Example 11.3.3.2 The experiment in question examined the effects of four fertilizer levels on total crop biomass. The design used a 4x4 Latin square. The following biomass measurements were reported: 1 2 3 4

1 36 67 40 43

2 40 36 47 65

3 32 30 61 23

4 43 22 32 26

(All of the above in kg/ha units.) The row assignments of treatments (A,B,C,D) accord with the map in Example 11.3.3.1. The ANOVA table is: Variation

Sum of

Degrees of

Mean

measured

of squares

of freedom

square

Treatment

1934.188

3

Rows

128.188

3

42.729

Columns

755.688

3

251.896

Error

112.375

6

18.729

Total

2930.438

15

644.729

F 34.424

Since the treatment F > [F0.05;3,6 = 4.76] we conclude that the fertilizer treatments had significantly different effect.

11.3.4 TWO-FACTOR DESIGN Factorial ANOVA is applicable when a single variable responds to the simultaneous effect of a number of factors. The analysis isolates the main effects of factors and the effect of the interaction between them. We restrict the discussion to a complete randomized two-factor design. Example 11.3.4.1 Consider a study in which the effect of nitrogen and lime treatments on a crop is examined. The lime treatment (factor A) has two levels, not limed and limed, while 135

László Orlóci nitrogen (factor B) is used at three levels. The total number of treatments is 6. The symbolic data set of replicates and corresponding means are: Treatment combination

1

2

3

A1B1

A1B2

A1B3

X111

X121

X131

X112

X122

X132

.. .

...

X11n

X12n

X 11

X14

...

4 A2B1

5

6

A2B2

A2B3

X211

X221

X231

X212

X222

X232

....

...

X13n

X21n

X22n

X 13

X21

Grand mean

X23n

X22

X23

X

The typical case, AiBj, refers to the group whose plots were given level i of the lime treatment and level j of the nitrogen treatment. Xij is the mean over the n replicate plots. Xijk is a replicate. Ranges for the subscripts are the following: i = 1, ...,a; j = 1, ..., b; and k = 1, ..., n where a is the number of levels of factor A, b is the number of levels of factor B, and n is the number of replicates per treatment.

The linear additive model for the two-factor case is Xijk =  + i + j + ()ij + ijk . The model partitions response into independent components:  for the base level; i for factor A, level i; j for factor B, level j; and ()ij is the effect produced by the interaction of factor A, level i, and factor B, level j. The total sum of squares is partitioned into the same number of components Q = QA + QB + QAB + QE. The total and error sums of squares are defined exactly as in the single-factor models. The main effect and interaction terms are defined by: a

QA  bn ( X i .  X )2 (main effect A) i 1

b

QB  bn( X. j  X )2 (main effect B) j 1

a

b

QAB  n( Xij  Xi .  X. j  X )2 (interaction AB) i 1 j 1

The ANOVA table is: Source of

Sum of

Degrees

Mean

variation

squares

of freedom

squares

II 136

F -ratio ModelI

Model

Statistical Ecology

QA

a-1

SA2

Main effect B

QB

b-1

SB2

Interaction AB

QAB

(a-1)(b-1)

Error

QE

ab(n-1)

SE2

Total

Q

abn-1

S2

Main effect A

S A2

S A2

SE2

2 S AB

SB2

SB2

SE2

2 S AB

2 S AB

2 SAB

SE2

SE2

2 SAB

The definition of the F-ratio depends on the model of the ANOVA (Ostle 1963, pp. 321-327). Inferences regarding the influence of the individual factors may be difficult. This is because higher order (non-linear) interactions may be present. However, the basic two-factor model is readily extended to more complex cases (Ostle 1963, Morrison 1983). Example 11.3.4.2 The field experiment is performed to determine the effects of lime and nitrogen on a crop yielded the following data for the six treatment combinations: A1B1 40 36 38 38 34

Replicates

Mean 38

A1B2 32 34 35 35 33

A2B1 35 33 32 36 40

A1B3

32 31 33 36 34

A2B2 38 40 42 40 43

A2B3

44 42 44 42

The grand mean X is 37 and the number of replicates n = 4. Results: Means, Factor A

X 1. =

38+34+33 = 35 3

Means, Factor B

X 1. =

38+34 = 36 2

X 2. =

X 2. =

34+40+43 = 39 3

34+40 = 37 2

X.3 =

33+43 = 38 2

Sums of squares - Q = (40-37)2 + (36-37)2 + ... + (42-37)2 = 370; QA = 12[(35-37)2 + (39-37)2] = 96 QB = 8[(36-37)2 + (37-37)2 + (38-37)2] = 16 QAB = 4[(38-35-36+37)2 +...+(43-39-38+37)2] = 208; QE = Q - QA - QB - QAB = 50 ANOVA table (Model I) - Variation A B AB E

Sum of squares 96 16 208 50

Degrees of freedom 1 2 2 18 137

Mean square 96 8 104 2.78

F 34.56 2.88 37.44

László Orlóci Total

370

23

Reference to the F-distribution shows that main effect A and the interaction AB are significant, but main effect B is not. To examine the interaction further, we inspect the plot of the 6 treatment means (Figure 11.3.4.1). Interaction is indicated by the fact that the two lines (representing the lime treatments) are not parallel. Increasing nitrogen level increases the crop yields when lime is present, but it has little effect in the absence of lime. This is not readily apparent from a quick perusal of the ANOVA table.

11.4 Testing homogeneity in discrete data The experimental design examined in this section uses a single discrete variable whose responses are observed under different treatments. The data set consists of counts, e.g., number of viable seeds. The counts are presented in a one-way table in which the columns represent treatments (Example 11.4.1).

Figure 11.3.4.1 Example 11.4.1 Population studies frequently involve experiments with seed viability. The outcomes are counts of seeds which germinated. In one case 100 seeds were sown at different depths in each of 15 pots under identical conditions, and the following numbers were found germinating after 10 days: Replicate (pots) 1 2 3 4 5 Total Mean

0 34 40 42 36 8 190 38

Depth cm 1 48 52 44 51 50 245 49

Total 2 12 16 22 24 21 95 19

530 35.33

The analysis of such an experiment is addressed in the following sections.

11.4.1 Null hypotheses for homogeneity The homogeneity test is performed in two steps: a. homogeneity of replicates within the treatments, and b. homogeneity of replicate means among the treatments. 138

Statistical Ecology

These are logical steps considering that the comparison of the treatment means would not make sense unless the treatments are homogeneous within. The number of replicates in treatment j is nj and the number of treatments is k. Two null hypotheses are tested: Ho1: [E(Xij - X j ) = 0; i = 1, ..., nj, j = 1, ..., k] Ho2:[E( X j - X ) = 0; j = 1, ..., k] . In these, Xij is replicate i in treatment j and X j is the mean of replicates in treatment j. X is the grand mean of all replicates. The first null hypothesis Ho1 states that the replicates have a common expectation within the treatments. This hypothesis has to be accepted before the treatment means are compared based on the second null hypothesis Ho2. According to the latter the treatment means have a common expectation. These two aspects of homogeneity are joined in the general hypothesis Ho: [E(Xij - X ) = 0; i = 1, ..., nj, j = 1, ..., k]. This reads: all replicates, irrespective of treatment, have common expectation X .

11.4.2 Test criteria The test criteria are formulated in terms of an information divergence (Rényi 1961): nj

k

I   X ij ln i 1 j 1

nj

X ij

k

for Ho and I(1)   X ij ln

X

i 1 j 1

k

I(2)   X. j ln j 1

Xj X

X ij Xj

for Ho1

for Ho2.

X.j is the total of treatment j. Divergences are additive, I = I1 + I2.

11.4.3 Homogeneity of replicates We have two sets of counts for each treatment: Xj = (X1j ... Xnj ) the observed counts, and 139

László Orlóci

 j  ( X j ...X j ) the expected counts with identical elements nj

Xj 

X

ij

i 1

nj

We note that 2I as in k

nj

2I(1)   X ij ln j 1 i 1

X ij Xj

is the overall test criterion for homogeneity within the treatments. This has an asymptotically Chi squared sampling distribution with = n1 + n2 + ... + nk - k degrees of freedom, under the usual regularity conditions (see Sections 10.3 and 12.2). Ho is rejected at probability of committing a type I error if 2I(1)  2 ; . If 2I(1)  2 ; within treatment homogeneity is declared. Example 11.4.3.1 Before examining the treatment means in Example 11.4.1, the homogeneity of replicates is tested within the treatments. For soil depth 0 cm, we have the ob190 served counts X1 = (34 40 42 36 38) have total X.1 = 190 and mean X 1  = 38. X has 5 expectations 1 = (38 38 38 38 38). The information divergence is

 

2I(11)  2  34 ln

34 38

 ...  38ln

38 

  1.054 . The second and third divergence terms are simi-

38 

larly calculated to give us the sum of 2I1 = 1.054 + 0.830 + 5.340 = 7.224.The total degrees k

of freedom is    (nj  1)  12. We choose the  = 0.05 rejection probability, for which j 1

2 the probability point is 0.05;12  21.026. The null hypothesis that the replicates are homo-

geneous within treatments is accepted.

11.4.4 Homogeneity of treatment means In the analysis of variance, the error variance, a term measuring replicate heterogeneity, is incorporated in the test of the treatment means through the numerator of F. Tests based on the Chi squared or information involve two steps. First the homogeneity of the replicates is tested (Ho1) and then the homogeneity of the treatment means (Ho2). The test criterion is 2I(2) which we refer to the theoretical Chi squared distribution with  = k - 1. The rejection/acceptance rule for Ho2 is the same as for Ho1. 140

Statistical Ecology Example 11.4.4.1 The homogeneity test which began with the replicates within the treatments in Example 11.4.3.1 is continued with the treatment means. The set of observed treatment means ( X1 X2 X3 ) = (38 49 19) is compared to the reference set  = [35.333 35.333 35.333] based on the grand mean

530

= 35.333. We use the treatment totals 15 ( X.1 X.2 X.3 )  (190 245 95) as weights and 2I2 = 70.012 as test criterion. At  = 0.05 the critical Chi squared probability point 5.991 is exceeded. We reject Ho2 and conclude that sowing depth has a significant influence on germination success. The test on 2I = 7.224 + 70.012 = 77.236 with degrees of freedom 14 leads to rejection of Ho.

11.5 Comparison of k covariance matrices Considering that covariance matrices define patterns of covariation among the response variables, we are interested in the following question: Are the k patterns of co-variation similar? The answer requires testing the null hypothesis Ho:E(S1) = E(S2) =... = E(Sk). The constants in the test criterion 2 = BC-1 incorporate the determinant |S| of the pooled covariance matrix and the determinants |Sj|, j=1, ..., k, of the k covariance matrices: k

(n  1)S j

k |S| B  (nj  1)ln , | Sj | j 1

S

j

j 1 k

(n  1) j

j 1

k

1

 (n  1)  j 1

j

1 k

(n  1) j

1

C  1  (2p  3p  1)

j 1

2

(p  1)(6(k  1)

S is the pooled covariance matrix and |S| is its determinant. The reference distribution of 2 is the Chi squared distribution with degrees of freedom

=

(k  1)p(p  1)

. 2 For these, we assume an underlying multivariate normal distribution and random sampling. Morrison (1976, pp. 250-251) describes further details and alternative tests. Example 11.5.1 Within each of three peat bogs (treatments A,B,C), 15 small, randomly sited plots (replicates) were located and within each plot the cover values of the species Sarracenia purpurea (X1) and Drosera rotundifolia (X2) were measured: 141

László Orlóci A

B

Species

C

Species

Plot

X1

X2

X1

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

16 15 18 12 18 9 15 9 16 18 17 13 16 15 14

11 12 9 17 15 9 17 16 18 9 16 12 9 17 10

18 19 23 19 25 24 22 27 25 19 18 26 25 26 26

Species X2 27 34 32 36 31 29 35 30 34 32 32 35 31 34 34

X1

X2

43 38 37 39 44 40 42 40 36 39 39 45 40 40 38

14 17 18 12 14 13 11 13 14 12 17 16 14 11 12

The scattergram of Figure 11.5.1.1 displays the three bog types as distinct clusters of points. The test of Ho:E(S1) = E(S2)=... = E(Sk) proceeds through the following intermediate steps:

Figure 11.5.1.1 Covariance matrices -SA =  

8.49524 -0.961905  -0.961905 12.40952 

SB =   SC =  

11.0286 0.657143  0.657143 6.25714 

6.42857 -0.642857  -0.642857 4.98095 

Pooled covariance matrix—

S

S A  S B  SC  8.65079 -0.315873  =  -0.315873 7.88254  3

Determinants - |SA| = 104.497,

|SB| = 68.5755,

The test criteria are 142

|SC| = 31.6071,

|S|= 68.0904

Statistical Ecology 68.0904 68.0904 68.0904 B = 14ln 104.497 + ln 68.5755 + ln 31.6071 = 4.64853





(8 + 6 - 1)0.2143 - 42 1

C-1 = 1 -

(3)6(2)

= 0.931217

2 = BC-1 = 4.329 Given = 0.05 and = 0.5(2)2(3) = 6, the critical probability point is observe that 2
0.05; 2,-0.5,19.5 = 0.185 (0.05 probability point from Heck's charts) indicates rejection of Ho. We conclude that at least some of the mean vectors differ significantly. Examination of the limits set for selected multiple contrast is the next step. Selecting bog types A and B and variable X1, the design vector is a = (1 0) and the contrast is specified by C = (1 -1 0). Given  = 0.05, the 1 -  simultaneous confidence limits are UL = - 4.7506 (upper limit) and LL = -11.3828 (lower limit) based on

 0.185   0.227 , aE-1a' = 363.333, 1   1  0.185 p

,

k

g 1

2

Cg

N

= 0.133333

g

k

a C X h

g

gh

= -8.067 (contrast).

h 1 g 1

The UL and LL limits exclude zero. Therefore, the null hypothesis of a zero population contrast has to be rejected; bog types A and B are declared significantly different based on the mean cover of species X1. Similar calculations produce the following results: 145

László Orlóci C = [1 0 -1]

a = [1 0]

LL = -28.5828

UL = -21.9506 (reject Ho)

C = [0 1 -1]

a = [1 0]

LL = -20.5161

UL = 13.8839 (reject Ho)

C = [1 -1 0]

a = [0 1]

LL = -22.4321

UL = -16.1012 (reject Ho)

C = [1 0 -1]

a = [0 1]

LL = -3.8988

UL = 2.4321 (accept Ho)

C = [0 1 -1]

a = [0 1]

LL = 15.3679

UL = 21.6988 (reject Ho)

IT HAPPENS MORE OFTEN than not that we have no knowledge of the population parameters. Even than we can do paired comparisons using the sample estimate’s probability distribution derived on the basis of the assumption that the compared samples come from the same parental population. In other respects the modus operandi is similar with all of its limitations to what we already discussed in the preceding chapters.

146

Statistical Ecology

Chapter 12 PROBABILISTIC COMPARISONS III Having completed the sample-to-sample comparisons, we now turn to cases which involve the direct comparison of variables. We want to answer specific questions about the variables interrelationships, and the significance, sense and strength of these relationships in probabilistic terms. The comparison functions include the product moment, when the variables are continuous, and divergence measures, such as Chi squared and information, when the variables are discrete.

12.1 Two continuous variables Consider n paired observations on p variables X1, X2 ... Xp,

 X1   X11 X   X 2 21 X=   .  . X   X  p   p1

X1 n 

X12

...

X22

...

X2n 

.

...

. 

Xp2

...

X pn 





The numerical relationship of any pair of the p variables can be defined in many different ways. If we assume that they are manifesting a continuous linear response to common underlying factors, the product moment correlation coefficient is appropriate. For variables Xh and Xi, this is a fraction of the covariance Shi and the standard deviations:

147

László Orlóci

rhi 

Shi Shh Sii

.

Interpretation of a correlation value relies on its significance, sense, and strength. To declare r12 significant, Ho: E(rhi) = hi = 0 has to be tested and rejected in favour of the alternative hypothesis (H1) which negates Ho. The test can be based on

t

rhi 1  rhr2 n2

which has the Student t distribution with = n-2 degrees of freedom, provided that the sample is randomly taken from a bivariate normal population with zero correlation. The sense of the correlation depends upon the sign of r (Figure 4.1.2.2). However, a significant correlation, whether positive or negative, does not necessarily indicate a numerically strong relationship. To express strength, we use the square of the correlation coeffi2

cient rhi ; this is proportional to the variance shared by the two variables. Xh shares

Sh2rhi2 

Shi2 Si2

of its variance with Xi which in turn shares

Si2rhi2 

Shi2 Sh2

of its variance with Xh. Consequently, Sh (1  rhi ) is the variance specific to 2

2

2 2 Xh and Si (1  rhi ) is the variance specific to Xi, not considering possible cor-

relations with other variables. Example 12.1.1 Leaf length and leaf width were measured on six specimens of Ulmus americana: Leaf length X1 Leaf width X2

18.2 16.4 18.0 17.1 10.3 8.0 9.7 8.3

14.0 16.3 cm 7.2 7.4

The following question is posed: "Are leaf length and width significantly correlated?" The answer comes from the test of Ho: E(r12) = 0 with the alternative hypothesis given as H1: E(r12) ≠ 0. The sample values are 148

Statistical Ecology

S12

1.6553 0.865 = 0.865, t = = 3.446 2.327(1.574) 0.252 4

2

= 2.327, S2 = 1.574, S12 = 1.6553, r12 =

The probability points are t0.975;4 = -2.776 and t0.025;4 = 2.776. These indicate rejection of Ho. We conclude that leaf length and leaf width are significantly correlated. The high value of

r12

indicates a numerically strong correlation.

12.2 Two binary variables We now consider measuring 'association' of two binary variables A,B. These are also called classificatory criteria by which observations are sorted. The term 'association' replaces 'correlation' to indicate that the test is now based on a different method. Coincidences are scored and presented in a 2 x 2 table (Section 8.4). The measuring function is the Chi squared based mean square contingency



2

n

(Section 8.4). In large sam-

ples,  will have a Chi squared distribution with 1 degree of freedom if 2

the sampling is random, the sampled population is Poisson, and Ho: E(2) = 0 is true. If  is not smaller than 2 ;1 , the association is said to be sig2

nificant at rejection probability, positive if a >

(a+b)(a+c) and negative if n

(a+b)(a+c) 2 . The greater  is in the sample, the less likely that the n observed level of association could have arisen by chance. Therefore, the a
0 in dendrograms z,y. The r values range from 1 to -1. Zero indicates no relationship. Positive values indicate colinearity, a form of convergence or parallelism. Negative values indicate proportionately intense divergence or lack of colinearity. The values of the correlation (Equation 7) can 174

Statistical Ecology

be summarised in a symmetric square matrix and further analysed. The many possibilities arising from such an analysis are left for the more ambitious reader to pursue.

12.6.7 Further on the partial variance Simultaneous analyses are performed on q hierarchical relevés. One of the emergent properties is called the partial variance slope symbolically α . This is the slope of the straight line fitted in regression analysis to the [Viu], u=1,2,...,q graph. The dependent or response variable is the partial variance. The independent or factor variable is the gradient of elevation of the terrain height, ammoniac nitrogen, or nitrate nitrogen in the current example (Table 12.6.3.1). Should α be tested and found to have expectation not significantly different from zero then it is interpreted as an indication that no significant effect issues from the gradient that trigger linear variation in the partial variances in the sample. The maximum possible partial variance slope max  can be found. This puts a handle on the expression of the actual variance slope in relative terms. The difference max is an expression of the unrealized gradient effect. q

Of very much interest for the analysis is the relationship of WT 1  Wj 1 and j 1

2 d= ST20  WT 1 in which ST 0 

q

S j 1

2 j0

. WT1 is thus the sum of q partial variance.

The individual terms are defined for the q hierarchical relevés (equation (2)) specific to the “pure” gradient signal, or equivalently to the baseline X in the multi hierarchical relevé sample. Quantity d is specific to UPS or the environmentally unbound phylogenetic signal, a part of the total generated by the branching in the dendrogram. The proposed relative signal amplitude scalars are thus WT1/ ST20 for the environmental signal and d/ ST20 for the phylogenetic signal. To handle this over the gradient (several ordered relevés) the Essay’s solution opts for the slope (tan ) of the d values. There are complications in the way of actually arriving at the conclusion such as the above. These are in part less model related and more linked to the data type if it has many zero entries and to the species traits whose plasticity is too narrow. As a preventive measure species should be chosen whose distribution spans the entire gradient. The species presented in Table 12.6.3.1 were selected this way.

Partial variance slopes are defined across the dendrogram levels as well. These are symptomatic of the branching effects. 

12.6.8 Results and interpretation 175

László Orlóci

The partial variances and other related results from the analysis of the Coquihalla floodplain data (Table 12.6.3.1) are presented in Table 12.6.8.1. Consider the three neighbor values 6.7401, 7.1398, and 2.4376 of row #1. These are partial variances calculated according to equation (5). Note that partial variances are not additive; only the Wi quantities of equation (1) are additive. The first number in column #5 is the regression coefficient (tangent α). The standard error of the regression coefficient is in the next cell (column#6) and a probability in the following cell. The standard error and the probability are means to evaluate the reliability and steepness of the population tangent α. The probability tells how often a sample estimate as great as or greater than the observed tangent α should come up by chance alone under random sampling and re-sampling of a Normal universe of partial variances with zero slope. The mechanistic basis of probability determination in the current case is a t-distribution at 2 degrees of freedom. Since the standard error is small and the probability is small also the expectation of the slope 35.71O is declared different from zero. In other words the sample estimate 35.71O is “significant”. This makes logical to proceed with the interpretations of the variance slope as an effect of the terrain’s elevation gradient on the floodplain and fossil terrace. The numbers in columns #9 and #10 are interpreted as follows: 59.25 O (sign ignored) is the possible maximum value for the variance slope under the circumstances as given in Table 12.6.3.1; the difference of the observed partial variance slope 35.71O and the possible maximum 59.25 O is 23.54 O or 39.73% which is the unfulfilled potential. Depending on the level in the dendrogram, relationships change and the measured values can be very different. A particularly important vale is 77.22 in cell #10 of row #6. This indicates that the unfulfilled effect outweighs the gradient effect 0.7722 to 0.2278 in proportional partial variance slope terms. This suggests the possibility of a very strong UPS (see commentary regarding Table 12.6.8.2 below). Table 12.6.8.1. Partial variances Viu generated by branching at the dendrogram nodes. The dendrogram data are presented in Table 1. Separate dendrograms apply to the floodplain benches (low “a”, medium “b”, high “c”) . The results in the second part of the table come from regression analysis. The terminology follows appropriate sections of the present book. Tan  measures the slope of the straight line fitted to the partial variances. Partial variance is the response (dependent) variable and bench height is the factor (independent) variable. Note: tan 45O = 1. The probability given in column 7 is valid under zero expectation for  in random sampling of a Normal universe. Partial variance slops (column 8) in degrees are obtained by arc tangent transformation. Note: arc tan 1=45O. The entries in column 10 are calculated according to the following scheme: take the absolute difference of the values in rows #8 and #9 and dived the difference by the value in #9, take the one complement of 176

Statistical Ecology the fraction and multiply by 100. Basic results not shown in the table are included in Appendix 1. Abbreviations: #1. to 5. – table rows; i – dendrogram level transitions; a,b,c – floodplain bench (FBP) u; WDLOBH – within dendrogram level over bench height; SETA – standard error of tangent a per dendrogram level; PMETA – probability of a more extreme tangent PVS – partial variance slope; PMEPVS – possible most extreme partial variance slope; PMTV – probability of a more extreme tangent  value. Part 1 – columns 1 to 4 u FPB height m Column

“a”

“b”

“c”

4.2

5.4

10.8

1

2

3

4

df

Via

Vib

Vic

#

i

1. Genus to Species

1

5

6.740

7.140

2.438

2. Family to Genus

2

11

7.349

1.587

4.075

3. Order to Family

3

7

3.304

6.150

7.232

4. Class to Order

4

10

5.416

3.755

5.876

5. Class variance

5

6

2.390

5.890

4.980

--

--

--

39

5.286

4.336

5.033

8. Tangent  #1. to 5,

-1.063

-0.033

0.688

9. Standard error #1 10 5

0.486

0.813

0.529

10. Probability (PMTV)

0.117

0.970

0.284

11. Partial variance slope (pvs)

-46.76

1.9

34.53

12. Possible most extreme pvs

-78.78

78.47

78.47

13. Difference |#11- 12|

40.64

97.59

56

6. Sum of absolute values #1 to 5 7. Baseline variance

Part 2 – columns 5 to 10 tan  WDLOBH 5

SETA

PMETA

PVS

PMEPVS

|#9-#8|

6

7

8

9

10

1.

-0.719

0.182

0.059

-35.713

-59.250

39.730

2.

-0.203

0.796

0.822

-1.160

-53.810

97.840

3.

0.474

0.329

0.286

25.380

60.290

57.900

4.

0.169

0.269

0.595

9.560

57.910

83.490

5.

0.221

4670

0.683

12.460

54.360

77.080

84.270

285.620

77.22*

0.730

57.020

98.720

Column #

6. 7.

0.013

0.139

0.935

177

László Orlóci *The associated gradient effect based on the linear assumption for response is 100-77.22 = 22.78%. Assumption of non-liner response is possible. The functional form of this would have to be a priori chose and applied consistently. The Whittaker-Groenewoud response curve (see Orlóci 1978 p.104 et seq.) with mean and standard deviation as parameters would be the logical initial choice (Orlóci 2010), but the response variable in this case is the partial variance.

Partial variance slopes and statistics found in rows #8 to #13 pertain to linear regression equations of the partial variance within the floodplain benches across the dendrogram levels. None of the tangent α in this group is sufficiently unique to warrant a conclusion other than the directionless assortment of partial variances among the levels. Table 12.6.8.2 gives the estimated partial variance based signal amplitudes (col2

umn headed by “Sum”). Ratio WT1/ ST 0 =14.2752 is the estimated relative ampli2

tude of the environmental (gradient) signal and d/ ST 0 = 85.7248 of the phylogenetic signal. The justification for so using these ratios is already explained. It is emphasised that WT1 , a total of partial variances (81.5878), is in fact the residual that remains of the grand total variance (of the multi relevé sample)

ST20

=571.5360 after the partial variance (489.9482) specific to the hierarchic arrangement of the species in the dendrogram is accounted for. In the above 3

WT 1  Wj 1  81.5878 j 1

and 3

ST20   S 2j 0  571.5360 j 1

It is seen that in terms as defined, the UPS outweighs the environmental signal roughly 6 to 1.

Table 12.6.8.2. Signal amplitudes in partial variance terms. See the main text for definition of WT 1 ,

STo2

2

2

and d. The fractions 100WT1/ STo and 100d/ STo express the amplitude of

the environmental signal and phylogenetic signal as per cents.

Part 1 – first 3 columns Source of variation

“a”

“b”

206.16

169.09

196.28

2. Elevation gradient WT1

33.70

35.70

12.19

3. Elevation gradient WT1 %

16.35

21.11

6.21

2

1.Total STo

178

“c”

Statistical Ecology 4. Hierarchy d

172.46

133.39

184.09

83.65

78.89

93.79

5. Hierarchy d % Part 2 – last 6 columns # Sum 1.

571.54

2.

81.59

3.

14.28

4.

489.95

5.

85.72

std

arc tan

slope o

P>|t|

0.50

5.44

0.46

26.60

0.94

-3.59

0.91

-1.30

-74.45

0.06

4.09

6.35

1.33

76.28

0.59

Figure 12.6.8.1 Graphical representation of the level-by-level partial correlation of dendrograms based on different types of baseline data. Symbols: PT (ground cover), FT (functional type cover), NN (nitrate nitrogen) and AN (ammonic nitrogen) mass. Equation (7) defines the correlation coefficient.

Partial correlation analysis on the dendrograms of the same 40 species involves four types of baseline data including species cover (PT), functional type cover (FT), nitrate nitrogen mass (NN), and ammoniac nitrogen mass (AN). We took the data as given in Table 12.6.3.1. The rizy values are summarised in the graph of Figure 12.6.8.1. The partial correlation is highest for the baseline data. The negative correlation for FT x AN is a consequence of the negative correlation of NN with AN.

12.6.9 Remarks The utility of the analysis in new cases will depend on the validity of two assumptions. The first requires environmental homogeneity within the sampling units. Ecologists use preferential location of sample plots (areal units on the ground) to obtain environmental homogeneity (Orlóci 2010). The second emphasizes the importance of species identification and the requirement that the species selected represent functional traits. The latter can be enhanced by selection of species with broad tolerance ranges. 179

László Orlóci What has been done and what are the novel findings in the Essay? Partial variances and partial variance slopes were the basis of signal isolations and amplitude measurements. The expectation of the partial variance slope differed significantly from zero only in the case of the baseline level (row #1 in Table 12.6.8.1). This implies that baseline is the only level where the environmental effect on the partial variance is statistically significant. Significance is missed at the other dendrogram levels by a wide margin. On the baseline level the proportion of the environmental effect to the maximum possible effect is 60% to 40%. Interestingly the overall estimates show a 6 to 1 margin of importance in favour of the phylogenetic signal on the partial variances over the terrain gradient effect. This is surprising, but it has to be borne in mind that species of broad tolerance limits were selected for the analysis. And of course, the conclusion is context dependent in that it is reached using the terrain’s elevation as environmental indicator. But then considering the terrain height dependence of overflow, the indicator value of terrain height for environmental quality is close to being absolute. A striking regularity is revealed in Figure 12.6.8.1. This is the high positive correlation of PT with FT and the positive correlation of FT with the NN but negative with AN. The switch of sense is the consequence of the negative correlation of NN and AN which are both terrain height related in an opposite sense. NN is high on low elevation terrain (bench “a”) where much herbaceous nitrogen-rich organic matter is produced. Ammoniac nitrogen is rich in raw-humus which accumulates on the high elevation terrain (bench “c”). These indicate sensitivity of the partial product Wizy to environmental forcing. This is most striking at the Genus to Species branching.

12.6.10 Conclusions Considering what has already been said, we cannot miss observing a definite sorting effect of bench height on species performance. But this does not exhaust the total effect, of which 85% cannot be attributed to the environmental effect. On this basis we identify the presence of a strong UPS. Furthermore, the highly significant PT x FT correlation (Genus to Species branching) and the significant FT x NN and FT x AN correlations are in close proximity with the Pillar-Duarte model of functional type selection in community assembly. In dynamic environments, such as the Coquihalla floodplain, the height of the terrain above the base water mark in the river has been used to derive a proxy time scale. Kerner von Marilaun (1863) and many others after him applied such a time scale in the description of vegetation succession. Clearly if we replace bench height in the discourse by “successional state” in the conventional sense of the term as ecologist use it, the principle of environmental selection of species by functional type in the course of “vegetation succession” should not have to be changed. But the connection of time to terrain height assumes perfect regularity 180

Statistical Ecology

in progressive aggradations without erosion. Likewise the use of benches of progressively increased elevation to recognise progressive states in succession may not be correct considering the well-known fact that plants, particularly trees, observed on a high bench may be survivals of many years of aggradations.

WE BEGAN THIS CHAPTER with the comparison of two variables. We had to assume that the variables responded linearly to the influential factors of their environment, and the underlying distributions were bivariate normal. We did this to justify the use the product moment correlation coefficient r for measuring the variables’ correlation. Next we considered an application of the mean square contingency value, and separately the exact probabilities of 2x2 tables, in tests of hypotheses regarding the relationship of binary variables. The case of multistate variables was taken up next and association measures (Chi squared, I-divergence information) were applied. Presentation of canonical correlation analysis followed in which two groups of variables were compared. We than extended the application of the notion of correlation to a hierarchical character sets. In this the comparison of entire hierarchical structures came into focus. In the first case the hierarchy was ‘balanced’ serving the purpose of a rigid scheme, a stencil as it were, that facilitates the comparison. In that way the results are unique to the ‘stencil’. In the second case the hierarchy is unbalanced and perfectly natural. There are no empty ‘runs’ or ‘imaginary taxa’. The results are unique to the natural case, stand alone or a part of complex comparisons. In the entire chapter the probabilities were derived from theoretical distributions in some cases or determined by the Monte Carlo method. Appendix 12.6.10.1 The following table contains the basic computed data from which results shown in the main text can be derived. S-matrix

DF

"a"

"b"

"c"

Elevation

m

4.2

5.4

10.8

Baseline X

39

206.1612

169.0929

196.2819

Genus

34

172.4606

133.3938

184.0938

Family

23

91.6267

115.9407

139.2641

Order

16

68.4958

72.8898

88.6421

Class

6

14.3372

35.3405

29.8826

Floodplain bench

“a”

“b”

“c”

Sum

tan 

std

arc tan

slope o

P>|t|

Total W

206.16

169.09

196.28

571.54

0.50

5.44

0.46

26.60

0.94

3.59

0.91

-1.30

-74.45

0.06

4.09

6.35

1.33

76.28

0.59

E W

gradient

E% Hierarchy W

33.70

35.70

12.19

81.59

16.35

21.11

6.21

14.28

172.46

133.39

184.09

489.95

181

László Orlóci Hierarchy %

83.65

78.89

93.79

W-matrix

“a”

“b”

“c”

tan 

std

arc tan

Slope o

P>|t|

Baseline X

206.161

206.161

206.161

206.161

206.161

206.161

206.161

206.161

Genus-toSpecies

33.701

33.701

33.701

33.701

33.701

33.701

33.701

33.701

Family-to-Genus

80.834

80.834

80.834

80.834

80.834

80.834

80.834

80.834

Order-toFamily

23.131

23.131

23.131

23.131

23.131

23.131

23.131

23.131

Class-to-Order

54.159

54.159

54.159

54.159

54.159

54.159

54.159

54.159

Class

14.337

14.337

14.337

14.337

14.337

14.337

14.337

14.337

V-matrix

DF

85.72

“a”

“b”

“c”

Std

arc tan

slope o

P>|t|

Baseline X

206.16

206.16

206.16

206.16

206.16

206.16

206.16

206.16

206.16

Genusto-Species

33.70

33.70

33.70

33.70

33.70

33.70

33.70

33.70

33.70

Familyto-Genus

80.83

80.83

80.83

80.83

80.83

80.83

80.83

80.83

80.83

Orderto-Family

23.13

23.13

23.13

23.13

23.13

23.13

23.13

23.13

23.13

Classto-Order

5.42

3.75

5.88

0.17

0.27

0.17

9.56

0.59

5.42

Class

2.39

5.89

4.98

0.22

0.47

0.22

12.46

0.68

2.39

182

Statistical Ecology

Chapter 13 TREND SEEKING: UNIVARIATE RESPONSE Trend seeking as understood in this chapter is the description of the functional form of continuous univariate responses as seen unfolded within a well-defined factor space. The case where the factor space is not well defined in the sense of not being directly measurable will be addressed in the section on ordinations. The aim of trend seeking is reached when a function is found that fits the observed responses with accuracy to our satisfaction, if no hypothesis suggested the response functions exact form. In either case, the choice of response function to be tried should not be mechanistic. The criteria of choice, the functions, and fitting techniques are the Chapter’s main topics.

13.1 Response type a straight line Consider two continuous variables Y and X. The underlying conditions are such that the trended portion of variation in variable Y is the response variable linearly linked to factor variable X. In a well designed experiment variation in X is controlled. Y is allowed to vary freely. In experiments conducted under natural conditions, variation cannot be controlled and for that reason even if variation in Y is closely correlated with X, it is not certain that Y is actually responding to X. All that may be certain is the existence of a statistical relationship. It is implicit in the application of the statistical techniques to be discussed that variable X is measured accurately (zero measurement errors), the 183

László Orlóci

sampling is random, and that variable Y has a normally probability distribution about the mean at each state (measured value) of X (Figure 13.1.1). Under these conditions, Y can be represented as a simple sum Y = o + 1X + . This is the equation of a straight line with an error term  included. o is the intercept on the Y axis and  1 the regression coefficient. The  term represents the sampling error (in random sampling) and it is assumed to be normally distributed with zero mean and variance V. The latter remains unknown if not a priori given.

Figure 13.1.1 In this, Y is response and X is factor. Other symbols: y – a regression estimate, bo – intercept on Y, b1 – regression coefficient,  – sampling error, f( mean zero and variance V.

) – normal density function with

When a sample is taken, o and 1 are replaced by their sample estimates bo and b1 and the regression line is defined by y= bo + b1X. The estimated error is incorporated when setting probabilistic limits on the regression estimates. The constants are b1  tg  

SXY SX2

and bo  Y  b1 X .

Constant b1 is the slope of the regression line and bo is the value of the intercept on the Y axis. S X2 is the variance of X, Y and X sample means of Y and X, and SXY the covariance. The statistical significance of regression model Y = o + 1X + can be tested. The relevant null hypothesis and alternative hypothesis are written for the slope of the regression line Ho: 1= 0 and H1: 1 ≠ 0. The test uses the regression and the error components of the total sample sums of squares (see Figure 13.1.1):

Q  QR  QE . 184

Statistical Ecology

The equations for calculation: n

n

Q   (Yj  Y )2 ,

QR   (y j  Y )2 , j 1

j 1

n

QE   (Yj  y j )2 j 1

The analysis of variance table is constructed next: Variation Sum of Degrees Mean examined squares freedom square Model

QR

1

SR2  QR

Error

QE

(n-2)

SE2 

Total

Q

(n-1)

SY2 

F

SR2 SE2

QE n2 Q

n 1

Note that since

Q  QR  QE , 2 the proportion r 

QR QE

defines the portion of the total sum of squares ac-

counted for by regression. The squared term r 2 is called coefficient of determination. The error mean square QR is an unbiased estimate of the variance V. Therefore, an equivalent of Ho:  1= 0 is Ho : E(SR2 )  V . The logical alternative hypothesis is one-sided H1 : E(SR2 )  V implying that the variation added onto the random variation (which the sampling variance represents) owing to response to the factor effect is not zero. Clearly, the model should not be declared significant if the added effect had zero expectation. The variance ration

F

SR2 SE2

is the test criterion. This has the theoretical variance ratio distribution with 1=1 (number of factor variables) and 2= n-2 degrees of freedom under Ho, if Y is normal and the sampling is random. In this particular model of 2 regression, t /2;  F ;1, so that Ho: 1= 0 testable using the t criterion, 2

2

185

László Orlóci

t

b1 Sb1

in which Sb  1

SE2 (n  1)SX2

.

This is the sample standard error of b1. The quantity t has the theoretical Student t-distribution with  = n - 2 degrees of freedom. To complete the analysis, 1 -  confidence limits are set for the regression coefficient in the manner of

b1  t /2; Sb

1

and for the regression line in the manner of

y j  t /2; Sy j . The error term in the latter is

Sy j  SE For a single y estimate

1 n



( X j  X )2 (n  1)S X2

.

n 1 1 is replaced by under the square root. The n n

limits become progressively broader the further X is from X . We drew the confidence belts for o,  and y Figure 13.1.2. Confidence limits can be obtained for o by setting Xj to 0 in the equation.

Figure 13.1.2 Confidence limits (belts) for the intercept, the regression coefficient and the regression estimate in the intercept, the regression coefficient and the regression estimate in Y units at any value variable X. Example 13.1.1 An experiment with the effect of temperature (X) on the daily growth rate (Y) of filamentous alga yielded the following data: X oC 0 5 10 15 20 25

Y mm/day 0.0 4.5 6.5 7.0 10.0 17.0 186

Statistical Ecology Is there sufficient evidence to suggest that Ho: 1= 0 is not true? The test is performed against a two-sided alternative H1: 1 ≠ 0:

X

75 6

 12.5 , SX2 

815.7 

SXY 

1375 

752 6  87.5 , Y 

5

45 6

 7.5 , SY2 

500.5  5

452 6  32.6

75(45 6

5

 51 , b1 

SXY 2



SX

51 87.5

 0.583 , bo  Y  b1 X = 7.5-0.582857(12.5) = 0.214

Based on these, we have the equation of the fitted regression line y = 0.214288 + 0.582857X. Y values for selected X values are given in Table 13.1.1.1. Ho: 1= 0 is tested in Table 13.1.1.2. j 1 2 3 4 5 6

Table 13.1.1.1 Xj yj 0 0.2 5 3.1 10 6.0 15 9.0 20 11.9 25` 14.8

Type of variation

Table 13.1.1.2 Sum of Degrees of squares freedom

Mean square

F

Model Error Total

148.629 14.371 163.000

148.629 3.59275 5

41.369

1 4

Selecting =0.05 as the rejection probability, F > [F0.05;1,4 = 7.71], Ho has to be rejected. We conclude that the slope of the fitted regression line differs significantly from zero. Given Sb 

14.371

0.582857  0.090620 , the alternative test criterion is t = 0.090620 = 6.432 . Observ-

4(5)87.5 ing t > [t.025; 4 = 2.776], on this basis too, the decision is rejection of H o. Alternatively, we

may base the test on an inspection of the 95% confidence limits. The solutions of b1 ± t/2;4 Sb = 0.582857 ± 2.776(0.090620) gives us the upper limit 0.331296 and the lower limit 0.834418. The limits do not include zero. Consistent with this, Ho have to be rejected. The 95% confidence limits about the regression line are listed at the observed points below: j 1 2 3 4

Xj

yj

Syj

Lower limit

Upper limit

oC

mm 0.2 3.1 6. 9.0

mm 1.4 1.0 0.8 0.8

mm -3.6 0.3 3.8 6.7

mm 4.0 6.0 8.3 11.2

0 5 10 15

187

László Orlóci 5 6

20 25

11.9 14.8

1.0 1.4

9.0 11.0

14.7 18.6

The limits at a typical point j=2: the lower limit 3.12857 - 2.776(1.02991) = 0.26954 or 0.3 mm and the upper limit 3.12857 + 2.776(1.02991) = 5.98760 or 6.0 mm. These correspond to

 1 (5  12.5)2  Sy2  3.59275     1.02991 mm  6 5(87.50) 

13.2 Response type a plane Consider the case of p independent (factor) variables X1, ..., Xp. The assumptions are the same as before, except that the magnitude of response Y is now conceived as the sum of p + 2 independent terms Yj = o + 1X1 + 2X2 +... + pXp + j. The equation to be fitted to the data is y= bo + b1X1 + b2X2 + ... + bpXp. The b coefficients are estimators of the unknown coefficients. The latter are called the partial regression coefficients. A given bh indicates the rate at which changes in X h indicate variation in variable Y. We begin the analysis by calculating the deviations:

 X11  X1 X12  X1  X X X X x   21 2 22 2  . .   X p1  X p X p 2  X p

y  Y1  Y

Y2  Y

... X1 n  X1 



... X 2n  X 2 

  ... X pn  X p 

...

.

... Yn  Y 

We now define the regression coefficients and the intercept. We use matrix algebraic expressions for this. We need S = XX' which is a cross products matrix with a characteristic element n

Shi   xhj xij j 1

in which xhj  X hj  X h and xij  X ij  X i . The inverse of S symbolically S1 is also needed. The values of b1, b2, ..., bp are the first, second, ..., pth elements in 188

Statistical Ecology

 b1  b  2 b     S 1 xy ' . . b   p The intercept is then p

bo  Y   bh X h . h1

An estimated response corresponding to the p- valued observational vector Xj is

y j = Y + b'x j . The significance of the partial regression coefficients is tested simultaneously by criterion

F

SR2 SE2

.

The terms,

FR2 

QR p

and SE2 

QE n  p 1

in which n

QR  (b ' x)y ' and QE  (Yj  Y )2  QR . j 1

The squared multiple correlation coefficient of Y and the set of X variables is R  2

QR Q

.

It is a measure of the proportion of variation in Y accounted for by regression. The 1 -  limits on the hth regression coefficient are defined by * bh  t /2;np1 SE2 Shh* in which Shh represents the hh diagonal element in the 1

inverse matrix element S . The limits for o are given by bo  t /2;n p 1 SE2

 1  X ' S 1 X   n 

The confidence limits on regression y at a given Xj are defined by 189

László Orlóci

y j  t /2;n p 1 SE2

If

1

is replaced by

 1  x , S 1 x  j .  n j 

n1

, the limits will apply to individual predictions y. n n It will be difficult to interpret the regression coefficients, unless they are given in standard units: bh

Shh SY2

. This is the number of standard unit

changes in Xh corresponding to one standard unit change in Y, when all other X variables are held constant. It is noted further that the rejection of Ho in the global test (based on F) indicates the need for further testing to identify the regression coefficients which contributed significantly to rejecb tion. For this test, we use the criterion t  . Shh1 SE2 h

Example 13.2.1 Photosynthetic rate was measured on a clump of the moss species (Dicranum scoparium) under the forest canopy. Leaf photosynthetic rate is measured by enclosing a leaf in a closed, transparent chamber and measuring the decrease in carbon dioxide concentration. It was hypothesized that photosynthetic rate (variable Y) is a linear function of temperature (X1) and light level (X2). The following contains the observational data: Temperature X1 oC

5 7 8 10 10 8

Light level X2 nE cm-2sec-1 5 30 10 10 30 5

Photosynthetic rate Y mg CO2 cm-2 hr-1 1 6 2 3 10 2

The deviations from the mean of Y are y =[-3 2 -2 -1 6 -2]. The inverse of the S matrix

S-1 =  -0.003077 0.0015824  0.0615385 -0.003077

. The regression coefficients and intercept come out to: p 0.47692  b and bo  Y   bh X h  3.42198 .  0.24044  h 1

The fitted equation is y= -3.42198 + 0.47692X1 + 0.24044X2 which describes a plane in threedimensional space. The fitted points and sum of squares are y=[0.17 7.13 2.80 3.75 8.56 1.60], QR = 52.589, Finally, the ANOVA table is: 190

Q = 58.000,

QE = Q - QR = 5.411 .

Statistical Ecology Variation explained Model

Sum of squares 52.589

Degrees of freedom 2

Mean square 26.295 1.804

Error

5.411

3

Total

58.000

5

F 14.576

Considering that F > [F0.05; 2,3 = 9.55], Ho is rejected. We conclude that the population value of the slope of the plane is not zero. That the plain fits closely the data is indicated by R2 =

52.589 58.000

 0.907.

To test the contribution of the individual variables to the rejection of Ho, we rely on t=

0.47692 0.06154(1.804)

= 1.43152 for b1 and t = 4.5006 for b2.

Since t.025;3 = 3.182, we conclude that only b2 contributes significantly to the overall regression. However, to conclude that temperature does not influence photosynthetic rate would go against basic understanding of biology. The conclusion should rather be that this specific data set is not sufficiently sensitive. The 95% confidence limits for b1 are 0.47692 ± 3.182 0.061538(1.804) or equivalently -0.5832 and 1.537. Similar calculations give us the limits for b2: 0.07044 and 0.41044. Since the limits for b1 enclose zero, we conclude that b1 is not significant. Conversely, the limits for b2 exclude zero, indicating significant b2. The 95% confidence limits for the intercept: 1 bo -3.422 ± 3.182 1.804   3.556  5 

or equivalently -11.704 and 4.86. The limits for the yj are listed in the table below. These are points on the 95% confidence surfaces on two sides of the fitted regression plane. j 1 2 3 4 5 6

X1 5 7 8 10 10 8

X2 5 30 10 10 30 5

yj 0.16 7.13 2.80 3.75 8.56 1.60

Lower -3.40 3.61 0.86 0.69 5.29 -0.84

Upper 3.72 10.64 4.74 6.81 11.83 4.03

13.3 Response type a polynomial The simple response models described earlier assumed that Y is responding linearly. But as it often happens, the response is not linear. The nonlinear relationship may take the form of a quadratic or higher order polynomial, a logarithmic function, and even an exponential curve. Indeed, any 191

László Orlóci

number of nonlinear relationships may be envisaged. Fitting polynomials of degree p is considered in this section. In the polynomial model of degree p, the response is a sum of p + 2 terms, Yj = o + 1X + 2 X 2 + ... + p X p + j. The fitted equation is yj = bo + b1X + b2 X 2 + ... + bp X p . This has the same form for computational purposes as the fitted equation described in Section 13.2. The data set differs in that it contains the original observations and the powers up to p,

 X1 X2 ...  2 2 X1 X2 ... X  . . ...  p p  X1 X2 ...

Xn 



X n2 

and Y = [Y1 Y2 ...Yn ] .

. 



X n2 

The deviation matrix x is obtained by subtraction of the means of the different powers of X from the elements of X. The deviation matrix y is obtained by subtraction of the mean of Y from the elements of Y. Example 13.3.1. The growth rate (gain in dry weight after eight weeks of treatment) of the common bog composite Aster nemoralis was measured at different pH levels. The data set below was obtained: Soil pH

Dry weight gr

Xj

Yj

3.0

37

3.5

42

4.0

45

4.5

48

5.0

45

5.5

41

A hypothesis suggests quadratic relationship of X andY: y = bo + b1X + b2X2. The deviations matrices are

[ -1.25 -0.75 -0.25 0.25 0.75 1.25 ] and y = [-6.0 -1.0 2.0 5.0 2.0 -2.0 ]

x = -9.79 -6.54 -2.79 1.46 6.21 1.46

Working with the polynomial model, we need

S =  37.1875 318.427  , 4.375 37.1875

S-1 =  -3.643 0.4286  , 31.193 3.643

192

 -50.200 

b =  43.721 

 -4.929 

Statistical Ecology The regression equation is y= -50.2 + 43.721X - 4.929X2. The sums of squares are: Q = 74.00, QR = 71.307 and QE = 2.693. Observing that F 

SR2

= 35.654/0.898 = 39.72 and that the

SE2

critical probability point F.05; 2,3 = 9.55, Ho should be rejected. We regard the quadratic model significant. The coefficient of determination R2 =

71.307 74.00

= 0.964 indicates the close

fit of the model. Considering the relative importance of b1 and b2, t1 = 8.26 and t2 = -7.95, both are significant. Limits Value

Lower

Upper

bo= b1 =

-50.20

-84.97

-15.425

43.72

26.88

60.56

b2 =

4.93

-6.90

-2.95

The 95% confidence limits on the b coefficients are listed in the first table below. The fitted values y and their 95% confidence limits are in the second table. pH

Dry weight

Dry weight

j

Xj

Yj gr

yj gr

Lower limit gr

Upper limit gr

1 2 3

3.0 3.5 4.0

37 42

36.61 42.45 45.83

33.87 40.78 43.99

39.34 44.12 47.67

4

4.5

46.74

44.91

48.58

5

5.0

45.19

43.52

46.86

6

5.5

41.18

38.45

43.91

45 48 45 41

13.4 Response type a product or exponential The general response model is

Y  oe X . 1

We linearize this in the manner of

Y *  o*  1 X   * . The following substitutions were applied: Y *  lnY , o*  ln o , and

 *  ln . From this point on the standard methods of linear regression are followed as in Section 13.2. Example 13.4.1 Phytoplankton cells were counted (Y) within unit volume in an experiment over 5 consecutive day (X). The proposition to be tested is that population growth tracks the exponential model y *  bo*  b1 X . The data set: Day X 1 2

Cell count Y 10 14 193

Y* = ln Y 2.3026 2.6391

László Orlóci 3 4 5

25 40 72

3.2189 3.6889 4.2767

The analysis of variance is outlined in Table 13.4.1.1. The regression is significant, since F> [F.05;1,3 = 10.1]. The observed Y values, regression estimates y, residuals, and confidence limits are listed in Table 13.4.1.2. Regression coefficient and intercept statistics are listed in Table 13.4.1.3. The coefficient of determination is R2 = 0.99. Computer printout is copied directly without rounding. Table 13.4.1.1 Source of

Sum of

Degrees

Mean

variation

squares

of freedom

square

Model

2.49800

1

2.49800

Residual

0.01741

3

0.00580440

Total

2.51541

4

F

430.363

Table 13.4.1.2 X

Y

y

Residual Residual %

95% confidence limits

95% confidence limits

on regression y 0.07696 3.3423087

on individual predictions y*

1

2.3026

2.22564

2.0378315

2.4134485

1.91895

2.53233

2

2.6391

2.72544 -0.08634

-3.27157

2.5926393

2.8582407

2.4489934

3.0018866

3

3.2189

3.22524 -0.00634

-0.196962

3.1168087

3.3336713

2.9596387

3.4908413

4

3.6889

3.72504 -0.03614

-0.979696

3.5922393

3.8578407

3.4485934

4.0014866

5

4.2767

4.22484

1.2126172

4.0370315

4.4126485

3.91815

4.53153

0.05186

Table 13.4.1.3 Value

SE

t

95% confidence limits

bo*

1.72584

0.079905175

21.59860097

1.47154607

b1

0.4998

0.024092317

20.7452028

0.423127495 0.576472505

1.98013393

P(t RND;3  t ) 0.00022 0.00024

13.5 Working with residuals When we fit a response function Y = ƒ(X) to the data set, we assume that the model has been correctly chosen. In many biological applications, however, the correct model is not known. The incorrect choice is likely to result in a larger error mean square. One method for minimizing this involves fitting different regression functions to the data, and choosing the function for which the error sum of squares is minimal. But this may not work. Some high order polynomial will reduce the error mean square, yet the information conveyed about the relationship of Y to X may have little biological credibility. Alternatively, we may examine the residuals, i.e., the deviations of the observed values from the fitted regression Y=Y-y. The shape of the joint 194

Statistical Ecology

scatter of X and Y may shed light on the nature of what the regression equation missed. As a rule, a correctly chosen model is indicated by a random scatter of residuals, while an incorrectly chosen one will exhibit trends. Example 13.5.1 Consider the data set which has already been analyzed in Example 13.3.1. The best fitting linear equation for this data set is y = 35.23 + 1.829X. The original values, estimates, and residuals are in the table below: # 1 2 3 4 5 6

Y 37.0 42.0 45.0 48.0 45.0 41.0

y 40.71 41.63 42.54 43.46 44.37 45.29

Y 3.71 0.37 2.46 4.54 0.63 4.29

X 3.0 3.5 4.0 4.5 5.0 5.5

Observing that [F = 0.986] < [F.05;1,5 = 6.61], we conclude that the linear regression is not significant. Dispersion of residuals Y around the fitted regression line is shown as a function of X in Fig. 13.5.1.1. The shape of the Y scatter suggests non-linear response components, verifying that the non-linear regression model used in Example 13.5.1 is appropriate.

Figure 13.5.1.1

THE QUESTION of how to choose an appropriate response function and how to fit it to the data is a main topic of applied regression analysis. Another is how to determine significance and what to do with the residuals. The point was made that the choice of the best fitting model may be a matter of trial and error, but the “goodness of fit” is not a reliable criterion. The reason is that there is at least one Chebyshev polynomial of some high order that will give a 100 % accurate fit.

195

László Orlóci

Chapter 14 CHARACTER ANALYSIS: IMPORTANCE A POSTERIORI We now turn to the question of importance related to the potential influence of variables in a given set on the outcome of an analysis. This is to say we want to measure importance a posteriori in a pilot data set before we finalise the contents of the definitive character set. We do this by measuring how much the individual variables contribute to the complexity of structures that the entire set of variables defined. The methods isolate the specific and common components of the total sum of squares, cross products, entropy, information, and related physical quantities in the pilot sample.

14.1 Multiple correlation We begin with a p x p covariance matrix S, a typical element of which is such as Shi 

1 n 1

n

(Xhj  Xh )(Xij  Xi ) . j 1

The partition of the variance is sought such that Shh = Shhs + Shhc , of which Shhs is specific to variable h and Shhc is shared in common by variable h with the other p - 1 members of the variable set. As seen from

196

Statistical Ecology

1

Shhs 

* Shh

,

the specific component is the one fraction of the hh diagonal element of the inverse of matrix S. The shared component is easiest to obtain by subtraction: Shhc = Shh – Shhs. The ratio Rh2 

Shhc Shh

is the squared correlation coefficient of variable h and the other p-1 variables of the set. We can order the variables by the Rh2 values as weights. However, for best results the ordering should be based on residuals. The following is the algorithm : 1

(a) Compute S and its inverse S . (b) Find 2

R12 , R32 , ..., Rp (c) Find the highest value

W1(m)  SUP(R12 , R22 ,..., Rp2 ) . This is the weight measuring the posterior importance of variable m. Function SUP selects the highest value. (d) Compute the first residual of S according to (1) Shh  Shi 

Shm Sim

.

Smm

(1) (e) Given the specific variance Shh  Shhs (constant in the analysis), compute the quantity the correlation

Rh  2(1)

(1) Shh  Shhs (1) Shh

.

(f) Find the highest value

W2(u )  SUP( R12(1) , R22(1) ,..., R 2(1) p ) and declare it as the weight of a posteriori importance of variable u. Repeat the steps by finding the next residual of S, the next set of residual 197

László Orlóci

correlations, and the next highest value until all p variables are given their weight. The final order is

W1( m ) > W2(u ) > ...> W p( z ) . Note that some of the variables may end up having a zero weight. This is normal when the rank of S is less than its order. Should one wish to reduce the number of variables from p to k to create a parsimonious set, the p-k variables with the lowest a posterior weights are discarded. The information loss is proportional to the fraction of two sums, p

Wi

(*)

Wi

(*)

i k p



Sum of the k smallest weights . Sum of all weights

i 1

The asterisk replaces a variable label. Considering the dependence of Rh2 on Shhc , the squared correlation Rh2 will indicate the degree to which variable h is redundant with respect to the other variables in the set. It should be noted that Rh is conceptually related to the simple product moment correlation coefficient (Section 4.1.2). Both types measure linear dependence. Example 14.1.1 In this exercise we compute weights for the starfish variables of Example 4.1.2.2.1. The partial multiple correlation coefficient is the weight function. We need the covariance matrix and its inverse,

19.6667 13.8333 -1.5 -5.16667 S = 13.8333 16 -1.5 -5.16667 5.57143

  

0.183448 -0.203639 -0.139454 S-1 = -0.203639 0.315267 0.237538 -0.139454 0.237538 0.362222

  

The specific variances are found as the reciprocals of the values in the principle diagonal: S11s = 5.45114, S22s = 3.17191, S33s = 2.76073 . These values do not change in the analysis. We have the first set of intermediate results: Variable 1 2 3

Shh 19.6667 16.0000 5.57143

Shhs

Shhc

Rh2

5.45114 3.17191 2.76073

14.2155 12.8281 2.81070

0.723 0.802 0.504

Variable 2 (Pisaster ochraceus counts) has the highest weight (0.802). This defines the shape of the first residual covariance matrix: 198

Statistical Ecology

7.70659 0 2.96701 0 0 S(1) = 0 2.96701 0 3.90303

  

.

A typical element of this is S13  S13  (1)

S12 S32 S22

 -1.5 -

13.8333(-5.16667) = 2.96701 16

A second set of intermediate results are obtained from S(1), Variable

(1)

1

Shh 7.70660

3

3.90303

5.45114

Shhc 2.25546

Rh2(1) 0.293

2.76073

1.14230

0.293

Shhs

(1)

The weights are the same R12(1)  R32(1) .

14.2 Specific variance The previous section already used the specific variances Shhs 

1 *

Shh

* to find the common variance components. Shh is the hh element of the in-

cers matrix S-1. Shhs as a weight emphasizes the information carried individually by the variable h independently, not a portion of the variables’ covariance structure. When used as the weight criterion, the information carried by the covariance structure is lost. Example 14.2.1 The specific variances for the starfish data are given in Example 14.1.1. On this basis highest weight is carried by variable 1, and lowest by variable 3.

14.3 Sum of squares The computations proceed as in Section 14.1, but in this case S is defined as the sum of squares and products matrix. The weight criterion is the highp

Shi2

i h

Shh

est sum of squares Qh   ( m)

m if W1

and the maximum weight occurs on variable

 SUP(Q1 , Q2 ,..., Qp ) . In order to obtain subsequent weights W2(u)

(z)

to Wp we need the subsequent residuals of the S matrix, at most p, S(1) , S(2) , ... , S( p1) 199

László Orlóci Residuals are obtained by decomposition of sums of squares and products (elements in S) in a strategic way to obtain coordinates for objects. Suppose that we take character m as our first axis and we wish to find coordinates (scores) on this axis for character h and i. In terms of the elements of S the length of the position vector of character h is same for characters i and m are

Smm . Knowing that

Sii and

cos  hm 

Shh and the

Shm

Sim

and cos  im 

Shh Smm

Sii Smm

, scores for character h and i on the position vector of m are Yhm 

Shm Smm

Sim

and Yim 

Smm

The first residual of Shi is Shi(1)  Shi  YhmYim . Repetition of the steps yields the first residual matrix S(1) . The second residual matrix S(2) is derived from the first, the third S(3) from the second, and so on until all covariation were exhausted. The number of residual matrices is usually less than the number of variables (p) when p is large relative to the sample size n.

The result is an ordered set W1(m) >

(z) W2(u) > ...> Wp of weights indicative of the relative success of the individ-

ual variables to account for covariation in the set. Example 14.3.1 The starfish data of Example 4.2.2.1 is reanalyzed based on the sum of squares and cross-products matrix

118.0 83.0 -9.0 (n-1)S= 83.0 96.0 -31.0  -9.0 -31.0 33.4

  

.

The results are Variable

%

Rank

1

Weight 53.09

21.5

2

2

177.77

71.8

1

3

16.56

6.7

3

Total

247.42

100.0

Variable 2 (counts of Pisaster ochraceus) has ranks #1. It accounts for close to three-fourths of the total sum of squares. The sum of the weights (247.42) is the same as the sum of the sums of squares (diagonal elements in S).

14.4 Information The joint frequency of the states of two variables are recorded in n sampling units. The records are organised such as bellow in the contingency table: 200

Statistical Ecology

Variable 2 2 ...

1

Variable 1

s2

F2

1

f11

f12

...

f1s2

f1.

2

f 21

f 22

...

f 2s2

f 2.

. s1

. f s11

. f s1 2

... …

F1

f.1

f.2

...

.

. f s1s2

f s1.

f.s2

f..

A typical element fij represents symbolically the joint frequency of state i of Variable 1 and state j in Variable 2. The marginal distributions F1 and F2 have s1 and s2 elements respectively. Typical marginal totals are represented by fi . and f. j . The examples illustrate partitions into additive components of information divergences (Section 4.1.3): Source Heterogeneity (F1)

Information measured s1

I(F1 ;F 1 )   fi . ln

fi . s1 f..

I(F2 ;F 2 )   f j . ln

f j . s2

i 1 s2

Heterogeneity (F2)

f..

j 1

Mutual (F12)

s1

s2

I(F;Fo )   fij ln i 1 j 1

Joint (F1+2)

s1

s2

I(F;F)   fij ln i 1 j 1

fij f.. fi . f. j

fij s1 s2 f..

Heterogeneity is measured as a one-way information divergence (I-divergence, Kullback 1959) of the marginal distribution Fi from the most dispersed distribution Fi of the same number of elements, representing the distribution mean . (Section 4.1.2). Mutual information measures the divergence of F (joint frequencies) from Fo. The elements in the latter represent random expectations under fixed marginal totals. The joint information is a one way divergence but the standard is the grand mean of the table. The difference I(F ; F )  I(F ; F ) is the equivocation information. The terms are additive: o

201

László Orlóci

I(F ;F)  I(F1 ;F1 )  I(F2 ;F2)  I(F ;F o ) . The definitions are readily extended to cases in which the number of variables is greater than two. Using I(Fi ; Fi ) as a variables weight, it fixes how much of the total inforo

mation is attributable to variable i. When we use I(F ; F ) of a pxp contingency table, variable i is given specific weight equal to the residual o o ∆i = I(F ; F )| p - I(F ; F )| p  i

Symbol



indicates exclusion of variable i. Given

(1) m  SUP(1 , 2 ,...,  p ) , the largest among the p trial weights, ∆m, is the weight to be assigned to variable m. After removing variable m from the set, a new set of p-1 trial weights is computed according to ∆i = {I(F ; F )| p}  {I(F ; F )| p  i , m} o

o

to obtain u  SUP(1 , 2 ,...,  p1 ) the weight (second residual) of (2)

(1)

(1)

(1)

variable u. The analysis continues in this manner until one of two things happens: no more variables remains, the mutual information is totally exhausted. Example 14.4.1 The sequence in which the same five species of ruminants arrived at a watering hole on 80 different occasions is: Species A B C D E

Arrival sequences 2442132214

5222221334

2322412211

3212522224

3255342231

3452124415

2442225221

2241434544

1533451125

3333554253

3214141142

232111333

5322251352

5534435522

3313313112

3414551215

5321325553

2151113522

1533533335

5553231152

2544533125

2323313354

1251152335

1353325453

3214213332

1515332411

5455225523

4135354511

1133425513

1215252233

5125531553

5522212322

4155544441

4444445145

4141354454

1444445445

4411114444

4141541141

4534444444

4135143131

Note that the number of states for each species (variable) is 5, except for species E for which it is 4 (state 2 do not occur). There are 80 column and 5 rows of the scores table. A typical column, say 2 1 5 3 4, indicates that species B was the first to arrive, species A the second, 202

Statistical Ecology and so forth. Considering arrivals, the following is a relevant question: "How reliable is the arrival of one species to predict the arrival of the others?" To answer this question we need o

to partition the total mutual information I(F ; F ) into components specific to the species. But the partitioning should be additive based on residuals. Toward this objective we need two types of quantities to facilitate the computations, namely, the principal marginal frequencies and the joint frequencies. We have the marginal frequencies: Species Arrival sequence A B C D E

1

2

3

4

5

12 18 14 17 19

33 14 14 19 -

10 23 25 17 5

17 9 4 6 44

8 16 23 21 12

The joint frequencies are Column vector Observed Column vector Observed of joint states frequency of joint states frequency 1 21534 4 10 23514 4 2 45321 8 11 25134 2 3 43215 4 12 32541 4 4 23145 2 13 21354 8 5 14325 6 14 24351 3 6 35214 6 15 41523 5 7 12534 6 16 51234 1 8 53214 3 17 52431 4 9 23154 10 We note, the joint distribution is nominally 5 dimensional with 4 x 54 = 2500 cells involved, only 5! =120 of these could possibly materialize as an observational vector of 5 joint states. Of the possible 120, only 17 cases materialize. Simultaneous arrivals were not recorded. The heterogeneity information is based on the marginal frequencies. A characteristic element is 12(5)

I(FA ; X A ) = 12 ln 80

33(5) 10(5) 17(5) 8(5) + 33 ln 80 + 10 ln 80 + 17 ln 80 + 8 ln 80 = 11.2225.

Should the weights be based on the heterogeneity information, species E will be given the highest weight 20.6562. The complete set is tabulated: Species i

Heterogeneity

I(Fi ;Fi )

A B C D E Total (TH)

11.2225 3.4192 10.2199 5.1520 20.6562 50.6698 203

László Orlóci Clearly the order established by the I(Fi ;Fi ) from highest to lowest is:

I(FE ; XE )  I(FA ; X A )  I(FC ; XC )  I(FD ; XD )  I(FB ; XB ) . We now show how to compute weights based on the species’ contribution to the total muo tual information I(F ; F )| p in the set of p variables. In the process we also identify weights

by additive partition of the joint information I(F ; F )| p . To begin with we face a decision to resolve the dilemma of what to do with 2500-120 cells o for which the score has to be a 0 a priori. In other words, should we compute I(F ; F )| p

and its components, and I(F ; F )| p and its components based on the 2500-cell setup or limit the problem to the 120 cells. The question phrased specifically in the context of the joint information I(F ; F )| p , should the user go for the grand mean F = 80/2500 or F =80/120? If the data structure embedded into the 2500-cell 5 dimensional table is important for the user to be retained within the 5 dimensional analytical space, he or she will decide to opt for F = 80/2500 and define the joint information of the p=5 species such based on the sum

I(F ; F )| p = 4ln

4 80 / 2500

 8ln

8 80 / 2500

 ...  4ln

4 80 / 2500



409.047

o Based on this, the mutual information of the 5 species is I(F ; F )| p = 409.047 - 50.6698 = 358.3772. In the next step, we compute the first residual of the joint information and also the residual of the total heterogeneity TH | p  i conditional on the changing species identity i:

Species omitted i A B C D E

Residual joint information

Residual heterogeneity

I(F ; F )| p  i

TH | p  i = TH | p - I(Fi ; Fi )

280.292* 280.292 280.292 280.292 298.144

39.4472 47.2505 40.4498 45.5177 30.0136

Each entry in the second column is based on p-1 active variables with the ith omitted from the set. The third column entries are found by subtraction. For example in the case of B, the entry 47.2506 is the difference 50.6698 - 3.4192. o Regarding I(F ; F )| p  B , we have 17 vectors of 4 joint states with identical frequencies as the original 5 joint states. For example, when species B is removed, we have:

Joint states

Frequency

2534 4321 . 5431

4 8 . 4 204

Statistical Ecology Because of these, the joint information will depend on the product of the states of the variables involved. This is 500 or 625 whether E is included or excluded in the 4 variable sets. Thus we have I(F ; F )| p  B  4ln

4 80 / 500

 8ln

8 80 / 500

 ...  4ln

4 80 / 500

 280.292

for the set without B and I(F ; F )| p  E  298.144 for the set without E. o o Now turning to the I(F ; F )| p  i components of I(F ; F )| p = 358.3772, the trial weights accord with

∆i= I(F ; F o )| p

- I(F ; F o )| p  i :

Species omitted i

I(F ; F o )| p  i

∆i

A

240.845

117.532

B

233.041

125.335

C

239.842

118.535

D

234.774

123.603

E

268.131

90.247

For example,

I(F ; F o )| p  B = 280.292 - 47.2506 = 233.0414 and

∆B = I(F ;F o )|p - I(F ; F o )| p  B =358.377 - 233.0414 = 125.3346 . Since the largest ∆ among the 5 trial weights is ∆B, species B is given 125.3 as its weight. In the next step, species B is removed from the set and the analysis is repeated on the 4 remaining species (A,C,D,E). For this reduced set I(F ; F o )| p  B, D = 122.640

and

∆B = I ( F ; F o ) | p  B - I(F ; F o )| p  B, D = 233.0414 - 122.640 = 110.4014 . This happens to be the largest ∆ among the trial weights for the reduced set of species, so species D is given the weight 110.4. Removal of species D from the data leaves three species in the analysis for which the calculations proceed in a similar way. Eventually, we have the final weights in mutual information terms: Species i

∆i

%

B D A C and E

125.3 110.4 94.8 27.8

35 31 26 8

205

László Orlóci I(F ; F o )| p

358.3

Note the sum of the 4 terms in the table (358.3) is the mutual information in the 5-species set. The ∆i are weights indicative of the relative importance of the individual species in predicting the arrival sequence. The results suggest: (1) When all species are considered, species B is the best predictor of the arrival sequence. It is most often the third to arrive, often after species A and most frequently before species E. (2) When species B is removed from consideration, species D comes to some prominence as a predictor of the arrival sequence. (3) Species C and E are the least reliable predictors. (4) Pair-wise comparisons may refine the predictive power of the individual species for the other species taken individually.

14.5 Weighting variables: a discussion Weights applied in the manner of WX are likely to modify the influence of variable X in the outcome. When the analysis is concerned with the properties of biological populations, without a priori specification of the character set, the problem of which to choose from a potentially large number of characters will have to be faced. Character weighting can supply much helpful information: Weights suggest potential. Example 14.4.1 is a case in point. The weights indicate how well the states of one character can be relied on in predicting the states of the others. Weights suggest order. The order established is indicative of the characters' relative importance in the context of the weighing function. Weights suggest distorting power. Characters analyzed simultaneously may influence the results unequally. We can say that weights serve as indicators of the degree of distortion that would result if characters were selectively discarded. WE DETERMINED weights by which we could establish order of importance in the character set. We reasoned that any weighting function has potential if it facilitates partitioning of the total criterion into additive components. We gave examples for the main types and refer to additional ones in works by Williams, Dale and Macnaughton-Smith (1964), Orlóci (1973, 1978 p. 25 et seq., 1976, 2008), Rohlf (1977), Jancey (1979,1986), Dale (1986), and Pillar (2009).

206

Statistical Ecology

Chapter 15 EXPLORATION OF COMPLEX DATA The preceding chapters emphasized descriptions and comparisons based on methods which often required assumptions regarding the underlying distributions within a well-defined factor space of measurable components. The present chapter offers thoughts on data analysis when the sampling environment is as given by nature with no controls over the type or the intensity of the factor effects. The objective is definitely exploration. What to explore in the medium exactly will depend on our views of the medium. How to do the exploration? The common approaches rely on ordination, cluster recognition, and identification techniques.

15.1 Multidimensional or multivariate X is the symbolic representation of a data set we wish to explore:

 X11 X 21 X  . X  p1

X1 n 

X12

...

X22

...

X 2n 

.

...

. 

X p2

...

X pn 

207





László Orlóci

Response variables are assumed, p in total. The record for these appears as the rows of the data. The n columns (relevés) give the descriptions of the sampling units (individual organisms, sample plots, quadrats) on which the variables were measured. Notwithstanding the two-dimensional presentation of the records, the mathematical relationships among variables or relevés in terms of some distance or similarity measure may have intrinsic dimensions up to the lesser of p or n-1. Example 15.1.1 An investigator running an elevation transect up on a mountainside, locates sample plots at 200 meter intervals. He records the number of trees per plot (Pl 1 to 20) for the four leading dominant species A, B, C. D: Low

ELEVATION

High

Pl

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

A

40

45

50

47

41

35

25

20

15

13

10

7

5

2

2

2

B

10

16

22

37

40

45

42

40

36

30

25

20

16

13

7

3

0

0

0

0

2

1

0

C

0

2

5

11

13

17

25

31

41

51

58

60

51

40

35

0

28

25

20

14

D

0

0 1 11

1

1

2

5

8

12

15

19

26

35

52

60

10

66

60

49

36

30

This is a typical multivariate data set. There are 4 response variables. In the specific case of the example, the investigator is interested in the quantification of the coenocline (compositional gradient) as a first step of a search for the factors which force the elevation related pattern onto the vegetation. If such a coenocline is found existing, identification of the causes is a logical next step, which would lead to a branching out of the research in different directions or even to new data collection. Strong hypotheses would no doubt be generated about the functioning of species and the plant community, and the research could enter a new phase highlighted by controlled experiments.

15.2 Views of the medium The users' objectives and viewpoints certainly influence method selection, and possibly, restrict the scope of the analysis. In some cases, the scope is confined to include only statistical methods of estimation and hypothesis testing. Such a restriction of scope is, however, very much out of step with many applications of data analysis which find utility in the use of a broad range of non-statistical exploratory methods. These may be novel in having no precedent in statistical applications or simply apply the mathematical barebones of the conventional statistical methods. The problem of narrowing method selection is real and the choice is highly personal. 208

Statistical Ecology

One’s methodological choices are very much limited by one's intuitions about the true nature of the medium, such as for example the vegetation. Many conceive this medium as one with an inherent group structure. For them, multivariate analysis is analogous to automated group recognition with the purpose to erect a syntaxonomy, to establish order out of chaos, to enhance manageability by type casting, to promote communications, and so forth. Example 15.2.1 Those holding the view that the vegetation is a structured (discrete) medium will consider classification, and possibly subsequent ground mapping, as a main objectives of vegetation studies. The most ambitious may even attempt to construct hierarchical systems in form not unlike the existing hierarchies of the plant and animal Kingdom. In a concrete case, described by Frydman and Whittaker (1968), the syntaxonomy contains four hierarchical levels (association, alliance, order, class): Class Oxycocco-Sphagnetea (Raised Bogs)

Class Querceto-Fagetea (Oak-Beech Forests)

Order Ericeto-Ledetalia Order

Order Fagetalia silvatica

Alliance Oxycocco-Ericion

Alliance Alno-Padion

1. Association Sphagnetum medii

6. Association Circaeo-Alnetum

Class Vaccinio-Piceetea (Treed Bogs)

Alliance Carpinion

Order Vaccinio- Piceetalia Alliance Vaccinio-Piceion

7. Association Querceto-Carpinetum Medioeuropaeum

2. Association Pineto-Vaccinietum uliginosi

Alliance Fagion

3. Association Pineto-Vaccinietum myrtilli

8. Association Fagetum carpaticum

4. Association Abietum polonicum

Order Quercetalia pubescentis

Class Alnetea glutinosae (Wet Alder Woods)

Alliance Quercion pubescentis sessiliflorae

Order Alnetalia glutinosae

9. Association Querceto Potentilletum albae

-

Alliance Alnion glutinosae

10. Association Coryleto Peucedanetum cervariae.

5. Association Cariceto elongatae-Alnetum

Undoubted, such a classification has practical utility. Beyond this, the fact of a successful classification will not be sufficient as the proof that a natural group structure or hierarchy actually exists Those objecting to the distinct group structure hypothesis may opt to analyze the medium as if it were truly continuous. Data analysis then becomes identical in purpose with the discovery of continuous trends and the identification of the influential factors to which the trends manifest a response. Figure 15.2.1 gives an example. In this the 10 associations of Example 15.2.1 are displayed. Directions are suggested in which soil fertility and wetness increase in the environment. The distances between the points reflect compositional differences. Clearly, this type of graphical representation tends to emphasize continuity, without proving that continuity actually exists. 209

László Orlóci Example 15.2.2 Consistent with the vegetation continuum hypothesis, the idealised Whittaker-Groenewoud type performance Y of species populations is represented in Figure 15.2.2 by continuous curves on the X axis. The interpretation of such graphs is ambiguous, since the shape of demonstrated response may reflect influences from sources other than the factor variable X.

Figure 15.2.1

Figure 15.2.2

15.3 Broad objectives We can identify at least three main objectives for exploratory data analysis: Ordination. Pathways are sought along straight lines or curves in the space of the joint scatter of response variables or their transformations. Clustering. Groups of objects are sought to be used as reference taxa or types. Identification. The class affiliation of unidentified units is sought based on affinities to potential parental classes. Example 15.3.1 A case involving two variables X1,X2 is presented in Figure 15.2.1. The points represent vectors of multivariate responses. The shape of the point cluster is characteristic of the variables' joint response to underlying influences. Axis Y identifies the most parsimonious pathway through the point cloud. Example 15.3.2 Consider the scattergram in Figure 15.3.1. The joint dispersion of two response variables X1 and X2 is displayed. The points represent the units on which responses were measured. The distance of any two points in full dimensions is the basis for cluster recognition. Given the initial point configuration (A), (B), (C), (D), (E), (F), (G), (H) the clustering process begins by pooling the two nearest neigbour points (D,E). The fusions then continue in passes, involving the next nearest pair, and the next, until all points are made members of the same cluster. The passes: Pass 1 (A), (B), (C), (D,E), (F), (G), (H) Pass 2 (A), (B,C), (D,E), (F), (G), ( H) Pass 3 (A,B,C), (D,E), (F) , (G), (H) Pass 4 (A,B,C), (D,E), (F,G), (H) 210

Statistical Ecology Pass 5 (A,B,C,D,E), (F,G), (H) Pass 6 (A,B,C,D,E), (F,G,H) Pass 7 (A,B,C,D,E,F,G,H)

The fusions are mapped in a classification tree called dendrogram (Figure 15.3.2). The fusion levels signify observed dissimilarities. The horizontal spacing of A to H is arbitrary.

Figure 15.3.1

Figure 15.3.2

Example 15.3.3 In this Figure 15.3.3.1 presents the graphical solution of an identification problem. X1 and X2 are the classificatory variables, and clusters A and B denote potential parental populations. The unidentified unit I is assigned to the population to which it shows highest affinity. An assignment need not be made if the affinities are low.

Figure 15.3.3.1

CONJECTURES concerning the nature of the sampled medium were discussed. These lead us to the recognition of three analytical approaches: ordination, cluster analysis, and identification. These are presented in detail in the following chapters.

211

László Orlóci

Chapter 16 CONTINUOUS COMPLEXITY The family of techniques to be discussed is called “ordination”. The objective with these is the detection of trends in response variation and covariation, and their parsimonious description. The trends of interest are those that have causes other than pure random variation.

16.1 Two transformations The term ordination was originally coined by Goodall (1954) to describe methods which arrange objects, such as vegetation relevés in his case, on axes as points based on the objects resemblances. Ordinations involve two basic transformations applied in sequence: Raw data --- 1 → Resemblance values --- 2 → Ordination co-ordinates (p x n)

(n x n or p x p)

The raw data set has p variables and n sampling units:  2.7577 0.3935 0.5489   0.7788 0.6653 0.2445     0.7104 0.9713 0.5058  Y'   0.1696 1.2943 0.0137   1.9040 1.7275 0.1523     2.8344 0.1898 0.7811   3.4861 0.9996 0.4411    212

(n x t or t x n)

Statistical Ecology

The transformation 1 crates a resemblance matrix of p x p or n x n elements. Resemblance expresses closeness or separation. We described suitable resemblance functions in Chapter 8. The transformation 2 involves operations on the resemblance matrix. The outcome is a new description of the sample by t x n co-ordinates. Ordination methods of particular utility are those in which 2 yields parsimonious descriptions (t  n or p whichever is smallest). By definition, correlated variables are not parsimonious. Figure 16.1.1 illustrates a case in which the shape and the orientation of the point cloud reveal that X1 and X2 are linearly correlated. By the same criteria, axes Y1 and Y2 are uncorrelated. This means that their covariance value is zero. Axis Y1 is oriented in the direction of maximum variation and Y2 in the direction of maximum residual variation (variation unaccounted for by Y1.

Figure 16.1.1

16.2 Component analysis One of the most frequently used (and misused) ordination methods is H. Hotelling's Principal Components Analysis (Hotelling 1933). PCA incorporates linear constraints, including linear correlation of the variables, zero covariance between ordination axes, and linear transformations to obtain the ordination co-ordinates.

16.2.1 First transformation 1 This is applied directly to the p x n raw data matrix X (Section 15.1) to obtain a p x p covariance matrix S. It is important that component analysis extracts information directly from matrix S totally linked with the linear components of total response. Non-linear variation is discarded. How come only the linear components of variation matter? Consider the mean response graphs of variables X1,X2 in Figure 16.2.1.1. 213

László Orlóci

The expected joint scatter is a straight cluster such as in (Figure 16.1.1). This is the type of point cluster which S can effectively describe. We emphasize, to obtain this type of cluster the responses of the variables must be linear (Figure 16.2.1.1). Should the responses be non-linear such as in Figure 16.2.1.2, the cluster would take on a more complex shape. The horseshoe-shaped graph of mean responses in Figure 16.2.1.3 indicates this. When the point cluster is not straight, S is suboptimal, and component analysis is ineffective.

Figure 16.2.1.1

Figure 16.2.1.2

Figure 16.2.1.3

16.2.2 Second transformation 2 This transformation is specified by the equation p

Yij   bhi Ahj . h1

This rotates the axes but leaves the relative placement of the points unaffected. The symbols are defined as follows:

Yij -- component score (co-ordinate) of unit j on PC axis i. Ahj 

Xhj  Xh n 1

-- a transformation of the co-ordinates to the reference sys-

tem’s centroid (centre of gravity). The division by n  1 has the effect of

214

Statistical Ecology

passing from the sums of squares and products to the variance covariance in the transformation .

X h -- mean of variable h. bhi -- the ith component coefficient (direction cosine) of variable h. The p

component coefficients are such that  bhi  1 for any component i. The 2

h1

n

component scores (co-ordinates) are such that

YhjYij  0

for any pair of

j 1

component h and i. Important to note that the above condition does not exclude possible higher order correlation of the components.

16.2.3 Algorithm It can be seen that the immediate computational problem is to obtain numerical values for the component coefficients B. This is a task for the matrix algebraic method of Eigenanalysis: (1) Obtain matrix S and find its Eigenvalues t > 0. Note that Eigenvalues are solutions of the determinantal equation |S - = 0. In this equation I is the identity matrix and  is a row vector of Eigenvalues. The Eigenvalues are generalized variances such that t

n

p

   S

i  Yij2 and

i

i 1

j 1

hh

.

h 1

The Y are component scores yet to be defined. The number of Eigenvalues is t ≤ INF(p,n-1). This number depends on the intrinsic (latent) dimensions of variation in S. The greater the difference p-t, the more successful is the analysis in supplying a parsimonious description of covariance structure in X.

(2) Extract the Eigenvectors of S. For each i there is an associated Eigenvector of p elements. The Eigenvectors are the column vectors in

 b11 b 21 B  . b  p1

b12 b22 . bp 2

...

b1t 

...

. 

... b2t 





... bpt 

The defining equation is the characteristic equation of S given by SB=B 215

László Orlóci

For our purposes the Eigenvectors have to be adjusted so that t

 bhi2  1 . h1

For each of the t components, there is a set of n component scores These are given as the row vectors of the Y matrix:

Y11 Y12 Y Y 21 22 Y  B'A   .  . Y Y  t1 t 2

... Y1n 

... Y2n 



. 

...



... Ytn 

Two main objectives are served by the component scores: variable reduction (p to t) and factor (E) identification. Whereas data reduction helps render complex sets of biological data more manageable, factor identification aids in the interpretation of biological variation. c. Component analysis performs either of these functions well only when the responses are linear. Any non-linear variation will be thrown into the grab bag labelled “random variation”.

16.2.4 Dimensions of the significant trended variation Of the t sets of component scores, only the first few may be associated with significant trended variation. To ascertain that the last k sets capture no distinct trends, we test the null hypothesis of equal expectation for the last k Eigenvalues: Ho E(t-k+1)=...=E(t). If found true, the point cluster in the subspace of the last k PCA axes is deemed hyperspherical. This indicates that in this subspace, the PCA axes are unlikely to to have captured other than chance variation. The test criterion is t

  (n  1)k ln 2



i

i t k 1

 (n  1)

k

t



ln i

i t k 1

(see Morrison 1976, pp.296). In this, n is sample size and 2 is has the theoretical Chi squared distribution with



k (k  1)  2 2 216

Statistical Ecology

degrees of freedom. The test is reliable provided that n is large, Ho is in fact true, the population distribution is multivariate normal, and sampling is truly random. If the last k sets component scores are discarded, the amount of variance thrown away will be proportional to L% = 100 - E%. The total variance retained is t k

E% 

100  i i 1

.

t

 i i 1

th

The efficiency of the i component in retaining variance is proportional to E% 

100i t

.

 i i 1

16.2.5 A complete example We use the starfish data of Example 4.1.2.2.1 to trace the steps numerically which we outlined in symbolic terms. Example 16.2.5.1 The starfish data (Example 4.1.2.2.1) has nominal dimensionality equal to three. The question is this: "How many components are needed to account for the total variation in the data?" These cannot be more than three and are likely to be less than three. The following are steps and results: (a) We start with the covariance matrix of species,

19.6667 -1.5

13.8333

-1.5 -5.16667 -5.16667 5.57143

S = 13.8333 16.0

  

(b) The Eigenvalues of S are obtained by solving |S - I| = 0.The solutions: 1= 32.5574,

2= 7.23559, 3= 1.44505 .

(c) The Eigenvectors of S are found by solving SBi = iBi for each i. There are three equations for each i. For i=1: -12.8907b11 + 13.8333b21 - 1.50000b31 = 0 13.8333b11 - 16.5774b21 - 5.16667b31 = 0 -1.5000b11 - 5.16667b21 - 26.9860b31 = 0 Solving for B'1 and normalizing so that B1' B1  1 , the component coefficients are 217

László Orlóci B1 = (b11 b21 b31) = [0.730256 0.662347 -0.167402]. Similar computations lead to numerical values for two more sets of component scores. The results are summarized in the table: Component i

Eigenvectors

Eigenvalues

i

%

-0.16740

32.6

79

-0.395516

0.749692

7.2

18

0.636889

0.640265

1.4

3

' 1

Β

0.730256

0.662347

Β'2

0.530593

Β'3

-0.430346

(d) The component scores (co-ordinates on the PCA axes) are computed by

Y =B' A . In this, A has elements

Ahj 

X hj  X h n 1

so that, 3.26600   2.04124 0.816497 0.816497 0.816497 0.408249 1.63299  A  1.63299 0.408248 0.408249 0.408249 2.04124 2.44949 1.63299     1.10810 0.524891 0.933139 0.933139 1.51635 0.116642 0.116642 

 2.7577 0.3935 0.5489   0.7788 0.6653 0.2445     0.7104 0.9713 0.5058  Y '   0.1696 1.2943 0.0137   1.9040 1.7275 0.1523     2.8344 0.1898 0.7811   3.4861 0.9996 0.4411    The first column of the Y ' matrix contains the component scores (co-ordinates) of the seven tidal pools on the first component axis. The second and third columns contain the component scores on the second and third component axes. Notice, there is in the summary table single dominant component . This component accounts for about 79% of the total variance (this total is S11 + S22 + S33 = 1 + 2 + 3 = 41.2381). Given that 1 is dominant, should this make it reasonable to assume that the second and third components display only random variation? To answer this question, we test the null hypothesis Ho: E(2) = E(3). We fall back on Anderson's criterion

218

Statistical Ecology

2 = -6[ ln 7.23559 + ln 1.44503] + 6(2) ln which has

8.68064 = 3.53 2

2(2+1) - 1 = 2 degrees of freedom. Observing that 2 2  2   0.05;2  5.99 ,

Ho is accepted. We conclude that only the first component depicts trended variation. The component scores on the first component establish the following ordering of tidal pools: 1,5,2,3,4,6,7. The species most highly correlated with this order has rh1  bh1

1 Shh

.

We have: h 1 2 3

bh1 0.730256 0.662347 -0.167402

1 32.5574

Shh

rh1

19.6667 16.0000 5.5714

0.94 0.94 -0.40

These results identify Heliaster kubiniji and Pisaster ochraceus as the species having high positive correlation with the dominant component. Pisaster brevispinus has a rather low negative correlation. Further interpretations should involve environmental variables.

16.2.6 Presentation of the PCA results The component scores are plotted on perpendicular axes to obtain scattergrams. When three sets of coordinates are available, stereograms can be constructed. The technique is described by Fraser and Kováts (1966). Example 16.2.6.1 Figure 16.2.6.1 presents a two-dimensional scattergram of the tidal pools. Tidal pools 1 and 7 occur farthest apart on axis 1 and tidal pools 4 and 5 are located farthest from one another on the axis 2. Extreme positions may indicate extreme ecological separation as well.

Figure 16.2.6.1.1 A potentially more informative representation of the results of component analysis can be done by drawing stereograms. The scatter of points in the present case (Figure 16.2.5) is rather elongated, flat, and narrow. The construction algorithm is as follows: Three sets of n coordinates (Y1, Y2,Y3) are required. These are adjusted so that the ith set will have values in 219

László Orlóci the 0 to ai range. It is practical to select ai = 3.3 Ri/k where Ri is the observed range of Yi and k = SUP(R1, R2, R3). For convenient viewing, the viewing points are selected with coordinates: Left

Right

Y1L = 1.287 cm

Y1R = 2.112 cm

Y2L = 1.287 cm

Y2R = 1.287 cm

Y3L = 9.9 cm

Y3R = 9.9 cm

The first two axes have zero origin at the bottom left corner of the respective stereograms (maximum width 3.3 cm). The third is the vertical dimension. Point j has stereo co-ordinates

Yijk 

Y3kYij  YikY3 j Y3k  Y3 j

, i= 1,2 and k = L(eft), R(ight).

To clarify: Yijk -- co-ordinate of point (sampling unit) j on axis i of stereogram k; Y3k -- third axis co-ordinate of viewing point k; Yij -- co-ordinate of point j on PCA axis i; Yik -- viewing point co-ordinate on stereo axis i, stereogram k; Y3 j -- co-ordinate of point j on PCA axis 3. The points should be precision plotted to avoid blurring of the stereo image. The stereogram should not be arbitrarily enlarged or reduced, for the image becomes blurred. If blurred, try to readjust centre-to-centre distance to 5.5 cm by enlargement.

Figure 16.2.6.1.2

16.3 MDSCAL: a flexible method Our version of program MDSCAL incorporates the Brambilla-Fewster-Kenkel species of the Kruskal algorithm (Kruskal, 1964). Principal Components Analysis extracts new coordinate axes by rigid rotation of the original reference axes. But coordinate sets can be extracted based on a completely 220

Statistical Ecology

different logic. For example, in MDSCAL the algorithm is trial-and-error based, implemented in many iterations. The final objective is to scrutinize the response structure Y with the purpose to reveal the identity of possible factor effects X. The analysis begins with computation of an n x n matrix of distances based on the observed response data Y,

11 12   22 21 δ .  .   n1  n 2

... 1n 

...  2 n 



. 

...



...  nn 

The elements represent the chord distance  jk 

2(1  q jk ) which we

discussed earlier (Section 8.2). The matrix  is the 'observed' distance configuration. Yet another n x n distance matrix is defined

 d11 d 21 D  . d  n1

d12 d22 . dn2

... d1 n  ... d2n 



. 

...



... dnn 

We define the elements of D such as d jk 

t

( X

ij

 Xik )2 . The basic prob-

i 1

lem of the ordination amounts to finding t sets of n coordinates (the X matrix) that makes D a best fitting distance configuration for δ . Initially in our version f MDSCAL the X matrix contains random numbers. The use of coordinates from another ordination as initial input is optional, but not advised (Fewster and Orlóci, 1983). The value of t is the dimensionality of the final ordination configuration. In the course of the computations X is modified through cycles of iterations which make the elements less and less random in such a way that the order of the elements in D comes closer and closer to the order of the corresponding elements in δ . The degree to which this objective is achieved is measured in terms of a stress coefficient (D;) as used by Kruskal (1964). If the stress value is too large, the iterations continue with changing the elements of X. A new D is computed and checked for stress. At one point, 221

László Orlóci

when stress becomes acceptably small or when no further improvement can be achieved, X is considered final. The values in this final X hold the ordination coordinates of the n sampling units on t ordination axes. This program performs the calculations automatically based on input of chord distances from a disk file created by program METRICS. The X scores may be subjected to a correlation analysis to identify their environmental significance. Example 16.3.1 The response data of Example 15.1.1 is reanalysed. It is postulated that the four species respond in a non-linear manner to influences along a dominant environmental gradient X. The following results were obtained by program MDSCAL with t set equal to 3 (the dimensionality of the gradient hypothesized). Starting with a random seed of

0.58376, after 30 iterations the results are: AXIS 1 -0.1554 -0.1415 -0.0884 -0.1335 -0.1247 -0.070 -0.0835 -0.0690 -0.0050 -0.0260 -0.0479 -0.0238 0.0493 0.0827 0.1051 0.1532 0.1253 0.1439 0.1656 0.1436 AXIS 2 -0.3153 -0.3014 -0.2788 -0.1945 -0.1898 -0.1717 -0.0925 -0.0720 -0.0387 0.0268 0.0672 0.0821 0.0991 0.1268 0.1708 0.1820 0.2245 0.2214 0.2195 0.2345 AXIS 3 -0.1171 -0.0635 -0.0786 -0.0858 -0.0296 -0.0528 -0.0534 -0.0014 -0.0033 -0.0375 -0.0051 0.0417 0.0047 0.0721 0.0624 0.0796 0.0620 0.0548 0.0608 0.0994

The stress value of this final configuration is (D;) = 0.01433858 or 1.43%. The drop from the 29 stress value is is 0.011%. These values indicates a high degree of monotonicity between the two distance matrices D and . The joint scatter of the 3 ordination axes is on display in Figure 16.3.4. The elevation gradient is reproduced in the three-dimensional space. For comparison, the data set is re-graphed on PCA axes in Figure 16.3.5.

Figure 16.3.4

Figure 16.3.5

Since the responses are quasi Gaussian (in the sense as R. Whittaker used “Gaussian”, no Normality in a statistical sense involved) as in the original data, the ordination configuration based on the components has a strong 'horseshoe' shape. As expected under nonlinear responses, the closeness or separation of the plots on the components is not in accordance with their positions on the gradient. For example, plots 1 and 20, which represent the gradient extremes, are positioned close together on the PCA axes. Furthermore, neither of the PCA axes orders the plots in accordance with their positions on the elevation gradient. MDSCAL does all these quite well – at least in the illustrated case. 222

Statistical Ecology

WE DEFINED THE TASK of ordination as something more than just the ordering of sampling units as points on axes. The axes are in fact strategically positioned according to well defined criteria. The two methods considered, principal component analysis and MDSCAL, have different optimality criteria. PCA creates a linear reference space in which non-linear structures appear as nonlinear configuration. MDSCAL successfully unfolded the nonlinearity.

223

László Orlóci

Chapter 17 EXPLORING GROUP STRUCTURE The previous chapter presented a view of the medium as one of continuity or at least as one in which pathways of variation exist that span the entire sample space. Now we conceive a highly structured medium and elect cluster analysis to probe the data set for the isolated groups. The approach is either agglomerative or subdivisive. After the groups are defined, followup analyses are performed to establish connections to external structures and functionalities. If found, they enable the predictive use of the groups.

17.1 Single link clustering This method has been introduced independently by Florek, Lukaszewicz, Perkal, Steinhaus and Zubrzycki (1951) and Sneath (1957). The clustering is agglomerative, it builds larger groups by fusing individual units and smaller groups. The fusion criterion is nearest neighbour similarity or distance. In the example of Figure 17.1.1, clusters j and k have nearest neighbour units h and i. The decision to fuse the two groups will depend on the distance dhi. Example 17.1.1 Single link clustering is applied to the tidal pool data in Example 4.1.2.2.1. The decision criterion is the tidal pools nearest neighbour Euclidean distance. The following is the upper half of matrix D. The rows begins with self comparisons: 0.000

5.831

6.557

7.681

4.243

13.784

15.556

0.000

1.000

2.236

6.481

9.274

11.225

0.000

2.000

7.280

9.434

11.358

224

Statistical Ecology 0.000

8.544

8.062

10.630

0.000

12.728

13.342

0.000

4.472 0.000

The distance of tidal pools 2 and 3 is the smallest; they are the first to fused into a group (2,3). The fusion level is d23 = 1.00. The next fusion is between group (2,3) and tidal pool 4 at level 2.00. The next smallest distance 2.236 is between group 4 and 2, but these have already been fused. Pools 5 and 1 are fused next (5,1) at level 4.243. The fusions continue in a like manner until all pools unite in a single group. The fusion pattern is mapped in a dendrogram (Figure 17.1.1.1).

Figure 17.1.1

Figure 17.1.1.1

17.2 Centroid clustering This method was first proposed by Sokal and Michener (1958). The clustering criterion is the centroid distance of the groups. We write this as

dhi 

p

( X

mh

 Xmi )2

m1

Symbol X mh represents the mean of variable m in group h. Xmi has similar definition in group i. Symbol p stands for the number of variables. To clarify some aspects of the technique, we consider the set up in Figure 17.2.1. In this, a sample of n units is described by two variables X1 and X2. The analysis has reached a decision point. Should cluster h be fused with cluster i or with cluster j. Cluster size, shape and orientation are characteristic for the responses, but ignored since they have no influence on the groups' centroid distance.

225

László Orlóci

Figure 17.2.1

Three groups are recognized j,h,i of nj, nh, ni units and mean vectors X h ,

Xi , X j . The groups are separated by centroid distances djh, dji, dhi. The centroid distance between group j and the fusion group k = (h+i) is obtained directly from the squares of previously measured Euclidean distances, based on the Lance and Williams (1966) equation:

d 2jk  hd 2jh  i d 2ji   dhi2 in which the coefficients are

h 

nh nh  ni

i 

,

ni nh  ni



,

nhni (nh  ni )2

.

2

Fusion is indicated when d jk is lowest compared to all relevant values. When similarity values are given in the 0 to 1 range, rather than distances directly, Gower's (1967) equation is applicable:

S jk  h S jh  i S ji   (1  Shi ) . This defines the similarity of groups j and k. Fusion is indicated when Sjk is highest compared to all relevant values.

After each fusion, new centroid distances (or similarities) are computed. To further illustrate the mechanics of the computations, suppose that a fusion, involving groups g and u = (j+k) is contemplated. In this case, the decision is based upon the squared centroid distance 2 2 dgu   j dgj2  k dgk   d 2jk

with coefficients

h 

nj n j  nh  ni

,

i 

nh  ni n j  nh  ni

226

,



n j (nh  ni ) (n j  nh  ni )2

.

Statistical Ecology Example 17.2.1 The documentation of an environmental impact study included comparative sensitivity ratings for six ecogroups in a large number of sites. The following is the distribution of the sensitivity scores among the ecogroups: Sensitivity

Percentage of sites in ecogroup

class

1

2

3

4

5

6

Low

0

50

82

59

13

0

Medium

100

29

13

16

22

0

High

0

21

5

25

65

100

The relative placement of the ecogroups is on display in the stereograms (Figure 17.2.1.1).

Figure 17.2.1.1 The following algorithm performs centroid clustering on the ecogroups: (a) Compute cosines of subtending angles (cosAB in Section 8.2) for pairs of column vectors in the data to obtain the upper half of similarity matrix S. We have for this, 1 1.000

2

3

4

5

6

0.472

0.156

0.242

0.315

0.000

1

1.000

0.896

0.970

0.618

0.342

2

1.000

0.941

0.289

0.060

3

1.000

0.595

0.379

4

1.000

0.931

5

1.000

6

The unities in the diagonal cells correspond to self-comparisons. The maximum off-diagonal value is S24 = 0.970; fusion between ecogroups 2 and 4 is indicated. (b) Compute new similarities S(1) based on the Gower formula, 1 1.000

3

5

6

0.3645

(2+4)

0.1563

0.3150

0.0000

1

1.000

0.9261

0.6138

0.3676

(2+4)

1.000

0.2887

0.0601

3

1.000

0.9307

5

1.000

6

This matrix differs from S only in the columns (rows) 2 and 4 which are replaced by a single second column (row) 2+4. A typical similarity values in the new second column (row) is 227

László Orlóci (1) S1(2  4) 

S12 2



S14 2



1  S24 4



0.4716 0.2423 0.0302    0.365 . 2 2 2

The highest similarity is 0.9307. This indicates the fusion 5+6. (c) Compute new similarity matrices S(2), S(3), S(4) and implement new fusions to complete the analysis. The last similarity matrix S(4) has elements 1

(2,3,4,5,6)

1.0000

0.3963

1

1.0000

(2,3,4,5,6)

The final fusion unites ecogroup 1 with the group containing the other ecogroups at a centroid similarity 0.3963. The complete fusion directory is: Step

Sjk

d 2jk =1 - Sjk

1

0.9698

0.0302

2,4

2

0.9307

0.0693

5,6

3

0.9261

0.0739

2,4,3

4

0.4190

0.5810

2,4,3,5,6

5

0.3963

0.6037

1,2,4,3,5,6

Ecogroups in fusion

The dendrogram is displayed in Figure 17.2.1.3.

Figure 17.2.1.2 We note that the structure defined by S differs from the structure in the raw data. This is because of the similarity function used which incorporates normalization of the column vectors. Based on S, the ecogroups whose vectors lie near one another in angular terms (e.g., ecogroups 2,4,3) are regarded as highly similar, notwithstanding how far apart the points representing them may be located in sample space (Figure 17.2.1.2).

17.3 Sum of squares clustering The clustering criterion is Qjk 

n j nk n j nk

d 2jk . The d 2jk is the squared centroid

distance of groups j and k. Symbols nj and nk represent group sizes. The algorithm is either subdivisive (Edwards and Cavalli-Sforza 1965) or ag-

228

Statistical Ecology

glomerative (Ward 1963, Orlóci 1967). The subdivisive algorithm is extremely tedious. To find the best subdivision in a group of size n, 2n-1 - 1 complex operations are required. The agglomerative algorithm is more practical. When the method is agglomerative, fusions are selected to minimize Qjk = Q(j+k) - Qj – Qk, the 'between groups' component of the 'total' sum of squares. When the method is subdivisive, Qjk is maximized. Each term of the sum of squares can be derived from pair-wise centroid distances weighted by group size: Q( j k ) 

1 n j nk

nj nk 1 n j nk

  h1

dhi2 |( j k ) ; Qj 

i h1

1 nj

nj 1

nj



dhi2 |j ; Qk 

h1 i h1

1 nk

nk 1 nk

  dhi2|k h1 i h1

The symbol hi|(j+k) signifies the membership of units h,i point in the fusion group of j and k. Similarly, symbols hi|j and hi|k indicate membership of units hi in group j or k. Since a total sum of squares is partitioned, the analogy to the analysis of variance (Section 11.3) comes naturally (Edwards and Cavalli-Sforza, 1965), but not quite appropriately (see Example 17.3.1). Example 17.3.1 We reanalyse the starfish data of Example 4.1.2.2.1. The upper half of matrix D of Example 8.2.1 is used: 1

2

6

7

0

5.831

6.557

3

7.681

4

4.243

5

13.784

15.556

1

0

1.000

2.236

6.481

9.274

11.225

2

0

2.000

7.280

9.434

11.358

3

0

8.544

8.062

10.630

4

12.728

13.342

5

4.472

6

0

7

0

0

The smallest off-diagonal value is 1.000 in cell 2,3. Fusion of tidal pools 2 and 3 is indicated at level Q(23) 

1.000 2

 0.5 . This is the same value as Q23, since two elementary groups are

compared. The reduced matrix Q(1) is computed next: 1

(2+3)

0

26.00 29.50 0

4

5

6

9.00

95.00

121.00

3.33

32.00

58.67

85.33

(2+3)

0

36.50

32.50

56.50

4

81.00

89.00

5

10.00

6

0

0

7

0

The calculation of a typical element is 229

1

7

László Orlóci Q4(23) 

2 2 2 d23  d24  d34

3

1  2.2356  2 2



2

 3.333 .

3

This is the total sum of squares within group (4+2+3). From the within sums of squares a new quantity the between sum of squares is derived, Q4(2+3) = Q4+(2+3) - Q4 - Q2+3 = 3.333 0 - 0.5 = 2.83, which is the actual increase in the total sum of squares when the fusion of unit 4 and group (2+3) is implemented. Q4 (2+3) can also be computed based on the LanceWilliams formula: 2

2 d4(2 3) 

2.236 2

2

2



2



1 4

 4.2501 and Q4(23) 

2(4.2501) 3

 2.83 .

This being the smallest of all the new Qjk values, the next fusion is 4 + (2+3) at fusion level Q4+(2+3) = 3.33. After implementing this fusion, the computations continue with new Qs and new fusions. The following is the complete fusion directory: Step

Qj+k

Qjk

Tidal pools in group

1

0.5

0.5

2,3

2

3.3

2.8

2,3,4

3

9.0

9.0

1,5

4

10.0

10.0

6,7

5

66.4

54.1

1,5,2,3,4

6

247.4

171.0

1,5,2,3,4,6,7

The dendrogram image of the fusions and fusion levels (Qjk) is contained in Figure 17.3.1.1.

Figure 17.3.1.1 Assuming that three groups are sufficient for the purposes in hands, the groups will have contents A = (7,6), B = (4,3,2) and C = (5,1). Partitioning of the sum of squares permits measurement of the sharpness of the groups: Source

Sum of squares

Q

%

Between A,B,C

QA+B+C - QA- QB - QC

225.1

91

Within A,B,C

QA+ Q B + Q C

22.3

9

247.4

100

Total

The relatively high 'between' component indicates the groups’ sharp. A significance test in the manner of an ANOVA may be tempting. However, it would be quite inappropriate, since 230

Statistical Ecology the groups were defined to be maximally distinct with regard to the sums of squares. The test would rely on maxima, rather than averages, but for maxima the sampling distribution of F is not defined.

17.4 Association analysis In association analysis, introduced by Williams and Lance (1958; Williams and Lambert 1959), groups are the results of dichotomous subdivisions. The allocation of an individual unit to one or the other of the groups at a given dichotomy depends on the individuals possession or lack of a given character state. This property enables the method to create identification keys. Association analysis uses binary data. Consider the case of a survey in which n quadrats are located on the ground and inspected for the presence (1) and absence (0) of p plant species. The quadrat set is divided into two groups a and  with and without species A. The following table describes the joint distribution of p species according to a and : Species

Frequency

i≠A 1 2 . p Total

fia f1a f2a . fpa Fa

Total

fi f1 f2 . fp  F

Fi F1 F2 . Fp F..

The following definitions apply: fia - number of quadrats jointly possessing species i and A; fi - number of quadrats with species i present and species A absent; Fi - number of quadrats with species i present; Fa - number of quadrats with species A occurring with any of the other species; F - number of quadrats occupied by species other than A; F - total number of quadrats inspected. Pielou (1969) suggested the Chi squared criterion p  ( f  f o )2 ( f  f o )2   A2    ia o ia  i o i  to measure the success of the a, divifia fi iA  

sion. The expectations:

fiao 

fa fi f

and fio 

231

f fi f

.

László Orlóci

Association analysis begins with the computation of a  i2 value for each species. The division of the sample into groups a and on species A is accepted if  A2 is the largest Chi square value. Subdivisions continue in the same vein until the value of Chi squared drops below a specified level, or the group size becomes too small for meaningful manipulations. The end product is a dichotomous key (Figure 17.4.1).

Figure 17.4.1 Example 17.4.1 An investigator recorded the presence of 5 species of nitrogen-fixing bacteria in 30 species of the Fabaceae family: 1 ++-++-+++- ++++- - -++- -+-+-+ ++++ 2 -+-++-- - -- ++++++- -+- -+-+-+ - - - + 3 +++-- ++- -- + -- +- - +-++ +-+-+- ++- + 4 - ++-- +- +-+ +++- +- ++-+ +-+-++- - ++ 5 - - -++- + --+ +++- - +- +-+ -++- -- +++The legumes are to be grouped according to Pielou’s Chi squared criteria. The first contingency table is for bacterium species 1: Bacterium 1 Other bacteria

2 3 4 5

Total

+

-

12 9 9 11 41

2 7 9 4 22

Total 14 16 18 15 63

The expectations are

 9.1111 10.4127 Fo   11.7143   9.7619

4.8889 

5.5873 

.

6.2857 



5.2381 

These are based on marginal totals, such as for instance, f11  o

41(14) 63

criterion is

12 

(12  9.1111)

2

9.1111

(4  5.2381)

2

 ... 

232

5.2381

.

= 9.1111. Pielou's

Statistical Ecology Using the same type of computations, the following Chi squared values are obtained for the five species: Symbiont bacterium A

1

2

3

4

5

 A2

5.4227

3.3296

1.4496

1.1306

1.1909

Two groups are formed based on the presence (+) or absence (-) of bacterium species 1: Legumes belonging to group + 1, 2, 4, 5, 7, 8, 9, 11, 12, 13, 14, 18, 19, 22, 24, 26, 27, 28, 29, 30 Bacterium 1 - 3, 6, 10, 15, 16, 17, 20, 21, 23, 25

These groups are subdivided further in the next steps. Finally, four groups are found: Legume group

Distinguishing bacterium species

I

Presence of bacteria 1 and 4

II

Presence of bacterium 1, absence of 4

III

Absence of bacterium 1, presence of 3

IV

Absence of bacteria 1 and 3

Further subdivision of the legume groups would not be justified, since the Chi squared value is already too small and group sizes are also small. The classification tree is given in Figures 17.4.1.1. The numbers at the nodes identify the bacterium species on which the group is difided, + on left and – on right.

Figure 17.4.1.1

17.5 Analysis of structured tables A structured table has rows and columns arranged into groups (Figure 17.5.1). The analysis of such a table is conveniently based on canonical contingency analysis following the theory developed by Lancaster (1949). Examples are given by Williams (1952), Hill (1974), Feoli and Orlóci (1979, 1985), and Orlóci (1981).

233

László Orlóci

Figure 17.5.1

17.5.1 The data table Assume that separate cluster analyses (formal or informal) are performed on the rows and columns of a p x n data table. The analyses produce t groups for the n column units and q groups for the p row units. The groups do not overlap. The result is a data table structured by q x t blocks (Figure 17.5.1). We assume that the original p x n elements in the table register occupancy 0,1 or represents density counts of individuals. We condense the data by summation into a q x t table of block totals: Column

1

2

...

t

Total

f11 f 21 .

f12 f 22 .

...

f1. f 2. .

f q1

fq2

... ...

f1t f 2t . f qt

f q.

f.1

f.2

...

f.t

f ..

groups 1 Row

2

groups

. q

Total

...

The symbols have the following definitions: fij - sum of all values in the ijth block of the original table; fi. - sum of all values in the ith row group; f.j - sum of values in the jth column group; f.. - grand sum for the table. Since in the q x t table the entries concentrate the original values, the method of analysis is called concentration analysis. The blocks of the structured table are likely to be different in size. To adjust for the differences, the entries are transformed according to 234

Statistical Ecology

f.. fij nij

fij : q t fkm   k 1 m1 nkm

The operation is an assignment. The “:=” symbol is the assignment operator. It reads “the new value fij becomes …”. So from this point on fij identifies a new value. Block size nij is equal to the number of cells (row units ni. x column units n.j) in the ij block. This adjustment leaves the grand total unchanged. Subsequent computations are performed on the adjusted values F. The following are tested:

17.5.2 Compositional sharpness of blocks A general null hypothesis is stated as Ho: [E(Fij) =

f.. qt

for all ij. This reads “the observed concentrations do not deviate significantly from grand mean.” The general Ho has three sub-hypotheses: Ho1: E(F1.) = E(F2.)=...= E(Fq.)=

fi .

Ho2: E(F.1) = E(F.2)=...= E(F.t)=

q

Ho3: [E(Fij) =

f. j t

fi . f. j f..

for all ij]. Rejection of Ho implies significant, trended variation among the q row groups (Ho1), the t column groups (Ho2), or the q x t blocks (Ho3). The test criteria are Rényi type (order one) information divergences: Hypothesis Degrees of freedom Test criterion Description Ho1 Main effect (rows)

q-1

Ho2 Main effect (columns)

t-1

q

2I1  2 fi . ln i 1

f..

t

f. j t

j 1

f..

2I2  2 f. j ln

235

fi .q

László Orlóci Ho3 Interaction

(q-1)(t-1)

Ho Joint effect

qt-1

q

t

I12  2 fij ln i 1 j 1 q

fij f.. fi . f. j

t

2I12  2 fij ln i 1 j 1

fij qt f..

The 2I values have asymptotically a Chi squared probability distribution at the given degrees of freedom, given that the null hypothesis is true, sam2 pling is random, and F.. is large. Ho is rejected if 2I   ; .

Example 17.5.2.1 Consider species groups 2, 3, 4, and quadrat groups a, b, c as given in Table 1 of Feoli and Orlóci (1979). The data in this table were recorded in the course of a vegetation survey on the lower fossil terraces of a river. In the following example the original quadrat-group labels are retained. The species-group labels are changed to A, B, and C. The individual records are estimates of species cover/abundance. Only presence/absence will be used in the following calculations. The row and column groups are produced by sum of squares clustering. Block sizes are given by the elements in

 

N=

10x14 10x20 10x11  140 200 110 9x14 9x20 9x11 =  126 180 99 7x14 7x20 7x11   98 140 77

and the unadjusted occupancy counts in Quadrat group

a

b

c

Total

Species

A

2

15

71

group

B

74

21

2

97

C

16

112

35

163

92

148

108

348

Total

88

Adjustments are applied (see the preceding text) to obtain the basic data F. A typical ele2 140 ment in matrix F is F11 = 2.8767 348.0 = 1.728 . Similar computations yield the entire F matrix:

 1.728

F = 71.046

9.073

78.081 

2.444  .   19.750 96.777 54.987  14.113

The row totals are

 88.882   87.604    171.514  and the column totals [92.525 119.963 135.512]. The grand total is 348.000 . The homogeneity of the species groups is tested next. Given 236

Statistical Ecology =  2  2 88.882ln



88.882 348.000

 67.604ln

67.604 348.000

 171.514ln

171.514 

348.000 

 37.7 .

Considering that   [ 0.005;2  10.6] we reject Ho1 and declare a highly significant heter2

2

ogeneity among the species groups. The analysis continues with testing the heterogeneity of the quadrat groups. In this case, we have 2 2  2 =8.3642 and   [ 0.005;2  10.6] .

This leads to the rejection of Ho2. The independence of the row and column classifications is tested next. In this case, we have the test criterion 2 = 261.2 with 4 degrees of freedom. Then, [2 = 261.2] > [  0.005;4 =] 2

indicates rejection of Ho3. These results establishes the existence of trended compositional variation among the blocks is established. The analysis is continued to reveal the actual dimensionality and possible ecological significance of the dimensions.

17.5.3 Compositional gradients The method assumes that compositional variation among the groups is a composite of component responses to m ≤ INF(q-1, t- 1) underlying factors. The analysis assigns to each component a portion of the interaction Chi squared q

t

 2  2I12  2 fij ln i 1 j 1

fij f.. fi . f. j

in a simple additive way

 2 = 12  22  ...  m2 = f..1  f..2  ...  f..m . We identify the s as Eigenvalues and

i as the ith canonical correlation

symbolically Ri2 , relating Xi to Yi . We give the definition in the sequel. The term 'canonical' is rightly used when characterizing the analysis, since each  is associated with a pair of variables X,Y. One member of the ith pair is Xi = [X i 1 X i 2 ... Xiq] specific to the row groups, and the other member is 237

László Orlóci

Yi = [Yi1 Yi2 ... Yit], specific to the column groups. The elements are called canonical scores. As co-ordinates X and Y re-describe the groups within the new reference system. There are m X,Y pairs. Transfer of values between X and Y is accomplished based on t

fhjYij

Xih  

i

j 1 fh.

or in the reverse, q

Yij  

fhj Xih

h1 f. j

.

i

The scores Xih and Yij can be used as ordination coordinates to construct scatter diagrams. But how do we derive these scores? The computational problem is no more than an Eigenanalysis of matrix S= UU' t

(for species) with typical element Srz   UrjUzj being the centred cross j 1

product of column r and z consistent with

Urj 

frj fr . f. j



fr . f. j f..

and Uzj 

fzj fz . f. j



fz . f. j f..

.

Matrix S has m non-zero Eigenvalues 1, 2 , ..., m . The adjusted Eigenvectors hold the canonical coefficients . The X scores are obtained by Xih  ih

f.. fh.

.

The matrix must satisfy the double constraint:

238

Statistical Ecology q



fh. ih  1 and 2

q

 fh.ih  0 . h1

h1

Having determined matrix X, we find matrix Y as already shown. Should t < q, it will save computational time if we transpose matrix F before computing the matrix S. But then in the interpretation of results the designation of the row and column vectors is interchanged.

17.5.4 Dimensionality We assume that the canonical variables represent independent responses to some forcing factors. The intensity of the response by the ith pair of canonical variables is measurable by 2

Li 

f..Ri



2

i

2





2

as a proportion or by Li% = 100Li as a percent. Considering that the nominal number of canonical pairs is m, the descending values L1 > L2 > ... > Lm invite the same question as the descending Eigenvalues did in PCA (Section 16.2): how many of the m dimensions should be retained for further consideration? To find an answer, we require a probability distribution for the residual Chi squared that pooles the k smallest terms in f..1  f..2  ...  fmk 1  ...  f..m .

Since no reliable method appears available for this (Kendall and Stuart 1968, pp. 595; also Gittins 1979), a Monte Carlo simulation experiment remains the viable alternative to derive a probability distribution.

17.5.5 Identification of underlying factors The canonical contingency table analysis finds two matrices of scores on m canonical axes, X for the column groups, and Y for the row groups:

 X11 X 21 X= .  X  m1

X12 X 22 . X m2

...

X1 q 

 Y11 Y12 Y Y22 Y =  21 .  .  Ym1 Ym2

... X2 q   ... .   ... X mq  239

... Y1t  ... Y2t   ... .   ... Ymt 

László Orlóci

In both, the rows are linearly uncorrelated. Higher order correlation may of course remain that the covariance cannot detect. The following example discusses a real case of how to identify underlying factors of variation. Example 17.5.5.1 The tests in Example 17.5.2.1 indicated significant variation among the 9 table blocks. Further analysis to determine the dimensionality of this variation is therefore justified. The product moment matrix needed is S = UU'. The upper half of this matrix consists of 0.258850

-0.210473

-0.035920

0.390458

-0.127538 0.117006

A typical element in this is

f12

U12 

f1. f.2



f1. f.2 f..

9.073



87.604(119.963)



87.604(119.963)

348.0

 0.2105 .

S has two canonical variables: Variable i

Ri

 i  f..Ri

1

0.7490

195.2

73

3

2

0.4531

71.5

27

1

100

4

2

2

Li %

2 = 266.7

Total

DF*

*(q-1) + (t-1) - (2i-1)

The size of the total Chi squared is dependent on strength of trended compositional variation among the blocks. This Chi squared is partitioned between the two canonical variables. The first accounts for roughly 3/4 of the total. The second is rather small. The following are the canonical scores Y specific to the three species groups: Canonical variable Y1 Y2

Species group A B -1.08 1.63

C -0.27

-1.32

0.98

-0.57

and the X scores specific to the three quadrat groups: Canonical variable X1 X2

a 1.56 -0.56

Quadrat group b c -0.15 -0.94 1.37 -0.83

These scores were examined for correlations with ecological variables: Environmental

Quadrat group 240

Statistical Ecology variable

a

b

c

Terrace

Low

Medium

High

Moisture regime

Wet

Moist

Dry

Disturbance

No logging

Selective logging

No logging

The following conclusions are drawn: (a) The dominant component of the compositional variation is probably a manifestation of response to an elevation/soil moisture gradient from low/wet terrace to high/dry terrace. The lesser component appears to be related to disturbance. (b) The dominant pair of canonical variables (X1,Y1) has high correlation (0.749), suggesting that species groups are highly diagnostic of the conditions in the quadrat groups. (c) The joint scatter X1,X2 and Y1,Y2 are shown in Figure 17.5.5.1. The following species/quadrat groups have greatest affinity: A to c, B to a, C to b.

Figure 17.5.5.1

17.5.6 Partitioning the deviations The deviation to be partitioned

Δ = F - Fo with a characteristic element

hj  fhj  fhjo . The expectation fhjo 

fh. f. j f..

is specific to Ho3. There are q x t such deviations. As concentration analysis progresses, the total Chi squared as well as the deviations are partitioned into components:  hj  1 hj   2hj  ...   mhj .

The elements on the right-hand side are computed by

241

László Orlóci

 ihj 

Yih Xij Ri fh. f. j f..

for any i. There is a set of q x t values ihj for any i which we termed the ith lattice of deviations. Lattices are independent contingency tables 1, 2, ..., m , each having q rows and t columns. The following properties are noted: (1) The row and column entities in i have the same identity as the row and column groups in the original table F. (2) The interaction Chi squared for i is i2  f..Ri2 . The ratio

 i2

is a relative 2 measure of the importance of the ith lattice in accounting for linear variation in the original table F. (3) Should Xi impose order on the m column entities which is meaningful in terms of time T or some defined environmental variable E, the graph ih1 ih2 ... ihq  will be characterised in terms its correlation of T or E.

A significant positive ihj indicates performance of the hth row entity on the ith canonical axis in the jth column entity in excess of random expectation (zero line of graph). A significant negative ihj indicates under-performance. Since the lattices are ordered by their share of the total Chi squared, progresses from low to high Chi squared, associates with decreasing random variation and an accentuated trend. Example 17.5.6.1 We continue the analysis we started in Example 17.5.2.1 by extracting independent lattices of deviations. Having two non-zero canonical correlations, there will be two lattices. The first is based on X1,Y1:

 29.883 3.606 26.277  1 =  44.352 5.352 39.000    14.469 1.746 12.723  The corresponding interaction Chi squared (2I) component is 195.221 or about 73% of the total Chi squared. The second lattice is based on X2,Y2:

 7.980 25.173 17.193  2 =  3.402 10.734 7.331     11.382 35.906 24.524 

242

Statistical Ecology The interaction Chi squared components is 71.456 or about 27% of the total. The sum of the two lattices is:

 21.904

 = 1 +2 =  47.755

  25.851

21.567

43.470 

16.086

31.669 

37.652

11.801 



Each row of each of the  matrices defines a deviations profile. The three graphs in the first column of Figure 17.5.6.1 contains the deviation graphs constructed from the rows of . The zero line in the graph indicates random expectation. Deviations are measured from this up (positive) or down (negative). The graphs in the second column correspond to the rows of the  matrix. The third column corresponds to the rows of the  matrix. The graphs are additive in both directions. The sum of the second and third graphs in any row of Figure 17.5.6.1 is equal to the first graph in the same row.

Figure 17.5.6.1 Clearly the interpretation of compositional variation rely on the magnitude of deviations and the graphs shape. The deviation profiles in the second column (lattice ) indicate increasing occupancy for species groups A and C, and decreasing occupancy for species group B with increasing height of the fossil terrace. The deviation profiles in the third column lattice ) are similar for species groups A and B, reaching a low point on level b. These mirror the profile of species group C, a trend suggesting to the ecologist the effect of significant competition.

METHODS WERE PRESENTED by which a set of objects can be subdivided into groups. The clustering criteria included single linkage distance, centroid distance, sum of squares, and Chi squared. These were applied either in an agglomerative or subdivisive mode. Concentration analysis drew on cluster analysis to create a structured table which then we scrutinized by Lancas-

243

László Orlóci

ter’s canonical contingency table method. Other work to be consulted include Cormack (1971), Anderberg (1973), Sneath and Sokal (1973), Pielou (1977, 1984), Orlóci (1978, 1981), Legendre and Legendre (1983), Gittins (1985), and their references.

244

Statistical Ecology

Chapter 18 EXPLORATION OF AFFINITIES: IDENTIFICATION Having considered the recognition of groups, we now turn to the methods of identification. The identification problem amounts to the numerical assignment to a parental group of best fit the individuals to be identified. Approaches and specific methods are presented.

18.1 Approaches The conditions under which identification is contemplated determine the approach: (1) The set of variables, affinity measures, and decision rules are the same as those used in the delineation of the parental classes. (2) Parental classes are recognized on the basis of one set of variables, and later re-described on the basis of a second set of variables. Identification is based on the second set of variables and most likely on new decision criteria. The approach under option (1) is restricted by the choices made in the course of cluster analysis. The approach under option (2) offers considerably more flexibility. 245

László Orlóci Example 18.1.1 Consider the starfish data which we subjected to cluster analysis in Example 17.3.1. We recognized three groups of tidal pools {2,3,4}, {1,5}, {6,7} in Figure 17.3.1. Suppose that another tidal pool is examined and the following species counts are obtained: X8  [27 30 20] . The question is which of the three established classes would serve best as a parental class for tidal pool X 8 ? The cluster analysis which produced the three reference classes used a Euclidean distance. If we considered option (a) above, the same distance measure should be used to determine the nearest neighbour class of X 8 . The distances are (3.0 6.40 7.35 7.87 4.36 12.37 14.32). The smallest, d1 = 3.0, identifies Pool 1 as X 8 ’s nearest neighbour and Class 2 as the most affine parental group in nearest neighbour terms. Example 18.1.2 Suppose the tidal pools (Examples 4.1.2.2.1, 18.1.1) were subsequently described based on water pH and salinity: Tidal pool

1

5

2

3

5

6

7

pH

8.0

8.0

7.8

7.8

7.8

8.2

8.2

Salinity (g/kg)

31

30

30

29

32

35

36

8 8.0 32

Approach (b) is applicable, and one of the methods outlined below can be applied to complete the identification.

18.2 Generalized distance There are k reference (parental) classes to be considered. These are described by their mean vectors X1 , X2 ,..., X2 and sample covariance matrices S1,S2, ...,Sk. The external unit X is examined for possible assignment to one of the k classes. One method of assignment uses the Mahalanobis (1936; Rao 1952) generalized distance: 1

d(X; Xm )  (X  Xm )' S (X  Xm ) .

Deterministic or probabilistic assignment rules may be applied: (1) In a deterministic assignment, X is assigned to class m for which its generalized distance is minimal, irrespective of its probability of being a member of that class. (2) A probabilistic assignment would require testing and accepting the null hypothesis Ho: E( Xm )  X and H1: E( X m )  X . The identification problem is thus reduced to the problem of comparing the class mean vector X m to a second vector X (as in Section 10.4). The

246

Statistical Ecology

assignment is accepted if the condition Fm < F;p,nm – p is satisfied. Criterion F is the variance ratio Fm 

(nm  p)nm p(nm  1)

d 2 (X ; X m ) .

This can be referred to the probability points of the F-distribution under the usual regularity conditions (Section 10.4) with p and nm - p degrees of freedom. If the k class sizes are proportional to 1, 2, ..., k, and if the proportion of units in the jth class at least as extreme as X is  X|j by their centroid distances then after assignment of X to class m, we would likely misclassified X with probability 1

 m X|m k

e X|e

.

e 1

Example 18.2.1 Two vegetation types are described on the basis of the density of tree species found within 12 plots: Species

Type 1 (n1= 6)

Type 2 (n2 = 6)

1

2

3

4

5

6

7

8

9

10

11

12

Pseudotsuga menziesii

10

12

16

18

20

21

10

12

10

11

6

8

Acer macrophyllum

21

30

17

18

21

34

30

28

41

43

55

20

Thuja plicata

0

0

3

6

7

2

10

11

4

8

9

12

The mean vectors and covariance matrices are

16.167  3.000

X1 = 23.500

 

 9.000 9.500

X 2 = 36.167

 

19.367

 

4.700 -10.500 -0.600  158.167 -20.400 8.000

S1 =  S2 = 

4.100 9.400  47.500 -9.800 8.800

Assignment of a new plot

10 X = 30 10 to one of the types is contemplated. The generalized distance values are

247

László Orlóci d(X; X1 ) = 10.8270,

d(X; X2 ) = 0.5065.

(6-3)6 The F-values are F1 = 3(5) 10.82702 = 140.67 and F2 = 0.308. Since d(X; X2 )