texture in high resolution digital images of the earth

TEXTURE IN HIGH RESOLUTION DIGITAL IMAGES OF THE EARTH ©1999 Philippe Maillard

1

0,1 1

10 E (trees)

100 E(w aves)

750

300 500 Y

200

250

100 0 35

30

25

h

20

15

10

5

0

0

March 2001

ii

University of Queensland

Department of Geographical Sciences and Planning

This thesis entitled:

TEXTURE IN HIGH RESOLUTION DIGITAL IMAGES OF THE EARTH

Was first submitted by:

Philippe Maillard, M. Sc. Geography

On the 01 November 1999

For the degree of Doctor of Philosophy (PhD) in Geography.

iii

iv

STATEMENT OF SOURCES

The work presented in the present thesis is, to the best of my knowledge, entirely the result of my research except as acknowledge in the text in which case complete reference are quoted and described in the Reference section at the end of this thesis. I also wish to state that all the results reported here have not been altered in any manner to please the objectives of the research and that the same research has entirely been conducted with ethics and honesty. Furthermore, the material presented here has not been submitted, in whole or in part, for a degree at this or any other university.

_____________________________ March 11 2001

Philippe Maillard, M.Sc. Geography

v

To my daughter Melina

“Quando eu me encontrava preso Na cela de uma cadeia Foi que eu vi pela primeira vez As tais fotografias Em que apareces inteira Porém lá não estavas nua E sim coberta de nuvens Terra, Terra Por mais distante O errante navegante Quem jamais te esqueceria…” Caetano Veloso

(It was when I was a prisoner In the cell of a jail That I saw for the first time These pictures Where you appear wholly Though not naked But covered with clouds Earth, Earth However distant The navigating wanderer Who would never forget you)

vi

SUMMARY

Visual texture analysis and understanding remains one of the areas of Digital Image Processing that still evades a recognised solution. In the more restricted domain of Remote Sensing and computer interpretation of high-resolution aerial photography, the problem is even more complicated because of ill-defined textural characteristics found on these types of images. Beyond the challenge of computer-based image segmentation and classification lies the simple yet most important acquisition of knowledge about how the task of visual interpretation of texture can be better understood. The absence of a formal definition of what visual texture is shows the general lack of a structured understanding in this area. This research was entirely dedicated to a better understanding of what visual texture is for the areas of Remote Sensing and Aerial Photography and how this knowledge can be put to use for the segmentation and classification of digital image of the Earth.

First an extensive review on the subject of visual texture is presented from the point of view of both Experimental Psychology and Computer Science. Having shed more light on the subject, a more analytical eye is used to decipher the various aspects of visual texture and the specific characteristics of textures associated to conceptual objects found on aerial photography. In particular, an attempt was made to analyse these conceptual objects in terms of the geographical meaning they carry. Using a well defined approach (i.e. “features-based” or “channel-based”) to the problem and having precise objectives, the development of a series of computer programs is fully described and put to test in a preliminary analysis. Three approaches to the problem of texture analysis are presented: the first one based on a well known approach, the Grey Level Dependency Matrix method as described by its authors (Haralick et al., 1973); the second one uses a Geostatistics tool, the semi-variogram and the third is based on the Fourier frequency vii

domain transform. Each one of these methods is first tested with scrutiny to assess its potential. Then, the operational routines are described along with a series of variations and options that are left to be thoroughly tested in the following chapters. Of these three methods, the last two are original in their implementations. All details of these developments along with the programming codes are fully described.

The three methods are then fully put to test and are compared in a very rigid way through a methodology that has been developed specifically for the purpose of assessing the potential of texture analysis methods. Not only are the method compared between themselves but the evaluation is always related to a particular classification context. A series of these specific contexts are presented and explained. Such approach to assessing to potential of these three methods has lead to the drawing of a profile of their behaviour with regard to various aspects of texture characteristics and classification contexts.

Not only did all three methods proved to have excellent potential for the classification of complex textures found on aerial photographs but they also proved to have each special strengths and weaknesses in specific situations. Such comprehensive knowledge about the relation between texture characteristics, classification context and the type of method being used is one of the most valuable goals of this thesis. Such knowledge is clearly shown throughout the analysis of the results and is further synthesised and put into perspective in the last chapter. An attempt was made to generalised the knowledge acquired to any texture analysis technique based on the type of mathematical principle used. Finally, a number of recommendations are made for the development of any texture analysis method that could embody desirable characteristics according to a generic texture analysis context.

viii

PREFACE

When I undertook the study of texture analysis as the main topic of my doctoral research, I saw an interesting challenge in combining my knowledge of Geography and Remote Sensing with the fields of Experimental Psychology and Computer Science from which I had much to learn. I believe that through this quest I have broaden my horizons as a lecturer and perhaps as a scientist. Through the work, through the difficulties, through the surprises it brought me, this project has been a most rewarding process in which my personal and professional life have undergone profound changes that will always affect my perception of the world from now on.

Whether a computer can be used to interpret an aerial photography in a way similar to us humans or not or even if this is a desirable possibility are very questionable statements. Personally, I think that if a new tool, be it computerised or not, can help us monitor, catalogue, manage and protect our ever changing Earth, it should be given appropriate attention and be put to work for us.

New knowledge, on the other hand, is always desirable and if the approach taken in this thesis made ample use of computers and programming, it is really the acquisition of a better knowledge about the relation between some mathematical tools and our faculty to interpret and identify visual texture that has motivated my research. I can, however only hope that my work and, most importantly, my conclusions will be used in the most beneficial manner to bring any improvement, no matter how small, to the way we treat the Nature that surrounds us.

ix

TABLE OF CONTENTS

Statement of Sources

v

Summary

vii

Preface

ix

List of illustrations

xvii

List of tables

xxii

1

Chapter 1 INTRODUCTION 1.1 The importance of texture in images of the Earth

1

1.2 Current state of research

4

1.3 Theoretical background

5

1.4 The benefits of the textural approach

6

1.5 The Objectives

9

1.5.1 Goal n°1 : new tools

10

1.5.2 Goal n°2: a generic texture descriptor

11

1.5.3 Goal n° 3: a systematic method for test and comparison

12

1.5.4 Goal n° 4: a generic approach to classifying textured objects

13 14

1.6 Organisation of the thesis

Chapter 2 TEXTURE AND THE THEORIES OF PERCEPTION

17

2.1 The theories of perception

17

2.2 Psychophysics

17

2.3 The Gestalt theory

19

2.4 The neuro-physiological approach

21

2.5 The Computational approach to visual perception

29

2.6 Some thoughts of the Psychophysical approach to texture analysis

35

Chapter 3 TEXTURE AND IMAGE PROCESSING 3.1 The image processing approach to texture

x

41 41

3.1.1 The segmentation approach

44

3.1.2 The feature classification approach

50

3.2 First-order histogram methods

52

3.3 Fourier spectra methods

54

3.4 Edge detection methods

60

3.5 Autocorrelation and Decorrelation methods

61

3.6 Dependency Matrix methods

66

3.7 Microstructure method

71

3.8 Singular Value Decomposition method

73

3.9 Fractal method

75

3.10 The Variogram method

79

3.11 Final remarks on the image processing approach to texture analysis

89

Chapter 4 THE PROBLEM OF NATURAL TEXTURES

91 91

4.1 The attributes of natural texture 4.1.1 Defining natural texture

91

4.1.2 Micro- and macro-texture

92

4.1.3 The effect of scale on texture

93

4.1.4 The frequency and the orientation

97

4.2 Features for texture classification

99 99

4.2.1 How many features? 4.2.2 The invariant feature dilemma

103

4.2.2.1 The scale

103

4.2.2.2 The orientation

104

4.2.2.3 The illumination

106

4.2.2.4 The noise

107

4.2.3 The human-like features dilemma

109

4.2.4 The use of colour and multi-channel imagery

110

4.3 A sample set of textures

111

4.4 Choosing methods for describing and classifying texture

116

4.4.1 The Grey-Tone Spatial-Dependence Matrices Method

116

4.4.2 A Second-Order Statistic Method

118

xi

4.4.3 A Frequency Domain Approach 4.5 Evaluating the different methods

Chapter 5 IMPLEMENTING METHODS OF TEXTURE ANALYSIS

119 120

123

5.1 The general setup

123

5.2 Common considerations and programming codes

124

5.2.1 Some considerations regarding the use of focal operators

124

5.2.2 The texture “edge” problem

126

5.2.3 The general structure of the programs

127

5.3 The Grey Tone Dependency Matrix approach 5.3.1 Computing and displaying the co-occurrence matrix for an

130 130

image 5.3.2 Selecting features in the set defined by Haralick

138

5.3.3 Implementing the GDM

141 142

5.4 The Variogram approach 5.4.1 Computing and displaying the semi-variogram of an image

143

5.4.2 Describing the semi-variogram function curve

153

5.4.3 The regular “lag banding” approach

154

5.4.4 The logarithmic “lag banding” approach

155

5.4.5 The “most significant dividing points” approach

156

5.5 The Frequency Domain approach

159

5.5.1 Computing and displaying the power spectrum

159

5.5.2 Applying frequency-banding filters

168

5.5.3 The laplacian option

172

5.5.4 A note on the inverse Fourier transform

174

5.6 Summary of findings from the three approaches to texture analysis

174

5.6.1 Implementing the GDM approach

175

5.6.2 Implementing the VGM approach

176

5.6.3 Implementing the FFT approach

177

Chapter 6 TESTING TEXTURE ANALYSIS METHODS IN A HIGHLY CONTROLED ENVIRONMENT

xii

179

6.1 Introducing the highly controlled environment

179

6.2 Setting up the testing environment

181

6.2.1 A classification algorithm as a testing tool

182

6.2.2 Some assumptions regarding the use of a classification algorithm

185

6.2.2.1 The criterion of classification: Bayes’ rule

186

6.2.2.2 The normal form of Bayes’ rule and the assumption of normality

187

6.2.2.3 TEC as an error measure

188

6.2.2.4 The unbiased ideal classification result and the edge effect

191

6.2.2.5 Homogeneity of classes

193

6.2.3 Classify to associate, generalise, merge and separate

194

6.2.4 Choosing experiments

195

6.3 Pre-processing the data

196

6.3.1

Data types and data depth

196

6.3.2

Raw data vs normalised data

197

6.3.3

Rotation-invariant texture features

198

6.3.4

The notion of relevant features in feature selection

200

6.3.5

The optimal window size

202

Chapter 7 ASSESSING THE CLASSIFICATION POTENTIAL OF THREE

205

TEXTURE ANALYSIS METHODS 7.1 Presenting the results and their significance

205

7.2 Experiment series #1: separating a set of very different textures

206 207

7.2.1 Experimental results 7.2.1.1 The GDM approach

207

7.2.1.2 The VGM approach

210

7.2.1.3 The FFT approach

217

7.2.2 General discussion of the results for experiment series #1 7.3 Experiment series #2: separating six sets of very similar textures

223 227 228

7.3.1 Experimental results

xiii

7.3.1.1 The Forest texture set

228

7.3.1.2 The Residential texture set

233

7.3.1.3 The Desert texture set

236

7.3.1.4 The Crops texture set

240

7.3.1.5 The Scrub texture set

243

7.3.1.6 The Waves texture set

246

7.3.2 Results of the discriminant analysis for the six texture sets

250

7.3.2 General discussion of the results for experiment series #2

255

7.4 Experiment series #3: separating a mixture of both different and similar textures and testing the capacity of good association 7.4.1 Experimental results

259 260

7.4.1.1 The GDM method

261

7.4.1.2 The VGM method

265

7.4.1.3 The FFT method

270

7.4.2 General discussion of the results for experiment series #3

275

7.4.2.1 The general classification of the T36 texture set.

275

7.4.2.2 The results of the feature selection tests

277

7.4.2.3 The reclassification according to the generic classes

278

7.4.2.4 The classification results using partial training sets

279

7.5 Knowledge acquired in the three series of experiments

280

7.5.1 On the experiments

280

7.5.2 On generic texture descriptors

283

Chapter 8 TESTING THE INVARIANT QUALITIES OF TEXTURE ANALYSIS AND CLASSIFICATION METHODS 8.1 The problem of texture under varying environmental conditions

291 291

8.2 The experimental design for testing the invariant qualities of texture 292

features

293

8.3 Brightness invariance 8.3.1 Varying the brightness

294

8.3.2 Classification results for brightness variations

295 298

8.4 Contrast invariance

xiv

8.4.1 Varying the contrast

298

8.4.2 Classification results for contrast variations

299

8.5 Noise invariance

303

8.5.1 Varying the noise

303

8.5.2 Classification results for noise variations

304

8.6 Rotation invariance

307

8.6.1 Varying the orientation

308

8.6.2 Classification results for orientation variations

308

8.7 Scale invariance

314

8.7.1 Varying the scale

314

8.7.2 Classification results for scale variations

315

8.8 Synthetising the results of the invariant qualities of the three methods

Chapter 9 SYNTHESIS AND CONCLUSION 9.1 Understanding and interpreting the results

320

325 325

9.1.1 The type of approach

325

9.1.2 The scope of the research

326

9.2 Outlining the new developments

327

9.2.1 The GDM method; a new look at an old tool

327

9.2.2 The VGM method; a new tool using a recent concept

328

9.2.3 The FFT method; a new tool using an old concept

328 329

9.3 Generic texture descriptors 9.3.1 High- vs low-contrast textures

330

9.3.2 Repetitiveness vs semi-repetitiveness

330

9.3.3 Singular vs multiple pattern

331

9.3.4 Natural vs artificial texture

332

9.3.5 Isotropic vs anisotropic textures

332

9.4 A test-and-compare method for texture analysis

333

9.4.1 Future improvements to the test-and-compare method

334

9.5 Towards a generic approach classify texture in high resolution images 337

of the Earth

337

9.5.1 Number of features

xv

9.5.2 Type of features

338

9.5.3 Window size

339

9.5.4 Directional information

341

9.5.5 An all-purpose texture classification approach?

342

9.6 Conclusion: final remarks

343

xxvii

BIBLIOGRAPHY Appendix A Software resources and programming environment Appendix B The “erm.h” header file listing Appendix C The “my_structure.h” header file listing Appendix D The Grey Tone Dependency Matrix “c” program listing

xl xlv xlviii xlix

Appendix E The Variogram method (VGM) “c” program listing

lv

Appendix F The Fast Fourier Transform method (FFT) “c” program listing

lx

Appendix G Complementary results

lxvi

Appendix H Discriminant analysis results

lxxii

xvi

LIST OF ILLUSTRATIONS

Figure 1.1

A simple illustration of how different material textures can be “experienced” by simply looking without touching. 3

Figure 2.1

The effect of brightness on texture.

19

Figure 2.2

The figure and ground principle.

20

Figure 2.3

Examples of receptive fields (RF) of the ganglion retinal cells.

23

Figure 2.4

Examples of Julesz's texture pairs that were used in his neurophysiological experiments. 24

Figure 2.5

The textons. Two pairs of isso-second-order texture pairs are represented. 26

Figure 2.6

The multi-channel model of visual detection and perception.

Figure 2.7

The Four-Channel models. Three different version of the fourchannel model. 29

Figure 2.8

Graphical representation of the zero-crossing.

31

Figure 2.9

The two-dimensional zero-crossing.

32

Figure 2.10

Example of a land cover map produced from a Spot satellite image. 36

Figure 2.11

The zero-crossing edge extraction performed on an aerial photograph and on a common close-range photography. 37

Figure 3.1

The Interpretation cues pyramid model.

Figure 3.2

Illustration of the mathematical gradient methods for edge extraction of a textured SPOT sub-image 46

Figure 3.3

Sub-images of a forest and of ocean waves with their respective Fourier power spectrum. 56

Figure 3.4

A texture field image containing 10 different textures sampled from real scanned digital air photos. 57

Figure 3.5

Graphic representation of the feature space of the Strongberg and Farr texture segmentation method. 58

xvii

28

41

Figure 3.6

The band/orientation-pass tesselation of the frequency domain proposed by Wilson and Spann. 59

Figure 3.7

Examples of natural texture images with perspective views of their respective autocorrelation functions. 64

Figure 3.8

Examples of natural texture images after filtering with perspective views of their respective autocorrelation functions. 65

Figure 3.9

Examples of co-occurrence matrices for the four images shown in figure 3.6. 69

Figure 3.10

The Laws operators computed over four textures.

Figure 3.11

The result from a texture classification trial on the image in figure 3.10. 73

Figure 3.12

Fractal mountains.

Figure 3.13

Log/log plot of the fractal Brownian motion calculated over the two texture fields shown in figure 3.3. 79

Figure 3.14

The typical shape of the semi-variogram.

Figure 3.15

Example of a forest texture image and its computed semivariogram and covariogram. 84

Figure 4.1

A texture of rolling hills covered with scrub.

Figure 4.2

The Fourier power spectrum and interpretation of the rolling hills scene of figure 4.1. 98

Figure 4.3

The initial set of textures (T6).

113

Figure 4.4

The T36 texture set.

113

Figure 4.5

Plot of the local variance against the resolution for the 36 texture samples shown in figure 4.4 114

Figure 4.6

Plot of the variance against the resolution for the 36 texture samples shown in figure 4.4 114

Figure 5.1

Methods for dealing with the border problem when using focal operator. 125

Figure 5.2

Illustration of the biased and unbiased texture maps that will be used to compare the results of classification. 127

Figure 5.3

The co-occurrence matrix for texture sample 1: Forest.

72

76

xviii

80

92

132

Figure 5.4

The co-occurrence matrix for texture sample 2: Residential.

133

Figure 5.5

The co-occurrence matrix for texture sample 3: Desert.

134

Figure 5.6

The co-occurrence matrix for texture sample 4: Agriculture.

135

Figure 5.7

The co-occurrence matrix for texture sample 5: Scrub.

136

Figure 5.8

The co-occurrence matrix for texture sample 6: Waves.

137

Figure 5.9

The original Forest image and its two-dimensional semi-variogram; four directional semi-variograms for the same texture sample. 146

Figure 5.10

The original Residential image and its two-dimensional semivariogram; four directional semi-variograms for the same texture sample. 148

Figure 5.11

The original Desert image and its two-dimensional semivariogram; four directional semi-variograms for the same texture sample. 149

Figure 5.12

The original Agriculture image and its two-dimensional semivariogram; four directional semi-variograms for the same texture sample. 150

Figure 5.13

The original Scrub image and its two-dimensional semi-variogram; four directional semi-variograms for the same texture sample. 151

Figure 5.14

The original Waves image and its two-dimensional semi-variogram; four directional semi-variograms for the same texture sample.

Figure 5.15

The effect of averaging the semi-variance function.

152 154

Figure 5.16

The logarithmic scale used for dividing the semi-variogram.

156

Figure 5.17

The result of applying the logarithmic banding approach to the semi-variogram of figure 5.14. 156

Figure 5.18

The illustration of the most significant dividing points algorithm.

Figure 5.19

Illustration of the successive one-dimensional Fourier transform on a sample image. 163

Figure 5.20

The 0 and 90° power spectrums of the six texture samples.

165

Figure 5.21

A grey-tone transect of the residential texture sample.

166

Figure 5.22

The scrub texture sample on which has been superimposed a gross xix

157

interpretation of the lineaments.

167

Figure 5.23

Graphical representation of the filter bank.

171

Figure 5.24

The 0 and 90° power spectrums of the six texture samples to which 173 a Laplacian filter have been applied.

Figure 6.1

Representation of the process of “blind” systematic selection of the 184 training areas.

Figure 6.2

An image representation of the “ideal” classification result for the 189 initial texture set.

Figure 6.3

The TSC classification score vs the visual classification score.

Figure 6.4

Illustration of the processes of biased and unbiased statistical comparison of the classification results. 193

Figure 7.1

Classification results from T6gdm32_ms.

Figure 7.2

T6_vgm_srpd32. Classification results using the logarithmic-like lag banding approach and the SRPD instead of the semi-variance. 211

Figure 7.3

T6_vgm_srpd_dt32. Classification results using the logarithmiclike lag banding, the SRPD instead of the semi-variance and a distance weighted table. 212

Figure 7.4

A graphical comparison of the best rotation-invariant classification results from all three methods. 225

Figure 7.5

The six texture sets used to test the “separating” potential of the texture analysis methods. 227

Figure 7.6

An image representation of the “ideal” classification result for the texture set. 260

Figure 7.7

Graphical results of the classification of T36 using the GDM method with two lags. 262

Figure 7.8

Representation of the organisation of the texture patches according to the generic class to which they belong. 265

Figure 7.9

Graphic results of the classification of T36 using the VGM method. 266

Figure 7.10

Graphic results of the classification of T36 using the FFT method.

Figure 7.11

Best rotation-invariant classification results from all three methods. 276

Figure 7.12

The effect of contrast on the Forest texture patch. xx

191

207

272

285

Figure 7.13

Grey tone profiles over texture patches to show the “peculiar” shape of spatial functions in artificial textures. 287

Figure 8.1

Representation of the brightness shift applied to the histogram of an image. 294

Figure 8.2

Representation of the effect of brightness shift on a texture set.

295

Figure 8.3

Classification results for a brightness shift of –25.

295

Figure 8.4

Classification results for a brightness shift of +25.

296

Figure 8.5

Representation of the effect of contrast change on a texture set.

299

Figure 8.6

Classification results for a contrast decrease of 10%.

301

Figure 8.7

Classification results for a contrast increase of 10%.

301

Figure 8.8

Representation of the effect of additive noise on a texture set.

304

Figure 8.9

Classification results for adding Gaussian noise of σ =15.

305

Figure 8.10

Classification results for adding Gaussian noise of σ =30.

305

Figure 8.11

Classification results for adding uniform noise of σ =25.

306

Figure 8.12

Classification results for adding uniform noise of σ =50.

306

Figure 8.13

Classification results for rotating the T6 texture set by 90°.

301

Figure 8.14

Classification results for rotating the T6 texture set by 45°.

311

Figure 8.15

Classification results for rotating the T6 texture set by 22.5°.

312

Figure 8.16

Representation of the effect of regularisation on the T6 texture set.

315

Figure 8.17

Classification results for direct scaling of 125%.

317

Figure 8.18

Classification results for direct scaling of 75%.

317

Figure 8.19

Classification results for a regularisation of 66%.

318

Figure 8.20

Classification results for a regularisation of 50%.

318

xxi

LIST OF TABLES

Table 4.1

The number of texture features in some of the major texture classification methods. 101

Table 5.1

The general structure of the texture analysis programs.

129

Table 5.2

List of the GTDM features used in some texture analysis studies.

139

Table 5.3

Average and standard deviation of the six features measured from the lag three co-occurrence matrices of the original set of six textures. 140

Table 5.4

Rank analysis of the average values of table 5.3.

Table 5.5

A comparison of the FT and FFT algorithms in terms of number of complex multiplications and additions for 1024 sampling points. 160

Table 5.6

Definition of the preset filters’ name, band center and Gaussian width. 170

Table 6.1

Example of a confusion matrix derived from the comparison of a classification result compared with the “ideal” classification. 189

Table 7.1

Confusion matrix from the classification of T6gdm32_ms.

208

Table 7.2

The result of applying the “knock-out” algorithm at the lag level.

209

Table 7.3

The result of applying the “knock-out” algorithm at the measurement type level. 209

Table 7.4

Classification results from the three VGM algorithms.

211

Table 7.5

Confusion matrix for the T6vgm_srpd texture feature file.

212

Table 7.6

Test of normality for the semi-variance and SRPD and normalised SRPD texture features. 213

Table 7.7

Classification results from the normalised SRPD dataset.

Table 7.8

Classification result from the various rotation-invariant algorithms applied to the SRPD dataset. 215

Table 7.9

The result of applying the “knock-out” algorithm at the feature type level on the combination of rotation-invariant SRPD features. 216

xxii

141

214

Table 7.10 The result of applying the “knock-out” algorithm at the lag level on the combination of rotation-invariant SRPD features. 216 Table 7.11 Preliminary classification results from the use of different number of neighbouring lines for the creation of the FFT texture feature set. 218 Table 7.12 Classification results from the three FFT algorithms.

218

Table 7.13 Confusion matrix for the T6fft08 texture feature file.

219

Table 7.14 Test of normality for the some of the features of the T6fft08 file.

219

Table 7.15 Classification result from the various rotation-invariant algorithms applied to the T6fft_08 dataset. 220 Table 7.16 The result of applying the “knock-out” algorithm at the feature type level on the combination of rotation-invariant FFT features. 222 Table 7.17 The result of applying the “knock-out” algorithm at the lag level on the combination of rotation-invariant FFT features. 222 Table 7.18 Classification scores (TSC only) from the classification of five textures sets. 226 Table 7.19 Results of classification tests on the Forest texture set with one, two and three different lags respectively. 229 Table 7.20 Overall classification results from the VGM approach.

230

Table 7.21 Overall classification results from the FFT approach.

231

Table 7.22 Best rotation-invariant classification results obtained from all three methods of texture feature creation with the Forest texture set. 232 Table 7.23 Results of classification tests on the Residential texture set with one, two and three different lags respectively. 233 Table 7.24 Overall classification results from the VGM approach.

234

Table 7.25 Overall classification results from the FFT approach for the Residential texture. 235 Table 7.26 Best rotation-invariant classification results obtained from all three methods of texture feature creation with the Residential texture set. 236 Table 7.27 Results of classification tests on the Desert texture set with one, two and three different lags respectively. 237

xxiii

Table 7.28 Overall classification results from the VGM approach for the Desert texture set. 238 Table 7.29 Overall classification results from the FFT approach for the Desert texture. 239 Table 7.30 Best rotation-invariant classification results obtained from all three methods of texture feature creation with the Desert texture set. 239 Table 7.31 Results of classification tests on the Crops texture set with one, two and three different lags respectively. 240 Table 7.32 Overall classification results from the VGM approach for the Crops texture set. 241 Table 7.33 Overall classification results from the FFT approach for the Crops texture. 242 Table 7.34 Best rotation-invariant classification results obtained from all three methods of texture feature creation with the Crops texture set. 243 Table 7.35 Results of classification tests on the Scrub texture set with one, two and three different lags respectively. 244 Table 7.36 Overall classification results from the VGM approach for the Scrub texture set. 245 Table 7.37 Overall classification results from the FFT approach for the Scrub texture. 245 Table 7.38 Best rotation-invariant classification results obtained from all three methods of texture feature creation with the Scrub texture set. 246 Table 7.39 Results of classification tests on the Waves texture set with one, two and three different lags respectively. 247 Table 7.40 Overall classification results from the VGM approach for the Waves texture set. 248 Table 7.41 Overall classification results from the FFT approach for the Waves texture. 249 Table 7.42 Best rotation-invariant classification results obtained from all three methods of texture feature creation with the Waves texture set. 250 Table 7.43 Compilation of the most important texture features for the first two discriminant functions according to the three methods and the six texture sets of experiment series #2. 253

xxiv

Table 7.44 Compilation of the GDM texture features occurrences in the first two discriminant functions in the canonical discriminant analysis. 254 Table 7.45 Compilation of the VGM texture features occurrences in the first two discriminant functions in the canonical discriminant analysis. 254 Table 7.46 Compilation of the FFT texture features occurrences in the first two discriminant functions in the canonical discriminant analysis. 254 Table 7.47 Rank analysis of the most correctly classified texture patches for each texture set and for each method. 256 Table 7.48 Results of classification tests on the T36 texture set with one, two and three different lags respectively. 261

Table 7.49 The result of applying the “knock-out” algorithm on the GDM features at the measurement type level. 263 Table 7.50 Good association analysis through reclassification of the T36 texture set using the GDM method with two lags. 264 Table 7.51 Compiled indiscriminate classification results of the generic classes using twelve different subsets of training samples. 265 Table 7.52 Overall classification results from the VGM approach for the T36 texture set. 266 Table 7.53 The result of applying the “knock-out” algorithm at the lag level on the orientation-specific SRPD feature set of 24. 267 Table 7.54 The result of applying the “knock-out” algorithm at the lag level on the rotation-invariant SRPD feature set of 18. 268 Table 7.55 Good association analysis through reclassification of the T36 texture set using the VGM method with five lags. 269 Table 7.56 Good association analysis through reclassification of the T36 texture set using the VGM method with six lags. 269 Table 7.57 Compiled indiscriminate classification results of the generic classes using the VGM method with twelve different subsets of training samples. 270 Table 7.58 Overall classification results from the FFT approach for the T36 texture set. 271 Table 7.59 Result of applying the “knock-out” algorithm at the lag level on the orientation-specific FFT feature set of 24. 272 xxv

Table 7.60 Result of applying the “knock-out” algorithm at the lag level on the rotation-invariant FFT feature set. 273 Table 7.61 Good association analysis through reclassification of the T36 texture set using the FFT method with the five highest frequencies. 274 Table 7.62 Good association analysis through reclassification of the T36 texture set using the FFT method with six frequencies. 274 Table 7.63 Compiled indiscriminate classification results of the generic classes using the FFT method with twelve different subsets of training samples. 275 Table 8.1

Classification results after “knocking” one measurement type at a time from the GDM feature set. 297

Table 8.2

Classification results after “knocking” one measurement type at a time from the GDM set for brightness shifts. 300

Table 8.3

Best classification scores after performing feature selection on the texture feature sets. 319

Table 8.4

Cross tabulation synthesis of the invariant qualities of all three methods. 322323

xxvi

Chapter 1

INTRODUCTION

1.1 THE IMPORTANCE OF TEXTURE IN IMAGES OF THE EARTH Photo-interpretation is made possible by the simultaneous presence of various visual cues associated with the numerous objects and regions on a scene: shape, size, colour and tone, texture, arrangement, shadows and stereoscopic effect (Olson, 1960; Gagnon, 1974). As scientists and researchers around the world began to use satellite imagery of ever-finer resolution, airborne spectral imaging system data and scanned aerial photography, it soon became clear that spectral-based methods of computer classification and segmentation were doomed to yield unsatisfactory results. If most natural objects found on a scene could be described as having an average spectral response, the spectral measurements often vary from pixel to pixel. These variations are often stationary but they often give rise to an apparent regular spatial pattern referred to as texture (Kittler, 1983). Spectral information used solely did not permit a satisfactory differentiation of the ever increasing number of geographical objects that could easily be identified by the expert eyes on these digital products. One of the key elements that the interpreters use to identify, analyse and report was clearly the spatial arrangement of colour and tone that formed natural visual entities: visual texture (Pickett, 1970; Haralick etal., 1973; Pratt et al., 1978; Tamura et al., 1978). As a consequence, the problem of "texture-based computer classification" triggered the attention of a large community of researchers and end-users, and that included me.

Texture, with shape and size, is one of the second order elements in the interpretation pyramid model, preceded only by tone and colour (Estes et al., 1983). The task of image interpretation by computer first began by exploiting the tonal and colour information contained in an image using pixel-by-pixel segmentation and classification of the grey level (most particularly in a multispectral dimension) but this was a very simplistic and dull vision of natural objects contained on a scene. As Pentland (1984) puts it, this was a little like Plato’s notion of real forms- e.g., sphere cylinders and cubes to describe threedimentional shapes. It was then only natural that researchers began to turn towards higher order interpretation elements to get better results in their computer-assisted interpretation process. Texture on the second level of complexity, came as a natural choice. There are still two higher levels of interpretation which serve to show that there is still a long way to go before reaching a "human-like" interpretation-machine. Texture is considered a low-level scene analysis task which means that higher levels of image analysis cannot be successfully undertaken until a good comprehension of texture and other low-level image analysis tasks is achieved (Briot and St-Vincent, 1985; Arbib and Hanson, 1987).

Before going any further, it is appropriate to give an early definition of what exactly is intended by the word “texture”. Texture is originally thought to be a tactile sensation associated to cloth materials, hence the association with textile (Pickett, 1970; Pratt, 1978). It is, however, the concept of “visual texture” that will be used here, although the “visual” will generally be omitted. According to Pratt (1978), visual texture of natural objects can be thought of as a semi-ordered arrangement of grey tones or colours (pixels) that give an “impression” of relief such as the one perceived by our fingertips running the surface of a textile. Hawkins (1970) described texture as depending upon 1) some local “order” of 2) elementary parts having 3) approximately the same size. However, in this thesis, a more flexible definition of texture will be used; one that more or less matches the definition given by Pratt (1978). Figure 1.1 illustrates this simple concept through a visual demonstration. The problem of texture definition will be thoroughly explored in Chapter 4 (and to some extent in Chapters 2 and 3). This simple definition should suffice for this introductory chapter.

2

Figure 1.1. A simple illustration of how different material textures can be “experienced” by simply looking without touching.

It is also important to point out that any approach to texture analysis requires much more complex algorithms since it deals with three-dimensional space (rather than the individual pixel or pixel vector) and, as a result, uses a significantly larger amount of computation power. This implies that tests that can now be undertaken using a 750 MHz CPU were much more difficult to thoroughly investigate just twelve years ago with a 25 MHz CPU. This does not mean that the solutions to computer-assisted interpretation rely solely on computer hardware but only that techniques that had already been developed some years ago can now be implemented and tested much more easily.

One of the objectives of this project will be to apply some of the major techniques of texture analysis to high resolution digital images. Aerial photography was chosen mainly for its availability and its high content in details which translated into a very broad diversity of textures. Although high resolution satellite imagery is now being acquired by the new generation of Earth-observation satellites (i.e. Ikonos launched in 1999 with a ground resolution of one and two meters), such images have a higher cost and since a battery of images would have been necessary to cover a broad range of textures, the implied cost was considered prohibitive. Although true digital imagery would have been preferable, the difference has generally been considered negligible by numerous authors (Haralick, 1973; Tamura et al., 1978; Pentland, 1984; Lark, 1995). It is also expected that in the process, a better understanding of the behaviour of these methods when used with such high resolution images (and all remote sensing products

3

eventually) will be acquired and that this new knowledge will translate into an easier and more efficient use of the computer for texture analysis and classification.

1.2 CURRENT STATE OF RESEARCH Describing the current state of research in the study of visual texture as a whole is not an easy task. With the tremendous growth in computer automation, many different fields of science and technology have shown a growing interest in visual texture for its potential in computer vision, these include: industrial automation, medical imaging, astrophysics, planetary exploration, biological engineering, remote sensing, and many others. Many of these fields of study have carried their own research and experiments on visual texture. This resulted in many different approaches being applied to just as many different applications.

This multitude of different approaches has yielded many different methods, which in turn has produced its own classification of these methods. Various authors have proposed a classification scheme of texture analysis methods (e.g. Lipkin and Rosenfeld 1970, Haralick 1979, Rosenfeld and Kak 1982, Pratt 1991). More recently, Reed and du Buf (1993) have made an extensive review of recent (since 1980) texture segmentation and feature extraction techniques. They claim that most of the development has concentrated on feature extraction methods which seek to extract relevant textural information and map it onto a special dedicated channel called features. The authors classified the various feature extraction methods as belonging to one of three possible classes: feature-based, model-based or structural.

In feature-based methods, some

characteristics of texture (such as orientation, spatial frequency or contrast) are used to classify homogeneous regions in an image. Model-based methods rely on the hypothesis that an underlying process governs the arrangement of pixels (such as Markov chains or Fractals) and try to extract the parameters of such process. Model-based methods can be classified as a sub-class of a feature-based approach since the model parameters are usually mapped onto features. Structural methods assume that a texture can be expressed by the arrangement of some primitive element using a placement rule. Feature-based, model-based and hybrid methods have overwhelmingly dominated the scene in the last twenty years or so. One of their findings was that although so many 4

different methods have been developed, no rigorous quantitative comparison of their results had ever been made. One possible reason for this lies in the simple fact that no strict methodology for performing such comparisons exists. Both these gaps will be assessed in the course of this thesis.

An extensive review of the work that has been carried out on visual texture will be presented in the Chapters 2 and 3. In Chapter 2, the review will assess the work on texture as a psycho-physiological phenomenon that plays an important role in our visual perception. Chapter 3 will focus on the computational aspect of texture analysis and on the various approaches that have been implemented from pioneering until most recent work.

1.3 THEORETICAL BACKGROUND Any study involving digital images has to first describe the type of image being used. Strahler et al. (1986) gives good insights on the parameters that control the characteristics of the images in remote sensing. They state that any remote sensing model is composed of a sensor model, a scene model and an atmospheric model. In this study, only sensors working in the visible spectrum will be considered. However they can be either digital or analog (which would imply a further scanning process). The atmospheric model will not be assessed in this project. The image type or scene model will be discrete in nature which implies, depending on the type of sensor used, the possibility of having abrupt grey level changes between objects (which would not be the case with magnetic measurements for example). Only H-resolution scenes will be used to ensure the investigation of well-formed visual textures. An H-resolution scene model is described as having a resolution cell size smaller than the size of the object being observed whereas an L-resolution model represents the opposite situation (Strahler et al., 1986). L-resolution models will only be considered a degradation of the former type. This scene model aspect is not however absolute in the sense that a particular resolution cell might be characterised as a L-resolution for some objects and an H-resolution for others, even in the same image. Some importance will be given to this aspect that appears to be a controlling factor on the texture of objects. The approach here is empirical and no attempt will be made to control the parameters that produce the texture 5

for a particular object or collection of objects. In short, the images will be approached in a similar manner as the photo-interpreter would: with some experience on the visual aspects of the objects but without any control on the formation processes. To illustrate this, one may take the example of a forest: since it is known that an image of the forest canopy is formed by an alternation of light and dark patches produced by the combination of the angle of incidence of the sun and the tree crown shape, one could easily be lead to use a model of tree shape and tree arrangement (which corresponds roughly to a primitive element and a placement rule respectively) and an illumination model to identify and classify the different textures in an image comprised mainly of forest. This kind of approach would mean that such a model exists for every type of conceptual objects found in a scene. Another approach, which will be favoured here, treats any texture simply as a combination of certain characteristics such as spatial frequency or contrast and to use these for the classification of the textures that are considered meaningful for a specific application. The next two chapters will try to give the theoretical background needed to pursue texture study, first as a visual perception phenomenon then as a computational problem.

1.4 THE BENEFITS OF THE TEXTURAL APPROACH It is important to approach the study of texture in aerial photographs not only from a problematic point of view (the classification problem that was pointed out in the introduction for instance), but also from the benefits it will bring. The problem-solving approach is straightforward and seeks the simplest solution to whatever problem without much concern for the means. Benefits have broader scope and are usually evaluated from the new knowledge they bring and from a time-span point of view: short-, medium- or long-term benefits. Benefits can be foreseen, expected, hoped for or even unknown, a solution can only be effective or not. Here, two types of benefits will be distinguished: direct and indirect.

The simple fact of acquiring a better understanding of texture in airborne or space-borne images is in itself a direct benefit; however such a broad benefit is too generic and does not yield much information on the specific scope of the thesis. The most straightforward direct benefit of using a textural approach in image analysis is the expected 6

improvement in image segmentation and classification by adding textural information to the existing spectral-based methods. “Improvement” is a somewhat subjective concept so it is important to better describe the kind of improvement that is expected. • Total Errors of Classification (TEC). Minimising TEC is the general principle behind the Bayesian classification rule and is generally regarded as the optimal classification criteria (James, 1985). In most studies of classification methods, TEC is used to compare the quality of results. Its relative decrease is therefore regarded as a direct improvement. In this thesis, TEC’s complement, Total Success of Classification (TSC) will generally be preferred. • More classes. Incorporating “texture space” into classical classification methods should reduce confusion between classes and increase separability. This should in turn allow more classes to be efficiently identified. This would also make it possible to include “new” classes by separating spectral classes into spectral-textural classes like, for instance, distinguishing different types of plantations by their structure or even different rock formations having the same lithology but different formation processes. • Better generalisation. Most image processing applications require some level of generalisation. Amongst the various types of generalisation, spatial generalisation is perhaps the most important in image processing. As texture can only be studied using regions (as opposed to single pixels) it can be expected that texture analysis will carry an embedded generalisation that will reduce undesirable variability. It is also probable that this generalisation inherent to texture analysis could be useful either before, during or after the segmentation / classification process, depending on the technique used. • A method for comparison. It will also be a direct benefit of this thesis to propose a methodology for evaluating and, more importantly, comparing the results obtained through different approaches. The many aspects of method-testing will range from data selection to behaviour assessment in varying conditions (rotation, scale, contrast,

7

etc.). The research involved in this thesis deals greatly with the problem of comparison and a Comparison and Testing Methodology will be proposed. • Less spectral bands. If the textural information is found to increase significantly the level of confidence of classification results and the number of classes, then less spectral bands could be needed for a particular desired level of classification which would result in simpler spectral models and a more economical image acquisition.

Two major indirect benefits are foreseen as a consequence of using a textural approach in image analysis and understanding.

• Higher reliability on digital methods of classification. Although digital classification methods have been used in Remote Sensing for over twenty years, many end-users still do not “trust” these methods to provide reliable maps. Even when sophisticated image processing software is available, the accuracy of results from “classical” spectral classification methods are often below acceptable error tolerance figures and the maps produced in such manner require too much reworking to produce a visually acceptable map product. It is hoped that a textural approach would greatly help solve at least partially both these problems. Even a ten percent increase in accuracy would probably make many users shift to digital techniques of segmentation and classification. The fact that all texture applications are necessarily region-based should also improve significantly the visual character of classification results. Both these factors should in turn increase the level of overall reliability and acceptance of digital method of image understanding. • Towards a comprehensive texture model. As will be shown in Chapter 2, a great deal is already known about the processing of visual information by the visual cortex but how exactly all that information is utilised to form a conceptual texture is still largely unknown. It will be therefore difficult to develop a comprehensive texture model as long as the mechanics of it are not fully understood. The work in this thesis concerns itself with natural visual textures occurring in airborne images of the Earth which can be considered as “difficult cases” of textures since they do not usually

8

show the regularity observed in artificial or “man-made” textures. The use and comparison of different approaches and techniques to identify and classify a broad range of natural textures occurring in these images is in itself a valuable contribution. The comparison of the results obtained from both existing and new methods and human interpretation should provide a good insight on how texture is perceived by these different “eyes”. Hopefully this work will also shed some light on how these textures are being interpreted by the eye of the expert; what essential ingredients need to be used in either separating or associating the wide variety of textures encountered in these images. Furthermore, good understanding of texture will enable researchers to pursue higher levels of computer interpretation such as using cues of a higher conceptual complexity like shape, size, shadows or even association.

1.5 THE OBJECTIVES Although many of the concepts that are referred to in this section have not yet been introduced, it seems appropriate to lay out clearly at an early stage, the intentions of the author of this thesis. The reader is referred to chapters 2 and 3 for an explanation of the concepts. The principal objective of the research project can be defined as follows:

To investigate and compare the major mathematical tools available for texture analysis and to propose a generalised computerised method for texture classification in digital images of the Earth.

It could be argued that other authors have already attained such an objective, or again, that it has an absolute connotation that is not appropriate for a research project. For these reasons, this basic objective has been broken down into a series of goals that will be easier to grasp and refer to. Each of these goals is believed to bring some new element to the general knowledge about visual texture. Although they are presented here as separate entities, these goals are really intertwined and will not be necessarily assessed separately. Four goals have been identified in the scope of this project:

9

Goal n°1: To implement new tools, test and compare them against a more traditional approach. These tools should be computationally efficient and have some relation with psychophysical evidence about human visual texture perception.

Goal n°2: To use these tools to understand the mechanisms of texture recognition and to build a generic texture descriptor made of essential texture information occurring in digital airborne images. Such a descriptor should outline the generic characteristics of texture and not be tied to a particular method.

Goal n°3: To outline and implement a systematic method for the test and comparison of results obtained from different texture analysis method The method should attempt to assess all the important issues of texture segmentation and classification.

Goal n°4: To outline a generic approach for classifying textured objects encountered in images of the Earth using texture primitives. This approach expects to take advantage of all the new knowledge acquired together with the new findings from each method tested and combine it with the knowledge on texture recognition as a human perception phenomenon.

Each of these goals will be explained in the following paragraphs.

1.5.1 Goal n°1: new tools.

It is part of the operational hypothesis of this thesis that new tools can be developed that might improve the results of texture analysis in images of the Earth or that existing methods can be improved to “fit” better the specific characteristics of these images. These improvements shall assess not only the direct results but also the general understanding and perception of texture along with some considerations about computer efficiency and optimisation. It is not my intent to put forward an entirely new method using a new mathematical concept but rather to investigate some of the most commonly used mathematical concepts for texture analysis and suggest new tools based on a different approach to the problem. As for the use of existing methods, it appears to me 10

that better results and a better understanding can be obtained through a fresh approach to the problem (one that concentrates exclusively on textures in aerial photographs) and, most importantly, through thorough testing.

Another important aspect of this first goal is that the methods chosen will have some relation to what is currently known about the human visual system in terms of texture analysis and recognition. It is important to specify here that this is not a study of Psychophysics nor that it makes no pretence at solving the mysteries of human visual perception, but only that a serious attempt will be made to exploit the neurological evidence that supports what is now known about human vision. This is important in that the objective of this study is not only to propose some method to carry texture segmentation but also to provide a better understanding of the texture present in aerial photographs and how the interpreter processes such textural information. A number of considerations will be involved in trying to use this knowledge: • A multi-channel1 system will be “simulated” through the use of various features, each representing a different visual aspect (Sekuler, 1974; Harvey and Gervais, 1978; Caelli, 1982). • The frequency and orientation aspects will be processed independently as appears to be done in the human brain (Meese and Freeman, 1995). • The texture features will also (but not only) be chosen on their ability to correspond to important aspects of our perception of texture like contrast, orientation and spatial frequency.

Unfortunately, as very little is known on how the information from the various channels is actually processed and classified by the brain, the selected features will be treated as a 1

The multi-channel term refers to a model of human vision originally proposed by

Campbell and Robson (1968) which states that different frequencies and orientation in visual texture are processed separately by our visual cortex. For further details, see section 2.4 and figure 2.6. 11

multi-dimensional space and a traditional statistical segmentation method will be used to classify the textures analysed.

1.5.2 Goal n°2: a generic texture descriptor.

The aim of the project as a whole is not simply to propose some algorithm that performs texture analysis and segmentation but also to increase knowledge about texture and most particularly the texture encountered on vertical high resolution images of the Earth. It becomes important then to be able to classify those textures according to specific characteristics in such a way that they can be either associated to related textures or not depending on the particular situation. For example, if some constructed area has not been accounted for in a classification training session, it is hoped that the segmentation process will associate it with another constructed area type and not a totally different conceptual object like an orchard for example. Building an index of key features and rating them might help control such segmentation behaviour.

Building that key will also help assessing the potential as well as the limitation of the methods. As another example, consider the simple attribute of some texture measure being scale-invariant. Aerial photographs come in all scales but a forest texture can be equally recognised on a broad range of scales and the texture description of that same object at a different image scale might be very different but it should nonetheless describe the same conceptual object: a forest. The invariant feature problem will also play an important part in goal no 3.

The ordering of texture samples according to various criteria is also the only way to compare the performance of the algorithms with that of human interpreters. It will also help to delineate the realm of textures encountered in aerial scenes.

1.5.3 Goal n° 3: a systematic method for test and comparison.

As pointed out earlier, there is a lack of comprehensive studies that compare results obtained from the various approaches to texture recognition and that one probable

12

reason lies in the fact that no systematic method has ever been suggested. It would represent a Herculean task to compare all the methods that have been developed so far but some significant advances could be achieved by selecting some of the most popular and different approaches. In selecting different mathematical tools to perform texture segmentation the main concern was to select tools different enough from each other to enable a significant comparison. More knowledge can be gained from comparing dissimilar methods than similar ones, at least as a first effort. The aim of this comparison is not so much to find the “best” but most importantly to find in which situation these tools perform better and to suggest why based on scientific observation. This kind of approach will also ensure that the comparison scheme is not only specific to one kind of approach but would easily be applied to any other method.

Although there are numerous ways by which these various tools can be compared and evaluated, it is the “cartographic potential” that will prevail in this thesis which can be describe as their ability to ignore insignificant details (in terms of extent) and perform cartographic generalisation. The efforts will be directed at evaluating the performance of the different approaches from the most variant yet realistic situation (in terms of cartographic potential of images of the Earth). In the process, not only will the behaviour and potential of these methods be assessed but the comparison method itself will also be evaluated and a testing and comparing methodology will be presented. Some of the considerations will be: •

The capacity to distinguish similar and dissimilar textures.

•

The capacity to associate similar textures.

•

The invariant potential of features and methods under changing conditions.

•

The correct segmentation of texture edges.

1.5.4 Goal n° 4: a generic approach to classifying textured objects.

Goal n° 4 is much less specific than the other three. What is intended through this goal is to outline what kind of tools and knowledge are necessary for an end-user to perform some sort of segmentation / classification of textures present in digital aerial images. Just as it is necessary to understand the physics of light reflection and absorption in the 13

different parts of the electromagnetic spectrum to do a spectral classification of an image, it is essential to understand how texture is perceived by the human brain and how this essential information about texture can be mapped in what is loosely called texture features. It is also important to understand the mechanics of the mathematical tools being used, their advantages, their weaknesses and how they relate to the neurophysiological and psychophysical evidence gathered about our sensory perception of visual texture.

It is also part of the hypothesis of this thesis that the results of any texture segmentation can be greatly improved not only by having the best tool but also by making proper use of that tool. It is my belief that a pre-analysis of textures contained in an image or a set of images can provide all the necessary information on how a particular texture segmentation should be performed. Furthermore, tools can be developed that will carry out such pre-analysis and further ease the texture segmentation task. An insight on the possible development of such a tool is also part of the scope of this thesis.

1.6 ORGANISATION OF THE THESIS First an overview of visual perception as a psycho-physiological phenomenon will be given to provide this research with a structured framework and a "human vision" related approach. Although this approach might not be optimum, it can stand as a basis for understanding visual texture. Then the literature on texture analysis theory will be reviewed to highlight the current state of research and give the reader the theoretical background that was used within the scope of this project. Next, in Chapter 4, the particular problem of texture analysis of natural textures occurring in aerial photographs will be assessed from a variety of aspects in an attempt to restrain the number of approaches and techniques that are available. In Chapter 5 the methods used and the programs and routine that were developed will be thoroughly described and explained with some preliminary testing on texture samples. In Chapter 6, the general testing methodology along with the tools used to carry it out and the various assumptions will

14

be presented and thoroughly explained. Chapters 7 and 8 will be dedicated to testing and evaluating the results of the various approaches through a series of experiments. In the first of these two chapters, the testing will assess general as well as very specific aspects of texture classification such as its capacity to separate or associate different textures. Chapter 8 will be entirely dedicated to the behaviour of the methods under changing environmental conditions and their capacity to remain invariant under the most important of these types of changes.

These tests will be carried out in a highly controlled image environment which will be selected to maximise understanding and minimise the production of context-dependent results.

Chapter 9 will be dedicated to further assessing the results obtained in Chapters 7 and 8 and, in a more specific way, to Goal n° 4: a generic approach to texture segmentation. In this Chapter, all the experience gained will be used to suggest a generic methodology and a set of tools to assist the end-user in gaining enough information on the potential and limitations of performing a texture segmentation in any situation. The chapter will conclude by assessing further work to be done and will give an insight on what has been accomplished in view of the objectives that were set out.

15

Chapter 2

TEXTURE AND THEORIES OF PERCEPTION

2.1 THEORIES OF PERCEPTION Theories of perception are embedded in the science of Experimental Psychology which is the "application of laboratory techniques to the investigation of mind and behaviour, including such subjects as perception, memory, thinking, learning and problem solving" (Beer, 1994, no page number). It is fascinating to observe how physiological and psychological explanations of visual perception have contributed to the problems of image processing as it is easy to see that most of the actual image processing techniques make use of some psychophysical theory. Not less interesting is the observation that Digital Image Processing, Computer Vision and Computer Graphics, being different fields of study appear to be merging towards an attempt to build a comprehensive model of human vision. It appears that the work of David Marr (1980, 1982) is probably one of the main contributions that brought these areas closer (Gordon, 1989).

The research in psychophysics will be described briefly here but exclusively in its visual perception aspect. Most specifically the theories will be related to texture.

2.2 PSYCHOPHYSICS The first experiment of interest for visual perception was conducted by E.H. Weber in 1846 (Gordon 1989). Weber showed that the Just Noticeable Difference (JND which is the quantity of stimulus sufficient to be perceived; a stimulus being an external agent 16

capable of provoking a reaction of the central nervous system) was not an absolute value for a particular individual but rather a ratio between the difference in stimulus and the original stimulus. For example, on a tone scale of 1 to 100, if the JND in tone perceived for a tone patch of value 10% is, say 2%, then that JND for a patch of 80% is equal to the brightness level (80) multiplied by a fixed ratio (in this case 2/10 or 0.2):

K=

∆S S

(2.2-1)

where K is the constant, S is the stimulus strength and ∆S is the JND. So, the JND for a tone of 80 would be of:

JND80 = 80 ×

(2.2-2)

2 10

= 16 %

But it is G.T. Fechner (1801-1887) that really launched psychophysics by formulating a law with Weber's principle (in Gordon, 1989):

Sensation = C log Stimulus intensity

(2.2-3)

where Stimulus is the external agent and Sensation is an alteration of state felt by the subject and C is a constant. Although these step probability models (the stimulus response probability rises from zero to one over a single threshold value) are now known to be wrong (further models made use of Gaussian probability curve and neural quanta), they have given rise to innumerable experiments and other models.

Perhaps the main finding of these theories is that the relation between stimulus and sensations is most probably a logarithmic one. The importance of which, for visual texture analysis, can easily be illustrated by a simple illustration. In figure 2.1, a texture image extracted from a Landsat TM scene is shown at different tonal intensities. In these four versions of the texture, a constant value has been added or subtracted but the overall variance of each image is exactly the same. However, the darker image seems to

17

have a coarser texture than the lighter one. This simple demonstration brings two questions:

•

1) how do photo interpreters react to differences in brightness when using texture as a key element for their interpretation?

•

2) should a computer algorithm do the same or should more independent criteria be defined?

-50%

-25%

original

+25%

Figure 2.1. The effect of brightness on texture: our ability to sense darker shades is weaker than lighter ones, therefore the brighter image appears more “textured” than the darker one.

2.3 THE GESTALT THEORY The Gestalt theory had an enormous impact both on psychology and on other social sciences. Even in Cartography, one can see the importance of the laws of grouping (according to the Gestalt theory these are: Proximity, Similarity, Symmetry, Good Continuation, Closure and Common Fate), to quote but one Gestaltist principle, in the process of generalisation. One of the reasons why Gestalt laws are so convincing is because they make no use of complicated descriptions of experimental results but rather show their principles with simple elegant demonstrations (Gordon, 1989). These laws are of great interest for texture analysis as they trigger our "intuitive" notion of what texture is. The following summarises the most important principles of Gestalt thinking and their possible application to texture analysis (after Gordon, 1989; Bruce and Green, 1985).

18

• Figure and ground refers to our natural tendency to distinguish objects by contrasting them against background. As an example, in figure 2.2, people generally observe a set of poorly organised lines over a white background rather than a confusing set of black and white patches. This description of a scene matches the most simple scene model as being a mosaic of discrete objects distributed on a continuous background (Woodcock and Strahler, 1987). It could also be stated that, for a given distribution of related objects (e.g. trees or houses), a background change will also produce a texture change.

Figure 2.2. The figure and ground principle: the figure above is generally perceived as black lines over a white background and not as a mosaic of black and white patches. • Goodness (Pragnanz) and the Laws of Grouping were first demonstrated by Wertheimer in 1912 (Gordon, 1989) and were used by the gestaltists to show that we have a natural (perhaps innate) tendency to group objects or sub-regions of figures by using a set of laws. These are: − Proximity: similar objects tend to be grouped if they are closer to each other. − Similarity: different objects will be grouped by their relative similarity. − Symmetry: even imperfect symmetry in the organisation of objects will give a grouping impression (e.g. left and right or up and down). − Good continuity: objects will tend to be grouped in the smoothest manner. − Closure: perceptual organisation producing a closed figure will be preferred to an open figure. − Common fate: objects that appear to move together are grouped together.

19

− The Gestaltists concluded from these laws that our perception is always moving towards simplicity, symmetry and wholeness. • The wholeness principle states that the whole is greater than the sum of its parts. This can be explained by the relationship between elements, which is naturally perceived by an observer. This principle can easily be illustrated with texture. For example, two images of grass will naturally be associated even if they came from different lawns. This can be explained by the fact that the relationship between dark and bright patches will generally be maintained even if the patches are actually "different". The observer will tend to see the resemblances rather than the differences. • Constancy is the principle by which objects tend to maintain, in our perception, certain properties constant: as they recede, objects do not tend to "shrink" nor do they acquire different colours under different illumination. In an aerial photograph, interpreters are not disturbed by the difference in illumination on opposed slopes of a hill unless that difference is so dramatic that it may be taken as shadow.

The Gestalt theory is very useful when trying to describe the processes that underlie the visual perception; these principle are of great help for teaching photo-interpretation. One of the main problems associated with the Gestalt theory is that description is somehow confused with explanation and that an elegant demonstration can yield different conclusions. The "Brain model" put forward by the Gestaltists (especially Kohler) has proved to be incorrect. In a "top-down" or "bottom-up" classification of the perception models, the Gestalt theory can be thought of as a "top-only" model. As this project is concerned with texture analysis and not visual perception per se, the Gestalt theory offers limited possibilities. It is still a very valuable way of describing the interpreter’s natural tendencies.

2.4 THE NEURO-PHYSIOLOGICAL APPROACH

20

The neurophysiological approach is basically a reductionist or "bottom-up" model which is to say that it studies the lowest level of visual perception (cell-based) to provide an explanation for the high level meaning of what we see. Because the neurophysiological theory tries to provide a model that can be physically explained (electrical discharges) and located (the cells) it can be easily understood why it has been the most popular psychological / physiological approach to be used by digital image processing. Neurones are specialised cells that have a known behaviour (null, excitatory or inhibitory) and that interact together in a manner analogous to logical gates. It may, therefore, be deduced that a computing device can do something akin to neurones.

Perhaps one of the first successes of the neurophysiological approach was to provide a definite explanation of colour vision by proving that both Yong's Trichromatic model (1802) and Hering's Opponent Process Theory (1890) were correct. Another area where the neurophysiological approach seems to have mediated the debate between different theories is the Template Matching versus the Feature Detection models.

Template matching claims that the visual cortex might try to recognise shapes by noting their similarities to canonical forms (templates). This would mean however, that this process would have invariant properties such as size, skewness (when viewed at an angle) and rotation.

The Feature detection model analyses shapes by their component parts. Lines and edges in various directions are such components. This model is compatible with much of the more recent findings including Marr's theory (Marr, 1982) which will be described later.

There is now neurological evidence that there are retinal ganglion cells that respond to a field (or area) of the retina: a receptive field. Those receptive fields (which are composed of a number of retinal cells) can combine excitatory and inhibitory response to light. They can be thought of as a kind of structuring element such as the ones commonly used in mathematical morphology for the erosion and dilatation of regions in a digital scene (Serra, 1982; Cocquerez and Philipp, 1995). Some examples are shown in figure 2.3. Following these discoveries, the feature detection model appears much closer to the way human vision works than template matching. 21

Figure 2.3 Examples of receptive fields (RF) of the ganglion retinal cells. The signs show the effect of light on the response (+: excited, -: inhibited). The upper-left RF will detect bright spots whereas the upperright RF will detect a dark one. The lower RFs will give a strong response to a strong contrast produced by an edge for example. After Gordon, 1989. The circles illustrate an area of the visual field and the pluses and minuses represent individual cells that will respond respectively to bright and dark spots of light. The RF will give a unified response that will correspond to the resultant of all the individual cell responses combined.

The results from these studies can be said to have influenced a series of approaches and even algorithms based on the detection of edges in digital images. Rosenfeld and Troy (1970) have suggested a texture measure based on the number of edges in the neighbourhood of a pixel. Haralick (Haralick et al., 1973 and Haralick, 1979) has proposed a texture measure based on a co-occurrence matrix of geometrically related pixels. These examples are clearly inspired by the existence of visual receptive fields. It is still not known, however, how many of these types of field exist.

Unfortunately, all neurophysiological experiments did not always lead to neurological evidences and much of the neurophysiological research work with models is based on indirect experiments. One of the most important and quoted of these more recent experiments was the one conducted by Bela Julesz (1962). In his experiments, Julesz created a set of approximately 100 x 100 pixels texture pairs like the ones shown in figure 2.4.

22

Figure 2.4. Examples of Julesz's texture pairs that were used in his neurophysiological experiments. From Julesz, 1962, p. 44.

In these texture pairs, Julesz was able to control the first-, second- and third- order statistics2 of both parts of the texture images. Julesz showed that observers were able to distinguish effortlessly (without concentration) the textural halves when they had significantly different first- or second-order statistics (these will be explained in the next section) but were unable to do so when the textures had different third-order statistics but identical first- and second-order statistics. He also showed that if the texture contained deterministic patterns such as lines or checkering, the textures could sometimes be distinguished. These experiments have been commented, quoted and discussed by so many authors that it seems pointless to try to make a list. One important aspect is the interest Julesz's research has raised in the image processing community (Pratt et al., 1978). In their study, Pratt, Faugeras and Gagalowics (1978) (which are associated with the image processing community) have carried out experiments meant to be a continuation of Julesz's conjecture. They have verified some of Julesz's results, but 2

First-order statistics are based on the simple probability distribution of some function such as energy, variance, skewness and kurtosis and is expressed by P (a ) ; second-order statistics are measurements based on the conditional joint probability of having two values separated by some distance and angle (in , ) : the probability of having grey level a the case of images) and can be expressed by P (a ,b | x ,y ;d θ at x,y and grey level b at distance and angle d,θ from x,y; third- and higher-order statistics are measurements based on the conditional joint probability of having three or more values separated by some specified distances and angles.

23

have also shown that some of his conclusions were not correct. They concluded that the texture field description is dependent on the first- and second-order probability densities but that first- and second-order moments (mean, variance and correlation function) are not sufficient to fully describe our ability to distinguish texture fields. These findings would later be questioned in particular by Serra: “These conclusions leave us rather sceptical. Second-order probabilities are important but not exhaustive...” (1982, p. 295). This is what Serra called a blindness issue which was, in his case, a limitation of the covariance as a tool for distinguishing spatial structures.

Later, Julesz (1981) continued his experiments and came to interesting new conclusions about the capacity to distinguish texture fields. He was able to show that, on the one hand, some texture pairs that had significant difference in their second-order statistics could not be separated, and on the other, that some texture pairs that had identical firstand second-order statistics could be distinguished when they had different third-order statistics. He then argues that our ability to effortlessly discriminate textures depends on the first-order statistics of the image and on the presence of what he has given the name of textons. Textons are local features in textures (especially artificial ones) that display a peculiar structure that are easily recognised by observers. He then describes the three types of textons discovered in the context of his studies:

•

1) quasi-collinear structures are an arrangement of elements that appeared to be describing a line (but not necessarily a straight one).

•

2) elongated blobs simply refer to the elongated structure of a single or a group of objects.

•

3) the number of terminators refers to the number of non-connected segments found in a feature element. Textons would also be invariant under positional and scaling transformations. Figure 2.5 gives graphical examples of two of these textons.

The textons are easily associated with the receptive fields first discovered by Adrian in 1928 but they do not yet have physiological evidence to support them. 24

Figure 2.5. The textons. Two pairs of iso-second-order texture pairs are represented. a) and b) shows the effect of the quasi-colinearity textons. In a) the texture pair cannot be distinguished whereas textures in b) can be easily distinguished because the central texture shows a quasi-colinearity structure. c) and d) show the effect of the number of terminator textons. In c), the two textures have the same amount of terminators (2) and are difficult to distinguish whereas in d) the two patterns have a different number of terminators and are easily distinguishable (two and five in this case). From Julesz, 1981, p. 93-94.

There is another major area in which receptive fields have been supportive of a theory: the visual perception of spatial frequencies. Any image can be expressed in terms of two-dimensional complex frequencies using, for example the Fourier transform (Pratt, 1978). But even more interestingly, visual texture can generally be expressed in terms of a few simple frequencies and orientations. The first important breakthrough in the perception of frequencies is probably the paper written by Campbell and Robson (1968,

25

in Gordon, 1989 and Harvey and Gervais, 1978). They used artificially created gratings (a grating can be expressed by its contrast between light and dark patches, its spatial frequency, its orientation and its phase) to measure the thresholds associated with our perception of frequencies and established the relation between the frequency and its contrast embodied in the Contrast Sensitivity Function Curve. This function predicts the fineness of detail (high frequencies) that can be perceived at a particular contrast level. Furthermore, Blakemore and Campbell (1969, in Gordon, 1989) showed that staring at a particular grating provokes an eyestrain that is specific to that particular frequency and orientation. This discovery has led to an impressive body of work by various authors (Sekuler, 1974) which brought extensive experimental evidence to the fact that our visual system carries frequency information in tuned channels that are frequency- and orientation-specific. Spatial frequencies are easily computed for a digital image (although it needs a fair amount of “number crunching”) this makes these discoveries all the more interesting for the image processing community.

A model of visual detection based on the tuned channels theory is shown in figure 2.6. In this model, the light stimulus arrives at the retina through a Modulation Transfer Function that acts as a first filter. The light falling on the retina provokes neural activity that in turn serves as input to separate channels, each with its own frequency-band spectrum. The output of these channels goes on to independent detector mechanisms (which are not described) that make the decision as to "respond" or not to the input. Richards and Polit (1974) have proposed a four-channel one-dimension frequency model: Visibility, low-, medium- and high-frequency channels. This model along with two other models are shown in figure 2.7. One clear fact is the overlap between the channels. In another study, Harvey and Gervais (1978) have shown that the Richards and Polit model seems to work best compared to the other two (based on how different texture gratings were grouped using a similarity factor by a number of subjects). They have also shown that in almost 90% of the cases, only two of the four channels were sufficient to perform the discrimination. However, they conclude (like Richards and Polit and others) that a model containing at least four channels accounts for most of the perception of visual textures.

26

Figure 2.6 The multi-channel model of visual detection and perception. From Harvey and Gervais, 1978, p. 535.

The evident flaw in these studies is that the images that were used only showed onedimensional frequencies. As Caelli (1982) argued these models are inadequate when two-dimensional textures are analysed. He then made a three-dimensional version of the channel filters: the clam-shell filter. He doesn't however make any speculation on the number of clam-shell filters required. In his work, Caelli also suggests that there is a relation between the orientation and the frequency: as frequency increases, so does the orientation sensibility (or separability). This also means that it is more difficult to recognise changes in orientation at low frequencies.

From reading these authors and others, it seems quite clear that computer technology will continue to play an increasingly important role in visual perception. Although some do not believe that computers will eventually be able to perform close to real human perception (for example, Dreyfus, 1972), it will continue to be used both as an experimentation and model tool. The next section will assess more deeply the use of computers in visual perception.

27

Figure 2.7 The Four-Channel models. Three different versions of the four-channel model: a) Visibility, low-, medium-, and high-frequency channels of Richard and Polit (1974), b) exponential filters and c) the Butterworth filter channels. From Harvey and Gervais, 1978, p. 537.

2.5 THE COMPUTATIONAL APPROACH TO VISUAL PERCEPTION. It was the work of David Marr (1982) that really threw some light on a formal computational theory of visual perception. Although his approach had been taken previously, specially by computer vision specialists (for example, Rosenfeld and Kak, 1982; Pratt, 1978; Gonzalez and Woods, 1992 to quote but a few authors of books on the subject), he was the first to give such a clear formalisation to the approach. In Marr's theory, there are three levels of explanation for all the processes that take place in what we call vision: 1) Computational theory, 2) algorithm and 3) hardware implementation. Computational theory expresses the goal of computation, its appropriateness and strategy. Algorithm represents the procedure for problem solving, a representation of the inputs, outputs and internal computation. Finally, hardware implementation represents

28

the actual physical device that will carry the task, the nature of its component parts and how they operate. This framework is very valuable to any research in perception, especially from a computational point of view as it gives a common base to any theory aimed at explaining the concepts behind visual perception.

Marr also proposes a model for visual perception with a modular design, the four stages of visual perception:

•

1) the image which is the simple spatial representation of the light intensities on the retina as controlled by an instantaneous field of view (IFOV) and a modulation transfer function (MTF) as in figure 2.6. The resolution however has to be expressed as degrees of arc.

•

2) The primal sketch resides at the lowest processing level where raw intensity values are processed to build explicit information. One of its primary functions is to inform on the geometric distribution and organisation of intensity changes (edges). In this representation, edges and texture of objects are made explicit.

•

3) At the two-D sketch, a "picture" of the world is emerging as the orientation and rough depths of objects are described.

•

4) The three-D model is a sort of object-oriented model of the visual world where their shape, orientation and depth are explicit. In this representation, the description has become independent of the original position and orientation of the object on the retina.

It appears clear that visual texture is primarily concerned with the first two stages of vision: the image and the primal sketch. An image model has already been adopted as being a discrete square tessellation of light intensities so that visual texture recognition resides almost solely (texture cues could also be inducted from higher levels in an hypothesis-testing scheme) on the primal sketch stage. The next paragraphs will briefly describe the major primitives of the primal sketch.

29

Zero-crossing is the basic primitive from which blobs, edge segments and bars can be extracted (Marr and Hildreth, 1980). Zero-crossing is obtained by computing the second-derivative of an image and then extracting all the positions where the function crosses the zero (figures 2.8 and 2.9). However, in Marr's algorithm, this is done independently over various spatial frequencies which are in turn computed by applying Gaussian-blurring filters (being filters that have a limited effect on edges). The zerocrossings are directly related to intensity change (gradient) in the image and it has the advantage of being an absolute rule rather than depending on a preset threshold. The various zero-crossing (or frequency channels) are then combined to extract the blobs, edge segments (straight lines), terminations and the bars which together make up the primal sketch of the image (see figure 2.9).

Figure 2.8. Graphical representation of zero-crossing. The intensity change of the original function (in this case a one-dimensional intensity) is differentiated to produce the first-derivative which is in turn differentiated to produce the secondderivative. The location where the second-derivative function crosses the x-axis is the zero-crossing.

One can easily see the analogy of this approach with the multi-channel model of Campbell and Robson (1968, in Harvey and Gervais, 1978) on the one hand and with Julesz's textons one the other. If the algorithm has been, to a certain point, successfully implemented in a computer, what will its hardware implementation be?

30

Figure

2.9.

Two-dimensional

crossing. In

zero-

a two-dimensional

image, the original image is first blurred using a Gaussian filter (a) and

the

computed operator

second-derivative using (b).

a

Then

is

Laplacian the

zero-

crossings are extracted. Note that in Marr and Hildreth's algorithm, this operation

is

done

at

various

frequency bands.

Marr and Hildreth made a very plausible neuro-physiological model of the algorithm using a network of receptive fields (see Marr and Hildreth for a full explanation). However, their explanation as to how the different elements of the primal sketch (or textons or channels) are combined to produce the explicit sketch is much more confusing and remains somewhat a mystery. As Gordon (1989) puts it, the top-down explanation in Marr's theory is not as convincing as the bottom-up part. This is the main flaw in most of these channel-oriented models. In one of their papers, Uttal et al. (1995 B), while carrying out a general critique of a Harmon and Julesz model (1973, in Uttal et al., 1995 B), argue the following:

"This alternative point of view also suggests that there are many different ways in which the information of the configuration may be communicated to the observer. While features such as edges, corners, "geons", "textons", or spatial frequency components may play a useful role in describing stimuli, the complex and difficult-to-quantify factor to which we refer as arrangement may actually be the most meaningful perceptual parameter. If so, we must once again acknowledge the enormous insight of the classic Gestalt students of perception." Uttal, Baruch, and Allen, 1995 B, p. 690-691

31

Furthermore, if the important role which luminance edges play in low-level visual perception seems widely accepted (Marr and Hildreth, 1980, Haralick, 1984, Pratt, 1991), it is not yet proven that the zero-crossing circular filter model is the actual algorithm (the term being employed as in Marr's theory) that make those edges explicit. Meese and Freeman (1995) argue that circular filters cannot account for the orientation anisotropy experienced by subjects while observing gratings in various orientations. Without describing their experiments, here are some of their findings:

•

Orientation of some special gratings (in this case two superimposed sinusoidal gratings of equal contrast and frequency but with a different orientation) can affect the perceptual anisotropy.

•

Orientation-specific filters or close-to specific filters (they propose close-to vertical and close-to horizontal filters) could explain the anisotropy.

•

The interpretation of the special gratings could be the result of the combined response of the orientation-specific filters.

•

The environment might have had an effect on the orientation preference experienced by the observers: horizontal and vertical lines are predominant in our visual world.

The tendency in psychophysics clearly seems to merge towards the multi-channel model of visual perception. The following quotes some of the most recent findings in this field of study which are of particular interest for texture analysis. • Atkinson and Braddick (1989) showed that, in a detection task, subjects where able to determine first the coarse location of a stimulus, then its nature (orientation in this case) and finally its precise location in this order. This experiment suggests that we process the visual stimuli at more than one level of precision and that localisation and orientation recognition are most probably two parallel processes. Scialfa and Joffe (1995) have obtained similar results by contrasting orientation identification of targets and surroundings.

32

• A somewhat similar study was performed by Vincent and Regan (1995) but with regard to the orientation, frequency and contrast of gratings. They conclude that these three types of information are encoded independently with negligible cross talk (in a pre-attentive experiment). • Uttal and his colleagues (Uttal et al., 1995 A and B) have recently studied the effect of three different types of image degradation, namely visual interference (noise), lowpass filtering and local area averaging (or blocking), on the performance of both discrimination and recognition tasks. In their experiments they found that both tasks were enhanced when both filtering and blocking were applied rather than applying just one of the degradation techniques. They also show that the order in which the degradations were applied, although producing significantly different images, did not have a significant effect on the performance. Finally they showed that the addition of noise had a devastating effect on both tasks when used in conjunction with another degradation technique. • Another interesting study by Jan Theeuwes (1995) showed that in a pre-attentive experiment, abrupt colour changes having the same luminance were not easily detected or did not "pop out" whereas luminance (or brightness) did, independently of being of the same colour. This means that our ability to identify texture pattern does not improve with the addition of colour so that when a coloured texture is seen, it is really the tonal variation that is perceived. This is important in the sense that it simplifies the problem of colour in texture analysis as not being as relevant as the tone changes.

A multitude of other results could be quoted but it would become difficult to avoid a certain redundancy. These, I believe capture the essence of the interest in Experimental Psychology and Psychophysics for the study of visual texture.

2.6 SOME THOUGHTS OF THE PSYCHOPHYSICAL APPROACH TO TEXTURE ANALYSIS.

33

The studies in the areas of perception in psychology are without doubt most valuable to the computer recognition of visual texture. It appears quite clear that the most prolific approach for the computer recognition of texture is the neuro-physiological approach and the studies of psychophysics. This is mainly because a computer is a reductionist device by nature and needs a complete description of all the tasks it will carry out. If the Gestaltists gave a very intuitive understanding of the task of vision, they say very little on how exactly this is done. From this point of view, it is probably the work of David Marr and his colleagues that brought the most insights on how a computer can be used to simulate human vision. However there is a limit to the extend to which texture recognition should be guided by these studies. The following points try to summarise these limits in the scope of the present work.

• Computer vs Human Vision. Since no complete model of human visual perception exists, it is normal that certain differences can be experienced between what is perceived by an observer and what the algorithm will produce. But should these differences be necessarily corrected? For example, it is known that human observers are affected by the amount of light even if the contrast is the same (a less illuminated texture will appear coarser). Another example can be found in the fact that there are many exceptions in the first- and second-order statistics rule-of-thumb of Julesz and his colleagues for humans to be able to distinguish differences between texture. Should the texture algorithm outperform the human in these cases?

In this project the answer to this question will be “no” in the sense that the computer should first be able to perform “as well as” a human interpreter; then, “improvements” could be assessed in future efforts. In Remote Sensing, there is always a certain amount of field work associated with the interpretation process. If a particular algorithm should “see” objects that the interpreter cannot, this would result in an increase in ground control which goes against the purpose of this study. However, if the algorithm can be made to be unaffected by differential illumination, this would be considered an advantage and should be retained. • On the type of image. Images can represent artificial or natural textures; objects forming these textures can be depicted in an L- or H-resolution image. Surfaces of 34

homogenous texture can have simple or very complex shapes like concave polygons with holes in them. The size of the objects can be totally enclosed in the image or its limits lie outside it. In almost all of the studies reported in this section, the textured images were produced using a totally controlled environment. Most of the time, when two different textures were displayed, simple shapes like rectangles or circles described them. This is far from the reality of low resolution images of the Earth where natural textures may have a varying contrast, show very subtle changes and have very complex or even meaningless shapes. Figure 2.10 shows an example of a land cover map produced by the interpretation of a satellite image. In this example, the shape and size of the different regions give very little information on the class they represent and it appears impossible to understand the map without a legend. But in this kind of Lresolution image, the texture created by the mosaic of structuring elements has been well altered by the regularising effect of the reduced ground resolution (Woodcock and Strahler, 1987; Atkinson et al., 1995A)

Figure 2.10. Example of a land cover map produced from a Spot satellite image.

To illustrate the complexity of the problem, consider figure 2.11 which shows the result of extracting the edges using the zero-crossing method on a close-range common photo and on a section of a scanned air photo with a resolution of approximately five meters. Although it is quite simple to identify the different homogenous regions on both images, the edge image of the aerial photograph doesn’t reflect our intuitive interpretation of the different regions (for example look at the neighbourhood enclosed by the forested area in the middle on the right of the image). Figure 2.11b can be considered an H-resolution

35

image but the shape and size of the objects being enhanced by zero-crossing edges (figure 2.11d) does not give much “help” in the interpretation of these objects. In the present project, this is the kind of problem being assessed and only H-resolution images will be considered as valid test data.

Figure 2.11 The result of applying the zero-crossing edge extraction method on an aerial photograph: a) original and c) after processing. The same for a common close-range photography: b) original and d) after processing. • On the Approach. Whereas the studies reported in this section were using textures that reflected one particular aspect of texture like contrast or orientation, the Remote Sensing community of users are confronted with a given set of naturally occurring textures. On one hand, these texture can be expected to be much more complex, but on the other, they represent a finite set of objects that can be found in such an environment.

Another important difference lies within the motive of the two kinds of approaches. In visual perception studies, textures are mostly artificially produced to be barely distinguishable to assess the human capability to recognise (often in a pre-attentive manner) the difference between two or more of these texture fields. This is hardly the case in remotely sensed images where perceptually similar textures should often be 36

given the same class. Furthermore, it is not a prime objective in Remote Sensing to simulate textures artificially, although it may be desirable in order to “close the loop” in the understanding of texture. • On the Models. One of the most interesting aspects of the neurophysiological and the computational approaches is that they provide a model to describe how our visual system works. In the Remote Sensing and Image Processing community, such models are not only valuable, they are necessary to guide the development of new technologies for machine interpretation as well as new sensors. Strahler, Woodcock and Smith (1986) have proposed a taxonomy of models that can be used to identify similarities between the visual models used by the psychophysical, computational and remote sensing sciences. For instance, the scenes used in most psychophysical or computational visual experiments can almost always be described by a discrete scene model composed of objects and background. In Remote Sensing however, continuous scene models are also used for describing sets of measurements such as gravity, magnetism or even surface temperature. The present work will not however concern itself with these types of images.

The discrete scene model has a direct connection with both the Gestaltist approach and David Marr’s primal sketch. Furthermore, the model can be said to be not only discrete but also nested since many objects in a scene are composed of a mosaic of smaller objects. This is the case of a forest, a residential area or a crop field which are all conceptual objects made of an arrangement of smaller repeated objects. Again, there is a similarity to David Marr’s primal sketch where objects and details of varying size are extracted successively by applying different levels of “blurring”. This is also consistent with the neurophysiological approach claiming that the recognition task is carried out by an array of “tuned channels”. The work presented in this thesis has favoured the model taxonomy proposed by Stralher et al. (1986) mainly because it is compatible with both the neurophysiological and computational approaches and because it was build for research purposes in Remote Sensing which remains at the heart of the objectives claimed here.

37

First-order statistics are based on the simple probability distribution of some function such as energy, variance, skewness and kurtosis and is expressed by P (a ) ; second-order statistics are measurements based on the conditional joint probability of having two values separated by some distance and angle (in the case of images) and can be expressed by P (a ,b | x ,y ;d θ, ) : the probability of having grey level a at x,y and grey level b at distance and angle d,θ from x,y; third- and higher-order statistics are measurements based on the conditional joint probability of having three or more values separated by some specified distances and angles.

38

Chapter 3

TEXTURE AND IMAGE PROCESSING

3.1 THE IMAGE PROCESSING APPROACH TO TEXTURE. Atkinson (1995) has defined quantitatively the information in images as spatial variation in a single variable. This is a formalisation of what is intuitively perceived by looking at any monochromatic image data: the value of a pixel only carries a meaning in relation to its neighbours. Texture could be considered as the first level of spatial arrangement features. Looking at the pyramid model of interpretation cues (figure 3.1), it can be argued that size, shape, pattern and most of the even higher order features are all recursively defined by tone, colour and texture. The size and shape of an object can only be defined by the edges of its inner texture except when it has a completely smooth appearance.

Figure 3.1. The Interpretation cues pyramid model (From Estes et al., 1983, p. 994).

39

It would seem that a formal and universal definition of texture has not, to this day been made. Hawkins (1970) as one of the most widely quoted authors, describes texture as depending upon three ingredients: 1) a local "order" repeated over a much larger region which consists in 2) a non-random arrangement of elementary parts being 3) roughly uniform entities of approximately the same size everywhere within the textured region. The terms "approximately" and "roughly" alone indicate the confusing nature of what texture is. Rosenfeld (1982) uses the term "busyness" to refer to objects that show a coarser textural pattern. Pratt (1978) distinguishes between artificial and natural texture patterns. The former being defined as arrangements of symbols against a neutral background whereas natural textures are natural scenes containing semi-repetitive arrangements of pixels. It appears that in all cases of natural textures, the repetitive texture pattern is never exactly identical but rather statistically similar. Estes and his collaborators (1983) in the Manual of Remote Sensing describe texture as a tonal repetition in groups of objects that are often too small to be discerned as individual objects. In that definition, the authors appear to restrict texture as a near L-resolution characteristic whereas a texture defined by discernible objects would be a pattern. For the needs of this study, texture will be defined as either L-resolution (the generic objects are not discernible because their size is too close to the pixel size) or H-resolution (objects enclose a sufficient number of pixels to be identified separately).

At this stage, it is important to look at the type of images that are being analysed. Whereas in a close-range photographs the objects that are depicted usually have a welldefined shape, remotely sensed images of the Earth do not, in most cases, exhibit such clear-cut objects. On the contrary, much of the information being extracted from remotely-sensed images have very peculiar and ill-defined shapes to describe objects that are often only conceptual (e.g. a forest or a dune field). It is generally assumed that there is a “high degree of correlation between the surface attributes to be studied, the optical properties of the ground cover, and the attributes of the acquired image” (Duggin and Robincove, 1990: p. 1675). This observation recursively implies that a number of controlling factors like atmospheric attenuation, sensor calibration, averaging effects (Instantaneous Field of View or IFOV), etc. have, if not negligible, at least controllable effects. Here, most of these issues are avoided by assuming that visually recognisable

40

features should provide sufficient differences for a computer-based technique to perform a task akin to visual interpretation.

Furthermore, in these images, there is not a single representation of the objects that satisfies all applications; categories of objects can belong to a variety of representations: physical, biological, cultural, etc.. (Bonn and Rochon, 1992). Some of these representations can be organised hierarchically: a tree can be said to belong to a stand which in turn belongs to a forest that's part of a valley and so on. Other object categories overlap in nature: a natural disaster will affect different types of forest, some singular land use can spread over a variety of land covers.

So the task of interpreting a satellite image or an air-photograph is a process that is usually guided by a goal. If it can be said that there is a finite set of possible objects in an image (Wilson and Spann, 1988) it appears that, in the case of remotely sensed images of the Earth, this number is not small. One of the most difficult tasks in Digital Image Processing is to transform the original array of grey level values (the pixels) into a comprehensive non-pictorial description of the image; a process called Image Understanding (Pratt, 1978) or Image Analysis (Rosenfeld and Kak, 1982; Gonzalez and Woods, 1992; Moik, 1980). That description or representation typically results in a qualitative description of objects or classes of objects having a given set of properties like grey level, texture or geometric property based on connectedness, size and shape (Rosenfeld and Kak, 1982). In Remote Sensing, that task can be said to translate a pictorial description into a geographical representation. Some important definitions of image processing processes are given below.

•

The process by which an original image is transformed into a series of primitives or features which will serve as a description of the objects is called Pattern Recognition (Pratt, 1978; Bonn and Rochon, 1992; Rosenfeld, 1982).

•

The task by which those patterns are used to build a spatial definition of the objects or regions is called Image Segmentation (Moik, 1980).

41

•

When the segmentation is based on recognising patterns in multiple representations of an image, the process is given the name of Classification (Moik, 1980; Bonn and Rochon, 1992).

3.1.1 The segmentation approach

In most image analysis applications texture is not a goal in itself but it is rather regarded as an attribute of the objects or regions being sought or investigated. Many image segmentation techniques adopt such a point of view and see texture as an undesirable property increasing variance of the tonality of the regions they wish to segment. In an image, the pixels are an arbitrary division into spatial elements that do not correspond to the objects present in the scene. The goal of image segmentation is to create a new division of the scene based of the different regions to which a label can be further associated; as such, image segmentation is an intermediate step in image analysis (Jain, 1989). In his book, Jain (1989) recognises the following techniques of image segmentation: 1. Component labelling 2. Template matching 3. Amplitude thresholding 4. Boundary-based approaches 5. Region-based approaches and clustering 6. Texture segmentation

Component Labelling was primarily developed for simple binary images and is based on direct connectivity between pixels belonging to the same object. Template Matching is well adapted for images containing a limited and known set of objects (as in industrial vision applications) or symbols (character recognition); it is also more effective on binary images but can be adapted to grey tone images. Amplitude Thresholding is based on initially slicing the histogram of an image by using peaks and valleys (if any) to encounter a certain criterion such as maximum number of sub-regions. These first three methods, although well adapted to the situations they were created for, do not perform very well for complex textures. Boundary-based methods seek to extract the objects or regions through their contours. Thus, these methods involve contour extraction through 42

focal operators, contour thinning, following and linking. Region-based methods usually approach the problem through a region-growing or split-and-merge technique; some seed is fed to the algorithm which then tries to make the region grow based on a set of criteria. Texture Segmentation consists (according to Jain) in Clustering and Regionbased methods adapted to textured images.

More recent papers on texture segmentation reveal three definite tendencies: Boundarybased methods, Region-based methods and Hybrid methods (Pavlidis and Liow, 1990).

Boundary-based methods for texture classification were first thought to be ineffective because region edges were poorly defined and polluted with noise (Jain, 1989). More recent publications show that a variety of Boundary-based methods work well even with textured images: •

Focal operators such as the Marr and Hildreth, Sobel, Prewitt, Shen or directional gradient operators which are usually based on some derivative may work reasonably well on lightly textured images as long as good contour thinning and linking algorithms are used for connecting the extracted edges (Cocquerez and Philipp, 1995). These methods remain ineffective with highly textured images (Pavlidis and Liow, 1990; Jones, 1994; Cocquerez and Philipp, 1995).

•

Mathematical morphology involves the form of objects and can be used for the extraction of textured region boundaries of grey tone images through the morphological gradient (Gonzalez and Woods, 1992; Serra, 1982). The morphological gradient is based on the logical comparison (in graph theory, the comparison through union, intersection, inclusion, complement and difference) of scene objects with a structuring element which is usually a small and simple object (square, circle, cross, ...) represented through a two-dimensional array. Such a concept can easily be associated with the idea of receptive fields in neurophysiology. Two basic operations in mathematical morphology are defined by the Minkowski subtraction and addition which are respectively known as erosion and dilatation (Cocquerez and Philipp, 1995). The difference between the eroded and the dilated scene is called the morphological gradient. Edges can thereafter be extracted with a 43

simple threshold followed by conventional techniques of edge thinning and linking. This method, given the proper structuring element, can yield good results even for textured scenes. Figure 3.2 illustrates these two methods with a small sub-image of the HRV sensor of the SPOT satellite. Another mathematical morphology method for extracting boundaries is called the watershed method and consists in applying successively erosion and dilatation (or the inverse) iteratively until stabilisation is obtained. The watershed contours can then usually be obtained through thresholding (Gonzalez and Woods, 1992; Cocquerez and Philipp, 1995).

(a)

(b)

(c)

(d)

(e)

(f)

Figure 3.2. Illustration of the mathematical gradient methods for edge extraction of a textured SPOT sub-image: a) original image, b) eroded image, c) dilated image, d) difference image between (a) and (b), e) contours extracted from (d), f) contours after thinning.

44

•

Markov Random Field (MRF)3 were first proposed for boundary detection and image restoration by Geman and Geman (1984). Their basic idea is that a textured image can be modelled by MRF and restored texture free. In particular they proposed a line process capable of detecting the explicit presence or absence of discontinuities (boundaries) in the degraded image and restore the image by taking these discontinuities into account. Although good results are claimed, the textured aspect is seen as a “degradation” of the original image and, as such is considered to be noise. In a more recent paper (Geman et al., 1990) the authors have proposed further improvements to their technique and incorporated a class of features that are multivariate in nature. Textures segmented with the improved technique were much more complex but the computational cost was relatively high due to its iterative nature.

Texture is a regional attribute so Region-based methods are generally more suited for texture segmentation (Gonzalez and Woods, 1992; Cocquerez and Philipp, 1995). This is specially true if texture is an issue in itself and not only a “degradation” inflicted on the image. A number of Region-based methods have been proposed, some of the better known are briefly described below: •

Region Growing methods are bottom-up strategies that, starting from initial seeds (that can be regions of individual pixels) seek to make these seeds grow until all the regions satisfy all the predicates and there are no more pixels that can be incorporated without breaking the predicates. The initial seeds can be defined by thresholding the original image or some derivative based on some homogeneity criterion. The region growing is then controlled through one or more predicates based on the geometry of the region (e.g. shape, convexity, size,...) or its radiometry

3

The Markov Random Field (MRF) is a stochastic process by which an image can be

described as a set S of N x N locations s ( S = {s = (i ,j )| 1 ≤ i ,j ≤ N } ) to which a random variable As is associated. All possible realisations as of As are contained in the set Ω (for example Ω={0,1,...,255}). For the image to be considered a MRF, any particular realisation of As has to be entirely a function of its neighbourhood and its probability function be described by a Gibbs distribution. 45

(e.g. average grey tone, tonal variance,...) or both. The regions are created one after the other through two phases: initialisation and iteration (Chassery and Garbay, 1984). Eventually, the predicates can be relaxed to enable the regions to grow further until all the pixels have been assign to one of the regions. •

Split-and-Merge methods usually combine top-down (split) and bottom-up (merge) strategies to perform an optimised segmentation of the image. To be efficient, the algorithm has to use some structured representation of the image (Cocquerez and Philipp, 1995) such as a region adjacency graph, a tessalation, a Voronoi diagram (Ahuja et al., 1985; Tuceryan and Jain, 1991) or a quadtree (Ohlander et al., 1978) or another pyramidal representation. Such a representation can be internal (constructed internally by the algorithm) or explicitly described through the image file structure. In the region splitting phase, the process can start with the whole image or some predetermined region selected arbitrarily or according to some homogeneity criterion (Ohlander et al., 1978). The process involves analysing the regions according to one or more criteria (predicates) and, if one region does not satisfy them all, it is split into two or more subregions. The splitting phase stops when all the regions satisfy all the predicates or the lowest level (individual pixels) is attained. The bottom up process involves merging all the regions geographically connected without breaking the predicates, starting from the lowest level until the highest level has been attained (the whole image). The results can generally be improved by using a sequence of hierarchical criteria in the form: S={(P1, F1), ..., (Pn, Fn)} where Pi is a predicate and Fi is some transformed version of the original image or feature like the local variance or the local amplitude (Monga, 87).

•

Local Histogram methods are based on the assumption that similar textures (texture is here considered the main attribute of the regions) will display similar histograms (Lowitz, 1983). It is basically a region growing method where the growing predicate is based on the correlation between Gaussian smoothed local histograms. Seeds are formed by points where the histogram norm is at its maximum. The regions are grown from two principles: relaxation where all the neighbouring pixels having highly correlated local histogram are incorporated in the region and propagation where the boundaries are tested for their pertinence. This method has the advantage

46

of being able to predict the behaviour of the local histogram for pixels situated between regions.

Hybrid methods have also been proposed where region-based and boundary-based methods are combined in some manner to produce an optimised segmentation (Reed and du Buf, 1993). Computationally more costly, they have the benefit of combining two entirely different approaches and make them merge towards a unique solution. Nazif and Levine (1984) have developed an expert system that uses both boundary information and a split-and-merge region-based method to segment the scene. Wilson and Spann (1988) proposed a method based on a quadtree to determine the regions and then refines the boundaries. Finally, Pavlidis and Liow (1990) used a split-and-merge algorithm to produce an over-segmentation of the scene and then make two types of hypothesis about weak boundaries: 1) the boundary is not an edge, 2) the boundary corresponds to an edge but does not coincide with it. In the first case the boundary is eliminated and the connecting regions are merged and in the second, the boundary is modified to coincide with the edge of the region.

Of course many other methods have been developed over the years but most of them are based on one or more of these basic ideas. It is not a simple task to compare all these algorithms and propose perhaps “the best one” for texture segmentation in natural scenes of the Earth. What can be said is that most of these techniques were developed with one goal: to divide the image into “homogeneous” regions corresponding to “objects”. The fact that these objects and regions are textured is usually considered as an additional problem and rarely as part of the solution. Cocquerez and Philipp (1995) carried out an extensive review of these techniques and have compared the results they yield in various types of images (artificial texture, medical images and natural scenes); some of their main findings are summarised below. − Boundary-based methods work well for well defined edges but tend to produce an under-segmentation when transitions between regions are gradual. − Region-based methods tend to produce artefact regions when gradual tonal transitions are observed within regions.

47

− Images with high textural content can be segmented with both approaches if the average tone of the regions (specially connected ones) show significant differences.

In the case of aerial or space-borne images texture is often one of the main characteristics (or cues) of conceptual geographic objects such as forest, pasture, plantation or even residential area. In pure image processing terms: “When a small-area patch (region) of an image has wide variation of tonal primitive, the dominant property of that area is texture.”(Haralick, 1979, p.304).

Furthermore, the segmentation techniques described above have the disadvantage of being able to use few texture features and even few features altogether to remain timeeffective. Texture being the main object of the present research, traditional segmentation methods do not offer a comprehensive way of understanding its true nature.

3.1.2 The feature classification approach

The classification principle cannot fail to remember the famous Gestaltist principle: the whole is greater than its parts (Max Wertheimer). In remote sensing, the classification process has been done typically using only the spectral space as is reflected by the different commercial image processing packages available. However, texture space has also been used for quite some time but techniques such as the simple identification of zero-crossing edges as reported by Hildreth and Marr (1980) are usually insufficient to produce a satisfactory segmentation of a remote sensing image. The fact that texture can only be defined over a region of more than one pixel makes it all the more complicated to extract an exact location for the texture edge. Or, as Pratt puts it: “Because texture is a spatial property, measurements should be restricted to regions of relative uniformity. Hence it is necessary to establish the boundary of a uniform textural region by some form of image segmentation before attempting texture measurements.” (Pratt, 1991, p. 567)

However, it appears that some texture measurement (here taken as feature) will be recursively needed for one textural region to be distinguished from another. This fact

48

leads one to thinking that texture recognition might only be perform by some iterative process.

The literature on the textural aspect of images is quite extensive and varied. However, having a close look at the books and papers that assess the subject reveal that there is in reality only a few different approaches with many variations.

In 1970, Lipkin and Rosenfeld recognise two main approaches to texture: the statistical approach and the structural approach, the former being regarded as a complex image pattern defined by a set of statistical measurements whereas in the later, texture is defined by a sub-pattern which repeats itself over a certain region. The spectral or frequency analysis approach to texture was added by numerous authors (e.g. Pratt, 1978; Moik, 1980; Gonzalez and Woods, 1992). As methods became more complicated over the years, it became difficult to classify all the approaches in only three categories. In the second edition of his Digital Image Processing book, Pratt identifies seven different methods (as opposed to approaches) for the extraction of texture features:

1. Fourier spectra methods 2. Edge detection methods 3. Autocorrelation methods 4. Decorrelation methods 5. Dependency matrix methods 6. Microstructure methods 7. Singular value methods

However, the first attempts at considering texture as a feature for image segmentation were simply using statistics from a first-order histogram. The next sub-section will deal with these aspects. Apart from these approaches, a few others have been added, mainly resulting from recent work.

8. Fractal methods 9. Variogram methods

49

The following sections will review each of these methods and their ability to produce texture features.

3.2 FIRST-ORDER HISTOGRAM METHODS Although texture analysis of electronic images started as far back as 1955 (Haralick et al., 1973), it was only in the seventies, with wide spread access to digital satellite images, that texture features caught the attention of the remote sensing community. However, given the resolution of these images (e.g. 80 m for LANDSAT Multi Spectral Scanner), numerous objects were enclosed in any one pixel and texture appeared as an undesired variance in the spectral signature of land cover classes (forest, crops, urban, water, …). In other words, texture was simply considered as noise. Since image noise, being spatially uncorrelated, has a high spatial frequency, a simple low-pass filter applied by convolution would reduce significantly the noise and unexplained variance.

The general form of such a filter is:

Q(m1 , m 2 ) = ∑ n1

∑ F(n , n )H(m 1

2

1

− n1 + 1, m 2 − n 2 + 1)

(3.2-1)

n2

where Q is the output image of size m1 x m2, F is the n1 x n2 input image and H is the impulse response of the filter that could typically take one of the forms:

1 1 1 1 H = 1 1 1 9 1 1 1

0 1 0  1 H = 1 2 1 6 0 1 0

1 2 1  1  H = 2 4 2 16 1 2 1 

These filters were effective in reducing classification errors but also had an unwanted smoothing effect on the edges which also have high frequencies.

The “real” texture features that can be extracted from the first-order histogram are based on the shape description of the histogram using second-order statistics like the variance:

50

L −1

(

)

2

σ = ∑ b − b P (b ) 2 b

b =0

(3.2-2)

and the energy: L −1

b N = ∑ [P(b )]

2

(3.2-3)

b =0

where P(b) is the probability distribution of the image’s L quantised amplitude levels and b is the mean. When used as a texture feature these statistics are taken over a window believed to contain at least one of the semi-repetitive patterns (Anys and He, 1995).

Other statistics that have also been considered include:

Skewness:

1 bS = 3 σb

∑ (b − b ) P(b) L −1

3

(3.2-4)

b=0

and Kurtosis:

1 bK = 4 σb

L −1

∑ (b − b ) P(b ) − 3 4

(3.2-5)

b =0

Where P(b) is the probability distribution, b is the mean, L is the number of quantisation levels and σ is the standard deviation.

Anys and He (1995) have obtained good results using statistics of the first-order histogram for separating different crop types in airborne radar images by measuring the Bhattacharyya distance (B-distance) between sets of statistics. However, the parameter that gave the greatest B-distance was the mean which cannot be considered a real texture feature. In addition, their study only considered crop fields, hence a limited set of textures, which all had a small range of mostly very high frequency range. As Faugeras and Pratt (1980) pointed out, the first-order histogram is dependant upon luminance point response and further that very different texture fields can yield an identical 51

histogram. They argue that first-order histogram statistics should only be used over previously decorrelated (or whitened) texture fields (see section 3.5).

In conclusion, statistics of the first-order histogram can give an estimate of the contrast or "busyness" (Rosenfeld,1982) of texture but is incapable of reflecting its internal structure.

3.3 FOURIER SPECTRA METHODS Fourier spectra methods are based upon the two-dimensional Fourier transform of the image array which is defined in its discrete form as:

F (u , v ) =

1 N

N −1

∑ j =0

 − 2πi  F ( j , k ) exp  ∑  k =0  N (uj + vk ) 

(3.3-1)

 − 2πi  F (u ,v ) exp   N (uj + vk ) 

(3.3-2)

N −1

and its inverse transform:

F ( j ,k ) =

1 N

N −1

N −1

u =0

v =0

∑∑

where i = − 1 , j and k are the coordinates in the spatial domain, u and v are the coordinates in the frequency domain and N is the total number of pixels in the image.

The Fourier transform enables a representation of the image as a set of complex sinusoidal wave functions defined by their frequency (the real component of the complex number) and their phase (the imaginary component). Every point on the transform represents the whole spatial domain according to a particular frequency (for the one-dimension transform) and orientation (for the two-dimensional version).

Although Fourier transform is but another representation of the image data, it is not in itself a feature space. Many initial efforts to used the Fourier transform were discouraged by the fact that the feature extraction problem was simply shifted from one domain to another with much additional computation. One of the most successful 52

attempts to use the Fourier transform was the use of its power spectrum (Rosenfeld and Kak, 1982; Gonzalez and Woods, 1992) which consists in representing the sum of the square of the real and imaginary parts of the complex numbers (Re2 + Im2). However, it has been reported that there was considerable spectral overlap between typical classes of land cover in digital images of the Earth (Pratt, 1991; Hawkins, 1970).

One technique to use the power spectrum represents it in polar coordinates S(r,θ), where r represents the range and θ the directional component and then by summing (integrating for continuous functions) the ranges for all directions and the orientations for all ranges: π

S(r ) = ∑Sθ (r)

(3.3-3)

R

(3.3-4)

θ =0

S(θ ) = ∑Sr (θ ) r =1

where R represents the set of all radius of circles centred at the origin. The respective histogram of these components were taken as texture features. Figure 3.3 shows examples of the power spectra (S(r) and S(θ)) for different textures. On these examples, it can be observed that the rate at which S(r) falls towards low values is related to the coarseness of the texture and that the peaks in S(θ) represent the main directional trends in the texture pattern. These spectra can further be treated as first-order histograms in which statistical descriptor features can be extracted as in the first-order histogram. In general terms, the principal drawbacks of this technique is the amount of computation involved and that to have significant results, N has to be quite large in the image subsamples. 35

50

30

40

25

30

20 15

20

10

10

5 0

0 0

(a)

(b)

20

40 60 Range

(c)

53

80

100

0

30

60

90

120

Direction (degree)

(d)

150 180

40

25

35 20

30 25

15

20 10

15 10

5

5 0

0

0

20

40

60

80

100

Range

(e)

(f)

(g)

0

30

60

90

1 2 0 1 5 0 1 80

D irec tio n (d e gre e )

(h)

Figure 3.3. Sub-images of a forest (a) and of ocean waves (e) with their respective Fourier power spectrum (b) and (f), their plot of S(r) (c) and (g) and their plot of S(θ) (d) and (h) .

Stromberg and Farr (1986) have developed a method in which texture features are produced by applying a number of frequency-domain ring filters on the Fourier transform of an image. The technique has five distinctive steps: 1. Compute the Fourier transform over the whole image to be segmented through texture features. 2. Apply a number of ring-shaped filters to the frequency domain. This implies that only isotropic textures are considered since all orientations are considered together. 3. Compute the inverse transform of the band-pass filters. 4. Isolating the modulation function as a texture feature by taking the absolute value of the derivative of the sinusoids and then low-pass filtering the result (see Stromberg and Farr, 1986 or Habibi, 1981 for a detailed description). 5. Classify the image by means of some multi-feature classifier.

As an illustration of the technique, a textured image made of 10 different textures extracted from scanned air photos or satellite images has been produced and its Fourier transform computed (figure 3.4). Steps two to four of the method are shown in figure 3.5 where, in addition to the band-pass filters a low-pass and a high-pass filter have also been applied. Although experiments in classifying the texture features have not been very successful, it can be seen that some of the textures in the images are well separated. Part of the problem lies in finding a good “recipe” for selecting the band-pass filters. The other problem in the case shown here is that the textures cannot be considered

54

isotropic and orientation definitively plays an important role in the visual discrimination process.

(a)

(b)

Figure 3.4. (a) A texture field image containing 10 different textures sampled from real scanned digital air photos, a panchromatic Spot image (top left circle) and a LANDSAT Thematic Mapper image (background). (b) The Fourier power spectrum of (a).

55

Filtered spectrum a)

Low pass 0 - 156

b)

Band pass 1 156 - 176

c)

Band pass 2 176 - 196

d)

Band pass 3 196 - 216

e)

Band pass 4 216 - 236

f)

Band pass 5 236 - 246

g)

High pass 246 - 256

power Inverse FFT

Filtered absolute inverse FFT

Figure 3.5. Graphic representation of the feature space of the Strongberg and Farr texture segmentation method using the image depicted in the preceding figure (see text for explanation).

Wilson and Spann (1988) have proposed another related method in which the orientation of the spatial frequencies are also being considered. In their method, the frequency-domain spectrum is divided according to two types of tesselation operators: center-surround and quadrant. Both types are combined to produce the band/orientationpass tesselation shown in figure 3.6. Before computing the inverse transform of every sub-region of the frequency domain, a spheroidal filter (Finite Prolate Spheroidal Sequence or f.p.s.s.) is first applied to the frequency sub-region. Each sub-region is then 56

transformed back in the spatial domain to produce a separated texture feature. Since the frequency domain is symmetrical, only half of these sub-regions are actually used (with the exception of the central low-pass filter).

Figure 3.6. The band/orientation-pass tesselation of the frequency domain proposed by Wilson and Spann (1988, p. 115).

In both the Stromberg-Farr and Wilson-Spann methods, good results are being claimed but the texture images being used appear to show little overlap in terms of spatial frequencies and/or orientations.

More recently, the Fourier transform has had a regain in popularity among Texture Analysis researchers through the use of Gabor wavelet functions:

“A two-D Gabor function is a harmonic oscillator, which is a sinusoidal plane wave of some frequency and orientation within a Gaussian envelope.” (Fogel and Sagi, 1989, p. 104)

The impulse response of an even-symmetric Gabor filter is given by:

 h( x, y ) = exp− 

1 2

 x2 x 2   +  cos(2πu 0 x )  σ 2 σ 2  y   x

(3.3-5)

where u0 is the frequency and σ2x and σ2y are the x and y sizes of the Gaussian envelope. The methods developed using this approach usually work in the same manner as the Wilson-Spann algorithm except that a series of Gabor filters of a certain envelope centred on some frequency and orientation is applied instead of the f.p.s.s. tesselation proposed by Wilson and Spann. The filter’s parameters are usually selected so that their region of influence meet at the half-peak magnitude in the frequency domain. Typically, the filter set will include four to seven frequency bands and four orientations (0, 45, 90

57

and 135°) for a total of 16 to 28 Gabor wavelet features which are created through the inverse transform of each Gabor-filtered spectrum. Gabor filters have been found to perform well in pre-attentive texture discrimination tasks and in a way similar to humans (Fogel and Sagi, 1989) and have since been applied with a good success rate to the segmentation of both artificial and natural textures (see for example: Reed and Wechsler, 1990; Jain and Farrokhinia, 1991; Dunn et al., 1994; Manjunath and Ma, 1996).

Although the Fourier-based methods have the advantage of being able to separate the frequencies and orientation of the texture fields they have the disadvantage of requiring a lot of complex computation (one inverse Fourier transform for each texture feature and additional filtering of the feature).

3.4 EDGE DETECTION METHODS This method was first introduced by Rosenfeld and Troy (1970). The method consists in first representing the image in an edge feature space using some edge detector like the Laplacian, Sobel, or the edge gradient and then counting the number of edges detected for a certain threshold in a window centred on the pixel being analysed. The resulting sum is being considered as the texture feature:

T ( j, k ) = 1 2 W

w

∑

w

∑ E( j + m, k + n)

(3.4-1)

m=− w n=− w

where (2W)+1 represents the size of the window being analysed, E represents the edge value, j, k are the pixel coordinates and m, n are counters within the window. Later, Rosenfeld and Kak (1982) have proposed building the grey level absolute difference histogram instead of working with highly sensitive thresholds. Unlike thresholds, the histogram had the advantage of not being affected by shifting the grey scale. Statistical histogram features could then be extracted as measures of texture (e.g. the mean, the second moment, the entropy, etc.).

58

As was the case for the first-order histogram method, the edge detection method does not reflect any of the texture’s inner structure but had the advantage of being simple and inexpensive in terms of calculations. However, good results can not be expected for complex textures like the one encountered with high resolution air photographs or even Spots HRV sensor (10 and 20 metres). This method finds its natural expansion in the Decorrelation method reviewed in the next section.

3.5 AUTOCORRELATION AND DECORRELATION METHODS The autocorrelation function was proposed as a candidate for texture measurement by H. Kaizer in 1955 (in Haralick et al., 1973) and later by Rosenfeld and Troy (1970). The basis for its use lies in the fact that texture can be modelled as a correlated random field or an array of independent random variables to which a filter has been applied. This suggests that texture can be modelled using appropriate descriptions of both components: the probability density of the original variables and the autocorrelation function (Rosenfeld and Kak, 1982). Although, Pratt et al. (1978), demonstrated that visually distinct texture fields can have the same autocorrelation function, these were exceptions mainly used to show that our visual perception of texture is not entirely based on such functions.

For a discrete image array of image values F(j,k), the ensemble autocorrelation (AF) function can be estimated by the spatial autocorrelation function:

A F (m, n ) =

j+w

k +w

∑ ∑ F(u, v ) F(u − m, v − n)

(3.5-1)

u = j −w v=k −w

For a (2W +1) X (2W +1) window. Note that 3.5-1 assumes a mean of zero and a variance of one; otherwise the AF function can be defined as (Box et al., 1978, p. 63):

AF

∑ ∑ (F − F )(F = ∑ ∑ (F − F )

u − m ,v − n

u ,v

u ,v

where: -w≥u,v≥w

59

− F)

(3.5-2)

A set of measurements can be computed over the first-order histogram of AF such as its mean, variance, energy, kurtosis and skewness as were defined in 3.2-1 to 3.2-5. Because the AF function measures the relationship for all n, m image values, the secondorder histogram can also be computed. Faugeras and Pratt (1980) have also proposed a set of function-spread measures over the second-order histogram whose general form is defined by:

T

S(u , v ) = ∑

T

u v ∑ (m − η m ) (n − η n ) A F (m, n)

(3.5-3)

m = 0 n = −T

where: T

ηm = ∑

T

∑ m A F (m, n)

(3.5-4)

m = 0 n = −T

and T

T

η n = ∑ ∑ n A F (m, n )

(3.5-5)

m = 0 n = −T

The function spread measure is only taken over one half of the histogram (hence m=0:T) because of its symmetry. The authors claim that the most interesting features are: S(2,0), S(0,2) which define the profiles, the cross-relation S(1,1) and the second-degree spread S(2,2). Figure 3.7 shows examples of the second-order histogram autocorrelation function computed for four different natural textures.

However, if a texture field can be modelled by the convolution of a probability density of a random variable and some spatial operator, then visually different texture fields can yield identical autocorrelation functions (Pratt et al., 1978; Faugeras and Pratt, 1980). To illustrate this, imagine the joint histogram of a highly correlated texture region; its values will tend to concentrate toward the diagonal whereas the joint histogram of an uncorrelated texture will tend toward uniformity. The process of whitening is a decorrelation technique based on the autocorrelation function to ensure the texture features are measured on an uncorrelated identically distributed field.

Faugeras and Pratt (1980) have suggested that the image should be first whitened before computing the autocorrelation to produce an uncorrelated identically distributed field. 60

ˆ ( j ,k ) = F ( j ,k ) ⊗ H ( j ,k ) W

(3.5-6)

ˆ is the whitened field which serves as an estimate of the independent where W identically distributed generating process W, F is the stochastic texture array and Hw(j,k) is the whitening operator. ˆ ( j ,k ) requires that the power spectrum be computed using the The computation of W

Fourier transform which in turn requires a fair amount of processing. To simplify the ˆ ( j ,k ) , Pratt et al. suggest the use of a gradient operator that can take the estimation of W

form of a Laplacian convolution operator that can take various forms (Pratt, 1978) like the following masks: −1    M 1 = − 1 4 − 1   −1

− 1 − 1 − 1 M 1 = − 1 8 − 1 − 1 − 1 − 1

 1 −2 1  M 1 = − 2 4 − 2  1 − 2 1 

(3.5-7)

The authors have attempted to discriminate a set of both artificial and natural texture fields by extracting a set of features (mean, standard deviation, kurtosis, skewness, S(1,1), S(2,2), S(2,0) and S(0,2)) from both original and whitened texture fields. The S(2,0) and S(0,2) are important here as they are indicators that can account for the anisotropy (directional bias) in the images as they express respectively the vertical and horizontal profile spreads. Although it was not reported in their paper, it can be felt that the ratio between the profile spreads may also be a good candidate for a texture indicator.

The computation of the autocorrelation function over the previously whitened texture fields has proven to increase the Bhattacharyya distances between almost all of the features extracted as compared with the original texture fields. In other words, the ability to separate different texture fields seem to improve by first uncorrelating or whitening the textured images. In another experiment, the same authors have modified the original texture fields so that they all had identical means and variances;

61

interestingly, the B-distances between the previously whitened texture features were not significantly affected by the histogram modification whereas the separability between the non-whitened images was significantly impaired.

a)

b)

c)

d)

Figure 3.7. Examples of natural texture images of 64 x 64 pixels (left) with perspective views of their respective autocorrelation functions: a) a pattern of streets and houses, b) scrub stand, c) tree stand and d) sea waves.

The experiment conducted by Faugeras and Pratt was done over a very limited set of texture fields whose principal characteristic appears to be concentrated in high frequency with little or no apparent anisotropy. In order to do a simple test, a pseudowhitening operator (Laplacian mask 1 from 3.5-7) has been applied to two of the texture 62

fields of figure 3.7 and their autocorrelation computed, the result of which is shown in figure 3.8. The first image represents a forest stand having little anisotropy (mostly created by the low sun angle) and principally high frequency variations which is comparable to the test images used by Faugeras and Pratt. However, the second image (sea waves) shows evidence of high anisotropy and medium range frequencies (9-11 pixels as expressed in wave length). By observing the plot of the autocorrelation of the latter image, the Laplacian operator appears to have had the effect of increasing the high frequencies with respect to the apparent medium frequencies. The anisotropy also seems much less apparent. It is also a known fact that such an operator tends to reduce the signal / noise ratio (Pratt, 1978). The use of such an operator also assumes a stationary process since the whitening is done over a very small window (3X3). It appears that the solution would lie in using a better approximation of the whitening process but at the cost of a large increase in computing.

a)

b)

Figure 3.8. Examples of natural texture images after filtering (left) with perspective views of their respective autocorrelation functions: a) tree stand and b) sea waves.

63

3.6 DEPENDENCY MATRIX METHODS The Grey Tone Dependency Matrix method also called co-occurrence matrix was mainly developed by Haralick and his collaborators (Haralick et al., 1973; Haralick, 1979; Pratt, 1991; Rosenfeld and Kak, 1982). The co-occurrence matrix is a secondorder histogram in which each entry reports the join probability of finding a set of two grey levels at a certain distance and direction from each other over some pre-defined window. Suppose an image F is composed of G = {1,2,…NG} grey levels and consider the pixel pair F(j,k) and F(m,n) which are separated by a distance d in a direction θ, then a matrix can be constructed to represent all instances of the co-occurrence of the grey level a (a ∈ G) and b (b ∈ G) being separated by d, θ. So the joint probability matrix for d, θ is defined by:

P (a , b | j , k ; d , θ ) = PR [F( j , k ) = a | F(m, n ) = b ]

(3.6-1)

Which reads as the probability of having grey level a at j,k and grey level b at d,θ from j,k in F. So if an image is composed of 256 grey levels then each matrix defined for d,θ has 256 X 256 elements. Of course, there are as many matrices as there are d,θ possible (for an W X W window, there are W-1 matrices for θ = 0° alone). However, it is often not desirable to analyse all possible matrices, nor is it to build these matrices with the combination of all the grey levels. It appears that one good compromise consists in reducing the grey levels to 16 levels of quantisation over a 30 to 50 pixels square window (Pratt, 1991). Another reduction of the amount of data can be made through building the co-occurrence matrices for ranges of direction angles. Figure 3.9 shows a set of four co-occurrence matrices of 32 levels of quantisation for the four images shown in figure 3.7. The lag distances are defined in row / column coordinates so that the 0,4 lag distances means that the matrix show all co-occurrences of grey levels separated by zero rows and four columns.

By doing a quick visual analysis of figure 3.9 there are a number of facts that can be deduced:

64

• The Houses matrices all show a strong dependency within the low (dark) values and even where the low values encounter higher (light) values (the upper and right edges). This probably indicates the strong relationship that exist between all the generic objects (light or dark) and their shadow. It can be seen that small peaks occupy all the area of the matrix which gives evidence of a strong contrast. The low spread of the matrix indicates a coarse texture which is the case for this image. The choice of these lags seem to suit this image well. • The Scrub dependency matrices are very different from the former example. On the one hand, only about half the area of matrices is “occupied” by values other than zero which indicates that the image contrast is rather low. On the other hand, the shape of the histogram appears almost uni-modal and Gauss-like which indicates little tone dependency for the lags shown. • The Tree image matrices look like the House image matrices: a strong shadow effect. However the contrast does not appear to be as strong. The -4,4 matrix shows almost no dependency between the middle and higher tonal values; perhaps due to the angle of incidence of the sun. • As for the Waves image, the co-occurrence matrices look very similar to those of the Scrub image ones. This indicates the same features described for Scrub. It also shows that this matrix would not be well suited for distinguishing the wave texture from the scrub texture. Finally, these lags do not give any indication of the wave frequency having a repeat distance of 10 - 12 pixels.

Even with all the reductions in the amount and size of the matrices, the fact remains that for all the pixels being analysed a set of matrices exists which cannot be considered as texture features without some preliminary processing to extract meaningful values. In their paper, Haralick and his collaborators (1973) have proposed a set of 14 measures related to the textural content of the image. Of these, the author conducted experiments with the Angular Second Moment which is similar to the energy of the first-order histogram, the contrast and the correlation as are defined below:

65

 Ng Ng Fc = ∑ n ∑ ∑ p(i, n =0  i =1 j =1 Ng −1

Contrast:

Correlation:

2

Fr =

 j ) for all |i-j|=n 

∑∑ (i, j ) p(i, j ) − µ i

x

µy

(3.6-2)

(3.6-3)

j

σ xσ y

Where Ng is the number of quantisation levels, p is the joint probability matrix, µx, µy, σx and σy are the means and standard deviations of Px and Py which are marginal probability matrices obtained by summing the rows (for x) or columns (for y) of P(i,j).

The angular second moment is a measure of homogeneity, the contrast, a measure of the amount of local variation (within the window) and the correlation gives an estimate of the grey tone linear dependency. There are probably other measures that can be of use, but these three seem to be the most useful (Haralick et al., 1973, Pratt, 1991, Davis et al., 1979). It also appears unreasonable to extract a large number of texture features, specially if a large number of co-occurrence matrices are needed to convey the textural information content of an image.

66

Figure 3.9. Examples of co-occurrence matrices (shown in perspective) for the four images shown in figure 3.7. The lag distances are shown below each matrix of 32x32. 67

In their work, Davis and collaborators (1979) have presented a generalised version of the co-occurrence matrix method. The process is initiated by first detecting the edge pixels and encoding them based on their direction (horizontal, vertical, right-diagonal or left-diagonal). Then the generalised co-occurrence matrices are built using a set of combination rules instead of the fixed row/column lags. The authors suggest three rules:

Rule one: P1(a, b | k) is true if “b” is within a distance “k” from “a”.

Rule two: P2(a, b) is true if “b” is the nearest neighbour of “a”. Rule three: P3(a, b | θ, ∆) is true if “b” lies within a directional ray defined by the direction θ and the angular range ∆ of “a”.

Their classification of texture is based on features such as the angular second moment and the generalised contrast like Haralick et al. (1979) and some new features which they propose: • ratio between features for different values of the same rule (e.g. different values of k for rule one) or between the values obtained for different rules. • peaks or clusters in the co-occurrence matrices • comparing values obtained for the same rule but for different parts of the window considered.

The last of these features is interesting as it is the first attempt to assess the possibility of having more than one texture pattern in the same window. This problem is of particular interest if the texture extraction is to be performed automatically on a real scene. This problem will be further assessed in a later section.

Davis et al. have performed a classification test of a variety of macro textures using a leave-one-out classifier (i.e. each texture sample is compared against all the other and is assigned the class of the texture to which it is closest). They have obtained an overall accuracy of a little over 80% for the 30 texture samples classified. A comparison was 68

made with the original co-occurrence matrix method for which the accuracy was in the range 50-57% (which naturally depends on their choice of lags for the rows and columns). Of all the features used, contrast was the most useful.

With this method, (and the original method), the main problem lies in the great range of values for the rules or for the lags. In the generalised co-occurrence matrix method, an additional difficulty lies in choosing the edge extraction operators. These difficulties make the approach highly sensitive to the decision process leading to a recipe-like method.

3.7 MICROSTRUCTURE METHOD The microstructure method is specially useful for texture patterns dominated by high frequencies. The method was developed by K.I. Laws in his PhD thesis and has been reported by Pratt (1991). The Laplacian, Sobel, and other gradient operators are known to increase the high frequencies (and noise with it) of texture fields. Laws has proposed a set of nine gradient operators he calls “microstructure impulse response arrays” which are masks of the type shown in 3.5-7. These operators are shown below: 1 M 1 = 2 1 − 1 M 4 =  0  1 − 1 M 7 =  2 − 1

2 1 4 2 2 1  − 2 − 1 0 0  2 1  − 2 − 1 4 2  − 2 − 1

1 M 2 = 2 1 1 M 5 =  0 − 1 − 1 M 8 =  2 − 1

0 0 0 0 0 0

− 1 − 2 − 1  − 1 0  1 

1  0 − 2 0 1  0

−1 M 3 = − 2  − 1 − 1 M 6 =  0  1 1 M 9 = − 2  1

2 4 2 2 0 −2

− 1 − 2 − 1  − 1 0  1 

−2

1  4 − 2 − 2 1  (3.7-1)

After the masks have been applied to the texture field, the standard deviation is calculated over a moving window of a size believed to contain a “few” cycles of the texture microstructure (although not stated, a minimum of at least two cycles is probably sufficient). The computed standard deviation over the resulting images convoluted with the nine operators serve as texture features. 69

A test has been done using the four texture fields of figure 3.7. But to ensure that the apparent difference is not the result of different first-order statistics, the texture fields have first been brought to identical means and variances (figure 3.10a). In these examples, the standard deviation has been computed over a nine x nine pixel window. Clearly the objective of the technique (like many others) is to create features that can allow segmentation by using first-order statistics over the different features taken together in a multi-channel fashion. In figure 3.10, it can be seen that the eighth and ninth (i and j) operators are probably the best suited to allow such segmentation. However results could be improved by first applying some filtering such as the median function.

Figure 3.10. The Laws operators computed over four texture fields (from left to right: houses, scrub, trees and waves). a) the textures after normalisation, b) to j) the nine Laws feature in order.

Applying such a filtering method clearly has good potential as shown by figure 3.11 which depicts the classification of the image in figure 3.10 (a) using the texture features in 3.9 (b-j). Such a method is limited to high spatial frequencies and to textures that are dissimilar enough so that these microstructures can map their differences. It is also assumed that the conceptual objects have an average size close to the actual resolution 70

of the image. This kind of method would probably solve many cases of texture segmentation in medium resolution satellite images (like the old problem of classifying constructed areas). Note that no attempt has been made in this classification trial to solve the border problem (window operators try to consider pixels that are outside the limits of the image) and that the edge has been mistakenly classified as Scrub. Furthermore, like many other techniques, the method is solely based on the segmentation / classification of the texture fields and not on an attempt to fully describe the texture fields. To improve these results, it now appears quite clear (based on the psychophysics studies) that some frequency feature would have to be integrated.

Figure 3.11. The result from a texture classification trial on the image in figure 3.10 (a) using the nine texture features derived from the “Laws” operators (see text above).

3.8 SINGULAR VALUE DECOMPOSITION METHOD A method called Singular Value Decomposition (SVD) was proposed by Ashjari (1982, in Pratt, 1991). The method consists of ordering a set of singular values for the window treated as a matrix and then ordering the singular values. The singular values are defined by: R

M = ∑λ j =1

1

2

( j )u j v j T

(3.8-1)

Where M is the matrix represented by the window being analysed, uj and vj are the eigenvectors of MMT and MTM respectively ,R is the rank of the matrix M and λõ are the set of singular values.

71

If the pixel values in the window are spatially uncorrelated, the singular values tend to be uniformly distributed but if the elements of the window show a correlation then the singular value curve tends to be skewed to the left (Pratt, 1991). The distribution of the orientated set of singular values can then be compared to a function curve whose shape can be assessed by various measurements in order to be used as texture features.

Although the results from this method are said to provide good classification results, the SVD method remains difficult to interpret directly and does not assess any frequency component in the texture; it assesses only the overall correlation or “busyness” of the texture sample.

3.9 FRACTAL METHOD Fractal geometry was developed by Benoit Mandelbrot in the early seventies. In his book The Fractal Geometry of Nature (1982), Mandelbrot defined fractals as: a set for which the Haussdorff-Besicovitch dimension strictly exceeds the topological dimension, which is to say that for a line, the fractal dimension (considering the line has fractal properties) lies between the topological dimension of a line (1) and that of a plane (2). This is the most basic principle of fractals: objects can have a fractional (hence the word fractal) dimension. The other principle characteristic of fractals is the self-similarity principle by which it is said that the fractal dimension is constant over scale (or over a wide range of scale for fractal objects in nature; see figure 3.12 for a photographic example). Perhaps the best way to understand fractal geometry is to compare it to Euclidean geometry (Voss, 1988): • Whereas Euclidean objects are scale-specific, Fractal geometry is scale-independent. • Euclidean geometry described man-made objects well; fractal geometry describes natural shapes and forms. • Euclidean objects are described by a formula; fractal objects can only be described by a recursive algorithm.

There are many mathematical and philosophical implications to Fractals but only the aspects related to texture analysis will be considered here. 72

Fractal objects can be separated into two basic types: 1) strictly self-similar and 2) statistically self similar. In the case of natural scenes or images, only statistical selfsimilarity is considered, which means that objects can have apparently random shapes but do maintain the same fractal dimension over a wide range of scales. One of the most useful models for the construction of synthetic random fractals related to objects in nature has been the Fractional Brownian Motion (fBm) developed by Mandelbrot and Van Ness (1968, in Voss, 1988).

Consider a random normally distributed process over time X(t), the increment:

X (t 2 ) − X (t1 )

(3.9-1)

(t1 and t2 being two different moments) has a Gaussian distribution and the mean square increments have a variance proportional to the time difference:

E[X(t 2 ) − X(t1 )] ∝ t 2 − t1

(3.9-2)

2

where E represents the statistical mean (or Expectation) over a large sample. Brownian motion is said to be statistically self-similar in the sense that : X(t 0 +t ) − X(t 0 ) and 1

r

[X(t 0 + t ) − X(t 0 )]

(3.9-3)

have the same joint distribution for any t0 and for r > 0 where r represents the scale factor.

The fractional Brownian motion is the simple generalisation of equation 3.9-3:

E[X(t 2 ) − X(t1 )] ∝ t 2 − t1 2

2H

(3.9-4)

for which the special case H=1/2 gives the common Brownian motion. The scaling behaviour of the fractal function is characterised by H in the range 0 < H < 1. H being a

73

sort of roughness factor: when close to zero, the function has a very “rough” appearance whereas values close to one give a smoother trace. The fractal dimension D is related to the H factor in the following sense: (3.9-5)

D = (TD + 1) - H where TD is the topological dimension.

Figure 3.12. Fractal mountains: fractal geometry is the easiest way to describe natural objects such as the relief features of our planet. The self-similarity in the left image appears more limited than in the right image.

Fractal shapes are easily generated from the integration of white noise, but, in the case of texture analysis the fractal dimension of real object representations have to be measured rather than imposed. In the image processing community, it is mainly the paper by Pentland (1984) that gave the basis for using fractal geometry for natural scene description. For a spatial process such as depicted by digital images, equation 3.9-4 may be rewritten using the following exponential relationship (Roach and Fung, 1994):

E[| f ( x + ∆x ) − f ( x ) |] ⋅ ∆x

−H

=C

(3.9-6)

Where E is the expectation, f ( x ) is the image pixel at location x , ∆x is a lag distance and C is a constant. Fractal surfaces should obey this relationship for a “wide range of scales” as natural surfaces cannot be expected to maintain the same fractal dimension over all scales (Pentland, 1984, p. 663).

74

However, if the Earth’s relief can be said to have fractal properties, there are a number of problems resulting from the use of an illuminated image of that relief instead of its actual topographic description. Pentland presents evidence that images of the Earth can be considered as spatially isotropic fractals through four propositions accompanied by the appropriate proof.

Proposition one: Given a Lambertian surface reflectance function and constant illumination and albedo, the fractal dimension of the reflectance image is identical to the fractal dimension of the surface topography.

Proposition two: A linear transformation of a fractal Brownian function yields a fractal Brownian function with the same fractal dimension.

Proposition three: The fractal dimension of a fractal Brownian function is invariant over scale (i.e. the pixel size does not affect the fractal dimension).

Proposition four: If an image intensity surface is a two-D fractal surface, then the topography of the surface (3-D) must be a spatially-isotropic fractal Brownian given Lambertian characteristics and constant illumination and albedo.

There still remain problems for the use of fractal dimension as a texture measure:

Problem one: Not all surfaces are Lambertian reflectors nor do all objects and backgrounds have the same reflectance.

Problem two: Not all surfaces result from a natural fractal process; urban areas, crops fields and tree plantations are such texture fields (this was also pointed out by Pentland).

Problem three: The scale range for which a surface has the same fractal dimension does not necessarily correspond to the scale range used by remote sensing equipment (or pixel size when talking about digital images).

75

In his paper, Pentland (1984) reports successful experiments in image segmentation using oblique air photos and Brodatz texture fields (Brodatz, 1966). However he also uses the fractal dimension to separate water from an urban area which is definitely not a fractal surface.

Xia and Clarke (1997) enumerate seven methods for estimating the fractal dimension. Of these, only two can be used to measure the Brownian fractal dimension from digital images (Pentland, 1984; Roach and Kung, 1994): 1) the fractal Brownian motion method and 2) the Fourier power spectrum method.

1- The fractal Brownian motion method uses the relation in equation 3.9-6 for all possible X and ∆X. If the surface has fractal characteristics, parameter H is estimated by the slope of the log/log plot of 3.9-6; the fractal dimension is then calculated using D=3-H.

2- In the Fourier power spectrum method, the fractal dimension D is estimated through the log/log slope (β) of the 1/f power spectrum using:

D = (7 + β) / 2

(3.9-7)

One interesting feature of the power spectrum method is that surfaces that do not have fractal characteristics tend to produce a fractal dimension that is below the topological dimension (i.e. < 2). This situation also happens when more than one homogenous region are present on the image. Pentland uses that peculiarity to detect texture edges in an image which is an interesting feature since this is one of the only texture analysis methods that proposes a solution to that problem. The Fourier power spectrum method has however the disadvantage of requiring much more computation time.

As reported by Roach and Fung (1994) the power spectrum method also has the disadvantage of being less accurate than the fractal Brownian motion (fBm) method. In the latter method, texture edges can be detected through the presence of a break or gap in the slope. However, non-fractal surfaces are more difficult to detect and tend to yield a very high fractal dimension (close to three). 76

As an example, figure 3.13 shows the log/log plot of the fBm calculated for the two texture fields shown in figure 3.3. As can be seen, the slope of the trees’ fBm lead a fractal dimension of D≈2.8 and D≈3.008 for the waves texture fields! The wave image is clearly not a fractal surface but the tree patch is more difficult to label (a fractal dimension of 2.8 seems a little high).

Log/log plot of the fBm for a tree patch and a wave pattern 1

0,1 1

10

Ï Etrees :

M = 0.189 D = 2.811

100

Ë

Ewaves :

M= -0.00804 D = 3.008

Figure 3.13. Log/log plot of the fractal Brownian motion calculated over the two texture fields shown in figure 3.3 (the higher plot stands for the tree texture and the lower for the wave pattern).

The fractal method shows some interest despite the problems associated with non-fractal surfaces. However, it should be used as one texture feature being part of a larger set as it is unlikely that any unique feature (as a mappable texture characteristic) can account for all the aspects of texture fields.

3.10 VARIOGRAM METHOD As will be shown, the variogram method is closely related to the correlation method but since the variogram has its own theory (the Theory of Regionalized Variables) and has given rise to an impressive amount of research, it deserves its own section. The Theory 77

of Regionalized Variables was first developed by Georges Matheron in the early 1960s (Huijbregts, 1975; Clark, 1979) but the application of this theory to mining and geological research has led to the well-known Geostatistics science. One of the main application of Geostatistics (for which the variogram is but an important tool) is the estimation of ore reserve, and this mainly between or beyond drill-holes. The relationship between concentrations of ore in different borehole logs is a very complex one and traditional statistical tools could not account for the behaviour of a such spatial variable (Huijbregts, 1975).

According to the theory of regionalized variables (being another name for a spatial phenomenon), a regionalized variable is the particular realisation of a random function. The theory of random functions makes it possible to account simultaneously for the structured and random character of spatial phenomenon (Huijbregts, 1975). Considering that γ ( x ) is a particular realisation of the variable Υ at location x, then all γ ( x ) taken at all possible location x can be interpreted as a particular realisation of an infinite set of random variables Υ ( x ) being called a random function.

The utility of this interpretation is that the relationship between two random variables, γ ( x ) and γ ( x + h ) , is of precise significance (h being a spatial distance constant or “lag” as it is often referred to). In order for this to be useful, a certain amount of assumptions have to be made. The first one is that the expectation (which is the estimate of the trend) is a constant and does not depend on the position:

E[γ ( x )] = m

(3.10-1)

Where m is a constant for a given set of values and E is the expectation. This is called the assumption of weak stationarity. Another assumption is that the weak stationarity also holds for the increment of γ ( x ) which can be defined by [γ ( x ) − γ ( x + h )] . The second moment of this increment:

[

2γ * (h ) = E {γ ( x ) − γ ( x + h )}

2

78

]

(3.10-2)

is called the variogram (Huijbregts, 1975) where E is the expectation and h is the distance lag. The * stands as an estimation sign because this is in fact the experimental variogram. But it is mostly used in the following form:

γ * (h ) = E 1

[{γ (x) − γ (x + h)} ] 2 2

(3.10-3)

which is the experimental semi-variogram. γ * (h ) is best represented in its graphical form as in figure 3.14.

Figure 3.14. The typical shape of the semi-variogram (using the spherical model). The h axis starts at zero and represents the distance between the samples. The γ axis also starts at zero and stands for the mean variance at each h. In figure 3.14, a and c are the parameters of the semi-variogram and represent the most interesting feature in terms of data analysis as the models used to approximate the experimental semi-variogram all use these parameters. To understand the shape of the semi-variogram, it is better to look at the behaviour of γ as h increases. At h=0, the distance between the samples is nil, so they are actually compared with themselves hence the curve starting at 0,0 (which is not necessarily the case for the variogram produced by a model). At small values of h, γ remains relatively small as the samples taken close to one another show a certain correlation. As h increases near the middle of the graph, the samples become independent from each other and the curve levels. The height at which the curve levels off is called the sill and the

79

value of h at which it does so, the range. The semi-variogram is usually modelled using these two parameters; the two most commonly used models are the spherical model:

{

[

(

C + (C − C 0 ) ∗ (3h 2a ) − h 3 2a 3 γ (h ) =  0 C

)]}

0≤h≤a h>a

(3.10-4)

and the exponential model: C + {(C − C0 ) ∗ [1 − exp (− h a )]} γ (h ) =  0 C

0≤h≤a

(3.10-5)

h>a

where γ (h ) is the semi-variogram, h is the lag distance, a is the range, C is the sill and C0 represents the nugget effect, i.e. the curve does not start at γ (h ) = 0 . The nugget effect can sometimes be interpreted as uncorrelated noise (Jupp et al., 1989).

In the case of the exponential model, the curve never reaches the sill (Clark, 1979) so that the meaning of the sill and especially the range is not clear. In this case and in the general situation where the curve does not reach the sill at a specific point, some other parameter is needed to estimate the range (Jupp et al., 1989). For that reason, the spherical model is usually preferred (Woodcock et al., 1988A; St-Onge and Cavayas, 1995; Clark, 1979).

One important aspect of the experimental semi-variogram is that it can be computed for isotropic data (having similar spatial behaviour in all directions) or for anisotropic data (showing different spatial behaviour according to direction) in which case it is computed for one or more specific directions so that each set of samples can yield a series of semivariograms. The difference between each directional semi-variogram lies mostly in the range as the sill remains usually stable (and can in fact be estimated using the variance of the sample set). In the case of geological applications, this is quite logical since most geological processes respond to specific trends. In remotely sensed images of the earth, the use of directional variograms can justify itself by considering a few possibilities:

1. The relative position of the sun influences shadow casting and therefore giving some preferred orientation to the texture.

80

2. Ploughing in agricultural fields is usually done in one specific direction. 3. Urban areas usually have a limited number of preferred directions (two to four generally). 4. Wind direction influences the wave pattern produced in the ocean or on lakes

Even so, many authors did not consider anisotropy to be an important factor (e.g. Ramstein and Raffy. 1989; Atkinson, 1995B) or if they did, they did not consider using directional variograms but rather a two-dimensional variogram (Woodcock et al., 1988). Some aspects have to be considered on the use of the variogram before showing its application in texture analysis of digital images; they are summarised below. • The effect of regularisation. In geological and geophysical applications, the samples are punctual in nature which is not the case for remotely sensed data that represent an integration over some area: the Instantaneous Field of View (IFOV). This implies that changes in the spatial resolution of an image will greatly affect the sill, range and shape of the variogram. In particular, increasing the regularisation decreases the sill and tends to flatten the curve (Woodcock et al., 1988A; Atkinson, 1995). Jupp et al., 1989 showed that in fact, the variogram is very much affected by the regularisation effect and that a preliminary analysis of the effect of regularisation should precede the choice of resolution. Plotting the local variance against the block size (representing the regularisation effect by successively averaging pixels to simulate grosser ground resolution. Since the variogram will be thoroughly investigated in this research, more specific attention to scale and regularisation will be given in the following chapter (section 4.1). • Autocorrelation, covariogram and correlogram. The autocorrelation function is related to the variogram (Woodcock et al., 1988A) and by treating the AF function in the same manner as the variogram (i.e. as a graph), one obtains the correlogram which, in turn, can be transformed into a covariogram when the function is divided by the variance. However, the variogram is often preferred for two reasons: it is simpler to compute and, most importantly, the AF function (and thus the correlogram and the covariogram) requires a more restrictive assumption by which the mean is considered constant for the whole image whereas the variogram only requires the 81

mean to be locally stationary (Woodcock et al., 1988A). Also, the correlogram has to have finite variance whereas the variogram does not. Figure 3.15 illustrates the relationship between the semi-variogram and the covariogram.

Variogram / Covariogram of a forest image 180 160 140 120

variogram

Variance

100

covariogram

80

variance

60 40 20

48

44

40

36

32

28

24

20

16

12

8

4

-20

0

0 Distance

Figure 3.15. Example of a forest texture image and its computed semi-variogram and covariogram. Note that one is almost the mirror image of the other. The variance has been plotted as an estimate of the sill. • The variogram and the fractal dimension. Equation 3.9-6 reveals a great resemblance with the experimental semi-variogram (equation 3.10-3) except that in the former, it is the absolute instead of the squared difference that is considered. Ramstein and Raffy (1989) have demonstrated that a straightforward relationship exist between the fractal dimension and the semi-variogram and that the fractal dimension can be directly calculated for a number of models (but not the spherical model). They also showed that when the variogram is bounded by a clear sill, more specifically if that sill is near the origin4, then the surface cannot be considered fractal as the calculated fractal dimension leads to a value which is less than the topological dimension. The latter seems to be the case for a number of textures.

4

Although a true fractal has infinite variance, the fact that any image has a predefined resolution and is bounded by its extents, “creates” a sill for all images. Still, the behaviour of the variogram near the origin can be used to compute the “local fractal dimension”. 82

Serra (1982) has produced some of the pioneer work on the variogram (some of which in collaboration with Matheron). He uses covariance (and the variogram) as one of the fundamental tools in mathematical morphology. He enumerates various applications where covariance can be an effective tool of analysis: 1. analysis of clusters of points

5. noise

2. description of structure

6. multiphased structures

3. superposition of scales

7. grey tone functions

4. periodicity

8. anisotropy

Most of these can be related in one way or another to texture analysis and directly apply to the study of spatial structures in images. He suggested that because the variogram is less restrictive it is more suitable for applications where the spatial correlations of the specimen under study reach beyond the zone defined by the scan field (image sample in this case). The author also proposes a number of algorithms for calculating the slope at the tangent at the origin (which bears special significance in terms of structuring objects and can be an estimator for the fractal dimension) and the slope of the asymptote and for estimating the noise (nugget effect) and the range. He also presents two simples models to illustrate that the covariance (and consequently the variogram) can be blind to some spatial structures in the sense that it does not distinguish them even though the human eye can clearly see the difference. These are cases where the specimens studied have similar second order statistics but differ in their higher order statistics therefore questioning the Julesz (1962; 1965) findings that human observers cannot distinguish texture pairs that are similar in their first- and second-order statistics (a theory that Julesz would correct later: see Julesz 1981). Serra’s work on the variogram has been used by most researchers studying the covariance or the variogram and their findings are complementary to his extensive study.

Woodcock et al. (1988A and B) have conducted an extensive two-fold study on the use of the variogram for the analysis of remotely-sensed images. Their study focuses primarily on the analysis of the behaviour of the semi-variogram with respect to different scene parameters. In the first part of their work, the experimental semivariogram is computed for a series of simulated images grossly representing a forested environment. They used a model developed by Li and Strahler (1985) in which discs 83

(representing conifers as seen from above) are placed against a uniform background according to several parameters such as crown density, crown diameter, and crown size. However they modified the model in order to reduce the simulated images to two quantisation levels: black discs on a white background. They also produced simulated images in which the disc conifers cast a shadow over the white background giving them the appearance of cones on their sides.

The authors fitted an exponential model to the semi-variograms they computed in two companion papers by Jupp et al. (1988 and 1989) stressing the effects of regularization. Their combined findings are most valuable in terms of understanding how some image parameters affect the shape of the semi-variogram. Their most important conclusions are summarised below. • The sill is directly related to the density of the objects or the percentage of the area covered by these objects. • The range is closely related to the size of the discs but the shape of the object does not appear to have a significant effect on the shape of the variogram or on its range. • The shape of the semi-variogram is related to the variance of the size of the objects. As the object size variance increases, the variogram becomes more rounded and does not reach its sill as clearly as with fixed size objects. • The increase in the pixel size (decreasing the resolution) affects the variogram in three different ways: 1) it lowers the sill, 2) the range increases to D1+D2 where D1 is the diameter of the basic disk and D2 is the diameter of the regularising disk (Jupp et al., 1989) and 3) the height of the variogram at a distance of one pixel increases relative to the sill (which would result in an increased nugget effect).

In the second part of their study (Woodcock et al., 1988B), real digital images were used in order to compare their theoretical findings with real situations. Images ranging in resolution from 0.15 m to 30 m and from different environments (urban, forest and agricultural) were used to produce a series of variograms which were in turn compared

84

with the variograms extracted from the disc model images. Comparing with the findings of their first article, the authors reached the following conclusions: • The height of the sill can, although not as clearly, be related to the density of the objects in the scene. This holds particularly for scenes that have one object type over a regular background. • The shape of the variogram (as controlled by the range) could repeatedly be associated with the size of the objects. Although most real-image-variograms did not quite reach their estimated sill (the variance), breaks in their slope could be related to actual object size. • In all cases, the real-image-variograms had a more rounded shape than the discmodel-variogram due to the larger object size variance in real images. • It appears that the relationship between the average object size in the scene and the size of the resolution cells have the most profound effect on the shape of the variogram.

The study of Woodcock et al. has been pursued by others, specially the combination of variogram and forest canopy models have given rise to prolific work. In a recent article, St-Onge and Cavayas (1995) have used the variogram in an attempt to extract forest parameters such as tree height and density and crown type and size based on parameters computed from multi-directional variograms. Their methodology consists in computing the semi-variogram for eight different directions and then, after fitting a spherical or exponential model on the curve, extracting both the range and sill parameters for each directional variogram. A partial list of these parameters follows:

a per

The range on the directional semi-variogram perpendicular to the direction of the incoming sunlight

c per

The sill on the directional semi-variogram perpendicular to the direction of the incoming sunlight

85

a par

The range on the directional semi-variogram parallel

to the

direction of the incoming sunlight

c par

The sill on the directional semi-variogram parallel to the direction of the incoming sunlight

a

The mean of the ranges in all eight direction

c

The mean of the sills in all eight direction

a par / a per

The ration between the parallel and perpendicular ranges

These parameters (and others) were calculated for a wide range of situations involving one of the forest parameters at a time (e.g. tree density, crown size, etc.). The results were used in a multi-regression model to see which variogram parameters best enabled the prediction of the independent forest variable. These model were then used to predict canopy parameters in a series of other simulated images not used in the computation of the multiple regression models. Success rate was very high (R2 ≈ 0.9) for predicting tree height, tree density and crown size. The author then tested the models for these three canopy parameters on real images acquired by the MEIS II airborne sensor. Although the correlation factor was not nearly as high (not specified in the paper), the general behaviour was correctly predicted.

The methodology developed by St-Onge and Cavayas is not directly useable as a texture feature extraction method or as a texture classification scheme; it involves too much computation (eight directional variograms, mean-square fitting using two different models, variogram parameters extraction and multiple regression) and has to be based on a rather large window to carry a significant level of accuracy. However, the method clearly shows the reliability of the relation between the variogram parameters and the density and size of objects on an image (one may assume that the relationship is maintained for objects other than trees).

Ramstein and Raffy (1989) have proposed a texture classification scheme based on a unique texture feature based on the semi-variogram. They suggested a fast algorithm for extracting the range parameter from the variogram based on using the variance as an estimate of the sill and the slope between the origin and the value obtained for h=1 (γ(1)). 86

Using this method they have separated successfully four classes of texture in a LANDSAT TM image (forest, fields, dense and less populated urban areas). However, one should note that the differences between these texture classes are not subtle ones and that other methods would have achieved the same. Additionally, the approximation of the range based on a fixed model is likely to bring problems when comparing more subtle textures. Finally they did not consider anisotropy in their computation of the semi-variogram which has revealed itself to be an important feature in other studies like the one by St-Onge and Cavayas (1995).

In the same paper, Ramstein and Raffy also consider the variogram as a way of resampling an image to a greater resolution (see also Atkinson, 1995 B) while maintaining the same textural characteristics as the original image. By comparing an image resampled using the variogram on one hand and a spline method on the other, the former shows a much more realistic visual aspect than the latter. The method could be compared to the fractal approach without the disadvantage of being restricted to fractal surfaces. This aspect is particularly interesting for image simulation purposes.

87

3.11 FINAL REMARKS ON THE IMAGE PROCESSING APPROACH TO TEXTURE ANALYSIS. The enumeration and brief description of texture analysis techniques that was made in this chapter is far from being an exhaustive list. There are many other methods that have been developed in the Remote Sensing community alone and many more if all related fields are to be considered. The important point is that most methods are related in one way or another to one of the fundamental mathematical principles that have been described here. It should also be mentioned that all methods do not, necessarily use the texture feature extraction approach and that other segmentation methods like texture boundary detection (e.g. Jones, 1991), region growing (e.g. Krumm and Shafer, 1995), recursive region splitting (e.g. Ohlander et al., 1978) or even hybrid methods using both region growing and edge detection (Pavlidis and Liow, 1990; Farrokhinia and Jain, 1991) can yield good results but will not be investigated here because they are basically unsupervised non-parametric methods that are difficult to monitor. In this study, the interest is not only on the end result but mostly on the understanding of the features that permit the analysis and classification of textures in images of the earth.

Like the ones presented in this chapter, the methods investigated in this thesis will all use some well-recognised mathematical principles. But even though these principles are well known and understood by the pattern recognition community, an infinite possibility of approaches and avenues still have been overlooked or simply not thoroughly investigated. Chapter 4 will take a deeper look at the type of texture and its particularities which will help to choose an approach or a set of approaches most appropriate for the objective being pursued.

88

Chapter 4

THE PROBLEM OF NATURAL TEXTURES

4.1 THE ATTRIBUTES OF NATURAL TEXTURE 4.1.1 Defining natural texture

Since there is no universally accepted definition of visual texture, one has to choose a definition that best reflects the objective or the results being sought. Within the scope of this thesis, a rather broad definition of texture is needed; one that will embrace texture as a photo-interpretation cue, regardless of the application. One such definition was given by Pratt (1978, p. 505): natural scenes containing semi-repetitive arrangements of pixels. No distinction will be made between texture and pattern and, contrary to the definition given by Estes et al. (1983), even natural scenes with discernible objects (the resolution cell is much smaller than the objects) will be considered legitimate textures (see figure 4.1a).

Another problem that is often encountered is the definition of what a semi-repetitive arrangement of pixels is. For example, in figure 4.1a, one could interpret the sample scene as containing two separate textures: the dark area texture which appears to have a greater density of trees and the lighter areas with more sparsely distributed trees. Another interpretation would describe the scene as rolling hills with an open cover of scrub. The first interpretation might be that of a forest engineer whereas the second one, of a geomorphologist that sees the scene as a sole topographic unit where the orientation of the slopes rules their water content which in turn controls the density of the trees. If

89

the scene is interpreted as containing two different textures, then it is unlikely that the “rolling hills with scrub” interpretation could be deduced further along the process. So the level at which texture is interpreted is determinant upon the final result of the interpretation process altogether. This is not, however, a desirable situation because it makes the process unstable and dependent on threshold values (such as the one that would split a scene into two textures for example).

(a)

(b)

(c)

Figure 4.1. A texture of rolling hills covered with scrub. The original scene (a) was filtered to leave only the rolling hills component of the texture (b) which, when subtracted from the original scene, leaves only the scrub component of the texture (c). Note that the slopes opposite the sun appear to have a greater concentration of smaller trees.

4.1.2 Micro- and macro-texture

Another possible solution is to describe the natural scene of figure 4.1(a) as two overlapping textures: a micro-texture (high spatial frequencies) and a macro-texture (low spatial frequencies). These two textures have been effectively separated into 4.1(b) and 4.1(c). This approach has the advantage of pushing forward the interpretation of the texture field and therefore leaving the decision on the type of application (and its interpretation technique) at a higher conceptual level. This would also be consistent with how the brain processes textural information through band-pass filters (Harvey and Gervais, 1978; Richards and Polit, 1974) and has already been implemented in some texture classification methods (e.g. Goresnic and Rotman, 1992; Cohen and You, 1992). There are several ways by which a scene can be analysed at different spatial frequencies 90

but one possible solution would be to build a multi-scale model of the scene prior to the analysis as was suggested by Cohen and You (1992). This leads to the concept of multiscale and multiresolution analysis that is receiving more attention from the scientific community for biophysical and landscape processes (Meentemeyer and Box, 1987). In fact, understanding any spatial process involves the extraction of critical information from a range of scales (De Cola, 1997). Multiresolution variance and covariance and are tools well suited for providing this type of information. In section 4.3, these tools will be used to justify the choice of resolution for the test images.

For example, in the figure 4.1, the scene in 4.1b has been heavily filtered and therefore does not need to be analysed at full resolution so that its resolution cell size could be increased by at least four times which would result in a reduced analysis time. Such a procedure would have the disadvantage of complicating the process of texture classification and is somehow dependant on the application. In this thesis, direct classification of the texture as one entity will be attempted through the choice of judicious analysis window size.

The separation between micro- and macro-texture was easy in the case of figure 4.1 since the two predominant spatial frequencies are well separated in the frequency domain. In practice, the spatial frequencies analysed in both the full-scale and the reduced-scale scenes will have to overlap to ensured that no textural information is lost when important frequencies fall exactly at the lower limit of the reduced resolution scene.

4.1.3 The effect of scale on texture

The effect of increased spatial resolution on texture was first perceived by the user community as an undesirable increase of variance in class statistics of the classification training process making them more likely to overlap each other in the spectral space. This was provoked by the fact that the Remote Sensing user community had hoped that the well established techniques of digital mapping (per-pixel classification) would perform just as well with high resolution data as it had with low spatial resolution imagery such as Landsat MSS (Dikshit, 1996). When choosing a scale for observations 91

(the imagery), one has primarily to consider the type of environment, the kind of information desired and the technique used to extract that information (Woodcock and Strahler, 1987). The type of environment dictates the nature and size of the objects found on the scene and therefore has a direct influence on the type of interpretation sought. The kind of information refers to the level of abstraction at which one interprets the scene. Considering that the scene model used in most Remote Sensing applications is discrete in nature (as opposed to continuous where local fluctuations are essentially smooth), the objects extracted from the image are in reality abstractions of real objects on the ground (Woodcock and Strahler, 1987). For example, a forest is really a mosaic of tree stands which are composed of individual trees which can, in turn, be decomposed into trunk, branches and leaves depending on the level of abstraction. This leads to the concept of the operational scale defined as the scale at which certain processes operate in the environment (Lam and Quattrochi, 1992; Cao and Lam, 1997) or the “scale of action” as it was called by Woodcock and Stralher (1987). Such a scale should be pinpointed before starting the investigation: “To select an image with appropriate spatial resolution for a study, one must examine the characteristics of scene content, especially the changing pattern of the scene as a function of changes in scale and resolution.” (Cao and Lam, 1997; p. 62).

Another distinction can be made upon the discrete scene model: as pointed out by Strahler, Woodcock and Smith (1986), the discrete scene can either be described as a mosaic of objects that covers the whole image or as individual objects distributed on a continuous background. Typically, a residential scene can be seen as a mosaic of houses, lawns, streets, etc. whereas the forest is characterised as an homogenous distribution of trees on a background of soil or vegetative under-story. However, in terms of image analysis, a forest can also be seen as a mosaic of patches of tree and under-story spectral responses. This later case is really what is achieved by most traditional image classification schemes based on a per-pixel recognition. To work properly, this technique implies that a spectral signature would have to be computed for every possible object on the ground and that a second interpretation stage would be necessary to transform the mosaic of spectral object (tree, soil, under-story) into conceptual objects (a forest). The whole idea of texture analysis is to bypass such a 92

reductionist process and to go directly from the input image to the conceptual object through the analysis of the image structure.

Another important fact lies in that it has been observed that the increased resolution of scenes has had the effect of often decreasing the classification accuracy. The reason for this situation has been attributed to two counteracting processes: the increase in image variance associated with more heterogeneous scenes, and the reduced number of boundary pixels with finer resolution that should theoretically increase classification accuracy (Townsend, 1981; Dikshit, 1996). More specifically the increase in image variance has had the damaging effect of increasing the spectral space of most classes and, hence, increasing class overlap, the outcome of which has been that these per-pixel techniques are quite inefficient when applied to scanned aerial photograph where the size of the texture primitives are much larger and complex.

The scale affects both the type of information being extracted and the precision of that information (Atkinson, 1995). At the gross resolution of satellite imagery, individual trees cannot be identified, so by using fine resolution imagery (scanned air photographs for example) one can either change the cartographic level of generalisation by mapping individual trees and houses or simply map at a higher precision level. The important point here is that no matter what the application is, the scale or resolution of the imagery will always have to be considered. Within the scope of this thesis, the goal will be to map at a higher precision and to extract more specific information about the conceptual objects. For instance, considering the object “forest”, information can take the form of forest structure such as crown size, regularity or spacing. Mapping precision will always be limited by the resolution cell size but mapping accuracy can still greatly be improved by reducing the amount of classification errors and by extracting additional information about these conceptual objects. In that sense, the effect of resolution can be considered from two different points of view: • the effect of resolution on remotely sensed data • the effect of resolution on cartographic information

93

The effect of resolution on remotely sensed imagery has generated substantial interest in the remote sensing research community (for a review on some of these papers, see Jupp et al., 1988 and 1989; Woodcock and Strahler, 1987; Atkinson, 1995 and Doyle, 1984). In one of these studies, Woodcock and Strahler (1987) identified two interesting facts on the behaviour of the local variance in the neighbourhood of pixels upon changing the size of the resolution cell:

1- when plotted against the pixel size, the local variance around a pixel describes a curve starting at low values, increases rapidly to reach a peak value and slowly decreases with larger pixel size. 2- the peak variance value occurs between ½ and ¾ of the average object size.

These conclusions were reached for scanned air photographs of three environments: forest, residential areas and agricultural fields. The pixel size ranged between one to 16 meters for the forest image, two to 30 m for the residential scene and 30 to 1000 m for the agricultural scene. This means that a texture feature based on some measure of spatial variation (correlation, variance, covariance) would not be directly comparable at different resolutions. Texture is obviously not scale free and texture analysis will certainly be intimately tied to the ground resolution cell size (Duggin and Robincove, 1990; Goodchild and Quattrochi, 1997), this is specially true for heterogeneous scenes (as is the case for textured images) that will suffer rapid information loss with the increase in resolution cell size (Meentenmeyer and Box, 1987; Cao and Lam, 1997). Therefore, the change in spatial variation with scale is informative and should perhaps be considered as a useful source of information in any texture analysis methodology. Some scale-invariance is however desirable because we often need to assign one of a limited number of attributes to what is observed in a scene and patterns observed in natural textures are rarely homogeneous (section 4.2.2 will further assess this question).

4.1.4 Frequency and Orientation

Spatial frequencies was among the first concepts to be exploited by the image processing community in an attempt to analyse and describe visual texture (Hawkins, 94

1970; Haralick et al., 1973; Stromberg and Farr, 1986). Most specifically, the Fourier spectrum has been used as early as the mid- 60’s. To understand why the concept of spatial frequency received so much attention, one needs only to look at the very definition of texture: the semi-repetitive arrangements of pixels suggests the existence of spatial frequencies.

If such an analysis of a visual texture field does not appear to correspond to our intuitive notion of texture (which as been defined by contrast, roughness, coarseness, linelikeness, directionality and regularity by authors like Tamura et al., 1978), it does however appear to find its counterpart in the manner in which our brain processes visual information (Richard and Polit, 1974; Pratt, Faugeras and Gagalowicz, 1978; Harvey and Gervais, 1978; Caelli, 1982; Goresnic and Rotman, 1992; Vincent and Regan, 1995). In the last of the quoted papers, the authors went further and suggested that “orientation, spatial frequency and contrast of a test grating (replacing the image term in many visual perception studies) are encoded in parallel and with negligible cross talk” (Vincent and Regan, 1995, p. 496). These three types of information find a direct correspondence in the Fourier transform: the contrast (for each specific frequency) is mapped by the relative height of the peaks in the transform; the frequency is implicit as this is the domain of the transform; and the orientation is expressed in the twodimensional transform just as it is in the image (except that it appears rotated by 90 degrees).

Figure 4.2 shows the power spectrum of the rolling hills with scrub image (figure 4.1) together with a very gross interpretation. It can be observed in that figure that the two main spatial frequencies do appear in the enhanced spectrum. However, these features are far from having a clear-cut visual impact and are somewhat rather faded. In the figure, the furthest peaks (arranged horizontally) probably depict some systematic noise due to the scanning process. The central ring represents grossly the frequency of the individual trees which, being illuminated from the top, show a characteristic alternance of bright and dark tones. The third region of interest is depicted in a ± 110° arrangement near the center on the image and represents the lower frequency of the rolling hills.

95

Figure 4.2. The Fourier power

Central

spectrum

(right) and interpretation (far right) of the rolling hills scene of figure 4.1.

If the Fourier transform seems so appropriate for the analysis of texture, then why have so many other methods been developed to account for the same work? The answer can be found in two different aspects of the transform. The first refers to computation time which has been known to be long (even in its fast version) and requires the double use of floating point numbers (real and imaginary). The second drawback comes from the fact that the transform has to be interpreted therefore bringing back the whole problem of analysis. These two factors were major drawbacks in the 70’s and 80’s when computers were much slower than they are today but recent years have brought easy access to computers with processing power as high as 200 MHz. In particular, Jernigan and D’Astous (1984) have had some success by using a set of features derived from the characteristics and the shape of the one-dimensional power spectrum. Wilson and Spann (1988) have also shown that by using orientation- and frequency-specific filters, very complex texture fields could be adequately classified using the Fourier transform (it should, however, be noted that the texture fields they used have distinct and well defined frequencies and can not be considered as natural). The Fourier method is also limited by the sampling theorem that states that a signal has to be sampled at a rate at least twice the highest frequency (the Nyquist frequency) one wants to observe (Glassner, 1995). These high frequencies (corresponding to short spatial wavelengths) are usually regarded as noise within the Fourier analysis and are often said to carry little useful information. Since the Fourier analysis treats frequencies in cycles / signal extent (image in our case), the short wavelengths tend to be spread throughout a wide range of frequencies. For example, wavelengths between two and five pixels will be displayed on the transform as frequencies ranging from 200 to 500 cycles for an image of 1000 pixels. These high frequencies (approaching the Nyquist frequency) are important for texture analysis and their content should not be attributed solely to signal noise. 96

Still, the Fourier transform is a very powerful tool for analysing and, perhaps, classifying visual textures. An attempt will be made within the scope of this thesis to develop and implement an original approach using the Fourier transform.

As was made clear in Chapter 3, the Fourier transform is not the only tool for the analysis of the frequency, orientation and contrast of visual textures. Among the various techniques that have been described in Chapter 3, one of the most promising was the variogram approach which will also be the object of further investigation.

It appears quite clear that any method used to analyse and classify textures will have to consider frequencies and orientation in some way. Only thorough experimentation with natural texture (encountered in air photographs or satellite imagery) will permit us to establish whether these two components are sufficient or not.

4.2 FEATURES FOR TEXTURE CLASSIFICATION 4.2.1 How many features?

Some of the early experiments (and even some later ones) with texture aimed at finding a unique feature that would account for the texture in an image. The general idea was to include a textural feature in the set of spectral features (the individual bands) and to perform some per-pixel classification (e.g. maximum likelihood, parallelepiped or minimum Euclidean distance). Usually, the texture feature was some first-order measure of variance or some edge-counting technique. This approach gave reasonable results in improving classification accuracy with the early satellite imagery like Landsat-MSS where the pixel size was far larger than most of the objects that usually cause texture such as, for instance, trees and houses (Marceau et al., 1990; Anys and HE, 1995). This situation can be attributed to the principle of the operational scale: which is the scale at which a phenomenon operates (Lam and Quattrochi, 1992). The increase in resolution (smaller pixels) brings phenomenon (through their visual pattern) from smaller operational scales (Bian, 1997). For instance, if a residential area had a characteristic

97

spectral signature at a 80 metre resolution, it loses this singular characteristic at a 10 metre resolution to the benefit of the individual rooftops, lawns and street bitumen.

With the increasing resolution of the latest generation of Earth-observation satellite data (e.g. Spot-HRV with 10 m and Ikonos with one and two metre image data) it appears that these simple first-order “texture” features do not account for the complexity of the structure in image textures in such smaller operational scales.

It seems reasonable to say that the less features that are needed, the better. On the other hand, the features have to account for every kind of texture, even if not present in all cases. In the literature, texture classification techniques have been notorious for using a large number of features, each representing some aspect of the texture's structure. Table 4.1 gives a small sample of these techniques with the number of features that were involved in the classification process. In that table, one can see that from 1973 through 1980, the trend was to use a rather large number of features (i.e. between nine and 28). Then, between 1984 and 1989, two new methods aimed at solving texture with a single feature: namely the fractal approach and the variogram approach. In the 90’s the trend seems to involve more numerous features with considerations regarding multi-scales and multi-directions.

98

Table 4.1 The number of texture features in some of the major texture classification methods Authors

Method's name

Number of features

Haralick, Shanmugam co-occurrence matrix Mean and range of 14 measures over and Dinstein, (1973) Tamura,

Mori

four matrices for a total of 28 features.

and Textural

Yamawaki, (1978)

features Six

corresponding

features

(coarseness,

contrast,

to directionality, line-likeness, regularity

human perception

and roughness).

Corners and Harlow, spatial-grey-level

Twelve features from a second-order

(1980)

dependence method

dependency matrix.

Laws, (1980)

Microstructure

Nine

method

convolution operators.

Fractal-based

One

description

dimension.

Pentland, (1984)

features

unique

derived

feature:

from

the

Stromberg and Farr, Fourier-based

A minimum of

(1986)

frequency band-pass features.

textural features

Ramstein and Raffy, Variogram-based (1989) Liu

five non-oriented

One unique feature.

Jernigan, Fourier-based spatial Eight to 28 features from the spatial

(1990)

frequency domain and

frequency domain.

You, Multi-resolution

(1992) Sali

fractal

feature

and

Cohen

nine

Three texture feature from the Energy

tuned masks and

measure from three different scales.

Wolfson, Hybrid method

Two features from first-order statistics,

(1992)

five from second-order and one multi resolution feature (fractal-based) for a total of eight.

Wu and Chen, (1992)

Statistical

Feature Statistical feature matrix three to five

Matrix

features derived from a second-order matrix.

Goresnic and Rotman, the Cortex transform

Eight spatial frequency domain band-

(1992)

pass features out of a set of 18.

99

Numerous other studies could be quoted, but this subset gives a good overview of the situation. By looking more deeply at the methods that use very few features (up to three) one can easily conclude that these methods have a good success rate when used to distinguish between very different textures such as forest, urban area and water (like in Pentland, 1984).

Many methods also use a preliminary feature selection technique by which a subset of feature is selected based of their ability to classify some set of textures. While some have used a trial and error method, others have used a “knock-out” algorithm to progressively eliminate the least useful features (finding the best subset of features is a combinatorial problem that can become computationally unfeasible when the number of features is large). If, for practical reasons, eliminating features of a classification process is very desirable, the subset selected should be the same for the vast majority of textures; otherwise the portability of the method is compromised and the classification process will always have to rely on a training phase assuming good ground control data.

One last observation about the number of features is that, contrarily to what one might think, the performance of most classifiers doesn’t necessarily increase with the number of features. In fact, some have pointed out that, beyond a certain point, the inclusion of new features might worsen the performance (Goresnic and Rotman, 1992).

It is still early to know exactly how many features will be used in the experiments involved in this research but it is likely to be a large number. Since this work is all about experimentation and investigation, and since all the programs are being written by the author (although some s be borrowed), it is economical to “try” a great number of potential features as long as they can be understood and justified. These features will be the object of statistical testing to determined their usefulness in discriminating between different textures. At the same time that texture features should maximise the discrimination between textures, it also should minimise the variation within groups of similar textures. Both types of tests will be applied in the evaluation of the features created.

100

4.2.2 The invariant feature dilemma

One of the main problems of image segmentation / classification is that it is necessary for a particular method to be able to “recognise” an object under varying environmental conditions. The concept of invariant features or attributes was particularly important to the pattern recognition techniques dealing with shapes: the shape of, say, an aircraft has to be identified in all the possible positions (rotation), and over a broad range of scales and illumination conditions. In the context of texture classification, the concept of invariant features has to be redefined. The following items are considered in the next few paragraphs in terms of their eventual “invariance”:

1. scale 2. orientation 3. illumination 4. noise

4.2.2.1 The scale

As was previously seen (section 4.1.3), the scale factor is most determinant upon texture: what might have a very coarse texture on one scale (or resolution) can show a very smooth and uniform texture on a smaller scale (in particular when the surface in question does not have fractal characteristics). Therefore, any texture feature could only be scale-invariant for a certain range of scales in which its behaviour could be considered roughly the same. As previously shown by many authors, the local variance around a particular pixel is directly related to the relation between the object size and the resolution cell size (Woodcock, and Strahler 1987, gives a good review of this relation). Starting at high object size/resolution ratio (H-resolution), the measure of local variance is generally rather low; as it decreases, the local variance increases to reach a peak that generally corresponds to the average object size (ratio of 1.0) and then starts decreasing when the pixel size becomes larger than the object (L-resolution). Evidence of this statement will take the form of a local variance graph for the textured image set chosen in section 4.3.

101

It would therefore be impossible to compute truly scale-invariant texture features mainly because of the regularisation effect. One possible way to partially overcome the scale effect is to always treat the spatial variation with respect to the real distance in a certain unit of measurement such as the meter. Although quite simplistic, this approach will ensure that small variations in scale (such as between one and two meters) will yield similar results.

4.2.2.2 The orientation

The preferential orientation of certain textures (or anisotropy) represents another source of complication for the design of texture features. There are a number of factors that can cause anisotropy in the spatial dependence of grey levels in textures originating from both natural and man-made processes. Apart from the well recognised effect of preferential illumination on relief, there are many geological processes that have inherent preferential orientation that have a direct or indirect effect on the surface cover through topography (for example, see the light and shadow pattern in figure 4.1), soils or vegetation cover. Even the dominant winds can yield visible effects on the land covers (e.g. sand dunes, ocean waves).

Although the first multi-channel models of texture perception did not consider the orientation separately (Richard and Polit, 1974; Harvey and Gervais, 1978), there is now sufficient psychophysical evidence to support that orientation is processed independently from frequency (e.g. Caelli, 1982; Vincent and Regan, 1995). In the same manner, even though many Remote Sensing and Image Processing authors have not considered anisotropy to be relevant in texture analysis (e.g. Rosenfeld and Kak, 1982; Pentland, 1984; Stromberg and Farr, 1986; Woodcock et al., 1988A and 1988B; Ramstein and Raffy, 1989; Miranda et al., 1992), it is increasingly being computed as an essential component of texture (e.g. Tamura et al.., 1978; Jain and Farrokhnia, 1991; Sali and Wolfson, 1992; Goresnic and Rotman, 1992; Wang and He, 1990; Lark, 1996).

However, it appears important to specify that if anisotropy is an important component of texture, the particular direction does not, in most cases, have any relevance to the classification of texture. Even if the brain processes orientations in texture 102

independently, for most applications, the particular orientation is irrelevant to the process of texture classification. For that reason, texture features that are affected by anisotropy should compute the directionality (in relative form) rather than raw direction. If directionality is to be considered as a texture feature (in the texture space), it should preferably be rotation-invariant.

Various methods have been developed to extract rotation-invariant features from textured images. The first one was probably the simple conversion of the Fourier power spectrum from cardinal coordinate (u,v) to polar coordinates (r,θ) which will transform a rotation in (u,v) to a shift in θ for the corresponding polar coordinate (see section 3.3). Pattern matching could then be performed much more easily by shifting the pattern template on an image’s power spectrum in polar coordinates (Rosenfeld and Kak, 1982; Ritter and Wilson, 1996). Hu (1962 in Ritter and Wilson, 1996) has proposed a set of seven moment-invariants that are independent from rotation and scale. However, these moment-invariants are mostly useful in detecting exact patterns and not statisticallysimilar textures. Faugeras and Pratt (1980) have also used a related approach (spread functions) on the autocorrelation function of sampled images, in particular, the profiles (S(2,0), S(0,2)) which are more likely to detect anisotropy (but only between the rows and columns of the AF function).

A simple, yet interesting approach was proposed by Tamura et al. (1978) for computing the directionality by computing the histogram of local edge probabilities against their directional angle. The histogram was obtained by counting the number of edges over a certain threshold for all values of θ which was calculated by: θ = tan −1 (∆ V ∆ H ) + π 2

(4.2-1)

where ∆H and ∆V were the horizontal and vertical gradients respectively measured by convolution operators. The following step involved finding the peaks in the histogram and calculating their respective sharpness using the second moment calculated around each peak. Although the method appears well suited, the steps to isolate the peaks are computationally costly.

103

Another method that is increasingly popular is the use of directional semi-variograms (St-Onge and Cavayas, 1995; Lark, 1996) as it has been demonstrated in section 3.10. One can also note that a similar approach can be used with one-dimension frequency transforms in a preferential direction. This concept will receive full attention in the next chapter.

4.2.2.3 The illumination

In a Remote Sensing application and for natural textures in general, the illumination condition can be highly variable as opposed to controlled environment application like robot vision for manufacturing. Not only can the illumination source have a different incidence angle according to the time of day, season or latitudinal position but the wide variety of sensors can have very different spectral response functions and banding in the electromagnetic spectrum (Strahler et al., 1986) and many will adjust their gain and shift according to the type of environment being scanned or photographed. In addition, the images or photographs will often be pre-processed to optimise their contrast either analogically by chemical process or electronically by digital process. This situation makes it all the more difficult to have directly comparable grey level responses between different scenes or even within the same scene but at different times of acquisition.

These natural and artificial sources of illumination differences have a direct effect on the contrast which is one of the components of texture. It is also expected that the solution to this problem should not rely on complex (and often incomplete) sensor and atmospheric models. The most dramatic implication of such conditions is that the almost totality of texture features are not illumination-invariant. This aspect is even worse for features based on some squared measure of variation (variance, covariance, semi-variogram, ...) because of the exponential effect.

It is therefore unlikely that contrast could be directly used to compare textures between scenes or acquisition dates. However there are some ways to get around the problem by using ratios between, for instance, the variance for a particular lag value and a local undifferentiated measure of variance (as in the process involved in the semi-variogram approach). Even for the Fourier power spectrum, the “normalised contrast” could be 104

measured between individual frequency peaks and the local maxima at the spectrum origin. The computation of textural features should offer some sort of normalisation option to overcome the effects of variable illumination.

4.2.2.4 The noise

Texture can be thought of as the structured variance within an image: a variation that can be predicted statistically. In that sense, noise does not contribute to the texture as it is essentially unstructured and stationary. Noise is said to be unstructured in the sense that no texture primitive (semi-repetitive arrangement of pixels) can be recognised in it. Stationary noise means that its expectation is constant throughout the various textures in an image. In that sense, noise should neither contribute nor contravene (as long as the signal/noise ration is maintained well above 1.0 or else the signal part would be almost lost) with the classification of texture patterns. In terms of statistical design, an image region ƒ(x, y) where x and y are two spatial coordinates can be represented as: f ( x, y ) = g ( x, y ) + η ( x , y )

(4.2-2)

where g(x, y) is the spatial variation accounted for by texture and η(x, y) is a spatial variation resulting from additive noise (Ganesan and Bhattacharyya, 1995). The same authors have proposed a new model for texture using a polynomial formulation where the spatial variation in an image is divided into two distinct components. In one component, the spatial coordinates vary together to produce texture and in the other, the spatial coordinates remain independent, producing noise. Of course, the model is only appropriate for additive noise which is the case for most visible and infrared images (radar images suffer from multiplicative noise but are not being considered within the scope of this research).

One important aspect that the model does not account for is that, when dealing with natural textures, the grey level spatial patterns are only semi-repetitive and can therefore be difficult to separate from noise so that when the noise accounts for an important part of the grey level variations it can “damage” significantly the texture and make it difficult to characterise. The less predictable a texture pattern is (or less regular or having 105

primitive patterns ruled by complex placement rules), the more it will be affected by a high proportion of noise or low signal/noise ratio. In statistical design, the signal/noise ratio is simply the ratio between total variance and unexplained variance and can therefore be estimated by a number of methods. One of the most simple ways to estimate the signal/noise ratio is by computing the semi-variogram and estimating where the function crosses the variance ordinate (see section 3.10) which is often referred to as the nugget effect (Huijbregts, 1975; Clark, 1979).

In Woodcock et al., 1988B, it was observed that the nugget effect was relatively high considering the regularisation effect which should have the opposite effect. Furthermore, its proportion with respect to the total variance seems to grow with the increase in pixel size (in the study, both scanned air photographs and satellite imagery were used). The authors attribute it to measurement errors of the semi-variogram and atmospheric path radiance. Another possible explanation could take the environment (mainly forest and urban areas) into account: as the pixel size approaches the size of the objects present in the scene (tree crowns and houses), the pixels have a lower probability of being correlated to their immediate neighbours considering the discrete nature of the images. The possibility of easily estimating the signal/noise ratio from the semivariogram is another advantage that this method offers.

Although the noise is not part of texture, its computation could benefit texture analysis in classification error assessment. In particular it could help predict difficulties in classifying textures that have similar characteristics such as different classes of forest for instance.

4.2.3 The human-like features dilemma

Various authors have used some definition of features related to the psychological concept of visual perception (e.g. Tamura et al., 1978; Davis et al., 1979; Wu and Chen, 1992). Tamura et al.. (1978) reported the use of six such human-like features they call visual perception features, they are: 1) coarseness, 2) contrast, 3) directionality, 4) linelikeness, 5) regularity and 6) roughness. It seems that such a definition of visual perception features is somewhat vague and can bring confusion as these attributes of 106

texture are really only a verbal translation of our way of describing visual texture in an artistic or common-language manner and are not necessarily the reflection of what we now know in terms of what is really happening in the visual cortex. Although these terms are very useful to our description of texture as an interpretation cue, it is scientifically misleading to call them visual perception features. One proof of their partial inappropriateness was given by the same authors by reporting a high rank correlation between different features which implies a certain amount of redundancy. Rank correlations of 0.650, 0.756 and 0.782 were found between coarseness and roughness, contrast and roughness and directionality and line-likeness respectively. The relationship between these language-related features and the actual brain process of visual perception is not a direct one and can suffer from differences of interpretation or even cultural and language context.

The visual interpretation of colour is a good example to illustrate such discrepancy. The actual trichromatic model of colour vision was postulated in 1802 by Thomas Young and was strengthened by physiological evidence in the sixties (Pratt, 1978). The interesting fact is that there is no apparent relation between the way colour information is acquired by our visual system and how we describe colour information in our every day language. There is no doubt that the trichromatic model is more easily translated into computer codes than our qualitative descriptions. The trichromatic model has also enabled the development of the Hue Saturation Intensity (HSI) model which lies somewhere between the functioning of our visual sensors (the cones and rods of our eyes) and our verbal description of colour. A similar process might occur for the interpretation of texture.

Since there are no complete models of human perception of visual texture (refer to Chapter 2) and since the concept of feature is related to machine vision rather than human vision, it would be erroneous to qualify attributes of texture as human-like features. However, strong physiological evidence suggests that the human visual apparatus makes some use of concepts such as frequency, orientation and contrast so that it seems appropriate that machine analysis of texture should use features that are related to these concepts. Within the scope of this thesis, the human-likeness of the features will refer to their ability to be translated into “channels” like the ones that have 107

been described either by neuro-physiological or even psychophysical evidence and not the mere fact that they correspond to our intuitive description of the attributes of visual texture.

4.2.4 The use of colour and multi-channel imagery

Digital images of the Earth come basically in three aspects: black and white, colour or multi-channel. How the texture analysis should be performed on colour or multi-channel images remains an open question. Most research on texture is done on black and white (one channel) images in an effort to isolate the question of texture from the spectral aspects of these images. The present research is no exception to this general approach and only black and white images will be used to investigate the various methods of texture analysis regardless of their original aspect.

The question remains nonetheless open as to weather or not texture analysis should be applied to every channel in a colour or multi-channel image. It appears that texture analysis, in most cases, needs only to be performed on a single channel which should either be chosen as containing the most textural information or be derived from the multi-channel image (perhaps in the same manner as intensity is created in a HueSaturation-Intensity transform). Two facts tend to support this approach:

1. There is usually a great amount of redundancy between the various channels of a multi-channel image (correlation of over 70% are not uncommon) and an Intensity channel could probably be created and used as the input channel for texture analysis to be performed.

2. In a study about pre-attentive recognition, Jan Theeuwes (1995) has shown that abrupt changes in colour having the same luminance were not easily detected by the human eye and that brightness changes, independently of being the same colour or not did trigger a pre-attentive identification.

These observations serve as a justification for not considering colour or multi-channel images within the scope of this study but do not aim at solving the question of colour 108

and texture. In fact, much research should be dedicated to the integration of colour and textural information both at the “visual brain” level and as a computer vision problem.

4.3 A SAMPLE SET OF TEXTURES Since the main objective of this work is to develop and compare different methods of texture analysis with the final goal of having the computer distinguish between them digitally, the texture set chosen bears a very important role. This problem has probably not, in fact, always received all the attention it deserves. Many studies claim success rates for segmentation or classification of textures that are really dependent on the texture sample chosen for the experiments. This situation makes it difficult to perform any comparison between the results of these studies. To avoid the trap of producing results that are dependent upon the texture samples chosen and that do not reflect “real life” situations (as in a real Remote Sensing study), a series of considerations have been taken.

The first of these considerations is to accept only texture patches from real aerial photographs or satellite imagery and that have a great chance of being found in a natural environment. However, the experiments presented in this thesis will use exclusively textures from scanned aerial photographs to minimise the regularisation effect that tends to be important on satellite imagery. The air photographs chosen for this first task are basically from two regions in Queensland (Australia): the Brisbane region for textures of forest, agriculture, urban areas and ocean waves; and the region West of Emerald for textures of scrub and barren land. All photographs were black & white on a 1:25000 scale. The sampled textures from these photographs are shown in figure 4.3 and 4.5.

To chose the resolution, a preliminary scanning was performed at 600 dpi (dot per inch which is roughly equal to 18 pixels per millimetre) yielding a pixel size of approximately one metre. Then the sampled images were successively degraded to coarser resolutions by simple averaging of the pixels. This approach assumes 1) an idealised square-wave response from the sensor and 2) a perfect square shape of the ground resolution cell. Although these bias are likely to affect the absolute values, the basic relationship among the various resolutions can reasonably be trusted (Woodcock 109

and Stralher, 1987). In that way, images with pixels of roughly two to 16 metres were generated and the local variance calculated over each image and resolution. The local variance is defined as the “average variance within a three by three moving window through the entire image” (Cao and Lam, 1997; p. 65) as was first suggested by Woodcock and Stralher (1987). Figure 4.5 is a plot of the average local variance for all the texture samples shown in figure 4.4. Additionally, the variance was calculated for each sample and is plotted in figure 4.6. On both graphs, the first break in the curve occurs at two (2) metres whereas the maximum variability can be observed at three (3) metres on the local variance plot. The resolution for this kind of study should be finer than the optimum resolution to ensure that crucial information is not lost in the regularisation process. Here, a resolution of two (2) metres should be sufficient so that scanning the photographs at a resolution of 450 dpi (yielding an approximate ground resolution of about 1.4 metres) ensured that most spatial variability would be preserved without generating unnecessarily large image sets.

Experiments on texture segmentation will be conducted in a highly controlled environment where the interpretation is as unambiguous as possible. In other words, an artificial patchwork of various textures will be used in the experiments but the patches will all be extracted from actual off-the-shelf scanned air-photographs.

Forest

Residential

Desert

Agriculture

Scrub

Waves

Figure 4.3 The initial set of textures (T6). The textures have been chosen for their widespread occurrence in aerial photographs and for their visual characteristics. Each texture patch is a square of 128 lines by 128 pixels; with a ground resolution of two square meters on the ground (1.4 x 1.4 m).

110

Figure 4.4 The T36 texture set. The texture matrix is organised as six lines of six different textures. Each line contains an array of the six generic textures of the initial texture set (figure 4.3).

Loca l Variance (3x3 box )

1 0,9

Data variability

0,8 0,7 0,6 0,5 0,4 0,3 0,2 0,1 0 0

4

8 Pixe l s ize (m )

12

16

Figure 4.5 Plot of the local variance against the resolution (in pixel size) for the 36 texture samples shown in figure 4.4. 111

Multiresolution Variance 2000 1750

S2

1500 1250 1000 750 500 0

4

8 Pixel size (m )

12

16

Figure 4.6 Plot of the variance against the resolution (in pixel size) for the 36 texture samples shown in figure 4.4.

As will be shown, the first series of experiments will be conducted on a very general set of textures having great visual and geographical differences (figure 4.3 shows the initial six texture patterns that have been chosen with a short description of each). Then, the initial sample set will be put against a series of other sets having a similar generic set of textures but with variations due to different

environmental conditions or from a

different formation process altogether (see figure 4.4 for the T36 texture set of six different sets). This configuration bears various advantages: • The first is that the initial set (figure 4.3) named T6, being very small can easily and quickly be processed using the different methods and its results be analysed for a simple, yet, meaningful comparison. • Second, the T36 texture set (figure 4.4) can easily be used in a variety of manners to infer the performance of the various techniques with respect to different experimental situations. For instance, the T36 set can be classified using the initial set as training input to see how much the different techniques enable textures to be associated with other related textures. The same set can be used to judge the amount of confusion at

112

various levels as within groups (e.g. the group formed by the forest textures) and between groups (each of the row subsets).

• The T36 texture set has been arranged so that contiguous textures have little features in common so that texture edges are neatly defined and contiguous texture will not “help” each other in the sense that when the window operator will process regions near the edges of the patch, it will overlap a completely different texture so that the correct attribution of a texture class will not be attributable to the other patch. This was done to intentionally make the classification more difficult and to “challenge” the methods for texture edge situations.

By using a combination of similar textures with very different ones, it is likely that a very broad range of situations are covered and there should be no need to use a more exhaustive set of textures covering more environments (like snow and ice for instance). However, it will be necessary to conduct more analysis with different resolutions, illumination conditions, orientations, and signal-to-noise ratio to determine the effect of these controlling factors on classification results.

All tests will be performed using these two texture sets to determine the settings of the different techniques giving better results and to assess what subset of features minimise classification errors.

4.4 CHOOSING METHODS FOR DESCRIBING AND CLASSIFYING TEXTURE As stated in Chapter 1, one of the main goals of this thesis is to propose and implement a texture analysis method that performs well with natural textures found in digital images of the Earth. This suggests that more than one method would have to be tested in order to make some kind of comparison possible. It was also suggested that at least one original method would be implemented that would be efficient and that would grossly agree with the neuro-physical and psychophysical evidences acquired so far on visual

113

texture perception as reported in Chapter 2. Upon analysis of the mathematical tools that appear to offer a good potential for the segmentation and classification of texture, it was decided that two completely different approaches would be used. Each of these could possibly yield various texture analysis programs by using slightly different approaches.

In all cases, the “feature” approach was favoured for all methods implemented, i.e. the various textural aspects of the sample texture fields will be translated into an “image variable” that the Pattern Recognition community commonly calls “feature”. This approach, although somewhat costly in terms of computer disc space, has the advantage of being easier to evaluate using the existing image segmentation and classification algorithms. For example, it is easier to have some algorithm produce a large number of different features and then having some evaluation method to decide which features play a role in distinguishing between the texture groups. The methods chosen along with their implementation will be described in detail in Chapter 5.

4.4.1 The Grey-Tone Spatial-Dependence Matrices Method

In this research, the Grey-Tone Spatial-Dependence Matrices will be used as a benchmark method. The reason for its choice lies in its wide acceptance by the scientific community. The method is well established and unambiguously described in the literature. The Grey-Tone Spatial-Dependence Matrices was proposed by Haralick, Shanmugan and Dinstein in 1973 who named it. Not only do almost all the authors in visual texture analysis quote the GTDM (also GLDM where L stands for level but will be called GDM from here on) but many have already used it as a comparison technique. Among them, Davis et al. (1979), Conners and Harlow (1980), Pratt (1991), Bonn and Rochon (1992), Wu and Chen (1992), Kundu and Chen (1992), Reed and du Buf (1993) and Dikshit (1996) have either used the GDM method as a comparison or have described it in their review. It should also be mentioned that at least two famous commercial image processing software packages use the GDM as their only texture analysis module. Even though the technique is almost twenty five years old, it is still the reference in image processing for texture analysis. The GDM as was originally proposed by Haralick et al. offers an enormous number of possibilities by combining a total of 14

114

measures that can be applied to any lag settings5. This is in fact one weakness of the GDM method which has little guidance to offer with regard to which lags and measurements should be used. The lag settings will be chosen based on the average spatial frequencies observed in the texture set but the subset measurements (it is unlikely that all 14 will be used) will depend on a combination of experience (mostly demonstrated by the various authors that have used the method) and theoretical results (using the mathematical description of each feature).

No special status will be given to the GDM method that is here regarded as having as much potential for texture analysis as the other methods. One major difference is that the GDM approach uses a diversified set of measurements whereas the other two methods are much more specialised in terms of types of measurement (as will be described in the following sections and in chapter 5). The only difference between the GDM method and the other two methods is that the former will not undergo any new development (if not some special considerations for choosing the lag distances and the set of measurements) and should be regarded as a benchmark method so that the results from newly developed methods can be put into perspective. In fact, the comparison will be made on the basis of the extent to which the various methods correctly enable the classification of a composite image of texture samples into the known number of texture classes.

4.4.2 A Second-Order Statistical Method

Since Bela Julesz has shown evidence that human perception of texture could be modelled using second-order statistics (although he would later change his theory for the “texton” approach motivated by the fact that some visually distinct texture had similar second-order statistics) many researchers have started to explore second-order statistics as possible features for texture analysis. The Grey-Tone Spatial-Dependence Matrices method is in fact based on a second-order histogram which is not, in itself, a statistical 5

Since the GDTM computes a second-order histogram, the lag setting defines the

distance in lines and pixels that separate the two events

115

parameter but from which second-order statistics can be measured. Among the most common second-order statistics that have been used are the spatial-autocorrelation, the covariogram and the semi-variogram. In particular, the semi-variogram has received much attention in the related field of Geostatistics (for a review, see Matheron 1971, Huijbregts 1975 or Clark 1979) and even directly applied to texture analysis (Woodcock et al. 1988A and B; Ramstein and Raffy, 1989; Miranda et al., 1992; Atkinson, 1995B; Lark, 1995; St-Onge and Cavayas, 1995).

The second-order statistics approach has been chosen as a good candidate for texture analysis for various reasons. The first one being primarily because it is a statistical method specially developed for natural geological phenomena (that is, originally) and is appropriate for representing the semi-repetitiveness of patterns founds in natural texture fields. Secondly, these methods, being very simple, are generally very manageable computationally and allow reasonable performance in terms of both virtual and physical memory space. Finally, in the case of the semi-variogram approach, since its behaviour has been well documented and even modelled, a strong theoretical background can be expected to bring support to the findings. If the semi-variogram approach is favoured a priori, only thorough testing will make it possible to strengthen its applicability or else to give the preference to another related method.

Second-order statistics can usually be represented as graphs either in one (orientation specific function curve) or two dimensions in which case all orientations are represented on the function surface. A two-dimensional graph (which is really a three-dimensional graph being represented in two dimensions using iso-lines) is very complicated to interpret directly and usually has to be divided into regions or directions for its significance to be used efficiently. For that reason, the approach taken here will use a set of uni-dimensional orientation-specific functions that will make it possible to account for a wide range of frequencies and orientations.

4.4.3 A Frequency Domain Approach

The frequency domain approach, also referred to as the Fourier Spectra approach or Spatial-Frequency method (see section 3.3), has been a long time favourite for texture 116

analysis. From the early attempts at using it as a texture analysis tool by Rosenfeld (1962) to the recent use of Gabor functions in combination with it to create frequencyand orientation-specific texture features (e.g. Fogel and Sagi, 1989; Jain and Farrokhinia, 1991; Goresnic and Rotman, 1992; Manjunath and Ma, 1996), the Fourier transform offers infinite possibilities not only for texture analysis but for applications requiring the analysis of spatial frequencies and their orientation. Apart from being a popular analysis approach, there is some evidence that the Fourier transform is related to the way our perception of texture is processed by the visual cortex (Harvey and Gervais, 1978). For a more detailed analysis of this relationship, the reader is referred to section 2.4. Most frequency domain approaches exploit the inverse of the Fourier transform (or other related transforms) to create features that usually reflect a certain range of frequencies, orientations or both. This implies that either the input image is first transformed, then a series of filters are applied in the frequency domain and each resulting modified spectrum is transformed back to the spatial domain where some measurement (average, energy, etc.) is taken to create the texture feature. This approach, although very widely used suffers from a very high computational cost having to performed for N features a total of N+1 transforms, each requiring the extensive use of complex and real values. Furthermore, the most recent methods usually perform this process for a small region or even each single pixel at a time. Even with the use of very efficient Fast Fourier Transform (FFT) routines, this approach is usually considered very slow.

One approach that has received relatively little attention was proposed by Jernigan and D’Astous (1984) in which the texture features were directly measured from the power spectrum (see section 4.1.4) therefore avoiding the use of the inverse transform for each texture feature. This approach has been favoured in this research, i.e. the texture features will be directly computed from the one-dimensional power spectrum; but the means used will be, however, completely different from the Jernigan and D’Astous method.

117

4.5 EVALUATING THE DIFFERENT METHODS There are many ways by which a texture analysis method could be evaluated. With an “image feature” approach, one could, for instance, evaluate each feature in terms of the “response” that it triggers in a group of observers. This is often the way chosen by psychophysical studies where the aim is really to understand the human visual system. Since the goals of this research are more practical and immediate, it seems that one of the most efficient is by evaluating a set of texture features through its performance in a classification scheme. This makes sense since the ultimate goal is really to conceive a method by which different textures can be distinguished. There are, however a number of drawbacks to this evaluation approach that should receive special attention: • Being an indirect approach, it does not involve a full understanding of the role of each feature in the classification process. • Results can change drastically between texture groups; even simply adding a new texture in a texture set can significantly alter the results. • The evaluation involves a comparison between obtained and reference classes which is in itself a qualitative process that does not give the real “distance” (in terms of feature space) between the obtained and the expected class. • The classification rule is in itself a somewhat subjective (the traditional minimising of the Total Errors of Classification; TEC is but one of the possible classification rules) and approximate solution (most classification rules make assumptions about the distribution and mean of the populations and their samples).

To remedy these potential problems, particular attention has been given to the experimental design which will be used to test and evaluate the texture features created by the different methods. First, the features will be analysed individually on each texture composing the initial texture set (figure 4.3) to evaluate their significance against a theoretical background which should already eliminate some options in the settings of

118

the different approaches or even some algorithms altogether. Secondly, the texture sets were arranged in a manner that will permit multiple usage. For example, in the T36 texture set (figure 4.4) each row can serve as an alternative set of classification training and the remaining row as test samples to test if related textures are also “close” in feature space. The error assessment can also be done per texture groups and not only for each sample individually; giving more strength to the results. Thirdly, the experiments will involve hundreds of classifications that will themselves form a “population” that will carry a much more significant information content and that should make it possible to get a picture of behaviour patterns. Such patterns will inform not only on what method is better for a particular situation, but will also yield greater understanding of the textures in images of the Earth; being the most important goal of this thesis as a new contribution.

119

Chapter 5

IMPLEMENTING METHODS OF TEXTURE ANALYSIS

5.1 THE GENERAL SETUP Implementing any image processing technique requires a certain amount of computer resources, the choice of which can have drastic effects on performance and portability. Apart from the obvious hardware and compiler resources, a display and image processing environment had to be selected; one that would offer sufficient versatility and power to avoid having to implement unnecessary codes. In short, the following resources were acquired: • Hardware: computer, display screen, pointing device • A Compiler • An Image Processing Package • A Desktop Graphics software

The specific resources selected for the implementation of texture analysis methods are enumerated and briefly described in appendix A.

Describing codes and programs can be very repetitive and tedious even for programmers who are not specifically interested in the implementation part so the full code description has been left for the appendices. The next section has been dedicated to a rather succinct description of common considerations and programming codes used for all three methods implemented. The remainder of this chapter will then describe all three approaches (namely, the grey tone dependency matrix, the variogram and the 120

Fourier approaches) for which preliminary tests will be presented to assess their individual potential along with special implementation considerations that had to be taken.

5.2 COMMON CONSIDERATIONS AND PROGRAMMING CODES Before starting to describe the different programs, each related to a particular approach, it is appropriate to introduce the programming not directly related to a texture analysis procedure in particular but that will be common to all three approaches. These include the considerations regarding window operators, texture edge problems and the general structure of the programs.

5.2.1 Some considerations regarding the use of focal operators

The method by which a neighbourhood around each pixel in a raster dataset is analysed is often referred to as a window or focal operator. It involves scanning the image (usually from top to bottom, left to right) pixel by pixel but also looking (or analysing) at the surrounding pixels based on the fact that some relation exists among these neighbouring pixels (Atkinson, 1995). Apart from the size of the window (which should depend on the application, image resolution, etc.) two principal factors have to be considered: 1) the fact that the image is finite, forces one to consider how the image border should be dealt with and 2) the shape of the window.

Being a finite set of data, the image necessarily has a border which brings a particular problem to the use of focal operators. For example, the first pixel at the top left corner has no left nor upper neighbour which would then bring a bias to whatever texture feature would normally consider this region. If the region is simply accounted for as “missing”, this would result in different bias for each method: a much reduced sample size for both the GTDM and the semi-variogram, missing lag values for the semivariogram and a reduced frequency band for the Fourier approach. Among the various solutions that have been proposed in the past for this problem three have been retained and tested: 1) to invalidate incomplete windows, 2) to mirror border data and 3) to

121

repeat border data. To ensure an easier understanding, these three possibilities have been illustrated in figure 5.1. .

.

(a)

(b)

(c)

(d)

Figure 5.1 Methods for dealing with the border problem when using focal operator. (a) the original image, (b) the illustrated effect of invalidating incomplete windows, (c) the border mirroring method and (d) the border repeating method.

Although invalidating incomplete windows is the only unbiased solution, the fact that reasonably large window operators are used (typically between 32x32 to 128x128) makes this option too costly in terms of image data lost. The second solution is certainly the most appealing visually since mirroring data gives a very smooth impression as if it were the true continuation of the image but, as will be shown in the next chapter, this creates artefacts in the orientation of the texture that would greatly affect the texture features. Finally, even if the border repeating method gives a visual sensation of a break in the image continuity, it has proven to give the smoothest transition in the texture features derived from the bordering regions.

The other factor likely to have a significant effect on the texture feature calculated from a focal operator is the shape of the window being used. There are in reality only two valid possibilities for the window shape: square or round. Because data is being read anyway or because it does not make much difference in small windows (3x3 or 5x5), most applications opt for the square window option. However, in the case of texture analysis, windows are usually rather large and it brings less bias to consider a round window. Perhaps the easiest way to achieve this is to consider the true distance value when calculating lags. For example, when considering neighbouring pixels on the diagonal, the distances should be multiplied by ¸ . Since this issue is quite important, it will be looked at with each method individually.

122

5.2.2 The texture “edge” problem

Obviously, the way to deal with image borders through the border repeating technique is a less than ideal solution and is most likely to bring a bias in the results; the larger the window operator, the stronger the bias. Ideally, larger sample texture images would have been chosen and the border simply discarded. The main reason why this was not possible is that such large images with uniform textures and a near constant mean (the assumption of stationarity) are very difficult to find considering that a total of 36 such samples were needed for the present research. Two considerations made this less-thanideal situation bearable in terms of scientific approach:

1- The bias is always the same for all three methods tested and since this is mainly a comparison study, this can be treated as a “difficulty” one necessarily needs to cope with to some extent.

2- This bias will be accounted for in the results through discarding all biased portions of the texture samples while comparing the classification results with the “ideal” texture map. In fact, many of the results will be presented both with and without these biased regions. The unbiased texture map will also discard the texture region that was used to train the classification (this will receive further attention in section 6.2).

Apart from the image border, the fact that texture samples will often be juxtaposed will create an unrealistic situation with very abrupt edges between texture samples. This however, is not all that unrealistic if we think of places where residential areas, parks, lakes and agriculture lie side by side without much transition between them. Even then, these edges will be excluded from the unbiased texture map to assess the true classification potential of each method. Figure 5.2 illustrates how the two types of texture maps will be constructed.

123

(a)

(b)

(c)

(d)

Figure 5.2. Illustration of the biased and unbiased texture maps that will be used to compare the results of classification: a) an original texture image composed of two texture samples, b) the biased texture map, c) the unbiased texture map excluding biased and conflict regions, and d) explanations for discarding each region.

5.2.3 The general structure of the programs

The C programming language was chosen for the implementation of the various algorithms and is generally regarded as highly modular in the sense that the extensive use of function calling can be achieved. This is usually a required state since it makes the programs much more readable and facilitates debugging. However, there is a point at which being too modular can have the opposite effect and make the program hard to read and follow and it can also become slower due to all the argument-passing involved with highly modular designs. The same goes for the use of header files6: their exaggerated use makes the program difficult to read and confusing. In this spirit, an attempt was made to find a reasonable compromise between effectiveness, simplicity 6

Header files, also called Include files (characterised by the “.h” extension) provide

function prototype declaration for library functions (Borland, 1995). 124

and modularity. In the structure adopted, each different program has its own user interface, variable initialisation, memory management and image reading and writing components. Although many of these components (“component” is here used as an identifiable but not separable part of a program) are identical for all the programs, subtle differences (specially in memory management and variable declaration) were more easily managed through repeating these components than encapsulating them into functions or header files. The following table gives an overview of the program structure (the complete program codes can be found in appendixes D to F).

125

Table 5.1 The general structure of the texture analysis programs. Introducing the program

This is a simple comment about the program’s utility, its basic functioning, its principle (Fourier transform, GTDM, variogram) and its output.

General declarations

In this block, all the calls to the included files are made (both from the C’s own library and home made) along with the declaration of global variables: image header structures, files, functions and other variables that can be used without having to be passed.

Declaration of variables

In C and C++, all variables need to be declared. Here, all the singular variables (having one value at a time) and the fixed length arrays (declared with enclosed brackets: array[255]) are declared

Allocation of memory

Here, all the open length arrays are declared using one of C library function for memory allocation (mostly calloc or farcalloc for large blocks) with the exception of arrays whose size depends on a program option like the window buffer.

File Management (user

In this component, all the files, read or written, are initialised through the user

interface)

interface. The user is first asked about the location of the directory from which he wishes to work and then the name of the input and output files he wishes to use. The program proceeds to open the input file along with its header file (having a “.ers” extension) which is read. Then the output image and image header files are created and the latter is filled with the image data details (e.g. size, data type, etc.). Both header files are then closed.

Program options (user

This component is still part of the user interface and sets the value of variables

interface)

that are specific to each application and each session such as window size or lag distances. The program timer is also zeroed here.

Allocating memory to

The window buffer size is dependent on the user’s options. This array is

the window buffer

defined as “**matrix” which means it is an array of arrays or a twodimensional array.

Loading the image in the

This component is where the image data is read and feeds the window buffer,

window

buffer

much in the manner that usual focal operators work for convolution filtering.

calling

the

and texture

The texture analysis function is then called for each pixel of every line.

analysis function Writing the results and

The texture analysis function is usually a void type function (it does not return

ending the program

a value) that simply fills an array that is then written to the output file.

Declaration of program’s

These functions are usually all related to the particular approach being used.

specific functions

All the other functions that are used by all the programs are declared in a header file.

126

5.3 THE GREY TONE DEPENDENCY MATRIX APPROACH A complete explanation of the Grey Tone Dependency Matrix (GDTM) method has already been given in section 3.6. In this section, its implementation will be discussed along with choices that had to be made in order to keep to a reasonable number the texture features being created. In the first sub-section, the GTDM was implemented to give a single result matrix for each sample texture from the initial sample set of six (figure 4.3). This first program will make it possible to compute a visual representation of the co-occurrence matrix (in 3-D) and to have an overview of some of the most significant features that were defined by Haralick, Shanmugam and Dinstein (1973).

5.3.1 Computing and displaying the co-occurrence matrix for an image

In this first implementation, the entire image is passed as a matrix to the co-occurrence function that computes the matrix, passes it to each feature function sequentially and fills a vector containing the values of each feature. The program writes both the cooccurrence matrix and the feature set to an ASCII file. Listing 5.1 shows the C codes used for the co-occurrence function.

Listing 5.1. The co-occurrence matrix calculation function. void compute_gldm(char **matrix, int **com, int n, int *vector, int lag_x, int lag_y) { int i, j, i_ini, i_end, j_ini, j_end; for(i=0; iversion)); } if(strncmp(buf,"LastUpdated",11)== 0) { while(c != '=') c = fgetc(ers_input); c = fgetc(ers_input); fgets((iners->lastupdated),29,ers_input); } if(strncmp(buf,"SenseDate",9)== 0) { while(c != '=') c = fgetc(ers_input); c = fgetc(ers_input); fgets((iners->sensedate),29,ers_input); } if(strncmp(buf,"DataSetType",11)== 0) { while(c != '=') c = fgetc(ers_input); c = fgetc(ers_input); fscanf(ers_input,"%s",(iners->datasettype)); } if(strncmp(buf,"DataType",8)== 0) { while(c != '=') c = fgetc(ers_input);

xlv

c = fgetc(ers_input); fscanf(ers_input,"%s",(iners->datatype)); } if(strncmp(buf,"ByteOrder",9)== 0) { while(c != '=') c = fgetc(ers_input); c = fgetc(ers_input); fgets((iners->byteorder),9,ers_input); } if(strncmp(buf,"CoordinateSpace",15)== 0) { while(strncmp(buf,"Begin",5) != 0) fscanf(ers_input,"%s", buf); while(strncmp(buf,"CoordinateSpace",15)!= 0) { fscanf(ers_input,"%s",buf); if(strncmp(buf,"Datum",5)==0) { while(c != '=') c = fgetc(ers_input); c = fgetc(ers_input); fscanf(ers_input,"%s",(incoor->datum)); } if(strncmp(buf,"Projection",10)==0) { while(c != '=') c = fgetc(ers_input); c = fgetc(ers_input); fscanf(ers_input,"%s",(incoor->projection)); } if(strncmp(buf,"CoordinateType",14)==0) { while(c != '=') c = fgetc(ers_input); c = fgetc(ers_input); fscanf(ers_input,"%s",(incoor->coordinatetype)); } if(strncmp(buf,"Rotation",8)==0) { while(c != '=') c = fgetc(ers_input); c = fgetc(ers_input); fscanf(ers_input,"%s",(incoor->rotation)); } } } if(strncmp(buf,"RasterInfo",10)== 0) { while(strncmp(buf,"Begin",5) != 0) fscanf(ers_input,"%s", buf); while(strncmp(buf,"RasterInfo",10)!= 0) { fscanf(ers_input,"%s",buf); if(strncmp(buf,"CellType",8)==0) { while(c != '=') c = fgetc(ers_input); c = fgetc(ers_input); fscanf(ers_input,"%s",(inrast->celltype)); } if(strncmp(buf,"NullCellValue",13)==0) { while(c != '=') c = fgetc(ers_input); c = fgetc(ers_input); fscanf(ers_input,"%d",&i); inrast->nullcellvalue = i; } if(strncmp(buf,"Xdimension",10)==0) { while(c != '=') c = fgetc(ers_input); c = fgetc(ers_input); fscanf(ers_input,"%f",&f); inrast->xdimension = f; } if(strncmp(buf,"Ydimension",10)==0) { while(c != '=') c = fgetc(ers_input); c = fgetc(ers_input); fscanf(ers_input,"%f",&f); inrast->ydimension = f; } if(strncmp(buf,"NrOfLines",9)==0) { while(c != '=') c = fgetc(ers_input); c = fgetc(ers_input); fscanf(ers_input,"%d",&i); inrast->nroflines = i; } if(strncmp(buf,"NrOfCellsPerLine",16)==0) { while(c != '=') c = fgetc(ers_input); c = fgetc(ers_input); fscanf(ers_input,"%d",&i); inrast->nrofpixels = i; } if(strncmp(buf,"NrOfBands",9)==0) { while(c != '=') c = fgetc(ers_input);

xlvi

c = fgetc(ers_input); fscanf(ers_input,"%d",&i); inrast->nrofbands = i; for(i=0; inrofbands); i++) { c = 'x'; while(c != '"') c = fgetc(ers_input); j = 0; c = 'x'; buf[0] = 0; while(c != '"') { c = fgetc(ers_input); buf[j] = c; j++;} buf[j-1] = (char) 0; strncpy(((*inband[i]).value),buf,j); } } } } } return(0); } int write_ers(FILE *ers_output,ERMGENERAL *outers,COORDINATESPACE *outcoor,RASTERINFO *outrast,BANDID *outband[20]) { int i; fprintf(ers_output, "DatasetHeader Begin\n\tVersion\t\t= %s\n\tLastUpdated\t= %s\n\t\SenseDate\t= %s\n\tDataSetType\t= %s\n\tDataType\t= %s\n\tByteOrder\t= LSBFirst\n", outers->version, outers->lastupdated, outers->sensedate, outers>datasettype, outers->datatype); fprintf(ers_output, "\tCoordinateSpace Begin\n\t\tDatum\t\t= %s\n\t\tProjection\t= %s\n\t\tCoordinateType\t= %s\n\t\tRotation\t= %s\n\tCoordinateSpace End\n",outcoor>datum, outcoor->projection, outcoor->coordinatetype, outcoor->rotation); fprintf(ers_output, "\tRasterInfo Begin\n\t\tCellType\t= %s\n", outrast->celltype); if (outrast->nullcellvalue != -999) fprintf(ers_output, "\t\tNullCellValue\t= %d\n", outrast->nullcellvalue); if((outrast->xdimension != 1) && (outrast->ydimension != 1)) fprintf(ers_output, "\t\tCellInfo Begin\n\t\t\tXdimension\t= %f\n\t\t\tYdimension\t= %f\n\t\tCellInfo End\n", outrast->xdimension, outrast->ydimension); fprintf(ers_output, "\t\tNrOfLines\t= %d\n\t\tNrOfCellsPerLine\t= %d\n\t\tNrOfBands\t= %d\n",outrast->nroflines, outrast->nrofpixels, outrast->nrofbands); for (i=0; inrofbands; i++) { fprintf(ers_output, "\t\tBandId Begin\n\t\t\tValue\t\t= \"%s\"\n\t\tBandId End\n", outband[i]->value); } fprintf(ers_output, "\tRasterInfo End\n"); fprintf(ers_output, "DataSetHeader End\n"); return(0);

}

xlvii

APPENDIX C

The “my_structure.h” header file listing

Due to space problems, the text in the listing was formatted using courier 8 font. Readers interested in having the digital listing should contact the author at “[email protected]”.

#include #include #include #include #define MY_BUFFER_SIZE 262144 typedef struct ermgeneral { char version[6]; char lastupdated[30]; char sensedate[30]; char datasettype[20]; char datatype[20]; char byteorder[8]; } ERMGENERAL; typedef struct coordinatespace { char datum[8]; char projection[20]; char coordinatetype[5]; char rotation[15]; } COORDINATESPACE; typedef struct rasterinfo { char celltype[28]; int nullcellvalue; float xdimension; float ydimension; unsigned int nroflines; unsigned int nrofpixels; unsigned int nrofbands; } RASTERINFO; typedef struct bandid { char value[35]; } BANDID; typedef struct fxy { float x; float y; } FXY; typedef char file_name[255];

xlviii

APPENDIX D

The Grey Tone Dependency Matrix “c” program listing

Due to space problems, the text in the listing was formatted using courier 8 font. Readers interested in having the digital listing should contact the author at “[email protected]”.

// // // // // // //

********************** ********************** ********************** ********************** ********************** ********************** **********************

Program that computes the grey level dependency **** matrix for one set of lags (X and Y) and then com- *** putes some of the most common 2nd order statistics *** that were defined by Haralick in his original paper ** on GLDM (also called co-occurence matrix) in 1979: *** Contrast (CON), Angular Second Moment (ASM), Entropy * (ENT)and Inverse Different Moment(IDM). ***

// ********************** This program produces output images #include #include #include #include #include #include

**************

ERMGENERAL COORDINATESPACE RASTERINFO BANDID

gen_erm, *pgen_erm = &gen_erm; coor_erm, *pcoor_erm = &coor_erm; rast_erm, *prast_erm = &rast_erm; band_erm[24], *pband_erm[24];

void compute_gldm (unsigned char **matrix, int **com, int n, int *vector, int lag_x, int lag_y); int gldm_con(int **com, int n), gldm_asm(int **com, int n), gldm_ent(int **com, int n); int gldm_idm(int **com, int n), gldm_imc(int **com, int n), gldm_cor(int **com, int n); FILE *output_ers;

*input_ers, *input_raster, *output_data,

main(void) { //

******************** INITIALISATION OF VARIABLES ******************** // ******************** ___________________________ ******************** file_name o_file_name, i_file_name, name; char dir_name[255], f_path[255]="", *def_dir, ext1[35], ext2[35], ext3[35], bs[35]; unsigned char *image_buffer, **matrix; int i, j, k, l, w_size, n_windows_x, n_windows_y, mid, j_index, n, **com, lag_x, lag_y, lag; int *vector, n_pixels, n_lines; short int *output_line; clock_t start; //

******************** ALLOCATION OF MEMORY *****************************

xlix

//

******************** ____________________ ***************************** def_dir = (char *) farcalloc(256, 1); if (!(def_dir)) {printf("Not enough memory\n"); exit(1);} image_buffer = (unsigned char *) farcalloc(65536, 1); if (!(image_buffer)) {printf("Not enough memory to allocate buffer\n"); exit(1);} output_line = (short int *) farcalloc(24576, sizeof(short int)); if (!(output_line)){printf("Not enough memory to allocate buffer\n"); exit(1);} vector = (int *) farcalloc(16, sizeof(int)); if (!(vector)){printf("Not enough memory to allocate buffer\n"); exit(1);} for(i=0;i