Image Equilibrium: A Global Image Property for Human ... - Springer Link

Image Equilibrium: A Global Image Property for Human-Centered Image Analysis ´ S´ O. anchez1 and M. Rinc´ on2 1

Gesti´ on de Infraestructuras de Andaluc´ıa, S.A. Regional Ministry of Public Works and Transport, Junta de Andaluc´ıa Charles Darwin s/n, Isla de La Cartuja, 41092, Seville, Spain [email protected] 2 Dpto. de Inteligencia Artificial, ETSI Inform´ atica. UNED Juan del Rosal 16, 28040 Madrid, Spain [email protected]

Abstract. Photographs and pictures created by humans present schemes and structures in their composition which can be analysed on semantic levels, irrespective of subject or content. The search for equilibrium in composition is a constant which enables us to establish a kind of image syntax, creating a visual alphabet from basic elements such as point, line, contour, texture, etc. This paper describes an operator which quantifies image equilibrium, providing a picture characterisation very close to a pixel matrix with considerable semantic content. Index Terms: human-centered image anlysis, image syntax, visual alphabet and semantic gap.

1

Introduction

When “analysing” pictures created by humans, the initial problem, irrespective of the method used, is known as the semantic gap [8] . An image contains colours, lines, figures, objects and elements which humans are capable of understanding through visual perception. In order to analyse an image in depth, a series of processes are required to obtain more abstract representations, ranging from operations with pixels to associate to models such as those found in [6], [7], to the location of contours, edges, objects, etc., as in [5], [2]. An abstract representation of the image is obtained in both cases. When someone takes a photograph or draws something, he or she uses visual composition schemes, much like when someone speaks (words, phrases, paragraphs, etc.). These composition schemes are related to visual perception and based on some fundamental principles which can be summarised as the search for equilibrium in each element. When we pick up a camera to take a picture of a landscape, we configure a vertical and horizontal axis where the position of each element (amount of sky, horizon, focal point, etc.) is intuitively balanced. Image syntax establishes these composition principles and uses a visual alphabet to develop a semantic configuration from basic elements such as points, J. Mira et al. (Eds.): IWINAC 2009, Part II, LNCS 5602, pp. 216–224, 2009. c Springer-Verlag Berlin Heidelberg 2009

Image Equilibrium

217

Fig. 1. Photograph taken by someone with composition criteria. The equilibrium axes are on the right-hand side of the road. Area 1 is compensated by area 2, creating a diagonal axis following the road.

lines, contours, colour, texture, scale, etc. In the art field, it has been used for semantic analysis and it can be found in the paper by D. Dondis [3]. The semantic gap problem is reduced in this system, as the step from the pixel matrix to the elements of the visual alphabet is smaller and semantic relations can be established in compositions derived from image syntax. This paper presents a computable definition of the “image equilibrium” concept which can be subsequently used in a visual structured semantic image analysis. The paper is organised as follows. Section 2 contains an introduction to image syntax and the visual alphabet and how they are based on the principle of equilibrium. In section 3, we establish a computable definition of the equilibrium concept. Finally, section 4 provides an example of the use of the equilibrium operator on a road works monitoring photograph.

2

Image Syntax and the Visual Alphabet

When considering images created by humans (photographs, drawings, paintings, graphs, etc.), irrespective of their purpose (artistic, representative, gestural, etc.), we have to consider that, in visual perception [1], the form of configuring, articulating and creating the image is based on a series of principles and laws. In the visual communication field, graphic design or art, image syntax is generally used for such an analysis. In “ A Primer of Visual Literacy” [3] D. Dondis contemplates the creation of a visual alphabet with which to develop an image syntax system enabling the creation of compositions from primary elements (points, lines, colour, texture, etc.) and principles and laws of composition with semantic value (equilibrium, preference for the lower left part of the image in the western world, etc.). In other words, in order to create an image of a new sports car, for instance, we would start with a specific type of composition in which the elements have properties such as diagonal lines, colours which are highly suffused around the edges in order to attract attention to the corners, lack of circles or enclosed contours, etc. Its structure and composition, irrespective of the location of the car in the

218

´ S´ O. anchez and M. Rinc´ on

picture, could be configured based on this syntax on a basic semantic plane without reference to recognisable objects or elements. This system enables us to work with all types of image without the need for specific expertise. Dondis establishes a series of principles which develop in perception and guide how the composition of an image is perceived, including the following: – Equilibrium. This is of a psychological nature and we tend to look for it amount the elements which are unconsciously found in the image. This equilibrium is established from a vertical and a horizontal axis derived from how the surrounding environment is visually configured, governed by principles such as the law of gravity. These two axes form what are known as “equilibrium axes”. – Stress. Some elements appear to be unstable, giving a sensation of motion. This principle is the opposite of the previous point and, when it appears, produces a constant need to establish equilibrium. – Preference for the lower-left part of the image. This is only applicable to the western world, and is not found in either eastern or Arabic culture. It is therefore a cultural, rather than psychological, feature, which is applicable in our case because the method is established in a western setting. According to this idea, the initial analysis is based on the equilibrium axes, and the second focuses on the lower-left part of the image.

3

Image Equilibrium

The principle of equilibrium is the basis of image syntax, so the analysis starts by identifying which parts of the image have the most stress, how some parts balance with others, etc. In other words, different levelling or balancing operations are performed based on the equilibrium axes, much like matching the weight on one side of a set of scales to the weight on the other. Fig. 2 shows an example of balancing scales, which is very similar to what is done with the image. The goal is obtain a representation of the image where we can see which parts have most stress and which are balanced, according to the visual alphabet element being analysed. This process is described below. We first look for the equilibrium axes and divide the picture into four quadrants. These quadrants are in turn divided into 9 homogeneous blocks. For a given element of the visual alphabet, we then seek to balance each block with symmetrical regions relative to the equilibrium axes. The result is a vector with 36 elements, one for each block, determining the equilibrium level for the visual alphabet element in question.

Fig. 2. Balancing system. To maintain equilibrium on a set of scales, we either move objects around or add others. When an object is larger on one side than on the other, we move it towards the centre, and vice versa.

Image Equilibrium

3.1

219

Equilibrium Axes and Dividing the Image into Blocks

To configure the axes, we analyse the tone of the image, which image syntax defines as its primary feature. The image is binarized by thresholding with a . We establish axes from the geometric centre of threshold value of H = max(I) 2 the image, thus dividing it into four equal regions denoted En , with n = 1, .., 4 . We apply the following equation in order to establish the position of the axes’ central point, EA: 1 En 1 En X , Y ) (1) EA = ( 4 n x 4 n y XxEn En provides the position of the vertical axis, and YyEn that of the horizontal axis in each En . They are equivalent to the positions of x and y in each quadrant with the largest number of pixels with value 0 in the rows, for x , and in the columns, for y. The mean position is taken if there are several rows or columns with the same value. Having obtained point EA, , the image is divided into four “quadrants” qn , where n = 1, .., 4. To simplify the operations between quadrants, they are normalised by horizontal and vertical reflections, so the origin is always point EA . The transformation of each quadrant is as follows: ⎧ Q1 = RH (RV (q1 )) ⎪ ⎪ ⎪ ⎨Q = R (q )) 2 H 2 (2) ⎪ Q = R 3 V (q3 ) ⎪ ⎪ ⎩ Q4 = q4 Where RH is horizontal reflection and RV is vertical reflection. Each quadrant Qn , where i = 1, .., 3 and j = is divided into 9 equal regions called “blocks” Ci,j 1, .., 3. Figure 3 shows an example of this transformation.

Fig. 3. Vertical and horizontal axes and division into quadrants and blocks. The figure shows how the position of the blocks in each quadrant is reconfigured following transformation rules.

220

3.2


Block Stress

Given an element of the visual alphabet, which is analysed relative to a property Qn Qn p, the stress of a block Ci,j , denoted Ti,j,p , measures the degree to which the element is highlighted. With regards to colour, for instance, we want to know the number of pixels corresponding to most saturated, lighter and less hue, considering that colour is divided into three sub-properties: hue H , luminance Qn only takes intense values of property L and saturation S. The definition of Ti,j,p p, into account, for which a domain-dependent µ threshold is defined: Qn Qn p(Ci,j ) if p(Ci,j )>µ Qn (3) Ti,j,p = 0 otherwise For simplicity’s sake, and as the analysis is performed for a single property, we will eliminate sub-index p in the rest of the paper. 3.3

Block Equilibrium

Qn , with the blocks from the Our goal is to seek the equilibrium of each block Ci,j other quadrants Qm , m = n. We use a weighting mask M , the centre of which Qm is positioned at the centre of block Ci,j . We thus determine a neighbourhood Qm around symmetric block Ci,j . Considering the idea of balancing a set of scales, the closer the neighbourhood is to the centre of the axes, the greater is the property of the element of the visual alphabet which is being analysed. As compensation differs according to the relative position of the elements being compared, we establish the following weighting matrices according to the relationship between the quadrants (horizontal H, vertical V and diagonal D): 1 2

22 H= 222 222

1 1 2 2 1 2 2

2 2 V = 2 22

D=

1 2 1 2 1 2

1 1 2 2 1 2 2

2 2

where the concept of greater/smaller has been simplified to double/half. The diagonal ratio needs greater values in blocks closer to the centre of the equilibrium axis (double), whereas those more distant are normalised as half. The lowest values required for compensation are on the horizontal axis. Qn and a block from the To calculate whether there is equilibrium between Ci,j Qm other quadrants Cp,q , we first weight the value of the stress in the destination quadrant according to the following equation: p,q,Qm Qm = M2+p−i,2+q−j · Tp,q Ri,j,Q n

(4)

where M is the weighting mask (H , V or D, according to the relationship between quadrants Qn y Qm).

Image Equilibrium

221

i,j,Qm Qm CEp,q,Q shows whether there is equilibrium with the destination block Cp,q : n p,q,Qm Qn p,q,Qm m − Ti,j ) ∈ −tp,q,Q 1 if (Ri,j,Q p,q,Qm i,j,Qn , +ti,j,Qn n (5) CEi,j,Q = n 0 otherwise m The threshold tp,q,Q i,j,Qn which defines the equilibrium range is obtained from the mean value of those being compared by a proportionality constant d ∈ [0, 1] which depends on the element of the visual alphabet being analysed.

m tp,q,Q i,j,Qn =

Qn Qm + Tp,q ) (Ti,j ·d 2

(6)

As we are attempting to balance each block with the weighted blocks of the other three quadrants and their neighbourhoods, we define the final equilibrium value of block, Ei,j,Qn , from the quantity of equilibria obtained: Ei,j,Qn = 1 −

1 1 C ·Q− · Q 4 4 3

(7)

where Q is the number of balanced quadrants and C is the number of balanced blocks: Q m Q= (Ei,j,Q > 0) (8) n m

C=

m

Qm Ei,j,Q n

(9)

Qm p,q,Qm is the sum of all the values of CEi,j,Q obtained in each quadrant: where Ei,j,Q n n Qm p,q,Qm Ei,j,Q = CEi,j,Q (10) n n p,q

Qn , the closer The more the quadrants and blocks balanced by a given blockCi,j is expression (7) to 0, or equilibrium. Likewise, the fewer the balanced quadrants and blocks, the closer it is to 1, or absence of equilibrium.

4

Practical Case: Equilibrium Analysis in the Colour of Road Works Monitoring Photographs

We will now analyse an image with regards to one of the elements of the visual alphabet, in this case colour. In the visual alphabet, colour has three properties defined as hue, luminance and saturation; hence, C = (H, L, S), each on a scale of [0, 255]. In our case, we establish a threshold for each of them so that, to consider that there is stress on a given pixel, it must satisfy the following rules in each subproperty: Stress = (−H > −20)(L > 225)(S > 225). The specified thresholds can be adjusted according to application domain. Considering this criterion, we have used figure 4 to, first, separate the hue, luminance and saturation properties (top right) and subsequently, applying the rules, obtain the pixels producing stress (bottom right).

222


Fig. 4. Photograph used for balancing. On the left we can see the original and how it is divided into quadrants and blocks. On the right, we see the representation of hue, luminance and saturation (top) and the pixels which produce stress (bottom). Q3 Q2 and Ci,j Table 1. Stress values of Ci,j

0.000131 0.000090 0.000018 Q3 Ti,j = 0.000000 0.000044 0.000000 0.000000 0.000000 0.000000 0.000484 0.000242 0.000045 Q2 Ti,j = 0.000389 0.000229 0.000032 0.000134 0.000064 0.000032

Once this criterion has been applied, we calculate the number of pixels for each block and normalise them between 0 and 1. This normalisation enables us to work with similar values on different elements of the visual alphabet. As an Q3 Q3 and C2,1 , from example, we describe the balancing operations of two blocks, C1,1 quadrant Q3 (lower left) with quadrant Q2 (upper right). We use mask D when comparing diagonally positioned quadrants. Table 1 shows the stress values for colour in both quadrants. Q3 Q3 and C2,1 with Tables 2 and 3 show the values obtained when balancing C1,1 quadrant Q2 . They are balanced using d = 0.5, for a tighter adjustment. Q3 For C1,1 , the number of balanced blocks in Q2 is CQ2 = 1. Using the same procedure for the other quadrants and Q = 1 (only in Q2 ), the final equilibrium value would be E1,1,Q3 = 0.7. In this case, there would be hardly any equilibrium. Q3 For block C2,1 , CQ2 = 4, Q = 3 ( in all quadrants) and E2,1,Q3 = 0.1. This would, then, be close to equilibrium. In the image in figure 4, in the first case there is

Image Equilibrium

223

Q3 Table 2. Balancing C1,1 in Q2 Q2 Tp,q

pq 1 2 2 1

1 1 2 2

p,q,Q2 p,q,Q2 2 D R1,1,Q (4) tp,q,Q 1,1,Q3 (6) CE1,1,Q3 (5) 3

0.000484 12 0.000242 2 0.000229 2 0.000389 2

0.000242 0.00015375 0.000484 0.00009325 0.000458 0.0009 0.000778 0.00013

1 0 0 0

Q3 Table 3. Balancing C2,1 in Q2 Q2 Tp,q

pq 2 1 2 2 1 1

1 3 3 2 2 1

p,q,Q2 p,q,Q2 2 D R2,1,Q (4) tp,q,Q 2,1,Q3 (6) CE2,1,Q3 (5) 3

0.000242 12 0.000121 0.00007575 0.000045 2 0.00009 0.0000265 0.000032 2 0.000064 0.00002325 0.000229 2 0.000458 0.00007225 0.000389 12 0.0001945 0.000389 0.000484 12 0.000242 0.00013625

1 1 1 0 1 0

Q3 Q3 not significant balancing in block C1,1 . In the second case in C2,1 there are some pixels on the road surface which produce some stress, less than in the case of the arrow, but which can be balanced either due to excess, the arrow, the road markings, or to the absence of stress.

5

Conclusions

This paper contemplates an operator for equilibrium analysis of an image based on image syntax. The process involves dividing the image into 4 parts according to equilibrium axes, and then dividing these into 9 equal blocks to establish the equilibrium of each of these blocks with the rest. The process is applied to each element of the visual alphabet (points, lines, contours, colour, texture, etc.), finally obtaining a vector of image characteristics comprising vectors of 36 values for each element. When the equilibrium of basic aspects of the image, such as points, lines, colour or texture, is analysed, this facilitates the semantic gap, as this characteristics vector is very close to the pixel level, but represents a more abstract aspect of the semantic plane, as it is related to the visual alphabet and image syntax. The application of this equilibrium analysis operator enables the creation of models for the analysis of man-made photographs or images, irrespective of the scope of application.

Acknowledgements The authors would like to thank the CYCIT for financial support via project TIN-2007-67586-C02-01. We also thank all GIASA staff for their dedication and professionalism.

224


References [1] Arheim, R.: Visual Thinking. University of California, Berkeley (1969) [2] Ballard, D.H.: Generalizing the Hough Transform to detected arbitrary shapes. Pattern Recognition 13(2), 111–122 (1981) [3] Dondis, D.A.: A primer of Visual Literacy. The Massachussets Institute of Technology (1973) [4] Eakins, J., Graham, M.: Content-based image retrieval, Tech. Rep. JTAP-039, JISC (2000) [5] Govindaraju, V.: Locating human faces in photographs. International Journal of Computer Vision 19, 129–146 (1996) [6] Rowley, H.A., Baluja, S., Kanade, T.: Neural Network based FACE detection. IEE Trans. Pattern Analysis and Machine Intelligence 20, 23–38 (1998) [7] Schneiderman, H., Kanade, T.: A Statistical model 3D object detection applied to faces and cars. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2000), Hilton Head Island, SC. IEEE, Los Alamitos (2000) [8] Smeulders, W.M., Worring, M., Santin, S., Gupta, A., Jain, R.: Content-based image retrieval at the end of the early years. IEEE Trans. Pattern Anal. Mach. Intell. 22(12), 1349–1380 (2000)