cluster, and deltaE is the Euclidean distance in the CIELAB space. The matrix DE must respect the following conditions, which are also satisfied by matrix D due ...
A Color Interface for Audio Clustering Visualization Silvia Zuffi, Isabella Gagliardi ITC, Consiglio Nazionale delle Ricerche, Milano, Italy Abstract The availability of large audio collections calls for ways to efficiently access and explore them by providing an effective overview of their contents at the interface level. In this paper we present an innovative strategy exploiting color to visualize the content of a database of audio records, part of a website dedicated to ethnographic information in a region of Italy.
Recently, the problem of the visualization of audio contents has drawn much attention due to the increasing volume of the collections of music available on the Internet. The simplest way to display these contents is to present textual lists where each item on the list shows the attributes of the corresponding piece. However, visualizations based on textual lists are not enough to help users to efficiently explore music libraries. Different forms of alternative visualization schemes have been proposed, based on semantic attributes of pieces, or on audio information. An effective strategy for the design of interfaces accessing database contents exploits color attributes: the user is facilitated in the browsing of the database if interface colors are employed to represent some property of the audio contents. We present here an innovative strategy for the visualization of the contents of a database of audio records. The solution we propose is based on the establishment of a mapping between the acoustic features of audio clusters and colors. The database used in our implementation is that of the Archive of Ethnography and Social History of the Lombardy Region (AESS). The AESS was founded to preserve, study, and enhance the value of documents and images of the life, social transformations, literature, oral history, material culture, and anthropic landscapes of the Lombard territory. Its website, designed by ITC, an institute of the CNR of Milan, stores information concerning the oral history of the region, and is composed mainly of popular songs and other audio records describing the popular traditions (such as fairs, and customs) handed down from generation to generation. The AESS website implements, besides the standard functions of catalogue and search, various modalities of navigation and employ 1. These include, among others, the location and clustering of similar audios, that is the organization of the audio files stored in the database in groups containing files acoustically similar to each other.
AUDIO CLUSTERS DEFINITION
In the AESS system, the acoustic similarity among the oral documents has been computed with the TreeQ system, implemented by Jonathan T. Foote of the Institute of Systems Science, at the National University of Singapore 2. This method represents each audio file as a histogram encoding some fundamental physical features of the file. These histograms can be considered vectors; therefore, the acoustic similarity index between two files is estimated by computing the cosine distance between the related vectors: the closer the index to 1, the more similar the two files in their acoustic features. The AESS website 1 implements the functions of audio clustering, that is it organizes the audio files stored in a database in groups containing files acoustically similar to each other. To identify audio files similar to each other, we have used vectors associated with the histograms created from quantization trees. As in traditional clustering processes, the division of the vectors is reiterated until certain conditions are satisfied; in our case the process is interrupted when, inside each cluster, the similarity index among all the possible couples of vectors is greater than a threshold set empirically at 0.8. According to this criterion, our algorithm first computes a clustering which assigns all the vectors to n different groups; it then checks to see if each of the clusters obtained is acceptable; if it is not, the cluster is divided into two sub-clusters, and a new clustering is computed on those vectors that have been assigned to the divided cluster, using as new centroids the two vectors with the lowest index of similarity. When all the clusters satisfy the evaluation criterion, the algorithm estimates whether it is possible to group together two or more clusters to produce a new cluster
that still satisfies the evaluation criterion. In particular, the clustering is computed by assigning each file to the cluster identified by the centroid to which it has resulted most similar. This is done by calculating the barycenter of each cluster and then the similarity index between the barycenter and the centroid: if this index is equal to 1, the process ends; otherwise the barycenter is set as the new centroid, and the process is reiterated from the beginning. In this manner, the number of clusters varies during the process, adjusting to the nature of the data. The implemented algorithm has produced satisfactory results, displaying a strong potential for discriminating different types of audios, while each cluster contains only similar objects.
AUDIO CLUSTERS VISUALIZATION
On the AESS website audio clusters were represented at first as buttons of the same size, but of different colors to give the user an idea of the number of audio files in the different clusters (Figure 1). But realizing that the user might want to browse the database on the basis of cluster contents rather than size, a different color-coding strategy was implemented. The new strategy could be based on information either about the contents of the audio clusters, or about the relationship between clusters. Obviously, in coding the contents, we could reasonably expect to obtain an implicit coding of cluster similarity. We have exploited both sources here.
Figure 1. AESS website page of audio file clustering. The first step was to characterize each cluster by defining an average histogram, which was computed as follows: Nc
hˆc ,k = ∑ hic, k
where N c is the number of audios in cluster c, and k is the index of the bin, ranging from 1 to K. Given all the clusters’ average histograms, the distance between each of them was computed as follows:
∑ (hˆi,k − hˆ j,k ) K
d (hˆi , hˆ j ) =
(2) K where i and j range from 1 to N c , and K is the number of bins. The calculus of the distances between each average histogram produced a symmetric distance matrix, symmetric, which we indicated with D: D = d hˆ , hˆ (3) l ,m
( l m)
where l and m range from 1 to N c . We wanted to assign a color to each audio cluster so that, if the distance matrix between cluster colors were evaluated in a suitable color space, this matrix would be equivalent to the distance matrix D defined above. This task can be seen as a problem of Multidimensional Data Scaling (MDS). MDS refers to a family of models where the structure of a set of data is represented graphically by the relationships between a set of points in space. It is commonly formulated as the
problem of finding a spatial arrangement of entities on a plane on the basis of a distance matrix. In our case, the goal was not to search for a spatial arrangement of the audio clusters in order to place the buttons representing the clusters on the Web page accordingly. On the contrary, the buttons would be placed in a uniform arrangement; their relative distance would be represented by their color. We assume that each average histogram characterizes the corresponding cluster. The mapping between colors and histograms must be such that - Similar histograms map to similar colors - Different histograms map to different colors In order to satisfy both the requirements, we must define the colors in a way that ensures that their perceived color distance corresponds to the distance in the histogram space. If we define a matrix DE of color distance as follows: DE l ,m = deltaE (Labl , Labm ) (4) where l and m range from 1 to N c , the Lab are the CIELAB3 coordinates of the color assigned to the corresponding cluster, and deltaE is the Euclidean distance in the CIELAB space. The matrix DE must respect the following conditions, which are also satisfied by matrix D due to the definition of histogram distance: 1. DEl ,l = DEm,m self − similarity 2. DEl ,m ≥ DEl ,l
3. DEl ,m = DEm,l
4. DEl ,k + DEk ,m ≥ DEl ,m
Moreover, matrix DE should be linearly related to D, that is DEl ,m = αDl ,m
where α is a constant. The dataset considered was composed of 43 audio clusters. Figure 2 is a diagram of the pairs of audio clusters closest to each other in the dataset (i.e., the cluster closest to n.1 is n.2; the closest to n.7 is n.20, and so on). The assignment of colors to clusters must maintain the relationships expressed by the diagram.
37 34 30
20 16 16 14 15 7 2 1
2726 25 23 2224 21 18
19 1615 14 13 11
11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43
Figure 2. The diagram represents the pairs of clusters closest to each other. On the x axis, the cluster index, and on the corresponding grid of the axis, the index of each closest cluster according to the distance computed in Equation 2.
3.1. Reflectance Spectra Coding A first solution mapping histograms into colors was to consider the histograms reflectance functions of colored surfaces. In this way, we could code the audio content, and expect to obtain an implicit coding of audio similarity. The reflectance spectrum is a function defined on the domain of visible wavelengths that represents the percentage of incident light the surface reflects at any wavelength. The product of the surface reflectance and the spectral power distribution of the illuminant defines a color signal, which, entering the eye, is filtered by the photoreceptors to determine the perceived color. A basic model of color perception is based therefore on the filtering of the color signal by sensitivity functions characteristic of human vision. Similarly, a simple model of color generation in imaging devices is based on the filtering of the color signal by the transmittance functions of the camera filters. We employed this simple model to convert reflectance spectra into colors. We considered the average histogram of a given cluster a surface reflectance spectrum. Each histogram had 60 bins, which we made correspond to a spectrum in the range of 400 to 695 nm, with a sampling step of 5 nm. In this procedure, which we called reflectance spectra coding, a direct mapping could be performed, thus satisfying the first rule (similar histograms be mapped to similar colors). The reflectance spectra were then converted into colors assuming the standard CIE D65 as illuminant. The colors obtained are those shown in Figure 3.
Figure 3. The colors assigned by the reflectance spectrum representation to the set of 43 audio clusters. This kind of representation has, however, two drawbacks. First, it cannot ensure that different histograms will be mapped to different colors; second, the perceived color difference may not be proportional to the cluster distance. In addition, we had to keep in mind that in information visualization it is the hue of the colors representing the data that conveys the information of “belonging to the same class”, or of proximity. Consequently, the representation we chose had to map similar clusters to colors with the same hue. To evaluate whether similar clusters were colored with similar colors, we computed the color coordinates in a perceptual uniform color space, such as the CIELAB to which we referred. We expected to find clusters that Figure 2 indicates as close very close in the (a,b) plane. Given the Lab coordinates, we computed the matrix DE as in Equation 4. Then we looked for the closest clusters. The results are plotted in the diagram of Figure 4, where it is evident that this approach is not feasible: we rarely find pairs of closest clusters that are the same of those in Figure 2. The reason why the rules of Equation 5 cannot be satisfied following this approach resides in the fact that if the clusters average histograms are considered reflectance spectra of colored surfaces, the matrix of histogram distances D represents the degree of spectral match between the corresponding colors. On the contrary, the matrix DE should represent distances in a suitable color space, such as the CIELAB. To respect the rule of linearity between D and DE, we would have to be able to define a linear transform between the spectral error and the color error. Unfortunately, the definition of a linear relationship between a colorimetric distance such as the Euclidean distance in CIELAB and a spectral match metric is still an open question in color science. Moreover, even if we were able to replicate the matrix D of cluster distances in a perceptually uniform color space, the solution would still not be an optimal one from the point of view of information visualization. In fact, the visual grouping of clusters based on their color attribute relies more on hue than on lightness or saturation.
192019 19 14 1213
13 8 7
11 9 5
11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43
Figure 4. The diagram represents the pairs of closest clusters according to the color distance of Equation 4 computed for the colors of Figure 3. On the x axis, the cluster index, and on the corresponding grid of the axis, the index of each closest cluster. 3.2. Hue based Perceptual Coding In order to operate in a perceptual color space, and appropriately exploit the attribute of hue for color coding, we conducted our selection of colors in the CIELAB color space, limiting our search to those in the (a, b) plane. This consists in searching for a spatial arrangement of colors within a perceptually uniform chromatic space, and is a MDS problem. In MDS, if the input data are Euclidean distances, the solution of the problem is equivalent to calculating the Principal Component Analysis (PCA) of the input dataset3. We performed PCA analysis on the clusters’ average histograms, and considered the coefficients of the first two principal components as coordinates in the CIELAB (a, b) plane. To assign RGB colors to Lab data, we mapped the (a, b) coordinates into a plane of a color atlas at fixed lightness, using the Munsell Atlas of colors, a dataset of chips organized so that the perceived distance between adjacent colors is constant within the whole atlas. We scaled the (a,b) values in a way that prevented out of gamut colors, and mapped each color to the nearest in the atlas according to the Euclidean distance. The result of this selection is displayed in Figure 5, while Figure 6 is the diagram of the pairs of nearest clusters according to distance DE (see Equation 4) of the selected colors. It is clear that this approach as well is hardly acceptable, as Figure 6 is very different from Figure 2.
Figure 5. The colors assigned by performing PCA analysis on average clusters histograms.
41 39 37
29 27 26
13 8 4 1
11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43
Figure 6. The diagram represents the pairs of closest clusters according to the color distance of Equation 4 computed for the colors of Figure 5. On the x axis, the cluster index, and on the corresponding grid of the axis, the index of each closest cluster. In order to find a better solution, to directly address the issue of the conversion between Lab and RGB coordinates and to exploit the whole color gamut of the display device, we formulated the problem as an optimization problem where the function to optimize was the difference between matrix D given as input, and matrix DE of the color distances in CIELAB, considering only colors inside the display gamut. The variables were the coordinates (a,b) of the colors associated with the audio clusters; the display was provided by its ICC Profile. To solve the optimization problem we employed a genetic algorithm. In Figure 7 we report the colors selected following this approach, while Figure 8 plots the nearest clusters. This last approach produces a greater similarity with Figure 2.
Figure 7. The colors assigned by the genetic algorithm performing the selection on the (a,b) plane.
20 17 17 16 16 15 15 14
29 28 27 26 25 24
42 41 38
32 31 30 28 23 22
5 2 1
1 1 1 3 5
11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43
Figure 8. The diagram represents the pairs of closest clusters according to the color distance of Equation 4 computed for the colors of Figure 7. On the x axis, the cluster index, and on the corresponding grid of the axis, the index of each closest cluster.
4. CONCLUSIONS The use of color in database interfaces can be of considerable help to the user in data accessing and browsing. We have focused here on the use of color to help access a database of audio files. The audio data were organized in clusters, and an innovative strategy was defined to implement a color coding of audio clusters. We exploited the visual attributes of colors at fixed levels of lightness to map the cluster distances in the histogram space into color distances in a perceptually uniform color space. This kind of representation can support the user in the browsing of audio clusters, allowing a rapid visual evaluation of the similarity of cluster contents without the need to listen to the audios.
5. REFERENCES 1. 2. 3. 4.
I. Gagliardi, P Pagliarulo Audio information retrieval in HyperMedia environment, Proc. sixteenth ACM conference on Hypertext and hypermedia, pp. 248 – 250, ACM Press, 2005. J. T. Foote, Content-Based Retrieval of Music and Audio. In C.-C. J. Kuo et al., editor, Multimedia Storage and Archiving Systems II, Proc. of SPIE, Vol. 3229, pp. 138-147, 1997. M.D. Fairchild, Color Appearance Models, 2nd edition, Wiley, 2005. K.V. Mardia, J.T. Kent, and S.M. Bibby, Multivariate Analysis, Academic Press, 1979.