An evolutionary algorithm to create artificial ...

4 downloads 247 Views 332KB Size Report
algorithm (EA) designed to create artificial soundscapes of birdsongs. .... messages to a specific Twitter account, linked to our EA system, it is possible to insert in.
Int. J. Arts and Technology, Vol. 9, No. 1, 2016

An evolutionary algorithm to create artificial soundscapes of birdsongs José Fornari Interdisciplinary Nucleus for Sound Communication (NICS), University of Campinas (UNICAMP), Campinas, São Paulo, Brazil Email: [email protected] Abstract: Birdsongs are an integral part of many landscapes, in urban and countryside areas. Together they constitute an ecological network of interacting sonic agents that are self-organised into an open complex system of similar cognitive aspects, yet with original acoustic content. This work presents a preliminary study and development of an evolutionary algorithm (EA) used here for the generation of virtual birdsongs that create an artificial sonic landscape; a soundscape of birdsongs. They are reproduced by genetic operators that build sequences of parameters to control instantiations of a computer model that emulates a bird syrinx. Such models are capable of synthesising a wide range of realistic birdsongs that altogether compound a dynamic network of artificial bird calls. This system can also be interactive as external input data can be received in real-time through instant text messages from the micro-blog Twitter. These messages are mapped as new individuals living in the EA system population set. As further described, by means of an aimless evolutionary process, the EA system presented here is capable of creating realistic artificial soundscapes of birdsongs. Keywords: evolutionary algorithm; soundscape; computer model; birdsongs. Reference to this paper should be made as follows: Fornari, J. (2016) ‘An evolutionary algorithm to create artificial soundscapes of birdsongs’, Int. J. Arts and Technology, Vol. 9, No. 1, pp.39–58. Biographical notes: José Fornari is, since 2008, a full-time Researcher in the Interdisciplinary Nucleus for Sound Communication (NICS), at the University of Campinas (UNICAMP). He received his PostDoc (2008) in Music Cognition, from the Music Cognition Group in the University of Jyväskylä, Finland. He also received another PostDoc (2007) in Evolutionary Sound Synthesis, from NICS/UNICAMP. He finished his PhD degree in 2003, in Electrical Engineering (EE), at the Faculty of Electrical Engineering and Computation (FEEC), at UNICAMP. In 1996, he was a Visiting Scholar Researcher at the Center for Computer Research in Music and Acoustics (CCRMA) at Stanford University. In 1994, he finished his Master in EE at FEEC/UNICAMP. He holds two Bachelor degrees. The latter is in Popular Music, Modality: Piano (1994), from the Institute of Arts (IA)/UNICAMP. The first Bachelor degree, from 1990, was in Electrical Engineering, from FEEC/UNICAMP. This paper is a revised and expanded version of a paper entitled ‘A computational environment for the evolutionary sound synthesis of birdsongs’ presented at the 1st International Conference and 10th European Event on Evolutionary and Biologically Inspired Music, Sound, Art and Design, Malaga, Spain, 11–13 April 2012.

Copyright © 2016 Inderscience Enterprises Ltd.

39

40

1

J. Fornari

Introduction

It is remarkable the amount and variety of places where birdsongs can be found and heard. These chunks of acoustical information are exchanged between birds whose identity can even be analysed by specific sonic aspects of their birdcalls. Together they create a sonic network that forms a natural landscape of sounds, known as soundscape. This term was coined by Murray Schafer and refers to an immersive sonic environment. Soundscapes are immediately perceived by listeners that do not require any training or expertise to recognise them. Most of the time listeners are also immersed in the soundscape and consequently become the agents that are also part of its creation (Schafer, 1977). Such organic-like sonic textures are effortlessly recognisable by means of the automatic sound cognition processes of our mind. Yet, in terms of its acoustical aspects, soundscapes are constantly changing and virtually never repeated. Schafer mostly worked with natural soundscapes, such as the ones found in forests, waterfalls, or nearby seashores. However, these natural soundscapes are nowadays merged with other man-made soundscapes, such as the ones generated by operating machines, traffic jams and crowds. Instead of being mutually exclusive, they are blended together creating new types of soundscapes, which can be seen as an immersive cybernetic sonic environment, ubiquitously found in most areas where humans inhabit (Wiener, 1968). This intertwinement of organisms and mechanisms – both acting as the agents that constitute an open complex open system – creates a sonic environment with emergent regularities that are acoustically new and cognitively similar. This work presents an evolutionary system that aims to emulate part of the sonic natural emergent capacity of soundscapes. For that, an evolutionary computer model was used here to generate some of the natural characteristics of a true soundscape. This system is able to create a stream of sounds that is, at the same time, similar and novel. This work presents the development of this system; an introductory evolutionary algorithm (EA) designed to create artificial soundscapes of birdsongs.

1.1 Inspired by nature The physiological apparatus that allow birds to generate sounds with perceptual diversity and enriching acoustics is utmost sophisticated. Its core is found inside a tiny little organ known as Syrinx, which is roughly the equivalent of human Larynx. Several researchers have developed and presented computer models emulating some of the sonic behaviour of a syrinx, in the attempt of understanding and emulating its sonic properties. Examples of such works are found in Mikelson (2000) and Farnell (2010). However, a syrinx computer model has a large number independent control parameters that need to be properly set in order to generate a specific bird-like sound. This can turn the exploration of new artificial birdcalls, by means of manual tweaking, very difficult, counter-intuitive and cumbersome. Indeed, the simultaneous control of a large number of parameters is a hard task to be conducted by formal (deterministic) mathematical models (such as a system of linear equations) and controlled by typical gestural interfaces (such as the computer mouse and keyboard).

An evolutionary algorithm to create artificial soundscapes of birdsongs

41

On the other hand, natural evolutionary systems, such as the human motor cortex, are capable of easily performing similar tasks, such as simultaneously controlling a large number of parameters involved in each human gesture. The control of all body parts, joints rotations, limbs displacements, and so forth, is a task that is extremely hard to be performed by a deterministic computer model. However, an evolutionary approach can be successfully used to handle complex problems of such category. Similar to the control of multiple body parts that compound a movement, and the control of multiple syrinx models that compound a soundscape of birdsongs are both complex tasks that involve a very large number of parameters to be controlled simultaneously. The system here presented intends to reach this goal, by simultaneously controlling several syrinx models, while also handling the dynamic processes of reproduction and selection of individuals, which will altogether generate the artificial soundscape of birdsongs. The control of a large number of independent parameters in the pursue of solving a complex unbounded problem has been approached by the usage of adaptive computer models. A significant part of the research in this area came from the computing field known as artificial intelligence (AI). As many other fields of human knowledge, AI was also inspired by the direct observation of natural strategies of problem solving. Although there are many others, Eiben and Smith (2007) mentions two fundamental sources of inspiration: 1

human brain

2

natural evolution.

From the study of human brain, came the development of neural-networks and non-supervised methodologies to deal with complex systems, such as the researches in the field known as artificial neural networks (ANN). From the observation of Natural evolution – as described by the Darwinian theory of natural evolution of biological species – new computing methodologies were developed, such as the EA approach that is used in this work. However, here the EA approach is used in a novel manner. Instead of having the evolutionary process guided towards generating and finding a best possible solution, here the evolutionary process is aimless. There is no final goal to be reached or any specific problem to be solved, but to maintain the continuous evolutionary process of soundscape generation. All solutions generated (the birdsongs) are simultaneously part of the artificial soundscape. This EA system is said to be aimless because it is not trying to find unique solutions as there is no specific solution that is the best one. Which is important here is not reaching a final goal but keeping an evolutionary process running. In other words, the result of our EA system is the evolutionary process in itself. The current implementation of our EA system controls up to 20 individuals in a varying-size population set. Each individual is an instantiation of a syrinx model (as described in Section 2) which is controlled by a sequence of 16 parameters; the genotype of each individual. Thus, the artificial soundscape of birdsongs is created by all individuals in the population set.

42

J. Fornari

1.2 The computing environment Our EA system was implemented in Pd (http://www.puredata.info); a free, open-source, multi-platform software environment designed for the programming of real-time data processing. We used an enhanced (also free) version of Pd, named: ‘Pd-extended’. This one can handle several types of data, such as: control, audio, image and video; to create computer models for media analysis, transformation and synthesis. Individuals were programmed as a separated model. Each individual is an instantiation of this model: the artificial syrinx model; a procedural physical modelling sound synthesis, controlled by the sequence of 16 parameters (i.e., the genotype). Each instantiation controlled by one genotype generates the sonic behaviour of one perceptually unique birdsong, which turns to its individual’s phenotype. Instead of using audio samples recorded from real birds singing – which would in fact make it impossible to create a true artificial soundscape of birdsongs – this work uses a artificial syrinx model, which allows full control of its sonic features and consequently the creation of a virtually infinite amount of realistic and distinct birdsongs. Therefore, in the work here presented, there are no audio recordings of actual birdsongs, or of any other sort, nor any type of permanent data being stored. Nevertheless, most of the time, the artificial soundscape generated by our EA system brings about sonorities that are considered by many listeners as being very realistic, even sometimes leading them to mistakenly believe that our implementation actually embeds audio data from real birdsongs. This EA system has total control on the creation and selection of new individuals, although sometimes it can also generate birdsongs that are quite distinct, but still keeping an inheritable similarity with their predecessors, which bounds them together as belonging to the same population set, so the balance between novelty and similarity is always maintained. For that reason, each individual has its own genotype, represented by a text file containing a sequence of parameters to control one instantiation of the physical modelling sound synthesis. The slight change of values in this genotype corresponds to a clear perceptual modification in the generated birdsong in the population set. It means that the sound of a birdsong generated by this computer model, as perceived by the human auditory system, varies significantly when its genotype is changed. Inside the population set, individuals are born, reproduce in pairs and – after completing their lifespan – they die. In each instant, the sound generated by all ‘alive’ individuals creates the unique soundscape. To start the system, it is required to have at least two individuals’ genotypes. They can be randomly generated or provided by the user, as specific birdsong sonorities chosen to start the evolutionary process. A single pair of individuals, by means of the reproduction process, is enough to generate a steady yet variable number of individuals in the population set. The current implementation of our EA system has four global control parameters: 1

recombination rate (or crossover rate)

2

mutation rate

3

lifespan rate (how much, on average, each individual will remain alive)

4

proliferation rate (how fast each pair of individuals will reproduce).

An evolutionary algorithm to create artificial soundscapes of birdsongs

43

By default these parameters are set to a steady generation of overlapped birdsongs that will hold enough of their sonic identity, at the same time that keeping it novel – although, as further explained, this system will virtually never create clones. These parameters can be changed while the system is running, in order to let the user to explore new and unusual sonorities emerging from different parametrisations of the artificial soundscape being generated.

1.3 Tweeting genotypes A fundamental condition to have true emergence of self-organisation in a complex system is to allow internal and external agents acting on it, thus turning it into an open system (Holland, 2006). In order to turn the population set of our EA system into an open system, it has to be able to receive external data. The chosen way to receive external data, and also turning this an interactive EA system, was through data input from Twitter; the famous internet micro-blog social network service (http://www.twitter.com). By sending messages to a specific Twitter account, linked to our EA system, it is possible to insert in it new genotypes, mapped from the incoming Twitter text message, thus becoming new virtual birdsong, as further explained. Interesting enough, birdsongs also inspired the creation of Twitter. This micro-blog had its name chosen after a metaphor; by comparing birds tweets with small text messages exchanged among users of this social network. Together they create a single contextual meaning for groups of small text messages. In the interview referred in Dorsey (2009), the creator of Twitter, compares this micro-blog with a soundscape of birdsongs. He says that, “in nature, chirps of birds may initially sound like something seemingly devoid of meaning, order or intention; however, the context is inferred by the cooperation between these birds, as individuals that each one can transmit (by singing) and receive (by listening to the songs) data (birdsongs) with each other. The same applies to Twitter, where many messages, when taken out of context, seem as being completely random or meaningless, but in a thread of correlated messages, they gain significance that unifies them into a single context.”

The work presented here followed a similar path, during its development. When receiving Twitter messages, our EA system maps their text characters into a new individual’s genotype. The entire EA system was implemented as a Pd patch; a modular, reusable unit of code written in Pd, forming a standalone program. The individuals were implemented as a separated Pd patch that acts as a sub-patch for the main EA patch. This is given by a Pd encapsulation mechanism known as ‘abstraction’. Each individual within the population set is an instantiation of this abstraction. Therefore, each Twitter message received into our EA system requires it to instantiate a new individual in the population set which is controlled by the respective genotype mapped from its Twitter message. By inserting a new individual into the population set, the users creates not only a new birdsong but also influences the evolutionary process of the entire EA system, once that this individual will eventually participate into the reproduction process.

44

J. Fornari

The reproduction occurs in individuals’ pairs. They are chosen by proximity and proliferation rate. At each time interval set by the proliferation rate the closest pair of individuals are chosen to participate in the reproduction process. The selection process is in charge of eliminating individuals whose genotype is too different from the average genotype of the population set. This helps to keep the number of individuals within the population set under 20 (as said before, the maximum amount of individuals allowed in this current implementation). This also contributes to maintain the entire population set within an approximate cognitive similarity (i.e., the individuals phenotypes will be alike). It is important to notice that this EA system does not necessarily require external input data to create an artificial soundscape of birdsongs. Through the action of the mutation operation an artificial soundscape can be successfully generated by running our EA system even without interactivity (i.e., without receiving external data from Twitter messages). The contribution of interactivity in this current implementation is still secondary. We plan to further explore the sonic contributions of interactivity in future and more complex implementations. Also as an enhancement, a simple visual feedback for the population set was built. In this graphical interface, individuals are represented by numbers (from 1 to 20) depicting as random walk icons inside a square plane; a window in the computer screen. Through this graphical representation, we can see interesting moments of the evolutionary process, as individuals getting closer (thus prone to reproduce) or disappearing (when they die). This currently intends to offer a complementary information through the real-time visualisation of the soundscape behaviour while it is being generated (and heard). This may enhance the possibility of an immersive experience for the listeners, considering that in a natural soundscape sonic information is most of the time accompanied by its visual reference. This graphical implementation was also programmed in Pd-extended, using the GEM library, embedded in the main EA system patch. As further described, the implementation here presented is a simple yet unusual aimless EA system. Instead of trying to find a final solution for a complex problem, our system constantly generates a variety of original yet similar solutions with the same aesthetical goal, for the creation of a soundscape of birdsongs. As often observed in nature and arts, in this EA system there are also no problems to be solved but solutions to be created.

2

Emulating birdsongs

Songbirds belong to the biological order known as Passeriformes. This group is very large and diverse, formed by approximately 5,400 species, representing more than half of all known birds. They are divided into two sub-groups: 1

Tyranno (Suboscines, also known as ‘shouter birds’)

2

Passeri (Oscines, also known as ‘singing birds’).

Both ones have syrinxes as the main organ responsible for the creation of their birdcalls (Clarke, 2004).

An evolutionary algorithm to create artificial soundscapes of birdsongs

45

Unlike humans, birds can independently control their lungs, which allow them to inhale with one lung while exhaling with the other. This allow them to simultaneously sing and breathe, so they can generate very long melodies; way beyond the volumetric capacity of their tiny lungs. In the anatomy of birds, the syrinx corresponds to the larynx in mammals. Syrinx has three groups of muscles that can be independently controlled; one for the trachea and other two for the bronchi. By constricting and expanding these muscles, birds can modify the anatomical aspects of the syrinx, thus modifying the sound generated by it, in a broad range of perceptual possibilities. Inside of the syrinx there is a membrane suspended by a cartilaginous cavity; the tympanic membrane. This is placed on the top of an inflated air bag; the clavicular sac, that let the membrane to freely move sideways. This is the main oscillator of the syrinx and can be compared with the reed of a woodwind musical instrument, such as an Oboe. Birds can also control the flux of air flowing in the trachea, that passes through the clavicular sac and each bronchus. they can also control the sturdiness of the tympanic membrane, by the action of minute lateral and medial muscles muscles located in it, similarly to the ones found in human lips (Doupe, 1999). Figure 1 shows the major parts of a syrinx, depicting the three groups of muscles and the tympanic membrane, where the sound of a birdsong is initially generated. Figure 1

Basic diagram of a syrinx

There are several computer models developed to emulate the syrinx behaviour (Larsen and Goller, 1999). Our work uses the one created by Hans Mikelson (2000), originally developed in Csound programming language. This algorithm was later improved and implemented as a Pd patch, by Farnell (2010), who created an algorithm that emulates the entire birdsong (timbre generation and melodic phrase). Figure 2 shows a simplified version of the algorithmic structure of the Pd patch used in the syrinx emulation. This is a basic version of this procedural physical modelling sound synthesis programming code. Physical modelling is a sound synthesis technique that emulates by the use of dynamic equations the physical properties and behaviour of a sound source (Smith, 2006). Figure 2 also shows the dynamic equation of this physical modelling sound synthesis of the syrinx. This one is used as a part of the Pd abstraction sub-patch of the individual, whose instantiations create all individuals in the population set of our EA system.

46

J. Fornari

Figure 2

PD patch and corresponding equation of a simplified physical modelling version of the syrinx, where 3 sine-wave oscillators (objects osc~) are controlled by FIVE parameters (A1, F1, A2, F2 and Fi)

As seen in Figure 2, the core of the syrinx model requires only five parameters to create the timbre of a birdsong. The other 11 parameters – of the total of 16 elements of the genotype – are used to control the creation of the melodic phrase of a birdsong, as further explained.

3

The EA

EAs have been used as a non-supervised approach for problem solving. EA is a subset of evolutionary computation (EC); an adaptive computing methodology inspired in the biological strategy of automatically searching for the best possible solution for a generic and often complex problem (Eiben and Smith, 2007). Such methods are commonly used in the attempt of finding the best solution for an unbounded problem, specially when there is insufficient information to model it using formal (deterministic) computational methods. Different from the typical EA usage, the generation of an artificial soundscape is not an optimisation problem. There is no evolutionary search towards a single best solution, once that there is no actual problem to be solved at the end of the evolutionary path.

An evolutionary algorithm to create artificial soundscapes of birdsongs

47

Instead, the system is designed to maintain a steady process of creating similar and variant solutions. Thus, this EA system does not deal with the reduction of a convergence time (Asoh and Muhlenbein, 1994). As it is, the convergence time of our EA system can be seen as limitless. In typical EA applications, convergence time is an obstacle that can be eventually minimised but never eliminated, as a computer model will always require a time duration (often above the designers expectations) to evolve possible solutions and find the best one. Thus, typical EA systems frequently have problems to operate in real time. In this work however, our EA system keeps the steady generation of solutions (birdsongs) and all of them are used as part of the soundscape once that it is formed by the sonic merging of all birdsongs. Thus, our EA system has no trouble to operate in real-time because its convergence time, instead of being very small, is infinite. Our EA system carries on the evolutionary process indefinitely and takes advantage of one interesting evolutionary byproduct; given by the fact that in the evolutionary path, created by the action of the reproduction and selection processes, new solutions are created but usually not repeated (clones), which is particularly interesting in terms of generating true soundscapes, where sounds are also usually not repeated. The concept of using an EA system to create a soundscape belongs to a thread of previous works. The most influentials ones are: 1

Vox Populi; a system able to generate complex musical phrases and harmony by using genetic operators (Moroni et al., 2000)

2

Roboser; a system created in collaboration with the SPECS UPF group, in Barcelona, that uses adaptive control distribution to develop a correlation between the adaptive behaviour in robotic algorithmic compositions (Manzolli and Verschure, 2005), and mostly important

3

ESSynth, the evolutionary synthesis of sound segments (waveforms); an EA method that uses waveforms as individuals within a population set that is manipulated by reproduction and selection processes, with a fitness function given by a distance measurement of the perception of acoustic aspects, known as psychoacoustic features (Fornari et al., 2008).

ESSynth was used in several artwork installations. For instance, it was used to create RePartitura; a multimodal evolutionary artwork installation that is based on a synaesthetic computational system that mapped graphic objects from a series of conceptual drawings into sound objects that became dynamically evolving individuals in the population set of an EA system (Manzolli et al., 2010). The first version of ESSynth already showed the potential of generating sound segments perceptually similar but never identical, which is, as said before, one of the fundamental features of natural soundscape. This system was later expanded to include parameters of spatial sound location for each individual, thus allowing the creation of a more realistic soundscape and also the implementation of sexual (in pairs) reproduction process, now being done through pairs of genderless individuals, instead of in an asexual manner, such as a mitotic reproduction (Fornari et al., 2009). Both features (spatial sound location and sexual reproduction) are also implemented in the current version of our EA system. The implementation of our EA system was developed as a Pd patch named ‘evopio.pd’. As said, individuals are instances of a Pd abstraction named ‘ind.pd’. Each

48

J. Fornari

instance of ind.pd generates an individual which corresponds to a birdsong belonging to the population set inside evopio.pd. Each instantiation is an independent physical modelling synthesiser of a syrinx. Each genotype is stored as a text file within a folder accessed by evopio.pd, each one corresponding to a single instantiation of ind.pd, manipulated by evopio.pd. Details of the genotype implementation are described in the next section.

3.1 Genes, chromosomes and genotypes These 16-element sequences that control each instantiation of ind.pd represents a single and unique genotype. However, in the current implementation, the genotype of our EA system is compounded of one single chromosome. Therefore, this is also seen here as a chromosome. The system temporarily stores these sequence as text files, in a folder that contains all genotypes of the individuals currently alive in the population set. Each element of the sequence is here seen as a gene, which corresponds to one single parameter of physical modelling synthesis (syrinx model) responsible for the birdsong generation. In the current implementation there is still no gender assigned to individuals nor dominance-recessiveness chromosomic hierarchy. Therefore, in our EA system, the 16-element chromosome will control the entire birdsong along its lifespan. When using external data from Twitter messages to inject new genotypes into the population set, each message is mapped into a new genotype. Once that each Twitter message can have up to 140 ASCII characters, all these ones are currently mapped into a single 16-element genotype. The ASCII characters of a Twitter message can be easily mapped to integers between 0 and 127, each number corresponding to a specific ASCII character. For instance, the message ‘Hello World’ corresponds to the numeric sequence ‘72 101 108 108 111 87 111 114 108 100’. Then, each number of the numeric sequence can be normalised from 0 to 1, and subdivided into sequences of 16 elements, each one corresponding to a chromosome. As the Twitter message can have up to 140 elements, each message can be mapped in up to EIGHT chromosomes per message. For simplicity this current implementation uses only the first chromosome of each message. In future implementations, the other chromosomes will be used, specially to handle multi-gender and polyploid genotypes. In this work, although reproducing in pairs, all individuals are genderless and haploids. They control all parameters of the procedural synthesis of a birdsong, as described by the Pd model in Farnell (2010). This is an extension of the syrinx model that also handles the articulation of throat (tracheal cavity) muscles and beak, thus not only the characteristic timbre of each birdsong is parameterised by each chromosome, but also the entire melodic phrase that corresponds to the birdsong. The 16 genes that compound the single chromosome of the individual’s genotype are: 1

Ba: beak articulation (control the beak openness rate)

2

Rt: random ‘tweet-ness’ (control the rate of the tweet random generator)

3

Ff: frequency of the first formant (for the first bronchus in the syrinx)

4

Af: amplitude of the first formant (for the first bronchus in the syrinx)

5

Fs: frequency of the second formant (for the second bronchus in the syrinx)

6

As: amplitude of the second formant (for the second bronchus in the syrinx)

An evolutionary algorithm to create artificial soundscapes of birdsongs 7

F0: fundamental frequency (fundamental frequency, for the entire birdsong)

8

Fe: fundamental extent (fundamental sweep extent, for the entire birdsong)

9

Fm: fundamental frequency modulation amount

49

10 Fb: fundamental frequency modulation base 11 Ft: frequency of the first tracheal formant 12 At: amplitude of the first tracheal formant 13 Fj: frequency of the second tracheal formant 14 Aj: amplitude of the second tracheal formant 15 Tr: trachea resonance rate (trachea filter resonance). Figure 3 depicts the organisational sequence of the 16 genes in the single chromosome sequence that constitutes the genotype. Figure 3

The single chromosome sequence of one artificial birdsong genotype

3.2 Fitness function Once that our EA system conducts an aimless evolutionary process, in theory, to generate an artificial soundscape of birdsongs, this system would not require a fitness function. However, in order to help the evolutionary process to maintain a closer sonic similarity while avoiding the occurrence of super-population, a fitness function was also used here. This one calculates a psychoacoustic distance (D) as explained in Fornari et al. (2008). D is given by the Euclidean distance between the values of three psychoacoustic descriptors: 1

loudness (L), the perception of sound intensity

2

pitch (P), the perception or clarity of a fundamental frequency

3

spectral centroid (S), the median of frequency distribution in the sound spectrum. D is given by the following equation: D =

( (LI − L2)

2

+ (PI − P2) 2 + (SI − S2) 2

)

(1)

The psychoacoustic parameters: L, P and S, can be easily calculated by lower-level acoustic descriptors that are commonly found in music information retrieval (MIR), as the ones described in Fornari and Eerola (2009). Selection process calculates Di, the psychoacoustic distance of each newborn individual created in the population set, and also Dp; the average D of all individuals in the population set. The individual whose |Di – Dp| is larger than a threshold T will be marked to not participate in the reproduction process, which means that this individual will live its

50

J. Fornari

entire lifespan in the population set but will not pass its genetic traits to further generations. In the current implementation T is hardcoded as T = Da, which means that if a newborn individual has its Di > 2.Da, it will not participate in the reproduction process.

3.3 Genetic operators The reproduction process in this EA system uses the two classic genetic operators: 1

recombination (or crossover)

2

mutation.

Acting together, they generate a new individual genotype out of the genotypical information of a pair of individuals in the population set. As said, all individuals in the current implementation are genderless and their genotypes are made of one single haploid chromosome. Recombination creates a new chromosome by calculating the weighed average of the respective genes in each chromosome of the individual’s pairs, according to the reproduction rate. They are chosen by the EA system to reproduce according to their mutual proximity in the population set. This one is calculated by their virtual sound location. In order to have a more realistic soundscape, we emulated each individuals’ location in a virtual space by using two simple strategies: inter-aural time difference (ITD) and inter-aural level difference (ILD) (Fornari et al., 2009). By varying these locations parameters – which is automatically done by the system, in this current version – the birdsongs are actually heard as if their sounds were moving around a horizontal plan. To hear this effect, the computer running this system needs to be connected to a stereophonic (two-channel) sound system, and this effect is even more realistic through headphones. Mutation operators inserts weigh-variability to the new chromosome by multiplying each gene value of the new chromosome with random variables bound to a mutation rate. Let’s suppose that there is a pair of individuals whose chromosomes: A and B, in a certain moment, are the closest ones in the entire population set (in terms of the spatial sound location parameters of their respective individuals). If the proliferation rate is such that requires the system to have a reproduction process, then A and B are chosen to create a new individual chromosome: C. This new chromosome will be calculated by the product of each correspondent gene in the 16-element sequences of A and B, but with a weight determined by the recombination rate, tr. This e a scalar real value between –1 and 1, which determines how A and B will be mixed in C. Considering that there would be no action of the mutation operator, if tr = –1, the chromosome C would be identical to A. Similarly, if tr = 1, the chromosome C would be identical to B. If tr = 0, each gene of C would be the arithmetic average of the correspondent genes of A and B. If recombination were the only operator, the birdsongs would, at some point, tend to repeat themselves, as there would not be variability inserted in the population set (also considering that this system did not receive any external genotype input data). By default, tr = 0, which delivers a uniform mixing of chromosomes pairs. The equation (2) shows the calculation of the reproduction operator, for the ith gene: ⎧( (− tr) ⋅ Ai + (1 + tr) ⋅ Bi ) 2 , tr < 0 ⎪ Ci = ⎨( Ai + Bi ) 2 , tr = 0 ⎪( (tr) ⋅ B + (tr − 1) ⋅ A ) 2 , tr > 0 i i ⎩

(2)

An evolutionary algorithm to create artificial soundscapes of birdsongs

51

What guarantees that there will be no repetition of birdsongs (or, at least, that it will be extremely rare) is the action of the mutation operator. As already suggested, mutation operator is responsible for inserting novelty into the new genotypes, thus helping to avoid the occurrence of clones in the population set. Its action is regulated by the mutation rate, tm, that varies between 0 and 1.This one determines the amount of variability that will be inserted into the genotype of a new individual. This variation is given by the product of each gene in the 16-element genotype sequence by the corresponding elements of another 16-element sequence of random real values (known as the novelty sequence) ranging between [(1 – tm), 1]. If tm = 0, the novelty sequence is equivalent to a sequence of ones, so there is no variability inserted into this new chromosome once that the products of the gene values by ones are equal to the same original values. If tm = 1, the sequence of C will be multiplied by a novelty sequence of random values ranging from 0 to 1. Thus, the resulting chromosome will also be another random sequence with values ranging from 0 to 1. This means that all genotypical information of the original chromosome is lost, as there will be no traces of the chromosome previously calculated by the recombination operator. For that reason, it is advisable that mutation rate should be kept small. By default, our system has tm = 0.1. This way there will be only 10% of novelty inserted into the new genotypes, while remaining with most of the information related to the sonic aspects of the parents. Equation (3) shows the calculation for the mutation operator, where rand is a random variable ranging from 0 to 1, and i is the ith gene of the 16element chromosome sequence: C′i = (1 − (tm ⋅ rand) ) ⋅ Ci

(3)

Both rates (recombination and mutation) are global controls of our EA system. They are continuous variables that can be dynamically modified by the user while the evolutionary process is running. This allows the user to explore new evolutionary (and consequent sonorities) of the artificial soundscape being generated. Other important global controls are: lifespan and proliferation rates. Lifespan rate controls the average lifespan of each individual in the population set. For each individual, the system by default includes a random variable of about 10% of the global lifespan rate. This guarantees that although lifespan is globally controlled, each individual will have a slight different lifespan. In future implementations the lifespan may become influenced by a new gene inserted into the individual’s genotype. In the current version, usual values for the birdsongs lifespan range from 1 to 60 seconds. Proliferation rate controls the rate of reproductions in the population set. This is done by inserting a time delay in the calculation of genetic operator (recombination and mutation). In the current version, usual values of the proliferation rate range from 0.5 to 3 seconds. Together, lifespan and proliferation rates can guide the variable-size population set to opposite extremes. If the procreation rate is kept always smaller than the lifespan rate, individuals will die faster than they reproduce and the number of them in the population set will decrease until its extinction. On the other hand, if the procreation rate is kept bigger than the lifespan rate, individuals will reproduce faster than others are dying, so the number of individuals in the population set will increase until it becomes overpopulated. If the superior limit of 20 individuals in the population set were not hardcoded, in the occurrence of overpopulation the system would eventually consume all processing and memory resources of the machine running it and the EA system would be halted by overflow. In the current implementation, however, this will not happen. The

52

J. Fornari

system will keep running with the top capacity of 20 individuals in the population set until the user changes the lifespan and proliferation rates, or manually stop this evolutionary process.

4

Artificial soundscapes

This section explains the perceptual sonic results of the current version of this EA system. As briefly described in the introduction, soundscapes are immersive landscapes of sounds, mostly found in nature – such as the sonic environment created by waterfalls, storms, birdsongs, and so forth – but also found in urban areas – such as in traffic jams, building constructions and crowds. Any listener can immediately recognise a soundscape that he/she had previously experienced. Often, listeners are also agents of their composition (e.g., as in a traffic jam, where each driver is listening and also creating its typical soundscape). Therefore, soundscapes are immersive environments also because their listers are frequently their formant agents (Schafer, 1977). Soundscape are the result of three processes: 1

sensation

2

perception

3

interaction.

According to Schafer, these processes can be classified by the following cognitive aspects: 1

close-up

2

background

3

contour

4

pace

5

space

6

density

7

volume

8

silence.

Soundscapes can be formed by five categories of sonic analytical concepts. They are: 1

tonic

2

signs

3

sound marks

4

sound objects

5

sound symbols.

An evolutionary algorithm to create artificial soundscapes of birdsongs

53

Tonics are the active and omnipresent sounds, usually in the background of the listener’s perception. Signs are the sounds in the foreground that quickly draw listener’s conscious attention, once they may contain important information (i.e., a lion roaring, squealing tires, a thunder etc.). Sound marks are sounds that are unique of a specific soundscape, that can not be found elsewhere. Sound objects, as defined by Pierre Schaeffer (who coined its term) are acoustic events that perceived by the listener as a single and unique sonic information. For that reason, sound objects represent the systemic agents that compound a soundscape. Symbols are sounds that evoke cognitive (memory) and affective (emotional) responses, according to listeners’ ontogenic and phylogenic background. These cognitive aspects are emergent features that imbue contextual meaning for the self-organising process of complex open systems that create soundscapes. As such, these units can be retrieved and analysed in order to classify the soundscapes features. However, they are not sufficient to define a process of artificial soundscape generation. In order to do so, it is necessary to have a generating process of symbols with inherent characteristics of similarity and variability. In this work, this was achieved by the usage of an aimless EA system. Such adaptive computer model proved to be able of generating an effective artificial soundscape of birdsong. By the interaction of individuals (sound objects) within the evolutionary population set (soundscape), our system spontaneously presents tonics, signals and sound marks, as defined by Schafer. In a systemic viewpoint, a soundscape can be seen as a self-organised complex open system formed by sound objects acting as dynamic agents. Together they orchestrate a sonic environment rich of interacting sound objects that are always acoustically unique and perceptually similar, which allow their immediate identification and discrimination by any listener who had already heard a birdsong.

5

Experimental results

The experimental results described here show that this EA system was able to generate artificial soundscapes of birdsongs even without receiving messages from Twitter messages. This external input is an enhancement of the current system to turn it interactive. By the action of recombination and mutation operators, this system could create realistic soundscapes of birdsongs, similar to the ones found in nature, also without the usage of recorded audio data from real birdsongs. The insertion of external data through Twitter messages is, for this current implementation, an enhancement that turns the population set in an actual open system. However, this is not required to actually create a convincing soundscape as the variability is provided by the action of the mutation operator. The following link presents an audio recording of our EA system running without receiving external data, for about 3 minutes: http://soundcloud.com/tutifornari/evopio. This other link presents a video of this EA system creating a true soundscape of birdsongs, also without receiving any external messages (http://youtu.be/o8LtGbRa-FI). This video shows a three-minute talk at TEDxSummit 2012, in Doha, where the author presented this EA system. This video can be found in Youtube, under the title ‘Jose Fornari: An algorithm to mimic nature’s birdsongs’.

54

J. Fornari

Finally, the following link shows a video of the computer screen of a typical run of our EA system. It is available in Youtube under the title ‘EVOPIO’ and its direct link is: http://youtu.be/q544QrL4-Nw. In this demonstration, the system starts with 50% of crossover rate and 30% of mutation rate. The first birdsong is heard in the instant 0m03s. In 0m37s mutation rate is lowered to zero. In consequence, the soundscape of birdsongs becomes less variant. In 1m10s, mutation rate is raised to 50% which allows the slow emerging of distinct and unusual birdsongs. In 1m38s lifespan rate is lowered, which slowly shorten the birdsongs duration. In 2m04s proliferation rate is raised, then, in 2m17s, it is lowered to its minimum, which raises the amount of short birdsongs in the population set. In 2m40s lifespan is raised again. In 3m07s proliferation rate is raised and lifespan rate is lowered to its minimum. In 3m27s proliferation rate is raised to almost its maximum, which makes impossible for the EA system to create new individuals faster than other individuals are dying (resulted by the small lifespan rate). The result is that the entire population is finally extinct. All these modifications were done to demonstrate the sonic perceptual changes in the soundscape generated in real-time by the manipulation of global parameters in the current implementation of our EA system.

6

Discussion and conclusions

This paper presented a preliminary study on the creation of a computer model that generates artificial soundscapes of birdsongs by means of a novel EA system that carries an aimless evolutionary process. This one proved to be effective in the creation of artificial soundscapes, a task that seems impossible to be reached by means of deterministic methods. The major difference between our EA system and a typical one is that it does not seek for the best possible solution but focus on the process itself, as a steady generation of similar and variant solutions that together compound the soundscape. This EA system has an infinite convergence time where similar yet variant birdsongs are born, reproduce and die. For that reason our EA system can easily operate in realtime, performing its major task of keeping a process of generating and controlling an artificial soundscape. This system was enhanced by incorporating a visual real-time representation of the soundscape that can be watched in the videos previously mentioned. This simple graphical representation of the individuals moving inside the population set shows their basic behaviours. Each individual is represented by a number. The variation of the position of each individual is represented by the corresponding variation of spatial sound location parameters that control ITD and ILT algorithms, as described in Fornari et al. (2009). The reproduction is triggered by the proximity between pairs of individuals through the calculation of the values of these sound location parameters. Although represented in a plane (i.e., a windows in a computer screen) individuals actually move in the three dimensions of space. The size of the number in the windows (representing the individual) corresponds to the depth of this individual’s location (i.e., the bigger the number, the near is the birdsong). When the individual dies, its number disappear for awhile and reappears when it is reallocated by the system, as a new birdsong. Therefore, each number works as a slot for a birdsong that is currently active (alive). The maximum number of 20 simultaneous individuals in the population set was chosen not because of computational but cognitive capacity. When experimenting with

An evolutionary algorithm to create artificial soundscapes of birdsongs

55

larger populations set for this current EA systems, we realised that more than 20 individuals would not make a perceptual difference in the sonification process. Any typical computers nowadays can easily run this computer model without major processing or memory restrictions. The full version of this Pd implementation can be downloaded by the following link: http://sites.google.com/site/tutifornari/academico/pdpatches. Each birdsongs is a sound object of the artificial soundscape. Sound objects are generated by the instantiation of a physical modelling synthesis algorithm of a generic syrinx computer model controlled by a sequence of 16-element parameters (genes). In the current implementation, this sequence represents both the chromosome and the genotype (as the genotype here is compounded of one single chromosome). The initial genotypes of the population set are randomly created or inserted by the user. This can be done manually or through a Twitter text message, while the system is running. As said, this external input is an enhancement that our EA system does not really depend upon to create a realistic soundscape of birdsongs, but that turns its population set into an open system, which is one of the fundamental conditions to have the emergence of selforganisation. We aim to further explore this interesting feature in future and more complex implementations of this EA system. As said, this current implementation still lacks individuals’ gender, although in this system individuals already reproduce in pairs. Future implementations may explore the design of multiple genders and experiment with them about the distinctions in the sonic aspects of soundscapes generated by n-gender individuals. Currently, individuals’ pairs are selected by spatial sound location proximity. Each individual moves aimlessly inside a sonic field defined by their location parameters. In future implementations this continuous aimless movement can be replaced by a goal-oriented movement, such as individuals foraging for energy intake and preservation (i.e., food, shelter) whose performance may also influence individual’s lifespan. From time to time, the selection process seeks and eliminates individuals with genotype too distant from population average. That helps the entire population set to maintain a certain phenotypical similarity among individuals, specially after long periods of running. However that does not avoid the opposite problem; the occurrence of clones. Mutation is the most important process that avoids the creation of clones. By the action of this operator, the chances of having a clone in the population set are virtually null. Considering that each gene had a numeric scale of only one decimal place (e.g., 0.5) the probability of having a clone (i.e., the same exact 16-element genotype sequence), is (10– 1 16 ) = 10–16, which implies in the probability of having one single clone after 1,000,000,000,000,000 reproductions. The syrinx model was developed as a physical modelling sound synthesiser. As said, this is an adaptation of the algorithm originally introduced by Hans Mikelson and extended by Andy Farnell. This late one also incorporated extra 11 parameters for the emulation of an entire bird melodic phrase, which is (with minor adjustments and adaptations) the computer model used in this work to generate these birdsongs. This syrinx model is very sensitive to parametric changes, which means that the birdsong generated by the syrinx model noticeably changes by any small change of its control parameters. This control is given by the 16-element sequence that is the single chromosome genotype of the EA system. This sequence is inserted into the population set

56

J. Fornari

by the reproduction process or, less frequently, by external input data from Twitter messages. In this work, the Twitter interface was implemented using JavaScript Object Notation (JSON) library (http://www.json.org); a lightweight data-interchange format that handles the communication between Twitter and Pd. This one uses a JSON-based library built for processing (http://www.processing.org), which is another computer environment for real-time data processing, based on Java text programming; instead of visual programming, as Pd. This implementation is called TwitterStream and was able to receive a timeline data from a Twitter account specifically created for this project (named @evopio), and send its retrieved data from processing via Open Sound Protocol (OSC) to Pd, where the EA system was built. Besides the seemingly computational awkwardness of this implementation, the overall system worked well and was able to retrieve messages from the Twitter account and map them into birdsongs. As said, with external input data, the population set behaves as a complex adaptive systems (CAS) with emergent properties, that is self-organised and thus presenting eventual and unexpected sonic changes, created by sound objects acting as interacting agents immersed into an artificial evolutionary process. This complex open system which self-similar features presents a flow of information built by independent and interacting agents; the birdsongs. This CAS presents emergent properties similar to the ones found in natural systems, created by means of natural evolution (Holland, 2006). Future implementations of this EA system may explore the possibility of self-organising soundscapes through data insertion of other types, such as from computer vision (e.g., images retrieved from people walking inside an art installation running this EA system); by motion detection, light sensors, temperature variations; and so forth. This may allow the interaction of multiple users with a single EA system. Although this multiple-user interaction was not tested yet, it seems feasible to suppose that it may create feedbacks between users and the EA system similar to the ones observed in cybernetic sonic environments, created by the interaction of birds and machines, mostly found in urban areas. This can also be enhanced by the usage of a yet to be implemented computer graphic model that generates visual objets corresponding to the sound objects created by each external input data, thus informing each human agent (i.e., users) which one is his/her birdsong in the population set. In the current development of these work, the graphical objects generated here were built by a Pd-extended sub-patch developed using objects from GEM library. The current version of the visual feedback of our EA system aims to help the users to grasp some of the swarming behaviour of individuals participating the evolutionary process. Future implementations may explore the development and implementation of herds and band movement behaviours, as defined by Holland (2006). With that, a future version of this EA system can have the emulation of flocks of individuals moving within a larger and more complex population set. In this current implementation, it may become difficult for the user to observe the birdsong corresponding to his/her Twitter message (if any) as the individuals are represented by numbers. In future implementations individuals can present a more elaborate graphical representation, thus contributing to create a visual metaphor of its sound objects, as an animation more likely to identify and resemble the development of birdsongs. With this, the EA system will have two layers of systemic interactivity:

An evolutionary algorithm to create artificial soundscapes of birdsongs 1

internal

2

external.

57

The internal one will be given by the individuals interaction throughout the processes of selection and reproduction, compounding the soundscape created by a mesh of simultaneous sound-synthesis processes corresponding to the various sorts of similar yet variants birdsongs flourishing from the aimless evolutionary process. The external one will be given by the insertions of external data (Twitter messages, sensors, etc.) from multiple users that will influence the overall genetic pool of the population set. Users will be able to visualised the behaviour of the genotypes they inserted in the population set, by a further and more realistic graphical representation of these individuals, phenotypically expressed as virtual birdsongs. In future works, these two interactive degrees are expected to corroborate with the initial premise of this work, which is the creation of an evolutionary computer model able to successfully emulate the emerging properties of a complex open system composed by internal and external agents that altogether self-organise the population set into a recognisable and meaningful sonic context; a true artificial soundscape of premise birdsongs.

References Asoh, H. and Muhlenbein, H. (1994) ‘On the mean convergence time of evolutionary algorithms without selection and mutation’, Parallel Problem Solving from Nature III. Proc. Int. Conf. Evol. Comput. (Lecture Notes in Computer Science), Vol. 866, pp.88–97. Clarke, J.A. (2004) ‘Morphology, phylogenetic taxonomy, and systematics of ichthyornis and apatornis (Avialae: Ornithurae)’, Bulletin of the American Museum of Natural History, Vol. 286, pp.1–179. Dorsey, J. (2009) ‘Twitter creator Jack Dorsey illuminates the site’s founding document’, LA Times, David Sarno, February 18 [online] http://latimesblogs.latimes.com/technology/2009/02/twitter-creator.html (accessed February, 2014). Doupe, A.J. (1999) ‘Birdsong and human speech: common themes and mechanisms’, Neuroscience, Vol. 22, No. 22, pp.567–631. Eiben, A.E. and Smith, J.E. (2007) Introduction to Evolutionary Computing, 2nd ed., Springer Natural Computing Series, Springer-Verlag Berlin Heidelberg New York. Farnell, A. (2010) Designing Sound, MIT Press, Cambridge, Massachusetts, London, England. Fornari, J. and Eerola, T. (2009) ‘The pursuit of happiness in music: retrieving valence with contextual music descriptors’, Lecture Notes in Computer Science, Computer Music Modeling and Retrieval. Genesis of Meaning in Sound and Music, Vol. 5493, pp.119–133. Fornari, J., Maia, A. and Manzolli J., (2008) ‘Soundscape design through evolutionary engines’, Journal of the Brazilian Computer Society, Vol. 14, No. 3, pp.51–64. Fornari, J., Shellard, M. and Manzolli, J. (2009) ‘Creating soundscapes with gestural evolutionary time’, Article and presentation, SBCM – Brazilian Symposium on Computer Music. Holland, J. (2006) ‘Studying complex adaptive systems’, Journal of Systems Science and Complexity, Vol. 19, No. 1, pp.1–8. Larsen, O.N. and Goller, F. (1999) ‘Role of Syringeal vibrations in bird vocalizations’, The Royal Society, 22 August, Vol. 266, No. 1429, pp.1609–1615. Manzolli, J. and Verschure, P. (2005) ‘Robots: a real-world composition system’, Computer Music Journal, Vol. 29, No. 3, pp.55–74.

58

J. Fornari

Manzolli, J., Shellard, M.C., Oliveira, L.F. and Fornari, J. (2010) ‘Abduction and meaning in evolutionary soundscapes’, 01/2010, Cientifico Internacional, Model-based Reasoning in Science and Technology – Abduction, Logic, and Computational Discovery (MBRBRAZIL), Vol. 1, pp.407–428, Campinas, SP, Brasil. Mikelson, H. (2000) ‘Bird calls’, Csound Magazine, Summer. Moroni, A., Manzolli, J., Von Zuben, F., Gudwin, R. (2000) ‘Vox populi: an interactive evolutionary system for algorithmic music composition’, Leonardo Music Journal, Vol. 10, No. 10, pp.49–54. Reynolds, C.W. (1987). ‘Flocks, herds, and schools: a distributed behavioral model’, in Computer Graphics, Vol. 21, No. 4 (SIGGRAPH ‘87 Conference Proceedings), pp.25–34. Schafer, M.R. (1977) The Soundscape: Our Sonic Environment and the Soundscape’, Destiny Books, Rochester, Vt, ISBN 0-89281-455-1. Smith, J.O. (2006) ‘A basic introduction to digital waveguide synthesis, for the technically inclined’ [online] http://ccrma.stanford.edu/~jos/swgt/swgt.pdf (accessed February 2014). Wiener, N. (1968) Cybernetics and Society: The Human Use of Human Beings, Cultrix, New York.