Direction of Arrival Estimation of Harmonic Signal Using ... - IEEE Xplore

3 downloads 0 Views 928KB Size Report
Abstract—A signal processing algorithm for estimating the direction of arrival of a harmonic signal using only a single sensor is proposed. The method is based ...
2014 IEEE 8th Sensor Array and Multichannel Signal Processing Workshop (SAM)

Direction of Arrival Estimation of Harmonic Signal Using Single Moving Sensor Yusuke Hioka and Masako Kishida Department of Electrical and Computer Engineering University of Canterbury Private Bag 4800, Christchurch 8140, New Zealand

only a single sensor to estimate the DOA by exploiting the signal modulation, or the Doppler effect, in the observation caused by the movement of the sensor. The idea of moving a sensor array has been proposed in various fields such as audio [7], [8] and Sonar [9], [10], however, none of them used a single sensor. Nevertheless, given the novelty of the architecture, only the simplest case with a pure tone (i.e. a sinusoidal signal) was studied in an earlier attempt [6]. Hence, further study of its application has yet to be investigated. In many of the above-mentioned applications that employ DOA estimation, the observed signal often has a harmonic structure in its spectrum. Vowels in speech are well-known example; each of the vowels comprise pitch and harmonic components, and the distribution of the spectral strength of each harmonic component varies depending on the phoneme [11]. In this paper, DOA estimation based on CAROUSEL is applied with a focus on signals with harmonics. The rest of this paper is organised as follows. In Sec. II the modelling of signals observed by CAROUSEL is briefly reviewed, then a procedure to estimate DOA of a harmonic signal is introduced in Sec. III. The performance of the method is evaluated by simulation using various synthesised voiced phonemes in Sec. IV. Finally the paper is concluded with some remarks in Sec. V.

Abstract—A signal processing algorithm for estimating the direction of arrival of a harmonic signal using only a single sensor is proposed. The method is based on the recently invented architecture called CAROUSEL, which makes use of the Doppler effect caused by observing signals using the sensor in a circular motion. The proposed method estimates the direction of arrival from the amount of phase shift of a harmonic component caused by the Doppler effect. Simulation using various harmonic signals including synthesised speech phonemes prove the proposed algorithm is able to accurately estimate the direction of arrival of the harmonic signals. Index Terms—DOA estimation, sensor array, harmonics, speech, CAROUSEL

I. I NTRODUCTION Direction of arrival (DOA) estimation is one of the key research subjects in sensor array signal processing and employs the estimated angle of the signal sources for a variety of purposes [1]. For example, in audio signal processing, the DOA information is often used as a priori knowledge of applications, e.g. speech enhancement, noise reduction, and acoustic echo cancelling [2]. Signal processing algorithms for DOA estimation using sensor arrays have been extensively studied over the last few decades. Various algorithms have been invented which include the following well-known methods: beamforming [1], generalised cross-correlation [3], the subspace methods such as MUSIC [4], and ESPRIT [5]. Although each of these algorithms has different advantages and disadvantages, the need for multiple sensors is common to all of these conventional methods. In principle a sensor array comprising more than one sensor and located at different positions is required to extract spatial information of the sources from the incoming signals. Generally speaking, increasing the number of sensors used in a sensor array will improve the accuracy of the DOA estimation, but naturally it also increases the cost for the implementation. Although the performance of the processors has significantly improved while the cost for the processors has been in decline, use of only a single sensor is still more cost effective because a multichannel system requires not only plural sensors but also the same number of peripheral devices for the interface, e.g. AD/DA converters and signal amplifiers, in its implementation which would increase overall cost. A novel sensing architecture named CAROUSEL (CirculARly mOving sensor for USE of moduLation effect) was recently proposed by the authors [6]. CAROUSEL requires

978-1-4799-1481-4/14/$31.00 ©2014 IEEE

II. S IGNAL O BSERVATION BY CAROUSEL With the support of the CAROUSEL architecture, a single sensor moves along the circumference of a circle of radius r, and observes a signal arriving from a source as illustrated in Fig. 1. Let the azimuth angle of the sensor at time instant t = 0 be θ(0) = 0, the position of the sensor at time instant t ≥ 0 is p(t) = [r cos θ(t), r sin θ(t), 0]T ,

(1)

T

where denotes the transpose. Assuming the distance to the signal source is sufficiently far from the coordinate’s origin compared to the radius of the circular trajectory, the plane wave model can be adopted to the signal observation [1]. The look direction vector towards the signal source located in the direction of azimuth angle θs and polar angle φs is defined by u = [cos θs sin φs , sin θs sin φs , cos φs ]T .

(2)

Assuming the source emits a stationary sinusoid with its harmonic components, the signal observed at the origin can

1

2014 IEEE 8th Sensor Array and Multichannel Signal Processing Workshop (SAM)

source





z



φs u r θ(t)



θs p(t)

y

x

sensor trajectory

Fig. 1.

DOA estimation problem using CAROUSEL.

B. Estimation Procedure ˙ Under the assumption that |θ(t)|  ω0 , for each l, r ml (t) := lω0 sin(φs ) cos(θ(t) − θs ) + ψol c

L X

Asl cos (lω0 t + ψol ) ,

(6)

is approximately constant during one fundamental cycle of the source signal. With this approximation along with orthogonality of trigonometric functions, Z t As s(τ ) sin(lω0 τ )dτ ≈ − l l sin (ml (t)) , 2 t−2π/ω0 Z t Asl l cos (ml (t)) . s(τ ) cos(lω0 τ )dτ ≈ 2 t−2π/ω0

be modelled by the superposition of the pitch and its L − 1 harmonic components as sorigin (t) =

radius of the circular motion r [m] of the sensor, angular ˙ [rad/s] of the location θ(t) [rad] and angular velocity θ(t) sensor are known a-priori, source is located sufficiently far from the sensor (i.e. plane wave model can be used) and is fixed in time, source signal is a sum of sinusoids with constant pitch frequency f0 = ω0 /2π [Hz] with different constant amplitude Asl where only f0 is known a-priori, ˙ the angular speed of the sensor |θ(t)| is much lower than the signal frequency ω0 .

(3)

l=1

Hence,

where Asl and ψol are the amplitude and the phase of the l-th order harmonic component, respectively, and ω0 is the pitch frequency of the signal. Due to the stationarity of the signal, both Asl and ω0 are assumed to be time-invariant. Considering the time difference of arrival of the signal between the origin and the position of the sensor, the signal observed by the moving sensor at time t with respect to the signal at the origin s(t) is s(t) =

L X l=1

=

    pT (t)u + ψol Asl cos lω0 t + c

ml (t) ≈ tan

−1

Rt ! − t−2π/ω0 s(τ ) sin(lω0 τ )dτ Rt s(τ ) cos(lω0 τ )dτ t−2π/ω0

(7)

and its time derivative m ˙ l (t) can be continuously computed. Furthermore taking the normalized integral of (7) for one cycle of the sensor motion will produce the estimate of the phase of the l-th order harmonic component, R θ0 +2π R t+R ˙ )dτ ml (t)dθ ml (τ )θ(τ θ0 ψol ≈ , (8) = t R t+R R θ0 +2π ˙ )dτ θ(τ dθ

(4)

t

θ0

L X

i   h r Asl cos lω0 t + sin(φs ) cos(θ(t) − θs ) + ψol , c l=1 (5)

where R is the time for the sensor to complete one cycle of its circular motion, and θ0 can be any angle of the sensor location θ0 = [0, 2π). From (7) and (8),

where c is the propagation speed of the signal. The observed signal in (5) can be seen as the phase modulated signal of sorigin (t) due to the Doppler effect caused by the circular motion of the sensor. With the CAROUSEL architecture, one could estimate the DOA of the signal by detecting the amount of the phase shift in the observation.

r lω0 sin(φs ) cos(θ(t) − θs ) ≈ ml (t) − ψol . c

(9)

Noting that r m ˙ l (t) lω0 sin(φs ) sin(θ(t) − θs ) ≈ , ˙ c θ(t)

III. DOA ESTIMATION OF HARMONIC SIGNAL

(10)

it can be found that

A. Problem Description The problem considered in this study is estimating both the azimuth and polar angles of the signal source, i.e. θs and φs defined in Fig. 1, from the observed signal s(t) acquired by the sensor based on CAROUSEL architecture when the following assumptions hold:

−1

φs = sin

c rω0

r

2 ˙ (ml (t) − ψol )2 + m˙ l (t)/θ(t) 

! , (11)

ˆ 2π), θs = mod (θ(t) − θ(t),

2

(12)

2014 IEEE 8th Sensor Array and Multichannel Signal Processing Workshop (SAM)

where

TABLE I PARAMETER VALUES USED IN THE SIMULATION .

ˆ = θ(t)         ml (t) − ψol   −1 r ,  m ˙ l /θ˙ ≤ 0 cos       2   ˙  (ml (t) − ψol )2 + m ˙ l (t)/θ(t)         ml (t) − ψol  , m  r 2π−cos−1 ˙ l /θ˙ > 0,      2   2  ˙ (ml (t) − ψol ) + m ˙ l (t)/θ(t)

Sampling frequency: F Radius of sensor motion: r Wave speed: c Source amplitude: Asl Duration of observation Azimuth angle: θs Polar angle: φs Angular speed of sensor: θ˙ Order of extracted harmonic l

absolute error [rad]

(13) and mod (a, b) := a − (bba/bc)), with bxc denoting the largest integer not greater than x. Note that m ˙ l (t) = 0 when ˙ = 0 so that m ˙ θ(t) ˙ l (t)/θ(t) is well-defined.

absolute error [rad]

C. DOA Estimation by Sinusoidal Curve Fitting An ideal continuous time estimation procedure was discussed in the previous subsection, however, the algorithm will be implemented in discrete time in practice. One of the issues in discrete time implementation is the error that arises when computing m ˙ l (t), which needs to be approximated by a backward difference. To mitigate this discretisation error, a sinusoidal curve fitting can be applied to the sampled discrete time signal m[n] to obtain the refined signal m[n]. ¯ Note the signal with a square bracket denotes a discrete time signal and n is the sample index. Assuming the sensor speed is constant, ˙ = θ, ˙ m[n] i.e. θ(t) ¯ can be chosen as ¯ cos(θnT ˙ m[n] ¯ =λ −σ ¯ ),

Fig. 2.

λ,σ

X

(14)

  2 ˙ λ cos θkT − σ − m[k] ,

k

are the functions of DOA. k for summation  should be chosen ˙ long enough. The fitting function, λ cos θkT − σ , was chosen by observing (6) that implies the frequency of the signal m[n] is equivalent to the sensor speed (which is assumed to be a constant). Hence, in the ideal sampling, m[n] is in the ˙ ¯ and σ form of λ cos θnT − σ . Once the parameters λ ¯ are obtained, the continuous time message can be constructed as ¯ cos(θt ˙ −σ m(t) ¯ =λ ¯ ),

π 6 π 3

2π 1

[Hz] [m] [m/s] [sec] [rad] [rad] [rad/s]

polar angle

10-6 10-8 10-10 10-12 10-14 1 10 0

10

102 pitch frequency [Hz] 103 azimuth angle

104

102 pitch frequency [Hz] 103

104

10-2 10-4 10-6 10-8 1 10

Average error of estimated angles for different pitch frequencies.

investigated. Fig. 2 summarises the absolute error of the estimated angles. The maximum order of the harmonics was set to 5 in order to avoid spectral aliasing at the highest pitch frequency, i.e. 3.2 kHz. For verifying the effectiveness of the proposed algorithm, no AWGN was added to the observation in this test. The estimation error decreased for both azimuth and elevation angles as the pitch frequency increased. This trend can be explained by looking at the amount of phase shift caused by the Doppler effect. By comparing (3) with (4) the amount of phase shift is proportional to the pitch frequency, thus the proportion of the errors in the phase shift becomes smaller when the pitch frequency is higher. Thus, the more accurate DOA estimation can be achieved by focusing on the higher frequency component in the signal. For verifying the capability of the proposed method for practical harmonic signals, DOA of five different voiced phonemes in speech signals, i.e. /a/, /e/, /i/, /o/, and /u/, with up to the 15-th harmonics generated by the formant synthesiser [11] was estimated by the proposed method. Fifty trials were conducted with randomly generated AWGN added to the signal at different input signal-to-noire ratio (SNR) for

where ¯ σ {λ, ¯ } = arg min

10-4

32000 0.05 340 2000 3

(15)

and can be used to find DOA as in (11) and (12). IV. S IMULATION R ESULTS Performance of the proposed method was evaluated by simulation. The method was applied to the DOA estimation of audio signals using an omni-directional microphone as the sensor. Parameters specified in Table I were used in the simulation unless otherwise specified. An additive white Gaussian noise (AWGN) was added to the observed signal that simulates the thermal noise of the microphone. Firstly the accuracy of the estimated DOA for harmonic signals with different pitch (fundamental) frequencies were

TABLE II PARAMETERS USED FOR SYNTHESISING VOCAL PHONEMES . Phoneme /a/ /e/ /i/ /o/ /u/

3

Formant frequencies 700, 1220, 2600 450, 1900, 2400 250, 2100, 3100 450, 900, 2600 250, 1400, 2200

Formant bandwidth 50, 60, 100 50, 70, 90 40, 50, 170 40, 50, 60 50, 50, 50

2014 IEEE 8th Sensor Array and Multichannel Signal Processing Workshop (SAM)

polar angle absolute error [rad]

absolute error [rad]

polar angle 100 10-1 10-2 10

-3

/a/

Input SNR:

0 dB

/e/

/i/ phoneme

10 dB

20 dB

40 dB

30 dB

/o/

100 10-1 10-2 10-3

/u/

/a/

Input SNR:

0 dB

/e/

/i/ phoneme

10-1 10

-2

10-3

/a/

Input SNR:

0 dB

/e/

/i/ phoneme

10 dB

20 dB

40 dB

30 dB

/o/

/u/

azimuth angle

100

absolute error [rad]

absolute error [rad]

azimuth angle

10 dB

20 dB /o/

40 dB

30 dB

100 10-1 10-2 10-3

/u/

Fig. 3. Average error of estimated DOAs for different phonemes of pitch frequency 150 Hz. The DOA was extracted from the pitch component (l = 1).

/a/

Input SNR:

0 dB

/e/

/i/ phoneme

10 dB

20 dB

40 dB

30 dB

/o/

/u/

Fig. 4. Average error of estimated DOAs for different phonemes of pitch frequency 250 Hz. The DOA was extracted from the pitch component (l = 1).

absolute error [rad]

polar angle

each case. The formant frequencies and bandwidths used for synthesising the phonemes are listed in the Table II where up to the third formant was considered for generating each phoneme. Fig. 3 and Fig. 4 show the absolute error of the estimated DOA for each phoneme where two different pitch frequencies, 150 Hz and 250 Hz, were chosen by taking notice of the pitch difference between male and female speech. Slightly more accurate estimation was achieved when the pitch frequency was higher, which agrees with the finding in Fig. 2. The estimation accuracy also varied depending on the phonemes particularly when the input SNR was low, e.g. estimated DOA of /a/ was more affected by the noise than that of /u/. Because the DOA is estimated by the extraction of ml (t) by (7), its accuracy will be degraded if the l-th order harmonic in the observed signal is contaminated by the noise. Since the first formant frequency of /a/ is much higher than its pitch frequency, the SNR at the pitch component was relatively low, which should have caused the poor DOA estimation accuracy. Fig. 5 summarises the estimation error of the same signals in Fig. 4, i.e. phonemes with 250 Hz pitch frequency, but the DOA information was extracted from the third harmonic component of the observed signal (l = 3). The result is self-explanatory that the phonemes whose formant frequencies were closer to the frequency of the extracted harmonic, i.e. 750 Hz, provided more accurate estimation even though the input SNR was low. This concludes that the harmonic order l for extracting ml (t) should be chosen so that the harmonic with high SNR is utilised for the DOA estimation.

100 10-1 10-2 10-3

/a/

Input SNR:

0 dB

/e/

/i/ phoneme

10 dB

20 dB

40 dB

30 dB

/o/

/u/

absolute error [rad]

azimuth angle 100 10-1 10-2 10-3

/a/

Input SNR:

0 dB

/e/

/i/ phoneme

10 dB

20 dB /o/

40 dB

30 dB /u/

Fig. 5. Average error of estimated DOAs for different phonemes of pitch frequency 250 Hz. The DOA was extracted from the third harmonics (l = 3).

stationary signals such as speech where the pitch frequency continuously varies and the signal’s energy fluctuates a lot. R EFERENCES [1] D. H. Johnson and D. E. Dudgeon, Array signal processing : concepts and techniques, Prentice Hall, Englewood Cliffs, NJ, 1993. [2] M. Brandstein and D Ward, Microphone Arrays, Springer, 2001. [3] C. Knapp and G. Clifford Carter, “The generalized correlation method for estimation of time delay,” IEEE Transactions on Acoustics, Speech and Signal Processing, vol. 24, no. 4, pp. 320–327, 1976. [4] R. Schmidt, “Multiple emitter location and signal parameter estimation,” IEEE Transactions on Antennas and Propagation, vol. 34, no. 3, pp. 276–280, Mar. 1986. [5] R. Roy and T. Kailath, “Esprit-estimation of signal parameters via rotational invariance techniques,” IEEE Transactions on Acoustics, Speech and Signal Processing, vol. 37, no. 7, pp. 984–995, 1989. [6] M. Kishida and Y. Hioka, “Circularly moving sensor for use of modulation effect,” in Proceedings of 7th International Conference on Sensing Technology (ICST2013), Dec. 2013. [7] A. Cigada, M. Lurati, F. Ripamonti, and M. Vanali, “Moving microphone arrays to reduce spatial aliasing in the beamforming technique: theoretical background and numerical investigation.,” Journal of Acoustical Society of America, vol. 124, no. 6, pp. 3648–3658, 2008. [8] A. Schasse, C. Tendyck, and R. Martin, “Source localization based on the doppler effect,” in Proceedings of International Workshop on Acoustic Signal Enhancement (IWAENC2012), 2012, pp. 1–4. [9] M.P. Hayes and P.T. Gough, “Synthetic aperture sonar: A review of current status,” IEEE Journal of Oceanic Engineering, vol. 34, no. 3, pp. 207–224, 2009. [10] R. E. Hansen, Introduction to Synthetic Aperture Sonar, chapter 1, InTech, 2011. [11] S. Furui, Digital Speech Processing: Synthesis, and Recognition, Marcel Dekker, 2 edition, 2000.

V. C ONCLUSION DOA estimation of a harmonic signal using a single sensor based on the CAROUSEL architecture was proposed. The method extracts the DOA information included in the observed signal by utilising the orthogonality of the harmonic components. Simulation experiments revealed the method is able to accurately estimate the DOA of different harmonic signals such as vocal phonemes. Application of the proposed method to practical problems is a future research subject. Further study is needed to investigate the detrimental effects of non-

4