A Historical Perspective of Spectrum Estimation

7 downloads 4805 Views 3MB Size Report
is to data or physical phenomena, and b) whenever the word modifies “estimation .... tions involving cos kx, and so the possible solutions are re- duced to the two ...
PROCEEDINGS IEEE,OF THE

VOL. 7 0 , NO. 9 , SEPTEMBER 885 1982

A Historical Perspective of Spectrum Estimation ENDERS A. ROBINSON Invited Paper

Alwhrct-The prehistory of spectral estimation has its mots in ancient times with the development of the calendar and the clock The work of F’ythagom in 600 B.C. on the laws of musical harmony found mathematical expression in the eighteenthcentury in terms of the wave equation. The s t r u e to understand the solution of the wave equation was f h l l y resolved by Jean Baptiste Joseph de Fourier in 1807 with his introduction of the Fourier series TheFouriertheorywas extended to the case of arbitrary orthogollpl functions by S t m n and Liowillein 1836. The Stum+Liouville theory led to the greatest empirical s u m of spectral analysis yet obbhed, namely the formulo tion of quantum mechnnics as given by Heisenberg and SchrMngm in 1925 and 1926. In 1929 John von Neumann put the spectral theory of the atom on a Turn mathematical foundation inhis spectral represent, tion theorm in Hilbert space. Meanwhile, Wiener developed the mathematical theory of Brownian mavement in 1923, and in 1930 he induced genernlized harmonic analysis, that is, the spectral repreaentation of a stationuy random process The cOmmon ground of the spectral repreaentations of von N e u m m and Wiener is the Hilbert space; the von N e u m m result is for a Hermitian operator, whereas the Wiener result is for a unitary operator. Thus these two spectral representations are d a t e d by the Cayley-MBbius transformation. In 1942 Wiener a p plied his methods to problems of prediction and filtering, and his work wasinterpretedandextendedby Norman Levinson. Wienerin his empiricalwork put m m emphasison the autocorrelationfunction than on the power spectrum Themodernhistoryof spectral eaimation begins withthe breakthrough of J. W. Tukey in 1949, which is the statistical counterpart of the breakthrough of Fourier 142 years earlier. This result made pos sible an active development of empirical spectral analysis by research workers in dl scientifiidisciplines However, spectral analysiswas computationallyexpensive. A majorcomputational breakthrough occurred with the publication in 1965 of the fast Fourier transform dge rithmbyJ. S. Cooleyand J. W. Tukey.TheCooley-Tukeymethod made it practical to do signal pmcessiy on waveforms in either the time or thefrequencydomain,sometlungneverpracticalwithcontinuous systems. The Fourier trpnaOrm became not just a theoretical description, but a t o d With the development of the fast Fouriertram form the feld of empirical .spectral analysisgrew from obscurity to importance, and is now a major discipline. Further important contributions were the introduction of maximum entropy spectral analysis by John Burg in 1967, the development ofspectral windows by Emmanuel Parzen and others starting in the 195O’s, the statistical work of Maurice Priestley and his s h o d , hypothesis testing in time series analysis by PeterWhiffle startiug in 1951, theBox-Jenkinsapproachby George Box and G. M. Jenkins in 1970, and autorepshe spectral estimation and order-determining criteria by E. Parzen and H. Akaike starting in the 1960’9 To these statistical contributions mustbe added the equdly importantengineeringcontributions to empirid spectrum analysis, which are not treated at dl in this paper, but form the subject matter of the other papen in this special issue.

s

I. INTRODUCTION

PECTRAL estimation has its roots in ancient times, with the determination of the length of the day, the phases of the moon, and the length of the year. The calendar and the clock resulted from empirical spectral analysis. In modem Manuscript received January 22, 1982; revised April 28, 1982. Thls paper is dedicated to Norman Levinson, August 11, 1912-October 10, 1975, my teacher in differential equations. Theauthor is with the DepartmentofTheoreticalandApplied Mechanics and theDepartmentofGeologicalSciences,Cornell University, Ithaca, NY 14853.

times, credit for the empirical discovery of spectra goes to the diversified genius of Sir Isaac Newton [ 11. But the great interest in spectral analysis made its appearanceonlya little more than a century ago. The prominent German chemist Robert Wilhelm Bunsen (18 1 1-1899) repeated Newton’s experiment of the glass prism. Only Bunsen did not use the sun’s rays as Newton did. Newtonhad found that aray of sunlight is expanded into a band of many colors, the spectrum of the rainbow. In Bunsen’s experiment, the role of pure sunlight was replaced by the burning of an old rag that had been soaked in a salt solution (sodium chloride). The beautiful rainbow of Newton did not appear. The spectrum, which Bunsen saw, only exhibited a few narrow lines, nothing more. One of the lines was a bright yellow. Bunsen conveyed this result to Gustav Robert Kirchhoff (1824-1887),another well-known German scientist.They knew that the role of the glass prism consisted only in sorting the incident rays of light into their respective wavelengths (the process known as dispenion). The Newtonrainbow was the extended continuous band of the solar spectrum; it indicates that all wavelengths of visible light are present in pure sunlight. The yellow line, which appeared when the light source was a burning rag, indicated thatthespectrum of tablesalt contained a single specific wavelength. Further experiments showed that this yellow line belonged to the element sodium. No matter what the substance in which sodium appeared, that element made its whereabouts known by its bright yellow spectral line. As time went on, it was found that every chemical element has its own characteristic spectrum, and that the spectrum of a given element is always the same, no matter in what compound or substance the element is found. Thus the spectrum identifies the element, and in this waywe can tell what elements are in substances fromthedistant stars to microscopic objects. The successes of spectral analysis were colossal. However, the spectral theory of the elements could not be explained by classical physics. As we know, quantum physics was born and spectral theory was explained in 1925 and 1926 by the work of Werner Heisenberg (1901-1976) and Erwin Schrijdinger (1887-1961). In this paper, we will show how spectral theory developed in the path to this great achievement. Although most of the glamour of spectral theory has been associated with quantum physics, we will not neglect the parallel pathtakenin classical physics. Although thetwo paths began diverging with the work of Charles Sturm (1803-1855) and Joseph Liouville (1809-1882) on the spectral theory of differential equations, we wiU see that thefinal results, namely, the spectral representation of John von Neumann(1903-1957) for quantumphysics, and that of Norbert Wiener (1894-1964) for classical physics, are intimately related, Because light has high frequencies, our instruments cannot respondfastenough to directly measure the waveforms. Instead, the instruments measure the amount of energy in the

0018-9219/82/0900-0885$00.75 0 1982 IEEE

886

PROCEEDINGS VOL. IEEE,OF THE

frequencybands of interest. The measurementand analysis of the spectra of other typesof signals, however, take different forms. With lower frequency signals, such as mechanical vibrations, speech, sonar signals, seismic traces, cardiograms, stock market data, and so on, we can measure the signals as functions of time (that is, as time series) and then find the spectra by computation. With the advent of the digital computer, numerical spectrum estimation has become an important field of study. Let us saya few words aboutthe terms“spectrum”and “spectral.” Sir Isaac Newton introducedthe scientific term “spectrum” using the Latin word for an image. Today, in English, we have the word specter meaning ghost or apparition, and the corresponding adjective spectral. We also have the scientific word spectrum and the dictionary lists the word spectrul as the corresponding adjective. Thus“spectral” has two meanings. Many feel that we should be careful to use “spectrum” in place of “spectral” a) whenever the reference is to data or physical phenomena, and b) whenever the word modifies “estimation.”Theyfeel thatthe word “spectral,” withits unnecessary ghostly interpretations, should be confined to those usages in a mathematical discipline where the term is deeply embedded. The material in this paper through Section XI1 surveying the period fromantiquitythrough Levinson and Wiener can be described as “The Prehistory of Spectrum Estimation” to emphasize that spectrum estimation is interpreted as estimation from data.The remaining sections may be described as “Some Pioneering Contributions to the Development of Methods of Spectrum Estimation.” Modem spectrum estimation began with the breakthrough for the analysis of short time series made by J. W. Tukey in 1949. This work led t o a great blossoming forth of spectrum analysis techniques. Despite the advances in digital computing machinery, such computations were still expensive. The next greatbreakthroughoccurredwith the discovery of the fast Fourier transform in 1965 independently by J. W. Cooley and J. W. Tukey and by Gordon Sande. This development, in conjunction with silicon chip technology, has brought spectrum analysis to bear on a wide range of problems. Another breakthrough occurred with the introduction of maximumentropy methods into spectrumanalysis by John Burg in 1967.

borhood of a certain point x = a, into an infinite series whose coefficients are the successive derivatives of the function at the given point

11. TAYLORSERIES At the time when calculus was introduced in theseventeenth century by Newton and Leibnitz, the concept of amathematical “function” entailed restricted properties, which in the course of time were gradually made less severe. In those days, the observations of natural events seemed to indicate that continuous relations always existedbetween physical variables. This viewwas reinforced by the formulation of the laws of nature on the basisof differentialequations, as exemplified by Newton’s laws. Thus it became commonplace to assume thatanyfunction describing physical phenomena would be differentiable. The idea of a functionthat changes in some capricious or random way, and thus does not allow any analytic formula forits representation, did notenterintothe thinking of the mathematicians of that time. It was thus very natural for Brook Taylor (1 685-1 73 1)[ 21, a contemporary of Newton, t o introduce the concept of “analytic function.” The Taylor series expands an analytic function as an infinite summation of component functions. More precisely, the Taylor series expands a function f ( x ) , which is analytic in the neigh-

70, NO. 9 , SEPTEMBER 1982

Thus analytic functions are functions which can be differentiated to any degree. We know that the definition of the derivative of any order at the pointx = a does not require more than knowledge of the function in anarbitrarily small neighborhood of thepoint x = u. The astonishing property of the Taylor series is that the shape of the function at a finite distance h from the point x = u is uniquely determined by the behavior of the function in the infinitesimal vicinity of the point x = u. Thus the Taylor series implies that an analytic function has a very strong interconnected structure; by studying the function in a small vicinity of the point x = a, we can precisely predict what happens at the point x =(I + h, which is at a finite distance from the point of study. This property, however, is restricted to the classof analytic functions. The best known analytic functions are, of course, the sine and cosine functions, the polynomials, and the rational functions (away from their poles).

111. THE DANIELBERNOULLISOLUTIONOF THE WAVE EQUATION Thegreat Greek mathematicianPythagoras (ca 600 B.C.) was the f m t t o consider a purely physical problem in which spectrum analysis made its appearance. Pythagoras studied the laws of musical harmony by generating pure sine vibrationson a vibrating string, fixed at its two endpoints. This problem excitedscientists since ancientdays, butthe mathematical turningpoint came in the eighteenth century when it was recognized that the vertical displacement u ( x , t ) of the vibrating string satisfies the wave equation a2u 1 -aZu -_ax2 c2 at2 - 0.

Here x is the horizontal coordinate and t is the time. The constant c is a physical quantity characteristic of the material of the string, and represents the velocity of the traveling waves on the string. Because the endpoints x = 0 and x = 71 are fixed, we have the boundary conditions

u(0, t ) = u ( s , t ) = 0. (Note that for simplicity, we have taken the string t o be of length r.) The problem of constructing the solution of the wave equation was thenattackedby some of the greatest mathematicians of all time, and in so doing, they paved the way for the theory of spectrum analysis. One of the finest results was that of Daniel Bernoulli (1 7001782) [3] in 1738. He introduced the method of separation of variables in which atrialsolution is constructed as the product of a function of x alone, and a function of t alone. Thus, he wrote u(x, t ) =X ( x ) T(t).

Putting this trial solution in the differential equation and solving, he found thesolutions cos kx cos kcr,

cos kx sin kct

sin kx cos kct,

sin kx sin kct.

ROBINSON:HISTORICALPERSPECTIVE

OF SPECTRUM ESTIMATION

However, the boundary condition at x = 0 excludes the solutions involving cos k x , and so the possible solutionsare reduced to the two choices sin kx cos kct,

sin kx sin kct.

The boundary condition at x = n requires that the value of k to be an integer. In view of the linearity of the wave equation, any superposition of solutions gives a solution. Bernoulli thus gave the following solution: 00

U(X,

t )=

sin kX(Ak

COS

kct

+ Bk sin k c t )

k=1

where the A k and Bk are arbitrary constants. Bernoulli made the claim that this infinite sum is the general solution of the equation for the vibrating string. Theimplications of Bernoulli’sclaimwere startling. From the principles ofmechanics it was knownthatthe initial displacementandinitial velocity of the string could be prescribed in an arbitrary way. That is, it was known that at the initial time t = 0, both u ( x , 0) and & ( x , 0) could have any functional form. (Note that the dot over a function indicates differentiation with respect to time, so & represents the velocity of the string in the vertical direction.) However, Bernoulli’s solution gives explicit expressions for initial displacement and initial velocity, namely,

x 00

U(X,

0)=

A & sin kx

k=1

Thus Bernoulli‘s solution implied that each of two arbitrary functions u ( x , 0) and ti ( x , 0) could be expandedin theinterval 0 < x 4 n in the form of an infinite series of sine functions. However, this result could not be explicitly demonstrated in Bernoulli’s time. Bernoulli’s result can be expressed in the following way. Let the initial displacement u ( x , 0) beanarbitrarynonanalytic function f ( x ) . Then we have the expansion 00

f(x) =

A & sin kx k=0

which says that a nonanalytic function f ( x ) can be expressed as an infinitesummation of analyticfunctions sin kx with weighting coefficients A & . This result was aparadox at the time,and it led to ahistoricalcontroversy of whetherthe function f(x) could be freely chosen or must be restricted to the class of analyticfunctions.Fromthephysicalpoint of view, f(x),which is the initial displacement of the string, could be freely chosen. From the then contemporary mathematical point of view, f(x), which is an infinite summation of analytic functions, must be analytic. This view was believed by all the eminent mathematicians of the day. Two of the greatest mathematicians who ever lived then set out to frnd the coefficients A k of this expansion. Multiply each side by sin nx and integrate between 0 and a. Because

in

sin kx sin nx dx =

I

n/2, 0,

when k = n when k # n

the result found by L. Euler (1707-1783)

[ 4 ] and J. L. La-

grange (1736-1813) [SI is

A, =

2n J

n

f(x) sin nx dx.

This is the point at which the question stood at the startof the nineteenth century.

Iv. JEAN BAPTISTE JOSEPH DE FOURIERAND . SINUSOIDAL SPECTRAL THEORY

THE

On December 21, 1807 the engineer Jean Baptiste Joseph de Fourier (1768-1830) [ 61 addressed the French Academy and made a claim that appeared incredible to the eminent mathematicianswho were members of the Academy. As it turned out, one of the greatest advances in the historyof mathematics, an innovation which was to occupy much of the attention of the mathematical community for over a century, was made by an engineer. Fourier said at that historic meeting that an arbitrary function, defined over a finite interval by any rough and even discontinuous graph, could be represented as an infinite summation ofcosine andsinefunctions.The distinguished and brilliant academicians questioned the validity of Fourier’s theorem,forthey believed that any superposition of cosine and sine functions could only give an analytic function, that is, an infinitely differentiable function, An analytic function, of course, could never be discontinuous, and thus was very far removed from some arbitrarily drawn graph. In fact, Taylor’s theorem stated that an analytic function had the property that, given its shape in an inffitesimal interval, the continuation of its course to the right and left by finite amounts was uniquely determined(the so-calledprocess of analyticcontinuation). The academicians and the other great mathematicians of the time could not reconcile the property of analytic continuation with Fourier’s theorem. How could the physical reasoning of an engineer stand up against the weight of the analytic reasoning of some of the most eminent mathematicians of all time? These were the days when many great men were at the peak of their powers. YetFourierstoodalone in defending his theorem. Aswe have seen, the concept of analytic function requires a strong interconnection of the valuesof a function, where knowledge atonepoint allows us to predict the value at a point at afinitedistance h . This prediction mechanism is embodiedintheTaylor series expansion. However, anonanalytic function, such as a rough and discontinuous function, does not demand any such prediction mechanism between the immediate vicinity of a point and its wider surroundings. The Fourier series expansion is stated in terms of this wider concept of function.Thecoefficients of aFourier series, as shown by the Euler-Lagrange result, are obtained by integration and not by differentiation as in the caseof the Taylor series. Each Fourier coefficient A , is obtained by integrating f ( x ) sin nx over the entire range. Thusanymodification of f(x) in a limited portion of the range changes all of the Fourier coefficients. It follows thattheinterconnectionsoperatein the Fourier series in a global sense and not in a localsense as in the case of the Taylor series. It is the behavior of f(x) in the large that matters in the case of the Fourier series, and not so much the behavior in the vicinity of a point. How can we resolve the differences between these two types of expansions: the Taylor series, which is the expansion about a point which gives strict predictions a finite distance from the point, and the Fourier series, which is an expansion in the large and which

PROCEEDINGS VOL. IEEE,OF THE

888

gives knowledge of thefunctionintheentire range. The Taylor series requiresunlimiteddifferentiability atapoint, whereas the Fourier series does not demand any differentiability properties whatever. Surprisinglyenough, the chasm between the Taylor series and the Fourier series is bridged by means of the z-transform, used inthetheory of which is thefundamentaltransform digital signal processing. Let us consider an analytic function f(z) of the complex variable

We now expand the function in a Taylor series (in the variable z-l) about the point z-l = 0, to obtain the z-transform anz-.

f(z) = n=O

The radius of convergence of this series extends from z-' = 0 to the fust singular point, say, z0' . A singular point is a point where the function ceases to be analytic. The region of convergence of the Taylor series expansion of f(z) is the region in the z-plane outside the circle of radius Izol; that is, the region of convergence is for all points z such that I z - ' ~ < IzO'l or equivalently IzI > IzoI. Let us now write the Taylor series expansion for points on the unit circle z = cos 8 - j sin 8. We have

n=o

which is in the form of a complex Fourier series inthe angle 8. Three cases can occur. In the f i t case, the singular point zo is inside the unit circle in the z-plane. In this case, the function is analytic on the unit circle and the Fourier series thus is an analytic representation of this analytic function, The French Academy believed this was the only case. In the second case, the singular point zo is outside the unit circle. In this case, the Taylor series does not represent the function, and so we will not consider the case further. The third case is the interesting one, and is the case which resolves the mathematical contre versy which led up to Fourier's discovery in 1807. When the singular point zo lies on the unit circle, the Taylor series will not converge at some or all of the points on the unit circle. Thus the Taylor series defines an analytic function, which is differentiable to anyorderoutsidetheunitcircle,butthe function becomes nonanalytic at some or all of the points on the unit circle. The Fourier series in 8 is the Taylor series for z on the unit circle, and thus the Fourier series represents a function in the variable 8 , which is nonanalytic at some or all of the points in its range - A Q 0 < A. A small modification of the Fourier coefficients that would move the singular point zo from on the unitcircle to justinside the unit circlewould change anonanalyticFourierrepresentation to ananalytic one. The amazing thing is that it is enough to move the singularity from the periphery of the unit circle to the inside by an arbitrarily small amount, in order to change the given nondifferentiable function in 8 to one which can be differentiated any number of times. Thus the mistake of the great French mathematicians of the prestigious French Academywho wanted to restrict the validity of Fourier series to analytic functions depended entirely on that extremely small but f i i t e distance from a point on the periphery to a point just inside the unit circle. A function can be extremely smooth right up to the unit circle, and then disintegrate into a rough and distorted

7 0 , NO. 9 , SEPTEMBER 1982

image of itsformer self once it is ontheunit circle. The Taylor series breaks down on the unit circle, but its counterpart, the Fourierseriesin 8, is still valid. The theoremof Fourier is true;science could blossom.

V.

THE

STURM-LIOUVILLE SPECTRAL THEORYOF DIFFERENTIAL EQUATIONS

Following the great innovation of Fourier in 1807, the remarkable properties of Fourier series were gradually developed throughoutthenineteenthcentury and intothetwentieth century.TheFourier series as introduced by Fourierisan expansion in terms of cosines and sines, which represent an orthogonalset of functions. However, there are many other sets of orthogonal functions, and so today any such expansion in terms of orthogonal functions is called a Fourier series. As we will see, some sets of orthogonal functions can be stochastic, and it turns out that the corresponding Fourier series play an important role in statistical spectralanalysis. First, however, let us look at the important generalizations made by the Frenchmathematicians Charles Sturm (18031855)[7] andJoseph Liouville (1809-1882) [8] in the decade of the 1830's. Let us now briefly look at the SturmLiouville theory of differentialequations.Thevibration of any infinitely long right circular cylinder of radius one can be described by a second-order differential equation. Let us considerasimple case, namely, the differentialequation (the one-dimensional Helmholtz equation) U"(X) + kZ u (x) = 0. The Helmholtz equation can be obtained by taking the temporal Fourier transform of the wave equation, which set off the search for the theory of Fourier. Here k is the wavenumber which is equal to w/c where w is the temporal frequency. In the Helmholtz equation, k2 is some undetermined parameter. The variable x is the central angle of the cylinder, and so x lies in the range - R and A. Because the points x = - n and x = A represent the same point on the cylinder, we must have the two boundary conditions u(-A)

= u(a)

u'(-A)

= u'(n).

The general solution of the differential equation is u(x)=A coskx+Bsinkx. The two boundary conditions restrict thechoice of the parameter kZ to thediscrete set of values

k2 = 0, l Z ,2', 3',

*

which are called the eigenvalues of the Helmholtz equation. Thecorrespondingsolutions of theequation, namely,the functions uk(X)=A coskx+Bsinkx are called the egenfinctions. These eigenfunctions are the c o sine and sine functions which Fourier had used to construct his Fourier series. These functions represent the characteristic vibrational modes of the cylinder, which can only vibrate in thisdiscreteset of wavenumbers k = 0, 1, 2, * . Thusthe Sturm-LiouvUe theory has given the answer to why the discrete set of cosine and sine functions were the correct ones for Fourier to use in a problem which stemmed from the wave equation.

ROBINSON: HISTORICAL PERSPECTIVE OF SPECTRUM ESTIMATION

Furthermore, the Sturm-Liouville theory gives us added insight to spectral analysis and, in fact, is the foundation of the spectraltheory of differentialequations. Most of the eigenvalue problems of mathematical physics are characterized by differential operators H of the form

889

lb

#(x) dx = 0

so thatthe

eigenfunctions forman orthonormal set.The orthonormal property can be written more concisely as

I"

$j(X) &(X) dx = 8jk

The physical problems we consider require that the function A ( x ) be positive within the given interval. Let us now form the operation VHU- uHv, which is d dx

VHU- UHU= -[A(x)(Lu'

- uv')]

.

We notice that the right-hand side is a total derivative, and so we have

T

(UHU- uHu)dx = [A(x)(uu' - UU')];.

Any differential operator H , which allows the transformation of such an integral(ason the left) into a pure boundary term (as on the right), is called self-adjoint. Thus the Sturm-LiouviUe operator H is self-adjoint. Often we may prescribe boundary conditions so that the right-hand side vanishes; such boundary conditions are called self-adjoint. We then have a self-adjoint problem,namely,aproblemcharacterized by aself-adjoint operator H and self-adjointboundaryconditions. We then have the identity in the functionsu ( x ) and u ( x ) given by

1

where 6jk is the Kronecker delta function. Let us now represent an arbitrary function f ( x ) in the form of the infinite expansion m

f ( x )=

As we have previously mentioned, such an expansion is called a Fourier series in honor of the pioneering work of Fourier. The Fourier coefficients Ck are obtained by multiplying both sides by $ j ( x ) and integrating. The result is cj =

f ( x )$ j ( x )d x .

H f b ) = P(x>.

In terms of linear system theory, p ( x ) is the input andf(x) is the output. Now substitute u = f and v = $k into Green's identity, We obtain

lb lb

($kHf - f H h ) dx = 0

H$ = A$. A solution satisfying the boundary conditions does not exist for all values of A, but only for a certain selected set hi called the eigenvalues. This setconsists of aninfinitenumber of eigenvalues hi whichare all realandwhichtend to infiity with i. We generallyarrange these eigenvalues in increasing order to obtain the infinite sequence (called the spectrum)

hl, AZ, X39

6

Undercertain general conditions, it canbe shownthatthe orthonormal set is complete, so that the above Fourier expansionactually converges tothefunction f ( x ) . Supposenow that f ( x ) is the solution to the inhomogeneousdifferential equation

(VHU- U H V ) d= x0

which is called Green's identity. The eigenvalue problemassociatedwith the self-adjoint operator H starts with thedifferential equation

ck$k(x). k=l

* '

which is

($kP

-f

b

[(h,@j$k-hk$k$j)dx=O

1

b

f@ #kP d xd =x *

together with thecorresponding eigenfunctions $1,$2,#3>"*.

d x = 0.

The above equation can be written as

*

We nowconsider two different eigenvalues hi, h k andtheir correspondingeigenfunctions $ j , (bk. Ifwe substitute u = $j and u = $k into Green's identity, we obtain

hk$k)

We recognize theleft-handside Fourier coefficient c k . Thus

as the expression forthe

b

ck =

We nowsubstitute series to obtain

$ k ( t ) P ( t ) dg.

thisexpression

for ck into theFourier

which gives the orthogonality condition

ib

& ( x ) $ k ( x ) d x = 0,

By normalization, we can require that

for j # k. Ifwe denote the expression in brackets by G ( x , t), then this equation is

PROCEEDINGS OF THE IEEE, VOL. 70, NO, 9 , SEPTEMBER 1982

ago

1

of these energies is embodied as a quantum of electromagnetic energy, thephoton. i f the energy diminishes, a photon is born. If the energy increases, a photon or aquantum of energy This is the integral form of the input-output relationship, and from some other field has been absorbed just before the jump. In quantum mechanics, an electron is represented by a probwe recognize ability density function. (The probability density function is found as the squaredmagnitude 1#12 of aprobability wave function 9.) An electron jump has a probability that depends upon the shapes of the probability density functions that coras the impulse response function or Green’s function (under respond to the electron prior to and after the jump. The probthe given boundaryconditions),concept a originated by ability of a jump is, generally speaking, greater for the stronger George Green (1793-1841) [9]. This equation exhibits the overlapping ordeeper interpenetration of these probability laws that divide electrontransitions impulse response function of a linear system in terms of its densityfunctions,The spectrum A I , X2, X3, * * . We can confirm that the Green’s in atoms into more probable and less probable ones are called selection rules. It is in this jumping of electrons that photons function is indeed the impulse response by setting the input are born. These photons enter a spectroscope, get sorted into p ( x ) equal to the impulse 6 ( x - x o ) . Then the output is types, and produce the spectral lines. Themore photonsthat an atom emits in asecond, the brighter the spectral lines. If thenumber of atoms remains ~6(1.-xo)C(X,f)d~=G(X,Xo) constant,thenthe brightness of the spectral lines depends upon the statistical frequency of electron jumps in the atoms. and so G(x, X O ) represents the output at x due to an impulse And this statistical frequency is d e t e m e d by the probability at X O . Since the differential equation representsaninputdistribution of jumps. It is in this way that an atomic specoutput system, we see that the Green’s function satisfies trum consisting of a number of lines of different brightnesses is generated. % ( x , XO) = 6 ( x - x o ) , One can make the observation that the spectrum estimation This equation shows that the Green’s function G(x, x o ) is the problem (the subject matter of this special issue of Proceedings inverse of the differential operator H. of the IEEE) is not central t o the spectral representation in We have thus reviewed the spectral theory of differential quantum mechanics. This situation was broughtforcibly t o operators, and now we can look at the most spectacular a p the writer’s attention several years ago at the U.S. Air Force plication of spectral estimation-quantum physics. Geophysics Library at Hanscom Field, MA, which is one of the best scientific libraries inthe world. Themany shelves VI. SCHRODINGERSPECTRAL THEORYOF THE ATOM devoted to “spectra” consisted of a mixture of both kinds of The Sturm-Liouville theory of the expansion of functions in books, but no book devoted to a discussion of the relationterms of orthogonal functions found numerous physical a p ship between the two areas of spectral theory. plications in the work of Lord Rayleigh (1842-1919). Such Spectral estimation in quantum mechanics is based on the expansions occur throughout the studyof the elastic vibrations edifice of spectroscopy, which is an instrumentational science. of solids and in the theoryof sound. In the history of physics, In1891,the physicist A.A. Michelson developed an intera decisive breakthroughoccurred when Erwin Schrodinger ferometer,a device producing the superposition of alight (1887-1961) [ 101 showed in 1926 that the vibrations occur- signal on top of itself with a prescribed delay. In one series ring within theatom can be understood by means of the of experiments, Michelson first bandpass filtered a light signal Sturm-Liouville theory. Let us now explain how the wave by passing it through a prism. He then used the interferometer mechanics of Schrodinger describes the spectral lines of the to measure the visibility of the superimposed signal as a funcatom. An equivalent matrix mechanics was formulated a year tion of delay. Theresulting curve was the autocovariance function of the original signal. Michelson then used a mechanbefore Schrodinger by Werner Heisenberg (1901-1976) [ 111. Before quantum theory, classical physics was at an impasse. ical harmonic analyzer to compute the Fourier transform of It could not explain the existence of atomic spectra. For ex- the visibility curve; that is, he estimated the power spectrum of ample, the bright yellow spectral line of sodium discovered by the signal, Michelson’s experiments were done t o examine the Bunsen means that the radiation of its atoms produces a dis- fine structure of spectral lines of light. Thus in those early crete frequency wo. If we assume that this line is emitted by days, the present day dichotomy of spectrum estimation had an electron, then the laws of classical physics state that such not yet materialized. an electron should emit not a discrete line at oo, but a whole Thetechnique of spectral analysis in physics developed spectrum of lines at all frequencies w, and with no discontinu- rapidly in the twentieth century, and the instruments became ities in the spectrum. That is, classical physics predicts that the morepowerful and sensitive. The spectroscopists came up spectrum of an electron should be continuous as is the spec- with the following question for theoreticians, namely, the trum of the sun. Yet Bunsen observed the discrete spectrum question of why spectral lines are somewhat fat, not i n f ~ of sodium as evidenced by the bright yellow line. (Aswe will tesimally thin. soon see, this line observed by Bunsen is actually a doublet, It was recognized that a photon corresponds t o a line at one which Bunsen was unable to resolve with the means available frequency w. The question was why the lines on a phototo him.) graphicplate of aspectroscopecome out somewhatbroadQuantum mechanics allows us to see the atom from a new ened, not slender. The answer was found in the wave p r o p point of view. Quantum mechanics says that atomic electrons erty of the electron and the Heisenberg uncertainty principle. jump from one energy state to another,and that the difference The initial energy of an electronin an atom refers to a b

f ( x )=

-

P ( t ) G(x,f )4

.

OF SPECTRUM ESTIMATION

ECTIVE HISTORICAL ROBINSON:

stationarystate,and so does the final energy.However, an electron jump is in violation of some steady state. As soon as thisoccurs, the Heisenberg principletakes over.Ifwe let A t designate the lifetime of an electron between jumps, then the uncertainty of photon energy is A E h / A t , where h is Planck’s constant. UsingPlanck’s formula for energy quanta, the uncertainty A E of the energy is proportional to the uncertainty AO of the frequency of the photon

-

h A E = -Ao. 2.rr

Thus the spectral lines have a width AO which is inversely proportional to the time of the “settled life” of the electron in the atom

AO--

2a At



In other words, the more “settled” or quiescent the life of the electron in the atom, the narrower the spectral lines. That is why at high temperatures and pressures, whenmany of the atomic electrons are unsettled, the spectral lines broaden out and become smeared. Thus an individualspectrallinehasa finitewidthassociatedwiththermalmotionand collision broadening. This is notonlyimportantin physics, but it relates very importantly to the topic of spectrum estimation in this special issue. Real “lines” have finitewidth. This means that real lines behave like narrow-band noises and not likeeither single frequencies or aconstant-amplitudelightly frequency-modulated signal. Let us now return the discussion of the yellow sodium line which Bunsenobserved. Thesodium D line is adoublet. Moreover thesodiumspectrumcontainsfourlinesinthe visible range, and two more inthe nearultraviolet,strong enough to be useful for analytic chemistry. The sodium spectrum contains 29 lines of astrophysicalinterestbetween the D lines and 4390 ii(still in the visible). We might say that Bunsen over a century ago was performing spectrumestimation. Hewas unable to resolve the two frequencies present in the doublet, even as today a person doing spectrumestimation mighthave the sameprobleminsome other situation. Also Bunsenmissed the many other lines in the sodium atom, even as today a person doing spectrum estimation might not find some features without the use of modem techniques. As spectrosopic instruments became better, these lines werediscovered.Now anotherquestion,however,has come up. Many spectral lines, which, it would seem, should correspond to a single frequency,actuallyturnedout to be the states of a number of very close-lying lines. The fact that thesodium D line is adoublet is a case inpoint.Thefine structures of spectral lines (doublets, etc.) were revealed only because of the great advances in spectral techniques. In turn, electron spin wasdiscovered in order to explainthese“fine qualities” in spectra.Let us briefly give the reason.When spectra are generated, the statesof two electrons with opposite spins can have slightly different energies. As a result, the spectral line is doubled; in place of one line we have twin lines with identical brightnesses. Such twins are usually born only when the outer electron shell hasoneelectron. If the number of electrons in this shell increases, we can have triplets and even larger families of the former spectralline, Let us now consider the quantum mechanical formulation of the harmonic oscillatorproblem.Interms of the nondimen-

891

SiOnal displacement x , the time-independent Schriidinger equation is H$ = A$

where His defied as the differential operator H

=d 2~ dx

and A is d e f i e d as

-2 x

2E A= -KwO

Here $ is the probability wave function, the constant E is the energy, h = 27cH is Planck’s constant, and the constant oois the natural frequency. The problem of finding the probability wave function $ is a Sturm-Liouville problem. The solution gives the eigenvalues as 1, 3, 5, 7, * , and so we write Ak

= (2k

+ l),

.

for k = 0, 1,2, *

Thus the eigenenergies are Ek = i l i u O h k = l f o o ( k +

i), for

k = 0,1,2,

* *

.

The corresponding eigenfunctions are $k = Ck hk(x) edX2fi, for

k = 0,1,2,

* * *

where Ck is a normalization constant, and h k ( x )is the Hermite polynomial of order k. The discrete set of eigenenergies E o , E l , E2, * * * represent the discrete lines observed in the spectrum. Thus quantum mechanics, through the use of SturmLiouville theory, is able to explain the existence of atomic spectra. However, certain mathematical difficulties remained; the history of their resolution is given in the next section, WI. THE VON NEUMANNSPECTRAL, REPRESENTATION THEOREM In finite-dimensional space, the following eigenvalue problem is posed. Given an Hermitian matrix H , frnd all column-

vector solutions$ of the characteristic equation H$ = A$

where A is a constant also to be determined. That is, given H , find $ and h Thesolutions d l , ,$, are called the eigensolutions (assumed to be normalized), and the corresponding real numbers A I , * , A, are called the eigenvaluesof the matrix H . The totality of the eigenvalues Al , A2, * * , A,, in order of increasingmagnitude, is called the spectrum. Now write the eigenequations H$k = Ak$k

(for k = 1, *

a

,

n)

in the form of the matrix equation HU = UA.

Because the eigensolutions are orthonormal,thematrix U (whichhas the eigensolutions as it columns) is unitary, i.e.,

UUT = I where I is the identity matrix.(Thesuperscript T indicates complex conjugate transpose.) The matrix A is diagonalmatrix,withthespectrum along its diagonal. Thus this eigenvalue problem can be described as the problem of finding a unitary matrix U that reduces H to arealdiagonalmatrix,

892

PROCEEDINGS IEEE,OF THE

Le.,

This equation represents thevon Neumann spectral representation o f the Hermitian matrix H Let us now analyze this equation. Ifwe strip the u and v from this equation, we are left with

U-'HU = A.

(Note: In case H i s real, then H is a symmetric matrix and U is an orthogonal matrix.) Althoughtheunitarymatrix U, whose columnsare the eigensolutions @i, is not uniquely determined by H, John von Neumann [ 121 in 1929 exploited the unitary nature of U to reformulatethe eigenvalue problem.The von Neumann r e formulation, which is called the spectral representation problem, yields the same results as the eigenvalue problem in finitedimensionalspace, but has the advantage that it can be extended to Hilbert space. We recall that the diagonalmatrix A is defined to be the matrix with the eigenvalues, ordered by increasing magnitude, along its main diagonal and zeros off the diagonal. Because of thisordering, thematrix A is uniquelydeterminedforany given Hermitian matrix H . Because some eigenvalues may be repeated, let us relabel them as A1, Xz, * , X, (with m < n), where each hi is now distinct. Consequently for a given H , we have the unique decomposition

.=I, 00

XX(A)dA

which, in matrix notation, is

We can write therow vector u as

which is

u = u P ~+ U P Z+ . * * + U p , . Finally, we can write W

HU =[=

A=AlQ1 +XzQz +*"+AmQm

where Qi is a diagonal matrix with 1's in those places on its main diagonal in which A, occursin A and 0's elsewhere. The sum of the Q f g i v e s the identity matrix I = Q l + Q z +*"+Q,.

(for j = 1,2,

AX(h) U d A

which is

HU=X~P~U+A~P~U+...+X,P,V. Let us now consider functions of the matrix H. First, we consider the square ofH . We have

We now define the matrixPi as

Pi = UQiU-l

VOL. 70, NO. 9 , SEPTEMBER 1982

* * *

,m ) ,

A projectionmatrix is defined as a Hermitian idempotent and idempotent matrix. Because Qi is Hermitian (Qi = (QiQi = Qi), it follows that Qi is aprojectionmatrix. Because Pi is Hermitian (Pi = Pi') and idempotent

QT)

PIPI = UQjU-' UQjU-' = UQjQjU-' =Pi it follows that Pi is a projection matrix. Since for i # j

PiPi = UQiU-lUQiU-'

0

it follows that Pi + Pi is a projection matrix and the space spanned by Pi is orthogonal to the space spanned by PI. Let us now define the function X(A) of the continuous variable

Aas

X(A)=P,6(A- A 1 ) + P z 6 ( h - A2)+...+Pm6(A- A,). This function is thecontinuousrepresentation of thesuite of projection matrices P I ,P z , * , P,. We now consider the quadratic form uHv where u is a row vector and u is a column vector. We have

U H U = U U A U - ~ U = U U ( +X ~A Q~, Q ~ + . . a +

X,Q,,,)U-~U

=u(h1P1 +AzPz +"'+X,P,>U = ~ ~ u p ~ u t ~ ~ u p ~ u + ~ ~ ~ t ~ , u p , v .

The essence of the von Neumann spectral representationlies in the fact that the components uPiu are numerically invariant for given u, H, and u. In this way, the nonuniqueness of the unitary matrix U appearing in the eigenvalue decomposition is bypassed. We see that we can write the quadratic form as the integral

Hz=(AIPl +.**+A,P,)2=A:P1

t..-tA~Prn

=I, 00

AzX(A)dX.

We see that squaring H results in squaring the A inside the integral.Ingeneral, if we formafunction of H , thenthe result is that the same function of A is taken within the integral sign; that is

I[=

f(H)

f ( A ) X(N d A .

Theabovespectralrepresentation was derived for finite dimensionalspace, that is, a spacein which the elements u and u are vectors and the Hermitian operator H i s a matrix. One of the major achievements of von Neumann was the d e velopment of the concept of the infinitely-dimensional space, which hecalled Hilbertspace in honor of the great mathe matician David Hilbert(1862-1943). We nowlet u and u representselementsinHilbertspace,andlet H representa Hermitianoperator. A Hilbertspace is characterizedbyan innerproduct(ordotproduct).The inner product of the elements u and u is denoted by (u,u). If we let H operate on theelement u, we obtaina new element Hv. Theinner product of theelements u and Hv is denotedby (u,Hu). This inner product is the counterpart of the quadratic form uHv infinite-dimensionalspace. Once we establishthisconnection,itturnsoutthatthe von Neumannspectralrepresentation has exactly the same form in Hilbert spaceas it does infinite-dimensionalspace.ThusinHilbertspace, we also have an operator X(X), which is the continuous representation of the suite of projection operators associated with the Hermitianoperator H . Whereas infinite-dimensionalspace, we made use of the quadratic form uX(A) u, we now make use of itscounterpart (u, X(A) v > in Hilbertspace.Thus the von

OF SPECTRUM ESTIMATION

HISTORICAL CTIVE ROBINSON:

Neumann spectral representation in Hilbert space is ( u , Hu)=

I,

A h , X(A) u> dh.

Let us now look at somehistory.In general, there is no quadratically integrable solution to the eigenvalue problem in Hilbert space. This circumstance, however, bothered no one working in physics. Wavelet solutions (i.e., quadratically integrable superpositions of eigenfunctions with eigenvalues in a small neighborhood) were used from the start, appearing in the works of de Broglie and Schrodinger from 1924. One of the authors cited in the Reference Section knew von Neumannpersonally,studied his work assiduously, and certainly regards him as one of the truly great founders of quantum theory. However, there was never a “crisis in physics” that was resolved by the von Neumann spectral representation theorem. Most peopledoing the practical calculations to be compared with experiment had never heard of the theorem, which was for them at such a high level of abstraction that it had no bearing on what they were doing. Throughout this essay we have traced the development of spectral theory, from the analytic functions of Brook Taylor, to the nondifferentiable functions of Jean Baptiste Joseph de Fourier,andnow to the more general operatorsof Hilbert space. At each stage, thesedevelopments were mathematical in nature,butthey laid thefoundationsfor subsequent advances in physics. Reasoning in mathematics and reasoning in physics often appear quite different. When amajor physical breakthrough occurs, such as inquantum mechanics inthe 192O’s, and a flood of exciting new physical results come out, certainly the work of mathematicians in establishing existence and uniqueness theorems might seem somewhat irrelevant. For a moment let us go back t o Sir Isaac Newton. It is often said thatthe unique greatness of Newton’s mind andwork consists in the combination of a supreme experimental with a supreme mathematical genius. It is also often said thatthe distinctive feature of Newtonian science consists precisely in the linking together of mathematics and experiment, that is, in the mathematical treatment of experimental or (as in astronomy, geophysics, or wherever experimentscannot be performed) observationaldata.Yet,althoughcorrect, this description does not seem to be quite complete; there is more in the work of Newton than mathematicsandexperiment. There is also a deep intuition and insight in his interpretation of nature. In today’s science, specialization has gone far. Physicists use mathematics; they formulate problems, devise methods of solution, andperformlongderivationsand calculations, but generally they are not interested in creating new mathematics. The discovery and purification of abstract concepts and principles is particularly in the realm of mathematics. John von Neumann (1903-1957) is a prime example of a mathematician doing physics. When he did physics, he thought and calculated likea physicist, only faster.. He understood all branches of physics, as well as chemistry and astronomy, but mainly he had a talentforintroducingonlythose mathematical ideas that were relevant to thephysics at hand. The introduction of abstract Hilbert space theory in quantum mechanics, chiefly by von Neumann, made possible the construction of a solid theory on t h e basis of the powerfulintuitive ideas of Dirac and other physicists. The physics of quantum theory cannot be mathematically formulatedin finite-dimensional space but requires Hilbert space. After the work of Heisenberg and SchrBdinger in 1925

893

and 1926, there was a crisis in abstract mathematics because the physics of quantum mechanics could not be adequately formulated in terms of the existing mathematical framework. This situation was rectified in 1929 by von Neumann [ 12J who laid the mathematical foundations of quantum mechanics in terms of Hilbert space. There is an apocryphal story that the young John von Neumann, who was barely past being a teenager, and had not yet earned his doctorate, was lecturing in GBttingen. Of course, most of the famous physicsts present regarded his work as too abstract, but the great mathematician Hilbert was in the audience. As thestory goes, the elderly Hilbert leaned over and whispered into Professor Courant’s ear: “What is this Hilbert space?” Another even more apoo ryphal story goes as follows. A group of physicists came t o von Neumann and described a problem in physics which they could not solve. After thinking for a while, von Neumann in his head came up with the numerical answer which agreed with the experimental result, which the physicists knew but had not told him. They were very impressed and they blurted out “Dr. von Neumann, the general solution involves solving an infinite set of nonlinearpartialdifferentialequations. Certainly you have found some mathematical shortcut!” von Neumann answered “NO, I solved the infinite set.” von Neumann [ 131 showed that from a mathematical point of view, it is the spectralrepresentation that is required in quantum mechanics rather than the solution of the eigenvalue problem as such. In this sense, spectral theory represents the key t o the understanding of the atom. In fact, von Neumann [ 13J has shown that the spectral representation enters so essentially into all quantum mechanical concepts thatits existence cannot be dispensed with. His establishment of the spectral representation of the Hermitian operator H is one of the great achievements in mathematics, and a milestone in the history of spectral theory. VIII. EINSTEIN-WIENER THEORYOF BROWNIAN MOTION A highly interesting kinetic phenomenonknown as Brownian movement was first reportedin1827 by the distinguished botanist, Robert Brown, who found that “extremelyminute particles of solid matter when suspended in pure water exhibit motions for which I am unable to account and which, from their irregularityand seeming independence, resemble in a remarkable degree, the less rapid motions of some of the simplest animalcules of infusions.” This type of irregular zigzag movement is typified by the dancing of dust particles in a beam of light. The cause of Brownian movement was long in doubt,but with the development of thekinetictheory of matter came the realization that the particles move because they are bombarded unequally on different sides by the r a p idly moving molecules of the fluid in which they are suspended. The Brownian movement never ceases. The detailed physical theory of Brownian movement was worked outin 1904 by M. von Smoluchowski [ 141, and in a more final form in1905 by AlbertEinstein [ 15 I . In1923,Norbert Wiener [ 161 developed the mathematical theory of Brownian movement, which today is the basis of the mathematical model of white noise incontinuous time. White noise is defined as a stationaryrandom process which has a constant spectral powerdensity. Theconcept of the white noise process, as given by the Einstein-Wiener theory of Brownian motion, is important in all theoretical studies of spectrum analysis. In practice, a signal is of frnite duration, and usually can be digitized on a grid fineenough for interpolation to be adequate. In this sense, the set of data representinga signal is really finite. Accordingly, we do not have t o go t o continuous

894

PROCEEDINGS IEEE,OF THE

time or to infinite time unless 1) we so wish or 2) we gain from it. In other words, as long as we stay finite, we do not need the Einstein-Weiner theory. With this caveat emptor, let us now discuss this theory. A white noise process in continuous time cannot be represented by the ordinary types of mathematical functions which one meets in calculus, Instead,white noise can only be represented by what mathematicians call a generalized f u n c tion. The most familiar example of generalized function is the Dirac delta function,which is often defined as

question becomes

6 ( t - t o )=

I

0,

for

00,

for t = to

6 ( t - t o ) d t = 1. The most important property of the delta function is its sifting property, that is, its ability to isolate or reproduce a particular value of anordinaryfunction f(t) according to the convolution formula

If one feels uncomfortablewith generalized functions,then one can often avoid them by using Lebesgue-Stieltjes integrals. For example, the Heaviside step function H ( r ) is an ordinary t < 0 and to one for f 2 0. Since function equal to zero for d H ( t ) = 6( t ) d t

the above convolution formula becomes theLebesgue-Stieltjes integral

I[

f(t

-

t o )d H ( t ) = f ( t o ) .

This Lebesgue-Stieltjes integral involves only ordinary functions. Let us now look at a white noise process which we denote by e ( t ) . It is a generalized randomfunction. Again let f(t) be an ordinary function, andconsider the convolution integral 00

f(t

- t o ) e ( t )d t .

Let d ( t ) be the integrated whitenoise process, so that we may write d 8 ( t )= e(t)dt.

The integrated white noise process d ( t ) is an ordinary random function, and the above convolution becomes the LegesgueStieltjes integral

1-

I[

f ( t )e ( t ) dr.

As is usual statistical practice, let E denote the mathematical expectation operator. Since this operator is linear, it may be interchangedwithintegral signs (providedcertainregularity conditions hold). The expectation of the above integral is 00

f(r) e ( t ) d t = l s f ( t ) E e ( t )d t .

t f to

00

VOL. 70, NO. 9, SEPTEMBER 1982

Because we wantwhite

noise to have zeromean,

S e ( t ) = 0, and so the above integral is zero. Let

we let

us next con-

sider the variance given by

E[Jm -00f ( t ) a ( t ) d t ] 2 = s [ I _ f ( t ) € ( t ) d -f 00 /Pf(~)e(~)d~] 00

=

f ( t ) f ( T ) E [ d T ) €(TI1 ddt 7 .

Now we come to the key point. We want white noise to be uncorrelated at two different time points, but at the same time we want the variance of white noise to produce an impulse so as to make the above integral have a nonzero value. Thus the key element is to define the covariance E [ e ( t )e(7)I as being equal to 6 ( t - 7). Then the above integral becomes

L

1, 00

f(t)

f(7)

6 ( t - T ) d t d7 =

f2

( t )d t .

We can therefore make the following definition. A generalized random function e ( t ) is white noise provided that E e ( t ) = 0 and Ee(t) E ( T ) = 6 ( t - T ) . For a long time such a randomprocess was regarded as improper. As we know, the delta function can be approximated arbitrarily close byordinaryfun@ tions. Likewise, the white noise process e ( t ) can be approximated arbitrarily close by ordinary random processes. Because one never uses the white noise process in isolation but only in integrals, the white noise process can be avoided by the use of the Lebesgue-Stieltjes integral, just as the Dirac delta function can be so avoided. However, as we have said, we will not follow theLebesgue-Stieltjes approach here. Let us now consider white noise e ( n ) for discrete (integer) time n. White noise in discrete timeis not a generalized random process, for e ( n ) is merely a sequence of zeremean, constantvariance, uncorrelated random variables. However, the Fourier transform of discrete white noise is a generalized random process, which we denote by E ( o ) . We have

2

~ ( o ) = e(n)e-’wn

(for - a Q o Q n).

n =-m

00

f(t

-

t o )&( t ) .

Wiener formulated everything. in terms of Lebesgue-Stieltjes integralswithordinaryfunctions.However, we are going to take a strictly engineering approach and formulate things in terms of ordinary integrals, but with generalized functions. Without loss of generality in the discussion which follows, we can forconveniencelet to = 0, so thatthe integral k

We can easily verify that E(w)has zero mean. The covariance of E(w)is n

n

Because E e ( n ) e ( k ) =

k

k

(i.e., theKronecker delta function,

OF SPECTRUM ESTIMATION

CTIVE STORICAL ROBINSON:

which is one when n = k and zero otherwise), we have e-in(p-u) = 27r6 (c( - a).

B[E*(u) E ( C ( ) I= n

That is, the covariance is a Dirac deltafunction.Thus we come to an important result: The Fourier transform of a white noise process in (infinitely extended) discrete time n is a white noise process in the continuous variable o.(It is easy to show that the correspondingresultholds for the caseof awhite noise process in continuous time.) In other words, the Fourier transform of a very rough (white) process in time is a very rough(white) process infrequency.TheFouriertransform preserves(saves) information and does not smooth (destroy) information. Today this result is second-nature to an engineer, but when Wiener obtained this result in 1923 it was startling. Wiener unlocked the spectral theory of the most random of processes (white noise), and now thestage was set for applying this result to the more smooth processes which are generated bymanyphysicalphenomena. Wiener madethisapplication in 1930 under the name of generalized harmonic analysis, but before wegive its history we will break our train of thought and look at the innovative work of Yule in 1927. Yule’s work at the time seemedmodest. While most mathematicians and physicists weredeveloping generalmethods to dealwith the infinite and the infinitesimal in spectrum analysis, Yulewas developing a simple model with a finite number of parameters to handle spectrum (i.e., a finite parameter model) in order analysis in those cases where this model was appropriate. This model of Yule is known as the autoregressive (AR) process.

IX. YULE AUTOREGRESSIVE SPECTRUM ESTIMATION METHOD At the turn of the twentieth century,

Sir Arthur Schuster

[ 171 introduced a numerical method of spectrum analysis for

empirical time series. Let x ( n ) represent the valueof a time series at discrete (integer) time n. Given N observations of the time series from n = 1 to n = N , then Schuster’s method consisted of computing the periodogrum P(o) defined as 1 ~(o)= Ix(l)e-iw +x(2)e-iw2 N

+

*

*

a

+x(N)e-juN12.

For example, suppose that the time series consists of a sinusoid of frequency oo with superposed errors; then, the periodogram would show a peak at o = oo.Thus by computing the periodogram, the peaks wouldshow the location of the frequencies of the underlyingsinusoidalmotion.Until the work ofYule (1871-1951)in1927 [18], the Schuster periodogram approach was the only numerical method of empirical spectrum analysis. However, manyempirical time series observed in nature yielded a periodogram that was very erratic and did notexhibitanydominant peaks. This led Yule to devise his autoregressive method of spectrum analysis. In those days, empirical spectrum analysis was called the investigation of periodicities in disturbed series. His main application was thedetermination of thespectrum of Wolfer’s sunspot time series. G. Udny Yule in1927introducedtheconcept of afinite parameter model for a stationary random process in his fundamental paper on the investigation of the periodicities in time series with special reference to Wolfer’s sunspotnumbers. If we consider a curve representing a sinusoidal function of time and superpose on the ordinate small random errors, then the only effect is to make the graph somewhat irregular, leaving

895

the suggestion of periodicity s t i l l quite clear to the eye. If the errors are increased in magnitude, the graph becomes more irregular, the suggestionof periodicitymoreobscure,and we have onlysufficiently to increase theerrors to mask completely any appearance of periodicity. But, however large the errors, Schuster’s periodogram analysis is applicable to such a time series, and given asufficient number of observations should yield a close approximation to the period and amplitude of the underlying sinusoidal wave. Yule reasoned inthe following way. Considera case in whichperiodogram analysis is applied to atime series generated by some physical phenomenon in the expectation of elicitingoneormore true periodicities.Then it seemed to Yule that in such a case there would be a tendency to start withthe initial hypothesisthatthetrue periodicitiesare maskedsolelyby additiverandom noise. As we well know, additive random noise does not in any way disturb the steady course of the underlying sinusoidal function or functions. It is true that the periodogram itself will indicate the truth or otherwise of thehypothesis made, but Yulesaw no reason for assuming it to be the hypothesis most likelyu priori. At thispoint, Yule introducedtheconcept of anihputoutput feed-back model, The amplitude of a simple harmonic pendulumwithdamping(indiscreteapproximation) can be represented by the homogeneous difference equation b(n)+u1b(n- l)+uzb(n- 2)=0.

Here b ( n ) is the amplitude at discrete (integer) time n. Errors of observation wouldcause superposed fluctuations on b ( n ) , but Yuleobserved that, byimprovement of apparatusand automatic methods of recording, errors of observation can be practically eliminated: An initial impulse or disturbance would set the pendulum in motion, and the solution of the difference equation wouldgive the impulse response. The initial conditions are b (n) = 0 for n < 0 and b (0) = 1. The characteristic equation of the difference equation is E’ + u l E + u 2 = 0. From physical considerations, we knowthatthe impulse response is a damped oscillation, so the roots El and E2 of the characteristic equation must be complex with magnitude less than one. This condition is equivalent to the condition that uz < 1 and 4u2 - a: > 0. The solution of the difference equation thus comes outto be b (n) = eAn

sin(n+ l ) o o sin a0

where

A = 0.5 In ~2 wo = tan-’ [-a;’

d W .

The damped oscillation b (n) is the impulse response function. Thefrequency 00 is thefundamentalfrequency of the impulse response function. Aswe mentioned above, Yule ruled out superposed errors. Now, however, he allows a driving function (or input)of white noise, which he describes in the following way. The apparatus is left to itself, and unfortunately boys get into the room and start pelting the pendulum with peas, sometimes from one side and sometimes from the other. He states that the motion is now affected, not by superposed fluctuations, but by driving disturbances. As aresult, the graph willbe of anentirely different kind than a graph in the case of a sinusoid withsuper-

896

PROCEEDINGS IEEE,OF THE

posed errors, The pendulumand pea graph will remain surprisingly smooth,but amplitudeand phase will vary continuously, as governed by the inhomogeneous difference equation

was motivated by the work of researchers in optics, especially that of Rayleigh and Schuster. However, Wiener demonstrated that the domain of generalized harmonic analysis was much broader than optics. Among Wiener’s results were the writing down of the precise definitions of and the relationship between the autocovariance function and the power spectrum. Thetheoremthat these two functions make up a Fourier transform pair is today known as the Wiener-Khintchine theorem [221. Mention should be made of the basic fact that the existence of the spectrum follows from the properties of positive definite functions. Bochner’s theorem on the spectral representation of positive definite functions provides adirectmathematical unification of spectral theories in Hilbert space and in stationary time series. The writer several times in the 1950’s discussed with Pre fessor Wiener why his 1930 paper was not more accepted and used by the mathematical profession at the time. As with all things, Wiener looked at history quite objectively and with his characteristic concern for and love of people. In retrospect, it seems it was not until the publication of Wiener’s book Cybernetics (231 in 1948 and also the nonclassified publication of his book Time Series [24] in 1949 that the general scientific community was able t o grasp the overall plan and implications of Wiener’s contributions. The following passage from Wiener’s 1933 book The Fourier Integral [251 indicates the philosophy ofWiener’s thinking and his great personal appeal:

x(n)talx(n- 1)+(12x(n- 2)=€(n) where ~ ( nis) the white noise input. The solution of this difference equation is 00

b ( k ) ~ ( -nk)

x(n) = k=0

where b (k)is the impulse response function given above. Yule thus created a model with a finite number of parameters, namely, the coefficients a l and a2 of the difference equation. Given an empirical time series x(n),he uses the method of regression analysis to fiid these two coefficients. Because he regresses x(n) on its own past instead of on other variables, it is a self-regression or autoregression. The least squares normal equations involve the empirical autocorrelation coefficients of the time series, and today these equations are called the Yule-Walker equations. Yule carried out his autoregressive analysis on Wolfer’s sunspot numbers, which are a sequence of yearly observations of sunspot observations. He used the numbers over the period 1749-1 924 and obtained the autoregressive equation (with the mean value removed)

x ( n ) - 1.34254x(n - 1 ) + 0 . 6 5 5 0 4 x ( n - 2 ) = ~ ( n ) and thus

X = 0.5 In 0.65504 = -0.21 154 wo = 33.963’ per year. Hence, the dominant period is 36O0/wO= 10.60 years. Yule states that his autoregressive method represents an alternative method of estimating the spectrum, as opposed t o the Schuster periodogram. In fact, his autoregressive model gives him an estimate not only of the power spectrum but of an amplitudeand-phase spectrum 1

00

which, for the sunspotnumbers, is

B ( w )=

1 1 - 1,34254 e-iw

+ 0.65504 e-2iw

The magnitude IB(o)l and the phase d ( o ) are given by the equation

The power spectrum is the square of the magnitude spectrum, that is, IB(o)12. The peak is close tothefundamental frequency wo = 33.963’ per year. Exceptinexploration g e e physics [ 19 I , [201, where Yule’s amplitude-and-phase spee trum B(o) is physically the spectrum of the minimum-delay seismic wavelet, Yule’s spectralestimation method received scant attention until the1960’s.

X. WIENER’S GENERALIZEDHARMONIC

ANALYSIS

Norbert Wiener [ 2 11 published in 1930 his classic paper, “Generalized Harmonic Analysis,” whichhe personally considered his finest work. In his introduction, he states that he

VOL. 70, NO. 9 , SEPTEMBER 1982

“Physically speaking, this is the total energy of that portion of the oscillation lyingwithin the interval in question. As this determines the energy-distribution of the spectrum, we may briefly call it the “spectrum.” The author sees no compelling reason to avoid a physical terminology in pure mathematics when a mathematical concept corresponds closely t o a concept already familiar in physics. When a new term is t o be invented to describe anidea new to the pure mathematician, it is by all means better to avoid needless duplication, and t o choose the designation already current,The “spectrum” of this book merely amounts to rendering precise the notion familiar to the physicist, and may as well be known by the same name.” Let us now define a stationary random process. We could either use discrete or continuous time, but for convenience let us use discrete (integer) time n. Let the process be denoted by x(n), which, we will assume, haszero mean. The process is called (second-order) stationary, provided thatitsautoce variance function

@(k) =lTx*(n)x(n

+ k)

depends only upon the time-shift k. Here, as always, the superscript asterisk indicates complex-conjugate. The normalized autocovariance function is called the autocorrelation function. However, Wiener generally used the term autocorrelation for @(k), whether it was normalized ornot, Nevertheless, it is confusing t o keep using the term “autocorrelation” with two different meanings. It is better t o use the term “autocovariance” wherever it is appropriate. A white noise process is stationary. In the case of continuous time, its autocorrelation is 6 ( t ) (the Dirac delta), whereas in the case of discrete time, its autocorrelation is 6 k (the Kronecker delta). As we have seen, the Fourier transform E ( w ) of white noise in time is white in frequency; thatis, the autoconrelation in frequency is the Dirac delta function

OF SPECTRUM ESTIMATION

PECTIVE HISTORICAL ROBINSON:

Theproblemsconfrontingempiricalworkers inspectral analysis in the fmt part of the twentieth century werecentered around the Schuster periodogram. Schuster introduced this concept at the turn of the century, and until Yule’s work in 1927, it was the only method available to carry out empirical spectral analysis. Supposethat weobserve astationary random process for a very long time, so that we obtain a time series x ( n ) for n = 1, 2, * * ,N , where N i s verylarge. Schuster then computed theperiodogram

891

reservations of the mathematicians. It is true that the physicists spoke of 6 ( t ) as an ordinary function, which it cannot be in any precisesense, and that they treated it as an ordinary function by integrating it and even differentiating it. But the physicists were justified because they only used 6 ( t ) inside integralswithsufficiently-differentiablefunctions f ( t ) . For example, the derivative of 6 ( t ) always appeared inside an integral, and the integral was integrated by parts as follows:

I

00

Jm -00

6 ’ ( t )f ( t )d t = -

6 ( t )f ’ ( t )d t = -f’(O).

-00

The physicistsneverused the delta function except to map functions to real numbers. In this sense, they employed the where X ( w )is the discrete Fourier transform machinery but not the words of distributiontheory, which was devised expressly in order t o give delta functions a sound basis. It was the French mathematician L. Schwartz who after n=l World War I1 created a systematic theory of generalized functions and explained it in his well-known monograph Thiorie (Today we can compute X(w) very rapidly by means of the des Distn’butions in 1950 and 195 1. From then on the theory Cooley-Tukey fast Fourier transform, but then it was a forofgeneralized functions wasdeveloped intensivelybymany midable task.) In the case when the stationary process is made mathematicians. This precipitate development of distribution up of sinusoidal waves withsuperimposedwhite noise, the theory received its main stimulusfrom therequirements of periodogram is effective in picking out the discrete frequencies mathematics and theoretical physics, in particular the theory of the sinusoids. Butapurelynondeterministicstationary of differentialequationsand quantum physics,Generalized process is generated by the convolution formula (input-output functions possess a number of remarkable properties that exrelation) tendthecapabilities of classical mathematical analysis, For example,any generalized function turns out to be infiitely differentiable(in the generalized meaning),convergent series k=O of generalized functionsmaybedifferentiatedtermwisean infinite number of times, the Fourier transform of a general(Here we interpret b ( k ) as the impulse response function of a filter, the white noise process E(n) as the input to the filter, izedfunction always exists,and so on. For thisreason, the usesofgeneralized functiontechniquessubstantiallyexpand andthestationary process x ( n ) as the output). For such a the range of problems that can betackledandleads to a p process, the Schuster periodogram P(o) is extremely rough, preciable simplifications that make otherwise difficult operaand oftencannotreadily be interpreted.Empiricalspectral tions automatic, analysiswas at an impasse, Most of the time series observed As scienceadvances, its theoretical statements seem to rein nature could not be analyzed by the methods available in quire anever higher level of mathematics. When he gave his 1930. theoretical prediction of the existence of antiparticles in 1931 Now comes Wiener in1930with generalized harmonic (hoc. Roy. SOC. London, Ser. A, vol. 133, pp.60-72)Dirac analysis, In brief, Wiener in1930knewhow to take the wrote, “It seems likely that this process of increasing abstracFourier transform of a stationary random process, a milestone tion will continue in the future and that advance in physics is in the use of Fourier methods. Wiener’s generalized harmonic to be associated with a continual modification and generalizaanalysis makes use of a generalized random function, namely, tion of the axioms at the base of mathematics rather than with the Einstein-Wiener (whitenoise) process. Inorder toput a logical development of any one mathematical scheme on a Wiener’s work into context, we will now give a small digression fixed foundation.” Subsequent developmentsintheoretical onthe most widely known generalized function:the Dirac physicshave corroboratedthis view. Inthis essay, we have delta function. seen that since the time of Newton, the search for and the The impuhe(Diracdelta)function hadbeen knownfor manyyearsprior to its use by Dirac [26]in1928.It was study of mathematicalmodels of physical phenomena have known by Heaviside [27]. However, it took the stature of a made it necessary to resort to a wide range of mathematical development ofvarious great physicist, Paul Dirac, to decree in 1928 the useof the toolsand have thusstimulatedthe areas of mathematics. Now let us return to Norbert Wiener in impulse function in physics. In those early days, people used 1930. to talkabout 6 ( t ) as a function of t in theordinary sense Physicists areconcernedwithunlocking the mysteries of whose integral with f ( t )produces f(0);that is, nature, and the impulse (Dirac delta) function eases their task. The impulse function is the simplest of the generalized functions. Onecanimagine the plight of Wiener in the matha maticalcommunity when he introduced generalized random functionsintothemathematicalliterature as early as 1923, This idea used to cause great distress to mathematicians, some and especially in his 1930 paper. Let us now give the gist ofWiener’s generalizedharmonic of whom even declared that Dirac was wrong despite the fact that he kept getting consistent and useful results, The physi- analysis. As we know,convolution in thetimedomain corcists rejected these extreme criticisms and followed their inttti- responds to multiplication in the frequency domain. Thus in tion. We can now see why the physicists succeeded despite the terms of Fourier transforms, the above input-output convolu00

J-00

PROCEEDINGS IEEE,OF THE

898

VOL. IO, NO. 9 , SEPTEMBER 1982

a calendar, to the work of the great mathematicians who formulated the wave equation in the eighteenth century, it took X(0)= B ( 0 ) E(o). thousands of years.Then the work of Bernoulli,Euler, and Fourier came, and the result was a spectral theory in terms In this equation, E(w) is the Fouriertransform of white of sinusoidal functions, in place at the beginning of the ninenoise, so that E(o) is ageneralizedfunction that is white teenth century. The theory was extended to the case of arbi(i.e., very rough) in frequency.Thefilter’stransferfunctraryorthogonalfunctionsbySturmand Liouville, andthis tion E(o)is a smooth well-behaved (ordinary) function. The led to the greatestempiricalsuccess of spectral analysis yet product X ( w ) is also very rough. inverse Fouriertransformof X(&)). obtained: the physical results of spectral estimation that unLet us nowtakethe locked the secret of the atom. Credit for this result belongs It is to HeisenbergandSchriidingerin1925and1926.Thenin 1929, the work of von Neumann put the spectral theory of 1 = x ( n )= e j w n X ( w ) do the atom on a f i m mathematicalfoundationin his spectral representation theorem. The spectral work of von Neumann represents the cumulation of this line of research in quantum which is physics. Meanwhile, Rayleigh andSchusterat the beginning of the twentieth century were applying the original sinusoidal 1 ” x ( n )= 2.rr e j W ” B ( w )E(w)dw. methods of Fourier to the analysis of datain the realm of classical physics. However, the periodogram approach of This formula represents Wiener’s generalized harmonic analysis Schuster did not work well for purely nondeterministicstationof x ( n ) ; that is, it is the spectral representation of the station- ary random processes, and this led Yule in 1927 to develop a a subclass known as autoregressive proary random process x ( n ) . It involves the smooth filter trans- spectraltheoryfor cesses. Meanwhile, Wiener haddevelopedthemathematical fer function B(w) and the very rough(whiteinfrequency) theory of Brownian movementin1923,andin1930heinprocess E(w). We thus see thatthe spectralrepresentation that is, the spectral requires Wiener’s generalized randomfunction E(o),which troducedgeneralizedharmonicanalysis, representation of a stationary random process. Thus in 1930, came out of his studies of Brownian movement. we have two spectral theories, one represented by the spectral Wiener’s generalized harmonic analysis (i.e., spectral reprerepresentation theorem of von Neumann and the other by the sentation) explains why the periodogram of Schuster did not workforconvolutional processes. Because the periodogram spectralrepresentation theorem of Wiener, It is the purpose of thissection,withthebenefit of hindsight, of course, to (asthe number of observations becomes large)is indicate the relationshipbetween von Neumannand Wiener 1 spectral theories. P(w)= - IX(0)l2 The common ground is the Hilbert space. As we have seen, N the von Neumannresult is thespectralrepresentation of a it follows that the periodogram has the intrinsic roughness of Hermitian operator H in Hilbert space. The Schrodinger equathe X ( w ) process. (It was not until the work of J. Tukey [34] tion is written in terms of a Hermitian operator, and this equain 1949 that a means was found to overcomethisproblem; tion governs the spectrum of atoms and molecules. Now let Tukey’s breakthrough was of epoch proportions.) us, however, leave this Hilbert space and look at another one. Wiener in his 1930 paper gave the following method, which The other Hilbert space is one defined by the probability meawas standarduntilthework of Tukeyin1949. Wiener’s sure that governs the stationary random process in question. method was intended for very longtime series. Itconsisted As we know, a Hilbert space is specified by an inner (or dot) of computing the autocovariance function as the time average product. The elements of the Hilbert space are random variables, and the inner product is defied as the expected value given by tion integral becomes

~1,

for - p Q k G p , where p is less than the data length N , and then computing the powerspectrum @(a) as the Fourier transform

k=-p

This Fouriertransformrelationshipbetweenautocovariance and power spectrum, aswe have observed, is now called the Wiener-Khinchin theorem.

Whereas von Neumann’s work in quantum physics in 1929 received instant acclaim and well-deserved recognitionby physicistsandmathematicians, Wiener’s workin 1930 lay dormant. However, now withthebenefit of hindsight, it is worthwhile for us to reconcile these two approaches to spectral estimation. This we will do in the nextsection.

XI.

RECONCILIATION OF THE

Two SPECTRALTHEORIES

We have come a long way in the history of spectral estimation to this point. From the work of the ancients in deriving

(x,y)=Ex*y.

(The superscript asterisk indicates the complex conjugate.) In this Hilbert space, a stationary process is defined as follows. We use discrete (integer) time n , although a similar develop ment can be made in the case of continuous time. A sequence of random variables x ( n ) in Hilbert space is called a stationary random process if its autocorrelation

@(k) = ( x ( n ) , x ( n+ k)) depends only upon the time-shift k and not on absolute time n. This definition implies that the elements x ( n ) of the process are generated recursivelyby a unitary operator; thatis, U x ( n ) = x ( n + 1) so that

x ( n + k) = U k x ( n ) .

Because a unitary operator represents a rotation, we see that a stationary random process traces out a spiral in Hilbert space, the so-called Wiener spiral. We now come to the connection

OF SPECTRUM ESTIMATION

ECTIVE HISTORICAL ROBINSON:

that we are seeking, namely, the fact that the Cayley-Mobius transformation[28] of aHermitian operator is a unitary operator. Thus there is a one-bone correspondence between Hermitian operators and unitaryoperatorsin Hilbert space. The von Neumann spectral representation is for a Hermitian operator. Ifwe takeits Cayley-Mobius transformation, we obtain the corresponding spectral representation for the unitary operator U. This spectral representation has the form

u=-2n1

1, =

e j W Q ( u )d o

where Q(o) represents a family of projection operators as a function of circular frequency o. Thus the process has the representation

1.

1 = x ( n )= U”x(0)= 2n eiw”Q(u)

x(0)

do.

We now make the identification

U u )x (0)= X ( u ) and we obtain x(n)=

,I, 1

=

e’W”X(o) do.

This equation is Wiener’s generalized harmonic analysis of the process. Thus wehave the connection we sought;thetwo spectralrepresentations are related by the Cayley-MBbius transformation.

XII. WIENER-LEVINSONPREDICTIONTHEORY Early in 1940, Wiener became involved in defense work at MIT and, in particular, he became interested in the design of fire-control apparatus for anti-aircraft guns. The problem was to build into the control system of the gun some mechanical device t o aim the gun automatically. The problem, in effect, was made up of two parts: a mathematical part, which consisted of predicting the future position of an airplane from its observed past positions, and an engineering part, which consisted of realizing the mathematical solution in the form of an actual physical device. Wiener recognized that it was not possible t o develop a perfect universal predictor, and so he formulatedthe mathematicalproblemonastatistical basis.He defined the optimum predictor as the one that minimizes the mean-square predictionerror.Theminimization led tothe Wiener-Hopf integral equation, which represented the completion of the mathematical part of the problem. As to the engineering part, Wiener immediately recognized thatit was possible to devise ahardware apparatusthat represents the solution to the Wiener-Hopf equation. As Wiener [29] states in his autobiography (p. 245): “It was not hard to devise a p paratus to realize in the metalwhat we had figured out on paper. All that we had to do was make a quite simple assembly of electric inductances, voltage resistances, and capacitors, acting on a small electric motor of the sort which you can buy from any instrument company.” Wiener’s mathematical results [ 241 were published in 1942 as a classified report to Section D2of the National Defense Research Committee. This report is Wiener’s famous TimeSeries book, which we mentioned previously. Its full title is Extrapolation,Interpolation,andSmoothing of StationaryTimeSerieswith EngineeringApplications, and it was republished as an unrestricted document in 1949 by MIT Press, Cambridge.

899

Although Wiener’s “GeneralHarmonic Analysis” did not have immediateinfluence, his TimeSeries book, which was written in a more understandable style, did among those who had access to the book in 1942 and the general public in 1949. As we will now see, a great deal of credit for the dissemination ofWiener’s ideas belongs t o his former student and his colleague, Professor N. Levinson. Levinson’s initial contact with Wiener was in Wiener’s course in 1933-34 on Fourier Series and Integrals, which is described in Levinson’s own words as follows: “I became acquainted with Wiener in September 1933 while an undergraduate student of electrical engineering, when I enrolled in his graduate course. It was at that time really a seminar course. At that level he was amoststimulating teacher. He would actually carry out his research atthe blackboard. As soon as I displayed a slight comprehension of what he was doing, hehanded me the manuscript of Paley-Wiener for revision. I found a gap in a proof and proved a lemma t o set it right. Wiener thereupon set down at his typewriter, typed mylemma, affixed mynameand sentit off to ajournal.A prominent professor does not often act as a secretary for a young student.”

still

N. Levinson, adynamicand brilliant mathematicianand a warm and kind person, made important and permanent contributions t o engineering and applied science. The Levinson theorem in quantum mechanics illustrates his ability t o grasp the relationship between physical concepts and mathematical structure, Few have thisinsight,andnowhere is itbetter demonstrated than in the two expository papers writtenin 1942 by Levinson soon afterthe restrictedpublication of Wiener’s Time Series book. These two papers were published in 1947 in the Journal of Mathematical Physics, and thus they represented the first public disclosure of Wiener’s time series results. Later these two papers also appeared as Appendices C ofWiener’s book in and B inthe unrestrictedpublication 1949 by MIT Press 1241. An appreciation of Levinson’s contribution can be gained in historical perspective. The 1942 edition of Wiener’s book was bound in a yellow paper cover, and because of its difficult mathematics, it came to be known among engineers as the “yellow peril” (a term familiar to mathematicians as applying to afamous series of advanced texts). However clear in a conceptual way the building of an actual device was t o Wiener, there were few engineers at that time who were able to grasp Wiener‘s mathematical solution, much less to realize it in the form of a physical device. At this point, Levinson stepped in and wrote “A heuristic exposition of Wiener’s mathematical theory of predictionand filtering,” [301, one of his two classic applied papers on explaining Wiener’s work. Levinson describes his paper as an expositoryaccount ofWiener’s theory. Levinson’s earlier training was in electrical engineering, so he understood hardware design methods. In the paper, Levinson shows in an elementary way why the Wiener-Hopf equationcannot be solved by use of theFourier transform theorem,Then, in a natural way, he introduces the spectral factorization and obtains the explicit solution for the prediction operator and, more generally, for the filter operator. This masterpiece of expositionopened up these methods to the engineering profession. Levinson’s other classic applied paper is entitled, “The Wiener RMS (root mean square)errorcriterion in filter design and prediction.” 131I As before, let us try to put thispaper in historical perspective. In 1942, the Army Air Force Weather Division negotiated a contract with MIT t o perform statistical

900

PROCEEDINGS IEEE,OF THE

VOL. 70, NO. 9 , SEPTEMBER 1982

analyses of meteorological and climatological data, particularly siumonAutocorrelation Analysis Applied to Physical P r o b in relationship to weather forecasting, and to conduct research lems” held at Woods Hole, MA in June 1949, sponsored by the into the applicationof statistical techniques to long-range fore- Office of Naval Research, The high point of this meeting was casting [ 321. Professor G.P. Wadsworth of the MIT Mathe- thepaper by Tukey [34], Before Tukey’s work,the power spectracomputedfromempiricalautocorrelationfunctions maticsDepartment was in charge of thismeteorological project. The basic idea was to collect and sort large amounts were too erratic. to be of any use in formulating physical hyof numerical meteorological data and to forecast by analogy, potheses. Not only did Tukey show correctly how to compute much like the forecasts made on television today, in which the power spectra from empirical data, buthe also laid the statistiweatherman looks at the data appearing on the satellite picture cal framework for the analysis of short-time series, as opposed to the very long ones envisaged by Wiener and Levinson. of the earth. Wadsworth’s methodhadmerit,butthedata Wadsworth was also director of the MIT section of the U.S. required was just not available in the 1940’s.Wiener’s Time Series book was completedataboutthesametime as this Naval Operations Evaluation Group, a project started in World MIT Meteorological Project was starting up. Since the weather WarI1 whichinitiated the useof operations research in the UnitedStates. By 1950, Wadsworth was applying operations data available occurredatdiscreteintervals of time,the research methods t o industry and had established himself as continuous-time methods of Wiener were not directly applicone of the highest paid consultantsintheUnitedStates. able. As aresult,Wadsworth asked Levinson to writeupa There were so many industrial people waiting to see him in discreteform of Wiener’s theory, Theresult wasLevinson’s “Wiener RMS” paper with the Levinson recursion. However, his outer office at MIT that one had to make an appointment with his secretary many weeks in advance to see him in his use was never made of Wiener-Levinson prediction theory by inner office for just 5 or 10 minutes at the most. The writer the MIT Meteorological Project,and Levinson’s papersat began as one ofWadsworth‘s researchassistantsin the MIT dormant. was asInorder to understandwhy Levinson’s methods were not Mathematics Department in September 1950, and he signed to work in seismology by Professor Wadsworth. Mobil used in the 1940’s, one must look at the computing facilities Oil made available eight seismic records, and the writer imavailable at the time. The actual realization of these methods would have to be carried out by people using hand calculators. mediately got a very lonely feeling, especially at MIT at night A hand calculator could add, subtract, multiply, and divide, digitizing the Mobil seismic records with a ruler and pencil, Except for Wadsworth, in 1950 nobody at MIT or in the oil but had no memory except an accumulator. Thus the result of each separate calculation had to be transferred to paper by industry thought that theanalysis of digital seismic data would hand, a drawn-out, time-consuming process. In contrast to the ever be feasible. Fortunately Tukey took an interest in the seismic project hardware devices working in real time, as envisaged by Wiener, As Wiener [24, andconveyed his research ideas by mail. Thefirstempirical thehandcalculator was apoorsubstitute. p. 1021 states: “Much less important, though of real interest, results were the computation of the Tukey spectra forvarious of 195 1. From is the problem of the numerical filter for statistical work, as sections of the Mobil recordsinthespring thesespectralresults,a seismic analysis based on prediction contrasted with the filter as a physically active piece of engierror was formulated in the summer of 195 1, This analysis neering apparatus.” prediction theory in digital form. Prior After Levinson wrote his two expository papers, which were made useofWiener realized in completed in 1942, neither he nor Wiener took up researchin to this work, Wiener’s procedures had only been analogform.Inhandplottingthe f i t numericalresults of the computational (software) aspect of Wiener’s theory. Wienerwas moreinterestedinitsrealization by machines what today is called linear predictive coding (LPC), the writer (hardware), and his researchinterest was already shifting to was so amazed that he could not believe his eyes, and he was sure that he would never see such good results again. But the biological and medical problems. In fact, it was theunion of these two research interests that led to his discovery and second trace, and the third trace, and so on, were computed saw. Thedigital processing method formulation of the science of cybernetics, which he describes andconfumedwhathe as the problem of control and communication in machines and called deconvolutionworked! As soon as possible, hemade animals. Meanwhile Levinson had decided as early as 1940 to an appointmentto see Wadsworth,which thesecretary set shift his field from the Fourier methods of Wiener to the field three weeks from then, in September 195 1. Theresult was that the digitally-processed seismic traces [351 were sent out of nonlineardifferentialequations. He talkedaboutthis decision with his friends in 1940. Levinson worked hard over to the oil industry, and the oil companies gave money to sup a period of two or three years (which included the period dur- port a project. The MIT Geophysical Analysis Group was thus born,andthe MIT Whirlwind digitalcomputer wasused to ing which he wrote the two expository papers) before he felt that he had enough mastery in his new field. Such mastery he analyze seismic records throughout the lifetime of the GAG gave his redid achieve, and his outstanding contributions to differential (1952-1957). During thisperiod,Tukeyfreely equations were recognized by his receiving the prestigious search advice [ 361, [ 371. For example, Tukey’s methods for estimating coherency (today called by various names, such as Bocher Prize in Mathematics in 1954. Despite their other research interests, both Wiener and “semblance” by the oil industry) are vital in the estimation of Levinson were always ready to give their support and time to seismic velocity as well as in othermultichannelmethods. of afastFouriertransform was always inthe MIT Meteorological Project directed by G . P. Wadsworth. Tukey’svision s. M. Simpson [38], wholaterdirected Wiener was especially interested in seeing physical examples of fluential.Infact, autocorrelationfunctions. This interestled to the computa- the Geophysical Analysis Group,eventually devisedaneftion ofseveral autocorrelation functions of ocean wave data ficient 24-point Fourier transform, which was a precursor to FFT by Wadsworth and by his friend and associate H. R. Seiwell the Cooley-Tukey fast Fourier transform in 1965. The made all of Simpson’s efficient autocorrelation and spectrum [33], who was with the Woods Hole Oceanographic Institution. The interest in these computations led to the ‘‘SympG programs instantlyobsolete,onwhichhehadworkedthe

ROBINSON: PERSPECTIVE HISTORICAL

ESTIMATION OF SPECTRUM

equivalent of half a lifetime. Wiener was very generous of his time [39]. Wiener’s work [40], [41] on multichannelmethods was helpful laterinextendingthe Levinson recursion, which Levinson had devised for single channel time series, to the multichannel case [421. Theexcellent seismic dataand corresponding well logs supplied by the oil industry to the Geophysical Analysis Group made possible the development of the statistical minimum-delay model of the earth’sstratigraphic layers, together with the theoretical justification of seismic deconvolution [ 191, [43]. In the late 1950’s a digital revolution occurred because of the introduction of transistors in the building of digital computers, which made possible reliable computers at amuch lower cost than previously. As aresult, the seismic industry completely converted t o digital technology in the early 1960’s, a long ten years after the first digital results were obtained. Since then, nearly every seismic record taken in the exploration of oil and natural gas has been digitally deconvolved and otherwise digitally processed by thesemethods.The final result of the digital processing of seismic data was the discovery of great oil fields which could not be found by analog methods. These oil fields includemost of the offshore discoveries, as in the North Sea, the Gulf of Mexico, the Persian Gulf, as well as great onshore discoveries in Alaska, Asia, Africa, Latin America, and the Middle East, made in the last twenty years, Today an oil company will deconvolve and process as many as one million seismic traces per day; it took a whole summer in 195 1 to do 32traces. Whereas the digital revolution came first to the geophysical industry largely because of the tremendous accuracy and flexibility afforded by large digital computers, today we are in the midst of a universal digital revolution of epic proportions. One now realizes that the work of Wiener and Levinson is being appreciated and used by an ever-increasing number of people. Digital signal processing is a growing and dynamic field which involves the exploration of new technology and the application of the techniques to new fields. Thetechnology has advanced from discrete semiconductorcomponents t o very largescaleintegration (VLSI)with densities above 100 000 components per silicon chip. The availability of fast, low-cost microprocessors and custom high-density integratedcircuits means that increasingly difficultandcomplexmathematical methods can be reduced to hardware devices as originally envisaged by Wiener, except the devices are digital instead of analog. For example, custom a VLSI implementation of linear predictive coding is now possible, requiringa small number of custom chips, Whereas originally digital methods were used at great expenseonly because the application demanded high flexibilityandaccuracy, we have now reached thepointthatanticipated long-term cost advantages have become a significant factor for the use of digital rather than analog methods.

XIII. TUKEYEMPIRICALSPECTRALANALYSIS Aswe have mentioned,a turningpointinthe empirical analysis of time series data began in 1949 at the Woods Hole Symposium on Applications of Autocorrelation Analysis, ThereTukeypresented the f i t of three papers [34], [36], [37], which he had written in the early years on spectrum analysis. These papers introduced the classic Tukeymethod of numerical spectral estimation, a method that has been used by most workers since that time. In addition, Tukey described an approximate distribution forthe estimate. This distribu

901

tion was required for the proper design of experiments for the collection of time series data. In a very interesting paper [44], Tukey describes the situation which led to his spectral work, includinga discussion of Hamming’s suggestion aboutthe smoothing of the discrete Fourier transform of an empirical autocorrelation, which led t o the joint work of Hamming and Tukey. During the last four decades, Tukey [45]-[ 501 introduced a multitude of terms and techniques that are standard to the practice of thedata analysis of time series. Suchcommonplace terms and concepts as “prewhitening,” “aliasing,” “smoothing and decimation,” “tapering,” “bispectrum,” “complex demodulation,” and “cepstrum” are due to Tukey. Very few papers in the literatureof applied time series analysis do not give some acknowledgment of Tukey’s ideas and methods, and most papers credit his ideas in some vital way. Moreover, Tukey [51]-[551 has made substantial contributions in the placing of the data analysis of time series into perspective with current research in the physical sciences, in statistics, and in computing and numerical analysis. We have already mentioned the key influence Tukey had on the MIT group, which included Wadsworth, Simpson, the writer, and others. W. J. Pierson and L. J. Tick [56] at New York University used Tukey’s methodsinthe analysis of oceanographic time series records. The outstanding thesis of Goodman [ 571, which extended the results of Tukey to the bivariate case, was written under Tukey’s supervision. The groupat La Jolla, CA, which included Munk, Rudnick,and Snodgrass, applied Tukey’s spectral methods to estimate wave motion due t o storms manythousands of miles away; a testimony t o the power of his methods. Munk and McDonald wrotearemarkablebook, The Roration of theEarth [58], which used these spectral methods in several novel ways. In this period, packages of computer programs fortime series analysis were appearing. The collection of programs by Healy [59] of Bell Laboratories were circulated from1960 on. The BOMM collection of programs [ 601 was developed at La Jolla. Some of the programs used by Parzen were included in his book [61]. The programs written at MIT are described in [381 and 1621. In econometrics, Granger’s book [ 631 in 1964 described many of the techniques suggested by Tukey for the analysis of univariate and bivariate time series. Themost successful application of spectrum techniques t o economic series is its use forthe description of themultitude of procedures of seasonal adjustment. In astronomy, Neyman and Scott [64] in 1958 carried outthe analysis of two-dimensional data consisting of the positions of the images of galaxies on phot@ graphic plates. Norbert Wiener remained active until his death in 1964. His later work included both empirical results, such as modeling and analyzing brain waves [661, [671, and theoretical results, such as his work with Masani on multivariate prediction theory [40], [41]. Wiener’s death marks the end of an era in time series analysis and spectral theory.

W . THE COOLEY-TUKEYFAST FOURIERTRANSFORM The present epoch of time series analysis began in 1965 with the publication of the fast Fourier transform by Cooley and Tukey [ 681. The effect that this paper has had on scientific and engineering practice cannot be overstated, The paper described an algorithm for the discrete Fourier transform of T = TI T p values by means of T ( T 1 + * + T p ) multi-

- -

-

PROCEEDINGS OF THE IEEE, VOL. 70, NO. 9 , SEPTEMBER 1982

902

plicationsinstead of the naive number T2. Althoughsuch algorithmsexisted previously [69],they seem notto have been put to much use. Sande developed a distinct, symmetrically related algorithm simultaneously and independently. The existence of such an algorithm meant, for example, that the following things could be computed an order of magnitude morerapidly:spectrumestimates,correlograms,filtered versions of series, complex demodulates, and Laplace transforms (see, for example, [70]). General discussions of the uses and importance of fast Fourier transform algorithms maybe found in [7 1]and [ 721.TheFouriertransform of an observed stretch of series can now be taken as a basic statistic and classicalstatistical analyses-such as multiple regression, analysis of variance, principal components, canonical analysis, errorsin variables, and discrimination-can be meaningfully applied to its values, [73] andreferencescitedtherein.Higher-order spectra may be computed practically [74]. inexpensiveportablecomputersforcarrying out spectralanalysis have a p peared onthe marketand may be foundinmany small laboratories. The years since 1965 have been characterized by the knowledge thatthere arefastFouriertransformalgorithms.They have also been characterized by the rapid spread of type of data analyzed. Previously, the data analyzed consisted almost totally of discrete or continuous real-valued time series. Now the joint analysis of many series, such as the 625 recorded by the Large Aperture Seismic Array in Montana [ 75 1, has become common. Spatial series are analyzed [75]. The statistical analysis of point processes hasgrown into an entirely separate field [761.The SASE iV computer program developed by Peter Lewis [771 has furthered such analysis considerably. We notethat transforms otherthantheFourier are finding interestas well [ 781, [ 791.

regularity of a noise-like process does not lie in the shapes of its individual realizations, but in itsunderlying statistical structure. As a result, different realizations usually do not appear to resemble each other. A segment of one realization generally will neither look the same nor have the same empirical aut@ covariances as the corresponding segment of another realization.Anyone using the autocovariance from a given realization as a“known” autocovariance,and thenfitting this “known” autocovariance exactly, is likely to be in worse error than if hedoesnot. As Brillinger andTukeypoint out, we often have only one distinct realization, and we need to make as good inferences as we can about the underlying population. We have to think statistically, and treat our numerical results with due consideration as to the tentative nature of the structural and stochastic assumptions built into the model. It is for data near the noiseprocess end of the continuum thatmany of the Fourier methods of spectrum estimation are intended and are most effective. At the other extreme are the signal-like processes. Some of the best examples of such processes are found in exploration seismology. A concentrated energy source produces a seismic record. Because the random background noise is usually weak, the record will appearessentially the same as anotherone taken at the same place at a different time. As Brillinger and Tukey point out, the study of signal-like processes may well be done by quite different methods. Also there is anotheraspect of spectrumanalysis,one to which the limitations of the data have forced many applied research workers. Here the models are not narrowly restricted by reliable subject-matter knowledge. The time-series records are not long, and the appearance of the data is not distinctive. Corresponding records do not look alike. With almost nothing to work with, the research workers can do nothing much more than fitting a few constants. Thus they fit low-order AR, MA, XV. BURG MAXIMUM ENTROPYSPECTRALANALYSIS and ARMA models usings the Box-Jenkins [ 931 methodology. ThediscreteFouriertransform of the autocovariance is As B W g e r and Tukey observe, this approach in a large numcalled the powerspectrum.Thus the powerspectrum of a ber of applications seems to work much better than might be anticipated. stationary time series with autocovariance @ ( n )is The basic issue in spectrum estimation is the proper choice of model and then the resulting choice of the method of spec@(a)= $(n)e-’W” (for - n < a Q n ) . (1) trum estimation [ 1021.Application of aparticularspectral n =-00 estimator to aninappropriatemodel can result in serious In this discussion, we do not require the autocovariance to be specification errors. normalized, so @(O) does not have to be one. If we know the namely,how Let us now return to the question in point, entire autocovariance function, that is, if we know @ ( n )for all we should estimate the power spectrum from a limited section values of n, then, of course, we can obtain the power spectrum of the autocovariance. Most researchworkers who calculate and use only a limited number of autocovariances do not asby means of (1). However, in many applications, we know or sume that all the later ones vanish. They use the earlier values can reliably measure the autocovarianceonlyforacertain finite number of values of n , say for n = 0, 1, 2, * ,p . B e to calculate a quadraticfunction of the data whose average cause the autocovariance is symmetrical @(n) = @(-n), we thus value (across the ensemble), like all other quadratic functions know the values of $(n) for n = 0, +l, +2, * ,+ p , and we do of the data, is the integral of a knowable kernel with the specnot know @ ( n )for In I > p . The question is how should we esti- trum. They then interpret their result accordingly. One must mate the power spectrum (1) from only this partial knowledge.not charge these workers with making any assumption about In order to answer this question, we should first consider the the uncalculated autocovariances. What they usually do is to recognize thatthe empiricalautocovariances thattheydo phenomenon under study. As there are many different types calculate will vary from realization to realization, and so they of phenomena, there is a corresponding diversity in spectrum analysis. Brillinger and Tukey state that no distinction is more do not and should not take the empirical values as absolute vital thanthat among a) noise-like processes, b) signal-like truth. One of the purposes of this historical essay is to try to give processes, and c) signal-plus-noise processes. While b) is ordinarily unrealistic, there are enough cases of c) with only a the flavor of important developments in spectrum estimation. slight amount of noise so that b) is a helpful idealization. Unfortunately, the writer did not attend the 37th Meeting of A noiselike process produces time series quite different in the Society of Exploration Geophysicists in 1967, although he characterthanthoseproducedbya signal-like process. The attended the meetings in the years both before and after that

2

-

OF SPECTRUM ESTIMATION

STORICAL TIVE ROBINSON:

one. It was at that meeting in Oklahoma City that John Burg presented a paper that was to shake the foundations of spec[80] is entitled trum estimation. This fundamentalwork Maximum Entropy Spectral Analysisand its abstract reads: "The usual digital method of obtaining a power spectrum estimate from a measured autocovariance function makes the assumption that the correlation function is zero at alllags for which no estimate is available and uses some treatment of the estimated lags to reduce the effect of truncation of the autocovariancefunction.The method discussed in this paperinsteadretains allof theestimated lags without modificationand uses a nonzero estimate for the lags not directly estimated. The particular estimation principleused is that the spectral estimate must be the most random or have the maximum entropy of any power spectrum which is consistent with the measured data. This new analysis technique gives a much higher resolution spectral estimate than is obtained by conventionaltechniqueswitha very little increase in computing time. Comparisons will illustrate the relative importance." Theestimation of the powerspectrum of stationarytime series frompartialknowledge of its autocovariance function is a classical problem to which much attention has been given over the years. Almost all of this work is based on the use of windowfunctions, whose properties can beanalyzedby Fouriermethods. Burg [go], [811, [ 821 in his pioneering work introduced a new philosophy in spectral analysis based on general variational principles, and, in particular, the maximum entropy method(MEM) which we will now discuss. Theconventionalapproach to estimating the power spectrum from @(n), In1 Q p , is to assume that @(n) = 0 for n > p and to taketheFouriertransform of w ( n ) @(n), In1 Q p , where w ( n ) is a weighting function. We now want to describe the maximum entropy method of Burg. Given a limited set of autocovariance coefficients together with the fact that a power spectrum @(a) must be non-negative, we know that there are generally an infinite number of power spectra in agreement with this information. Thus additional information is required, and a reasonable goal is to finda single function @(a),which is representative of the class of all possible spectra. In order to resolve this problem, some choice has to be made, and Burg made use of the concept of entropy. Maximum entropy spectral analysis is based onchoosing thatspectrum whichcorresponds tothe most random or the mostunpredictable time-serieswhose aut@ covariance coincideswith the given set of values. This concept of maximum entropy is the same as that used in both statistical mechanics and information theory, and as we will see represents the most noncommittal assumptionpossible with regard to the unknownvalues of the autocovariance function. Equation (1) gives the power spectrum @(a)as the discrete Fourier transform of the autocovariance function @(n). From Fourier theory, we know that the autocovariance function can beobtained as the inverse Fouriertransform of the power spectrum; thatis

@(n) =

1

"

@(a)eiwn dw

(for all integers n). (2)

T h e fundamental assumption involved in maximum entropy spectral analysis is that the stationaryprocess under consideration is the most random or the leastpredictabletime series

903

that is consistentwith the given measurements, Specifically, the given measurementsare the known autocovariance coefficients, namely

In terms of information theory, we require that the entropy persample of time series is amaximum.From the workof Shannon in 1948, it follows that the entropy is proportional to the integral of the logarithm of the power spectrum, thatis, the entropy is

1:

log @(a) do.

(4)

is Therefore the required maximum entropy power spectrum that function @ ( w )which maximizes (4) under the constraint equations (3). One way to solve the problem of finding the maximum entropy powerspectrumsubject to fixed values of @(n) for 1 n I < p is by use of Lagrange multipliers. However, we may instead use the following approach. From (1) we see that the partial derivative of @(a) with respect to @(n) is

It follows that

Now let us maximize (4) with respect to the unknown values @(n) where 1 n 1 > p . Thus we set the partial derivatives of (4) with respect to @(n) for In I > p equal to zero; that is

for In1 > p .

(6)

Making use of (5), we see that ( 6 ) reduces to [@(o)l-' e-iwn dw = 0,

for In I > p .

(7)

This equation specifies the form of the inverse power spectrum [ @ ( a ) ] of a maximum entropy process. Let us explain. Given any stationary process with positive power spectrum @(a), then its inverse power spectrum

-'

is also positive. We assume that the inverse power spectrum is integrableandbounded, so that it is a well-behaved power spectrum in its own right. Thus the inverse power spectrum (8) can be associated with an autocovariance function, which we designate by Jl(n), such that the counterparts of (1) and (2) hold, namely

9 04

PROCEEDINGS OF THE IEEE, VOL. 70, NO. 9, SEPTEMBER 1982

and

ical inverse problem. Itakura and Saito 1841 were responsible for introducing two important ideas into spectrum estimation that are now gainingwide acceptanceintheengineering world. The first idea is that of using maximum likelihood in spectrumestimation.Although the ideaitself was not new, their introduction of a particular gpectrum distance measure is becoming more and more important for different applications, such as speech. Parzen [85] gives the name information divergence to this measure, which is the same as the KullbackLeibler information number. He also shows its relation to the notion of cross-entropy. The second idea is that of using the lattice as a filter structure for the purpose of analysis (asan all-zero filter) and synthesis (as an ail-pole filter). The idea of an adaptive lattice was fmt proposed by Itakura and Saito as a way of estimating thepartialcorrelation(PARCOR) coefficients(a termtheycoined) adaptively.Makhoul [86] has shown how Burg's technique is really a special case of lattice analysis. Also, the lattice has become important because of its fast convergence and its relative insensitivity to roundoff errors.

(10) Because we do notnormalize, the zero-lag value $ (0) does not have to be one. Let us now return to the maximum entropy process. As we have seen, its inverse power spectrum satisfies (7) for In1 > p . In (7) replace n by - n , and also multiply each side of the equation by 1/27~.Thus we obtain

I,

1 = [ @ ( o ) ~ - ' e ' ~ ~ d w for = ~ ,~ n l > ~ (11) . 2n

Comparing (1 1)with (10), we see that for a maximum entropy process we have

$(n)=O,

for Inl>p.

(12)

Hence, using (12) in (9), we see that the inverse power speo trum of a maximum entropy process is [+(a)]

=

-1

5

IC, (n)e-iwn.

(13)

n=-p

The right-hand side of (13) is afinitetrigonometric series. Thus we have shown that themaximum entropy process is one whose inverse power spectrum is a finite trigonometric series, or equivalently one whose power spectrum is the reciprocal of a finite trigonometricseries, that is

@.(a) =

1

f

(14)

$(n) e-iwn

n=-p

Ifwe let z = e l w , then the finite trigonometric becomes

This can befactored p. 1941) as P

n=-p

series (13)

by the Fej6r method(Robinson[43,

1 $(n)z-" =-[l

+alz-l

+ *

"+apz-q

U2

- [l +alz+***+apzP]

(15)

where u2 is a positive constant and where ~ ( z=) 1 + alz-'

+

* * *

+apz-p

(16)

is minimumdelay (i.e., where A ( z ) has no zeros on or outside the unit circle). Thus the maximum-entropy process is

specified by -2

@ ( z )=

(I

A(z)A(z-')

*

This result shows that the maximum entropy process is an AR process of order p . Silvia and Robinson [ 831, through theuse of lattice methods, have related the concept of maximum entropy to the geophys-

XW. STATISTICAL THEORYOF

SPECTRUM

ESTIMATION

Since the pioneeringwork of Tukey[341in1949,many importantcontributions have beenmade tothe statistical theory of spectrum estimation. An adequate treatment would require a long paper in itself, and so all we can hope to dohere is t o raise the reader's consciousness concerning the statistical theory required to understand and implement spectrum estimation. Thewriterhasgreatadmirationforthework of Parzen, whofromthe 1950's to the presenttimehasconsistently made bedrock contributions both in theory and applications [61], [ 8 5 ] , [87], [88]. His long series of paperson time series analysis includethefamous Parzen window for spectrum analysis. Anotherone ofParzen's importantcontributions is his formulation of the time series analysis problem in terms of reproducingkernelHilbert spaces. A remarkable number ofPh.D. theses ontime series analysishave been written under the direction of Parzen, more than any other person. The writer has had the good fortune to discussgeophysical time-series problems with Professor Parzenover the years, and in every case Parzen has been ableto provide important physical insight in the application of the statistical methods. The Harvard lectures by Professor Parzen in 1976 represent one of the high points in timeseries analysis and spectrum estimation ever to be heard in those venerable halls. The book by Grenander and Rosenblatt [651 in 1957 formalized many of the data analysis procedures and approximations that have come into use. They have an extensive treatment of theproblem of choice of window andbandwidth. The further contributions to this problem by Parzen and by Jenkins are discussed in the 1961 paper of Tukey [ 5 1 1. An accurate and informative account of the developments of speo trum estimation in the 1950'sis given by Tukey [44]. Anotherimportantstatistical development that deserves mention is the alignment issue in the estimation of coherence, worked on in the 1960's by Akaike andYamanouch,by Priestley,and by Parzen.Discussionsof thisworkand the references can be found in the book by Priestley [89]. This excellent book, which appeared in 1981, has already set a new standard. It can be recommended as an authoritative account of the statistical theoryof spectrum estimation, whichwe only touch upon inthis section.

ROBINSON: PERSPECTIVE HISTORICAL

OF SPECTRUM ESTIMATION

H.Wold coined the names“moving averageprocess” and “autoregressive process” in his 1938 thesis [90] under Professor HaraldCramCr atStockholm University. In his thesis, Wold computed a model of the yearly level of Lake Vaner in the current rainfalland the Sweden as a movingaverageof previousyear’s rainfall. He also computedan autoregressive model of the businesscycle in Sweden for the years18431913. In turn, Whittle wrote his 1951 thesis [91] under ProfessorWold at UppsalaUniversity. Whittleopened up and made important contributions to the field of hypothesis testing in timaseries analysis. Whittle’s careful workis exemplified by his autoregressive analysis of a seiche record [92] in which he fits a low level autoregressive model to the data and gives statistical tests to determine the appropriateness of the model. Professor Whittle used to return to Sweden for visits, and the writerrememberstakinglong walks withhimthrough the Uppsala countryside exploring for old runestones and ancient Viking mounds.Although the writerhad left the University of Wisconsin t o work with ProfessorWold in Sweden, it turned out that Wisconsin under the leadership of Professor G. Box became the real center of time series analysis. It was the joint work of Box with Professor G. M. Jenkins [93] that actually brought the autoregressive (AR) process and the moving average (MA) process to theattention of the general scientific community. The brilliance of this work has made the names Box-Jenkins synonymous withtime-series analysis. No achievement is better deserved. No personunderstandsdatabetter than Box in the application of statistical methods to obtain meaningful results. In the 1960’s Parzen [87] and Akaike [941 discussed autoregressive spectrumestimation,and this workled to their crucial work on autoregressive order-determining criteria [88] and[95]. Suchcriteria have made possible the widespread application of autoregressive spectrum estimation by researchersin diverse scientific fields. Akaikehas provided a link between statistics and control theory with deep andsignificantresults,and his work is of the highest traditionthat science canprovide. Young research workers can learn much by studying his writings well. We wish we had more space and knowledge to expand upon this section, and those many statisticians whom we have not mentioned should remember that this history is by no means the final word. Someday we hope to write more fully on this subject, and we welcome all comments and suggestions. XVII. ENGINEERINGUSEOF SPECTRAL ESTIMATION The purpose of this section is only to refer to therest of the papers in this specialissueof the Proceedings of the IEEE. These other papers cover the engineering use of spectral estimationmuchbetterthan we could do here.Therepapers representa living history of the present status of spectral estimation,and,inthemandinthereferenceswhichthey give, the reader can find the works of the people who have made spectralanalysis and estimation avital scientific discipline today. As general references, we would especially like to mentionthe1978 IEEE bookeditedbyChilders [96], Haykin [98],the [971,the RADC SpectrumEstimationWorkshop First IEEE ASSP Workshop on Spectral Estimation [99], and Ulrych and Bishop [ 1001. Although much progress has been made, much work yet remains to be done, and there is adventure fora research worker who setshis course in this rewarding and exciting field.

905

ACKNOWLEDGMENT Iwant to expressmysincereappreciation to Prof. D. R. Brillinger who let me freely use his paper, “Some history of data analysis of time series in the United States,” in History of Statistics in the United States, edited by D. B. Owen and published by Marcel Dekker in 1976. I want to thank Dr. J. Makhoul for sending me notes on maximumlikelihoodand lattice networks. I want t o especially thank the authors cited inthe ReferenceSection whose constructivecomments materiallyimprovedthis paper. In writinganhistoricalpaper, we should include a thousand references instead of one hundred, so important statistical contributions have unfortunately been left out. Of course, we have purposely not included engineering contributions (as they are covered in the rest of this special issue) but there is never a clear cut line, and so in this sense other important work has also been left out. However, all such omissions are not intentional, and wewillgladly try to rectifyanysituation in someappropriatefuture publication. Finally, most of all, I want to thank Prof. J. Tukey for the support and help he gave me on spectrum estimation thirty years ago at MIT for which I am forever grateful.

REFERENCES [ 1 ] I. Newton, Optics London,England, 1704. [ 21B. Taylor, MethodusZncrementommDirecta et Inverse. London, England, 171 5. [ 31 D. Bernoulli, Hydrodynamics Basel,Switzerland, 1738. (41 L. Euler, Znsritutiones Calculi Differentialis St. Petersburg, Russia, 1755. [ 51 J. L. Lagrange, Thkorie desFonctionsAnalytiques Paris, France, 1759. [ a ] J. Fourier, Thkorie Analytiquede la Chaleur. Paris, France: Didot, 1822. [ 71 C. Sturm, “Memoire sur les kquations diffkrentielles linkaires du second ordre,” Journal de Mathkmatiques Pures et Appliquies, Paris, France, Series 1 , vol. 1, pp. 106-186, 1836. [ 81 J. Liouville, “Premier mkmoire sur la thkorie des kquations diffirentielles linkaries et sur ledeveloppementdesfonctionsen skries,” Journal de Mathimatiques Pures et Appliquies, Paris, France, Series 1 , vol. 3, pp. 561-614, 1838. [ 91 G. Green, Essay on the Application of Mathematical Analysis to the Theories of Electricity and M a g n e w Nottingham, Enland, 1828. [ 101 E. Schrodinger, Collected Papers on Wave Mechanics London, England:Blackie, 1928. [ 111 W. Heisenberg, The Physical Principles of QuantumTheory. Chicago, IL: University of Chicago Press, 1930. [ 121 J. von Neumann, “Eigenwerttheorie Hermitescher Funcktionaloperatoren,”Math Ann., voL 102, p. 49, 1929. [ 131 -, Mathematische Grundhgen der Quantenmechanik. Berlin, Germany: Springer, 1932. I141 M. vonSmoluchowski, TheKineticTheoryofMatter and Electricity. LeipzigandBerlin,Germany, 1914. [ 151 k Einstein,“Onthetheory of theBrownian movement,”. Annalen derPhysik, vol. 19, pp. 371-381, 1906. 161 N. Wiener, “Differential space,” J. Math, Phys., vol. 2, p. 131. 171 k Schuster,” On the investigation of hidden periodicities with application t o a supposed 26-day period of meterological phenomena,” Tew. Magnet., vol. 3, pp. 13-41, 1898. 181 G. U.Yule,“Onamethod of investigatingperiodicitiesin disturbedseries, withspecial reference t o Wolfer’ssunspotnumbers,” Phil Trans Roy. Soc London, A, vol. 226, pp. 267-298. 191 E. k Robinson, Predictive Decomposition of Time Series with Applications to Seismic Exploration, MIT Geophysical Analysis Group, 1954; Reprintedin Geophysics, voL 32, pp. 418-484, 1967. [20] -, AnIntroductiontoInfinitelyManyVariates London, England: Griffin, 1959, p. 109. [ 2 1 ] N. Wiener, “Generalizedharmonicanalysis,” ActaMath., vol. 55, pp. 117-258, 1930. der Stationiiren St* [ 2 2 ] A. Y. Khintchine,“Korrelationstheorie chastischen Prozesse,”Math Ann., vol. 109, p. 604. [ 2 3 ] N. Wiener, Cybemerics Cambridge, MA: MIT Press, 1948. [ 2 4 ] -, Extrapolation, Interpolation, and Smoothing of Stationary TimeSerieswith Engineering Applications, MIT NDRC Report, 1942, Reprinted, MIT Press, 1949.

906

The FourierZntegraL London, England: Cambridge, 1933. [ 2 5 ] -, [ 2 6 ] P. Dirac, Principles of Quantum Mechanics New York: Oxford University Press, 1930. [ 271 0.Heaviside, Electrical Papers, vol. I and 11. New York: Macmillan, 1892. [ 2 8 ] J. von Neumann, “Uber Funktionen von Funktional operatoren,”Ann Math., vol. 32, p. 191. [ 291 N. Wiener, Z A m a Mathematician Cambridge, MA:MIT Press, 1956. [ 3 0 ] N. Levinson, “A heuristic exposition of Wiener’s mathematical theory ofprediction andNtering,” Journal of Math and Physics, voL 26, pp. 110-1 19, 1947. [ 31 ] N. Levinson, The Wiener RMS (root mean square) error crite rion in filter design and prediction, J. Math Phys., vol. 25, pp. 261-278, 1947. [ 321 G.P. Wadsworth, Short-Range and Extended Forecasting by Statistical Methods, Air Weather Service, Washington, DC, 1948. [ 33) H. R Seiwell, “The principles of time series analyses applied to ocean wave data,” in Proc Nut Acad Sei., U.S.., voL 35, pp. 518-528, 1949. [ 3 4 ] J. W. Tukey,“The sampling theory of powerspectrum estimates,” in Proc Symp. AppL Autocorr. AnaL Phys Prob. U.S. O n Naval R e s (NAVEXOSP-725), 1949. Reprinted in J. Cycle Res., voL 6, pp. 31-52, 1957. [ 3 5 ] G.P. Wadsworth, E. A. Robinson, J. G. Bryan, and P. M. Hurley,“Detection of reflections on seismic records by linear operators,” Geophys, vol. 18, pp. 539-586, 1953. I361 J. W. Tukey and R. W. Hamming, “Measuring noise color 1,” Bell Lab. Memo, 1949. 1371 J. W. Tukey, Measuring NoiseColor, Unpublished manuscript prepared for distribution attheInstitute of Radio Engineers Meeting, Nov. 1951. [ 3 8 ] S. M. Simpson, TimeSeriesComputations in FORTRAN and FAR Reading, MA: Addison-Wesley, 1966. [ 391 N. Wiener, Bull Amer. Math Soc., vol. 72, pp. 1-145. [ 4 0 ] N. Wiener and P. Masani, “The prediction theory of multivariate stochastic processes,” Acta Math., vol. 98, pp. 111-150, 1957. ( 4 1 ] -, “On bivariate stationary processes,” Theory Prob. AppL, voL 4, pp. 300-308. [ 4 2 ] R A. Wiggins and E. A. Robinson, “Recursive solution to the multichannel filtering problem,” J. of Geophys Res., vol. 70, pp. 1885-1891, 1965. [ 4 3 ] E. A. Robinson, Statistical Communication and Detection with Special Reference to Digital Data Processing of Radar and Seismic Signals London, England: Griffin, 1967. Reprinted with new title, Physical Applications of Stationary Time Series. New York: Macmillan, 1980. [ 4 4 ] J. W. Tukey, “An introduction to the calculations of numerical spectrum analysis,” in SpectralAnalysis of TimeSeries, 8. Harris, Ed. New York: Wiley, 1967, pp. 25-46. [ 4 5 ] H. Press and J. W. Tukey, “Power spectral methods of analysis and their application to problems in airplane dynamics,” Bell Syst Monogr., voL 2606, 1956. I461 R B. Blackman and J. W. Tukey, “The measurement of power spectra from the point of view of communications engineering,” Bell Sysf Tech J., voL 33, pp. 185-282, 485-569, 1958; also New York: Dover, 1959. [ 4 7 ] J. W. Tukey,“Theestimationof power spectra and related quantities,” On NumericalApproximation, R. E. Langer, Ed. Madison, WI: University of Wisconsin Press, 1959, pp. 389-411. [ 4 8 ] -, “An introduction to the measurementof spectra,” in Probability and Statistics, U. Grenander, Ed. New York: Wiley, 1959, pp. 300-330. [ 4 9 ] -, “Equalization and pulse shaping techniques applied to the determmation of the initial sense of Rayleigh waves,” in The Need of Fundamental Research in Seisnology, Appendix 9, Department of State, Washington, DC, 1959, pp. 60-129. [SO] B. P. Bogert, M. J. Healy, and J. W. Tukey, “The frequency analysis of time series for echoes; cepstrum pseud-autocovariance, cross-cepstrum and shape-cracking,’’ in TimeSeriesAnalysis, M. Rosenblatt, Ed. New York: Wiley, 1963, pp. 201-243. [ 511 J. W. Tukey, “Discussion emphasizing the connection between analysis of variance and spectrum analysis,” Technometrics, VOL 3, pp. 1-29, 1961. [ 5 2 ] -, “The future of data analysis,” A n n M a t h Statipt., vol. 33, pp. 1-67, 1963. [ 531 -, ‘What can data analysis and statisticsoffer today?” in Ocean Wave Spectra, Net. Aced Sci., Washington, DC, and RenticeHall, Englewood Cliffs, NJ, 1963. [ 541 -, “Uses of numerical spectrum analysis in geophysics,” Bull Znt Statist. Znst., voL 39, pp. 267-307, 1965. [ 551 -, “Data analysis and the frontiers of geophysics,” Science, voL 148, pp. 1283-1289, 1965. [ 5 6 ] W. J. Pierson and L. J. Tick, “Stationary random processes in meteorology and oceanography,” BuU Z n t Statist. Znst., vol. 35, pp. 271-281, 1957.

PROCEEDINGS OF THE IEEE, VOL. 70, NO. 9, SEPTEMBER 1982 [ 571 N. R Goodman, “On the joint estimation of the spectra, cospectrum and quadrature spectrum of a two-dimensional stationary Gaussian process,” Science Paper no. 10, Engineering Statistics Laboratory, New York University, New York, 1957. [ 5 8 ] W. H. Munk and G. J. F. MacDonald, TheRotationofthe Earth. New York: Cambridge University Press, 1960. [ 591 M. J. Healy and B.P. Bogert, “FORTRAN subroutines for time series analysis,” Commun Soc. Computing Machines, voL 6, pp. 32-34, 1963. (601 E. C Bullard, F.E. Ogelbay, W. H. Munk, and G.R. Miller, A User’s Guide to BOMM, Institute of Geophysics and Planetary Physics, University of California Press, San Diego, 1966. Papers. San Francisco, CA: [ 6 l ] E. Parzen, TimeSeriesAnalysis Holden-Day, 1967. [ 6 2 ] E. A. Robinson, Multichannel Time Series Analysis with Digital Computer Programs San Francisco, CA: Holden-Day, 1967. I631 C. W. J. Granger, Spectral Analysis of Economic Time Series Princeton, NJ: Princeton University Press, 1964. [ 6 4 ] J. Neyman and E.L. Scott, “Statisticalapproach t o problems of cosmology,”J. Roy. Statist Soc., Series B, vol. 20, pp. 1-43, 1958. [ 6 5 ] U. Grenander and M. Rosenblatt, StatisticalAnalysis OfStationary Time Series New York: Wiley, 1957. [ 661 N. Wiener, Nonlinear Problems in Random Theory. Cambridge, MA: MIT Ress, 1958. [ 6 7 ] -, “Rhythm in physiology with particularreference to encephalography,” Proc Roy. Virchow Med Soc., NY,vol. 16, pp. 109-124, 1957. [ 6 8 ] J. W. Cooley and J. W. Tukey,” An algorithm for the machine calculation of Fourier series,” MathComput., vol. 19, pp, 297-301, 1965. (691 J. W. Cooley, P. A. W. Lewis, and P.D. Welch, “Historical notes on the fast Fouriertransform,” ZEEE TransAudioElectroacoust., v01. AU-15, pp. 76-79, 1967. [ 7 0 ] C. Bingham, M. D. Godfrey, and J. W. Tukey, “Modern techniques of power spectrum estimation,” ZEEE Trans Audio Electroacoust., vol. AU-15, pp. 56-66, 1967. [ 71 ] E. 0.Brigham and R E. Morrow, “The fast Fourier transform,” ZEEE Spectrum, pp. 63-70, 1967. [ 721 IEEE TransAudioElectroacoust (Special issue on Fourier transform), B. P. Bogert and F. Van Veen, E&., vol. AU-15, June 1967 and voL AU-17, June 1969. [ 731 D. R Brillinger, Time Series: Data Analysis and Theory. New York: Holt, Rinehart, and Winston, Inc., 1974; revised edition by Holden-Day, San Francisco, CA, 1980. [ 7 4 ] D. R Brillinger and M. Rosenblatt, “Computation and interpretation of k-th order spectra,” in Spectral Analysis of Time Series, B. Harris, Ed. New York: Wiley, 1967, pp. 189-232. [ 7 5 ] I. Capon, “Applications of detection and estimation theory t o large array seismology,”Proc ZEEE, vol. 58,.pp. 760-770,197& [ 7 6 ] StochasftcPointProcesses, P. A. W. Le-, Ed. New York: Wiley, 1972. [ 7 7 ] P. A. W. Lewis, A. M. Katcher, and A. H. We& “SASE IV-An improved program for the statistical analysis of series of events,” ZBMRes Resp., RC2365, 1969. [ 7 8 ] Roc Symp. Walsh Functions, 1970-1973. [ 7 9 ] A. Cohen and R H. Jones, “Regreasion on a random field,” J. Amer. Statist. Ass., vol. 64, pp. 1172-1182, 1969. [ 8 0 ] J.P.Burg, MaximumEntropySpectralAnalysis. Oklahoma City, OK, 1967. [ 81 ] -, Maximum Entropy Spectral A ~ l y s i s . Stanford, CA: Stanford University, 1975. [ 821 R T. Lacoss, “Data adaptive spectral analysis methods,” G e e PhySiCS, VOL 36, Pp. 661-675,1971. I831 M. T. Silvia and E. A. Robinson, Deconvolution of Geophysical Time Series in the Exploration of Oil and Natuml Gas Amsterdam, The Netherlands: Ekevier, 1979. [841 F. Itakura and S. Saito, “Analysis synthesis telephony based on on the maximumlikelihood method,” in Proc Sfxth Znf C o w . on AcouJtlcs, (Tokyo, Japan), 1968. [ 85 I E. Parzen, “Modern empirical spectral analysis,” Tech. Report N-12, Texas A&M Research Foundation, College Station, TX, 1980. (861 J. Makhoul, “Stable and efficient latticemethodsfor linear prediction,” ZEEE TransAcoust.,Speech, Signal Processing, voL ASSP-25, pp. 423-428, 1977. [871 E. Parzen, “An approach to empirical time series,” J. R e s Not. Bur. Stand., VOL 68B, pp. 937-951, 1964. “Some recent advances in time series modeling,” ZEEE I881 -, Tram Automat Contr., voL AC-19, pp. 723-730,1974. [891 M. Priestley, SpectralAnalyak and Time Series, 2 volumes, London, England: Academic Press, 1981. I901 H. Wold, A Study in the Analysis of Stationary Time Series. Stockholm, Sweden: Stockholm University, 1938. [ 9 11 P. Whittle, Hypothesis Tesring in Time-Series Analysis. U p g sala, Sweden: Almqvist and Wiksells, 1951.

PROCEEDINGS IEEE,OF THE (921 [93] [94] (951 [96] 1971

VOL. 70, NO. 9, SEPTEMBER 907 1982

-,

The statistical analysis of a seiche record, J. Marine Res., voL 13, pp. 76-100, 1954. G. Box and G. M. Jenkins, TimeSeriesAnalysis,Forecasting, and Control Oakland, CA: Holden-Day, 1970. H. Akaike, “Power spectrum estimation through autoregressive model fitting,” A n n Z n s t Statist Math., vol. 21, pp. 407-419, 1969. -, “Anewlookatstatistical modelidentification,” ZEEE Trans Autom Contr., vol. AC-19, pp. 7 1 6 7 2 3 , 1974. D. G. Childers, ModernSpechumAnalysis New York:IEEE Press, 1978. S. Haykin, NonlinearMethods of SpectralAnalysis Berlin,

Germany: Springer, 1979. [98] h o c of the RADC Spechum Estimation Workshop, Rome Air Development Center, Griffw Air Force Base, NY, 1979. [99] h o c of the FirstASSPWorkshop on SpectralEstimation, McMaster University, Hamilton, Ontario, Canada, 1981. [ l o o ] T.J. Ulrych and T. N. Bishop, “Maximum entropyspectral analysis andautoregressivedecomposition,” Rev.Geophys., VOI.13, pp. 183-200, 1975. [ 1011 P. R Gutowski, E. A. Robinson, and S. Treitel, “Spectral estimation:Fact or fiction,” ZEEE TransGeosciElectronics, vol. GE16, pp. 80-84,1978.Reprinted in D. G. Childers, Modern Spechum Analysis New York: IEEE Press, 1978.

Spectral Estimation: An Overdetermined Rational Model Equation Approach JAMES A.

CADzoW, SENIORMEMBER,IEEE

Al#frrret-In seeking rational models of time series, the concept of approximatingsecond-order statistical relationships @e., theYuleWalkerequations) is oftenexplicitly or implicitlyinvoked.Theparameters of the hypothesized rational model are typically selected so that these relationships “best represent” a set of autocodation lag estimates computed from time series obmvations. One of the objectives of this paper win be that ofestablishing this fundamental approach to the generation of rational models. An examination of many popular contempomy spectral estimation methods reveals that the parameters of a hypothesized rational model are estimatedupon using a ‘‘minhd’’ setofYuleWalkerequation evaluations. This results in an undesired parameterhypersensitivity To counteract and a subsequent decrease in estimation performance. this parameter hypersensitivity, the concept of using morethan the minirml numberof YuleWalkerequationevaluations is herein advocated. It is shown that by taking thii overdetermined parametric evaluationapproach,areduction in data-inducedmodelparameter in hypersensitivity is obtained,andacorrespondingimprovement modelingperformance results. Moreover,uponadaptinga singular value decomposition representation of an extended-order autocorrelation matrix estimate to this procedure, a desired model order determinationmethod is obtained and a further significant improvement in modeling performance is achieved. This approach makes possible the generationoflow-order highquality rational spectral estimates from short data lengths.

I. INTRODUCTION

H

N A VARIETY of applicationssuch as found inradar Doppler processing, .adaptivefiltering,speech processing, underwateracoustics, seismology, econometrics,spectral estimation, and array processing, it is desired to estimate the

Manuscript received May 24, 1982. This work was supported in part by the Signal Processing Section of RADC under Grant AFOSR-81-0250 andby the Statistics and Probability Program of the Office of Naval Research under Contract N00014-82-K-0257. The author is with the College of Engineering and Applied Sciences, Department of ElectricalandComputerEngineering,ArizonaState University, Tempe, A 2 85287.

statistical characteristics of a wide-sense stationary time series. More often than not, this requiredcharacterization is embodiedinthetime series’ autocorrelation Zag sequence as specified by

r,(n> = ~ { x ( + n rn)?(rn)}

(1.1)

in which E and - denote the operations of expectation and complexconjugation, respectively. From this definition,the well-known property that the autocorrelation lags are complex conjugate symmetric (i.e., r,(-n) = F,(n>) is readily established. We w l l iautomatically assume this property whenever negative lag autocorrelation elements (or their estimates) are required. Thesecond-orderstatisticalcharacterization as represented by the autocorrelation sequence may be given an “equivalent” frequency-domain interpretation. Namely, upontakingthe Fourier transform of the autocorrelation sequence, thatis, 00

rSx, ( n e j)we -) i=n w n=-w

(1.2)

we obtaintheassociated powerspectraldensityfunction & ( e i w ) in which the normalized frequency variable w takes on values in [-n,n]. This function possesses a number of salient properties among which are that it is a positive semidefinite,symmetric (if thetime series is real valued),and periodic function of a. This function is seen to have a Fourier series interpretation in which the autocorrelation lags play the role of theFourier coefficients. It,therefore, follows that these coefficients may be determined from the power spectral density function through the Fourier series coefficient integral expression

I

rx(n)= 2n

0018-9219/82/0900-0907$00.75 0 1982 IEEE

I, n

S,(eiw) eiwn dw.