Pattern classification using neural networks - IEEE ... - Semantic Scholar

7 downloads 773 Views 2MB Size Report
Nov 4, 1989 - net classifiers inspired by biological neural networks, by the need to add learning ..... Local Area Networks. Integrated Services Digital Network.
Pattern Classification Using Neural Networks Richard P. L ippma n n

-1.

HE RECENT RESURGENCE OF INTEREST IN neural networks, machine learning, and parallel computation has led to renewed research in the area of statistical pattern classification. Early pattern classification research performed in the '60s and '70s focused on asymptotic (infinite training data) properties of classifiers. on demonstrating convergence of density estimators. and on providing bounds for error rates. Many researchers studied parametric Bayesian classifiers where the form of input distributions is assumed to be known and parameters of distributions are estimated using techniques that require simultaneous access to all training data. These classifiers, especially those that assume Gaussian distributions. are still the most widely used because they are simple and are clearly described in a number of textbooks [ I ] [2]. These books and early research, however, tended to ignore important practical issues and provided few guidelines for selecting or implementing classifiers for real-world problems. What guidelines were presented were appropriate only for early computer technology, which had severely limited memory and computation power. The thrust of recent research has changed. More attention is being paid to practical issues as pattern classification techniques are being applied to speech, vision, robotics, and artificial intelligence applications where real-time response with complex real-world data is a necessity. Much of this research is motivated by the desire to understand and build parallel neural net classifiers inspired by biological neural networks, by the need to add learning to artificial intelligence applications, by new Very Large-scale Integration (VLSI) design techniques, and by the availability of high-speed parallel and serial computers with large amounts of processing power and memory. This has led to an emphasis on robust, adaptive, nonparametric classifiers that can be implemented on parallel hardware. Adaptive non-parametric neural-net classifiers work well for many real-world problems. These classifiers frequently provide reduced error rates when compared to more conventional Bayesian approaches [3-81 and they also provide selection of differing practical characteristics. Classifiers provide tradeoffs in memory, computation, training time, and adaptation requirements. They also differ in ease of real-time implementation using custom VLSI circuitry, in the ease with which they can be programmed efficiently on specific parallel or serial computers, and in computational complexity. Generalization capabilities for specific applications and the ease with which the complexity of a classifier can be matched to the amount of

0163-680418910011-0047 $01 .OO C 1989 IEEE

training data also differ. Finally, classifiers differ in their ability to use unsupervised training data and in the ease with which internal operations can be understood and interpreted to determine what input features contribute to classification performance. These issues, more than error rate, tend to drive the selection of a classifier for a particular application. Unfortunately, the recent literature on pattern classification is scattered among many journals and conference proceedings and it is difficult to obtain an overview ofthe many different adaptive classifiers that have been developed. This article

Adaptive non-parametric neural-net classifiers work well for many real-world problems. is meant to be the beginning of such an overview. It provides a taxonomy of classifiers, including a discussion of the importance of matching classifier complexity to the amount of available training data. Emphasis is placed on describing the many different pattern classifiers that have been developed and on discussing the large practical differences between classifiers. This article extends a previous review [9] and focuses on feed-forward neural-net classifiers for static patterns with continuous-valued inputs. Good overviews of recent work in the more conventional field of statistical pattern classification are available in [IO- 121. Other recent discussions of adaptive neural net classifiers are available in [6] [13-15]. Work describing analyses of neural net models using machine learning and complexity theory is available in [16-181.

Training and Testing The goal of pattern classification is to assign input patterns to one of a finite number, M , of classes. In the following, it will

This work was sponsored by the Department ofthe Air Force and the Air Force Office of Scientific Research. The views expressed are those of the author and do not reflect the official policy or position of the U.S. Government.

November 1989 - IEEE Communications Magazine

Authorized licensed use limited to: Escuela Superior de Ingeneria Mecanica. Downloaded on October 30, 2009 at 20:14 from IEEE Xplore. Restrictions apply.

47

be assumed that input patterns consist of static input vectors x containing N elements or continuous valued real numbers denoted x,, x2,..., x N Elements represent measurements of features selected to be useful for distinguishing between classes. Input patterns can be viewed as points in the multidimensional space defined by the input feature measurements. The purpose of a pattern classifier is to partition this multidimensional, space into decision regions that indicate to which class any input belongs. Conventional Bayesian classifiers characterize classes by their probability density functions on the input features and use Bayes’ decision theory to form decision regions from these densities [ 11 [2]. Adaptive non-parametric classifiers do not estimate probability density functions directly but use discriminant functions to form decision regions. Application of a pattern classifier first requires selection of features that must be tailored separately for each problem domain. Features should contain information required to distinguish between classes, be insensitive to irrelevant variability in the input, and also be limited in number to permit efficient computation of discriminant functions and to limit the amount of training data required. Good classification performance requires selection of effective features and also selection of a classifier that can make good use ofthose features with limited training data, memory, and computing power. Following feature selection, classifier development requires collection of training and test data, and separate training and test or use phases, as shown in Figure 1. During the training phase, a limited amount of training data and a priori knowledge concerning the problem domain is used to adjust parameters and/ or learn the structure ofthe classifier. During the test phase, the classifier designed from the training phase is evaluated on new test data by providing a classification decision for each input pattern. Classifier parameters and/or structure may then be adapted to take advantage of new training data or to compensate for nonstationary inputs, variation in internal components, or internal faults. Further evaluations require new test data. It is important to note that test data should never be used to estimate classifier parameters or to determine classifier structure. This will produce an overly optimistic estimate of the real error rate. Test data must be independent data that is only used to assess the generalization of a classifier, defined as the error rate on never-before-seen input patterns. One or more uses of test data, to select the best performing classifier or the appropriate structure of one type of classifier, invalidate the use of that data to measure generalization. In addition, input features must be extracted automatically without hand alignment, segmentation, or registration. Errors caused by these processes must be allowed to affect input parameters as they would in

a. Training Phase

Adjust Classifier Parameters and/or Structure

A Taxonomy of Neural Net Classifiers Practical differences between classifiers and internal differences in how classifiers form decision regions lead to the taxonomy of classifiers containing four broad groups, shown in Figure 3. The leftmost column in this figure names the classifier group, the second column demonstrates how decision regions are formed, the third column presents the lowest-level computation performed by nodes or computing elements in the adaptive networks, and the rightmost column names some representative classifiers within each group. The uppermost group in Figure 3 contains conventional probabilistic or Bayesian classifiers, while the lower three groups contain adaptive classifiers. These adaptive classifiers can all be implemented using fine-grain parallelism. Most also require simple local computations for incremental adaptation and can form arbitrary decision regions.

Probabilistic Classifiers Probabilistic classifiers from Figure 3 assume a priori probability distributions such as Gaussian or Gaussian mixture distributions for input features [ 11 [ 2 ] . Parameters of distributions are typically estimated using supervised training where all training data is assumed to be available simultaneously. These classifiers provide optimal performance when underlying distributions are accurate models of the test data and suffi-

I

11-----b. Test Phase Classification from Training Phase, then Adapts to Test Data

Fig. I . The two major phases of pattern classifier development.

48

practical applications where extensive hand-tuning is normally impossible. Unfortunately, these simple guidelines, restricting use of test data and limiting hand-tuning, and also other important common-sense guidelines discussed in [ 191, are frequently broken by pattern recognition researchers. Supervised training, unsupervised training, or combined unsupervised/supervised training can be used to train neural net classification and clustering algorithms, as shown in Figure 2. Classifiers trained with supervision require data with side information or labels that specify the correct class during training. Clustering or vector quantization algorithms use unsupervised training and group unlabeled training data into internal clusters. Classifiers that use combined unsupervised/ supervised training typically first use unsupervised training with unlabeled data to form internal clusters. Labels are then assigned to clusters and cluster centroid locations, and sizes are often altered using a small amount of supervised training data. Although combined unsupervised/supervised training mimics some aspects of biological learning, it is of interest primarily because it can reduce the amount of labeled training data required. Much of the expense and effort required to develop classifiers results from the necessity of collecting and handlabeling large amounts of training data. Combined unsupervised/supervised training can simplify data collection and reduce expensive hand-labeling.

Neural Network Training Techniaues

J t Training

Unsupervised/ Supervised Training

Fig. 2. Three training techniques used with neural net classification and clustering algorithms.

November 1989 - IEEE Communications Magazine

Authorized licensed use limited to: Escuela Superior de Ingeneria Mecanica. Downloaded on October 30, 2009 at 20:14 from IEEE Xplore. Restrictions apply.

cient training data is available to estimate distribution parameters accurately. These two conditions are not often satisfied in nonstationary environments and with real-world data.

ence of each node and the amount of smoothing. Classification decisions are made by high-level nodes that form functions from weighted sums of outputs of kernel-function nodes. Kernel classifiers train relatively rapidly, can use combined unsupervised/supervised training, and have intermediate memory and computation requirements. Centroids of kernelfunction nodes can be placed on some or all labeled training examples or they can be determined by clustering or randomly selecting unlabeled training examples. Kernel classifiers include conventional classifiers that estimate probability density functions using the Parzen window approach or mixture distributions [ I ] [2], and classifiers that form discriminant functions using kernel functions [27]. Neural net kernel classifiers include map-based approaches that use arrays of nodes which compute kernel functions [28] [29], classifiers based on the Cerebellar Model Articulation Controller (CMAC) [30] [3 I], and classifiers that use the method of potential functions [ 11, often called radial basis function classifiers [32-341.

Hyperplane Classifiers Hyperplane classifiers form complex decision regions using nodes that form hyperplane decision boundaries in the space spanned by the inputs. Nodes typically form a weighted sum of

Hyperplane classifiers form complex decision regions using nodes that form hyperplane decision boundaries in the space spanned by the inputs. the inputs and pass this sum through a sigmoid nonlinearity, as shown in Figure 3. Other nonlinearities, including high-order polynomials of the inputs, are also used. These classifiers have low memory and computation requirements during classification but may require long training times and/or complex training algorithms. They include multi-layer perceptrons trained with back-propagation (back-propagation classifiers) [9] [20], Boltzmann machines [21], binary tree classifiers [22] [23], high-order nets that form high-order polynomials of inputs [ I ] [24] [25], and high-order nets resulting from the use of Group Method of Data Handling (GMDH) algorithms [26].

Kernel Classifiers Kernel or receptive field classifiers create complex decision regions from kernel-function nodes that form overlapping receptive fields. Kernel-function nodes use a kernel function, as shown in Figure 3, which provides the strongest output when the input is near a node’s centroid. The output of kernelfmction nodej, denoted y,, is given by y, = fk(I lx - miI I /h), where.fk is a kernel function, a Euclidean norm is assumed, mi is a vector representing the centroid of node j , and h is a free parameter that determines the width of the kernel function. This equation indicates that the node output peaks when the input is near the centroid of the node and then falls off monotonically as the Euclidean distance between input and the centroid of a node increases. The rate of decay with distance h is a free parameter that determines the region of influ-

-I Group

Decision Region

I

c

~

Representative ~ Classifiers

~

Probabilistic

Gaussian Mixture

Hyperplane

Multi-Layer Perceptron, Boltzmann Machine

Receptive Fields [Kernel)

Method of Potential Functions,

Exemplar

K-Nearest Neighbor, LVQ

-

F/g. 3. Four basic classifier groups.

~

I

Exemplar Classifiers Exemplar classifiers perform classification based on the identity of the training examples, or exemplars, that are nearest to the input. Nearest neighbors can be determined using exemplar nodes that are similar to the kernel-function nodes. Exemplar nodes compute the weighted Euclidean distance between inputs and node centroids. Centroids correspond to previously presented labeled training examples or to cluster centers formed during combined unsupen;ised/supen/ised training. The classification decision in k-nearest neighbor classifiers is that class associated with the majority of the k exemplar nodes with the k smallest distances to the input (ties are broken randomly). Exemplar classifiers train rapidly but may require large amounts of memory and computation time for classification. Advanced data structures such as K-D trees [35] can be used to reduce computation requirements on serial computers at the expense of complicating training and adaptation. Exemplar classifiers include k-nearest neighbor classifiers [I], the feature-map classifier [36], the Learning Vector Quantizer (LVQ) [37], Restricted Coulomb Energy (RCE) classifiers [ 381 [39], Adaptive Resonance Theory (ART) classifiers [40], classifiers that use “memory-based reasoning” [4 I] [42], and classifiers that use local linear interpolation [43]. Classifiers from the above four groups often provide similar low error rates but differ dramatically in practical characteristics. Figure 4 suggests how many of the classifiers listed above can trade off memory requirements and training time when implemented on a serial computer. Only relative qualitative differences are plotted. Changes in ordering should be expected gfor specific problems and for variations in algorithms and implementations. Hyperplane classifiers, such a s back-propagation classifiers (at the upper left of the figure), require lengthy training times but also have low memory requirements and make rapid classification decisions. Nearest neighbor techniques (at the bottom right) train extremely rapidly but require extensive memory and computation resources. Kernel and many exemplar classifiers are intermediate. They are the most flexible and can trade off training complexity, memory, and classification computation requirements. A shortcoming of much recent neural net research is the overemphasis on a few popular techniques such as backpropagation classification. It is clear from Figure 4 that this is only one of many neural net classifiers. Other classifiers can provide more rapid training, take advantage of unsupervised training data, and be implemented using fine-grain parallelism with simple adaptation rules. The following sections describe recent work on back-propagation and other important classifiers from Figure 4.

November 1989

-

IEEE Communications Magazine

Authorized licensed use limited to: Escuela Superior de Ingeneria Mecanica. Downloaded on October 30, 2009 at 20:14 from IEEE Xplore. Restrictions apply.

49

Back-Propagation Classifiers Back-propagation classifiers form nonlinear discriminant functions using single- or multi-layer perceptrons with sigmoidal nonlinearities. They are trained with supervision, using gradient-descent training techniques, called backpropagation. which minimize the squared error between the actual outputs of the network and the desired outputs. Patterns are applied to input nodes that have linear transfer functions. Other nodes typically have sigmoid nonlinearities. The desired output from output nodes is ‘‘low’’ (0 or < 0.1) unless that node corresponds to the current input class, in which case it is “high” ( 1 .O or > 0.9). Each output node computes a nonlinear discriminant function that distinguishes between one class and all other classes. The squared-error cost function is minimized when these discriminant functions are the Bayes a posferiori probability functions that would be used in a minimum-error Bayes classifier [IO]. Back-propagation training provides estimates of these Bayes probability functions. These will be “good” estimates only when the network has enough flexibility to closely approximatc the Bayes functions and there is sufficient training data. Good introductions to back-propagation classifiers are available in many papers. including [9] and [20]. Early interest in back-propagation training was caused by the presupposition that it might be used in biological neural nets. Although this now seems unlikely [44], back-propagation classifiers have been successfully applied in many areas. Multi-layer perceptrons trained with back-propagation have been successfully used to: Classify speech sounds [6] Form text-to-phoneme rules [45] Deduce the secondary structure ofa protein from its aminoacid sequence [46] Discriminate between underwater sonar returns [3] Learn good moves for backgammon [47] Model the function of posterior parietal neurons in the macaque monkey [48] Perform nonlinear signal processing [49] [50] Figure 5 illustrates how the multi-layer perceptron (on the left of the figure) can form three nonlinear input/output functions using back-propagation training. The multi-layer perceptron shown has one linear input node, twenty nodes

Boltzmann Machine Multi-LayerPerceptron (Back Propagation) LearningVector Quantizer (LVQ)

t 0)

Feature Map Classifier

E

t-

o)

0

.-

c -

E

High Order Nets (GMDH)

c

RestrictedCoulomb Energy (RCE)

Method of Potential Functions (Radial Basis Functions)

A 0

DecisionTrees

~

: Memory Requirements for Classification

50

Training times are typically longer when complex decision regions are required and when networks have more hidden layers.

K-D Tree

K-Nearest Neighbor (KNN)

+

Fig.4. Relative differencesherween classifier [raining [irne and tnemorJi rcqirironcnts.

-

with sigmoidal nonlinearities in the first hidden layer, five nodes with sigmoidal nonlinearities in the second hidden layer, and one linear output node. Input values were selected randomly on each training trial using a uniform distribution over the plotted input region, and weights were adapted after every training example, as described in [49]. The “Step” and the smooth “Gaussian Pulse” nonlinearities are beginning to be closely approximated after 2,000 training trials. These results illustrate how back-propagation networks can form Bayes probability functions useful for pattern classification and also how they perform a type of curve-fitting. The solid lines in Figure 5 represent desired functions, while functions using backpropagation training with different numbers of training trials are plotted using dots and dashes. A number of theoretical analyses have been performed to determine the capabilities of classifiers formed from multilayer perceptrons. Similar constructive proofs, developed independently [9] [ 5 I ] [52], demonstrated that two hidden layers are sufficient to form arbitrary decision regions using multilayer perceptrons with step-function hard-limiting nonlinearities (node outputs of 0 or 1). This constructive proof was extended to suggest how multi-layer perceptrons with two hidden layers, linear output nodes, and sigmoidal nonlinearities approximate complex nonlinear functions [ 531. More recent work demonstrated that multi-layer perceptrons with only one hidden layer could form complex disjoint and convex decision regions [ 361 [ 541. This work was followed by a careful mathematical proof [XI,which demonstrated that continuous nonlinear mappings can be closely approximated using sigmoidal nonlinearities and multi-layer perceptrons with only one hidden layer. This proof implies that arbitrary decision regions can also be approximated by multi-layer perceptrons with only one hidden layer. This proof, however, is not constructive and does not indicate how many nodes are required in the hidden layer. Other recent theoretical work has demonstrated the advantages of sigmoidal nonlinearities over linear n o d e s f o r single-layer p e r c e p t r o n s t r a i n e d with back-propagation [56-581. One major characteristic of back-propagation classifiers is long training times. Training times are typically longer when complex decision regions are required and when networks have more hidden layers. As with other classifiers, training time is reduced and performance improved if the size of the network is tailored to be large enough to solve a problem but not so large that too many parameters must be estimated with limited training data. Other techniques that have been effective in reducing training time with speech data are to update weights after presenting each training example instead of after cycling through all examples, to randomize presentation order of training examples, and to normalize components of input training vectors to have mean values of zero [36] [59] [60].

Many other techniques are being explored [53] [61-671. These techniques need to be further tested on difficult real-world problems that require complex decision regions. Other characteristics of back-propagation classifiers that may be difficult to alter include difficulty in interpreting and understanding network solutions, and the frequent necessity of many nodes and connection weights. Research on developing

November 1989 - IEEE Communications Magazine ~

-

Authorized licensed use limited to: Escuela Superior de Ingeneria Mecanica. Downloaded on October 30, 2009 at 20:14 from IEEE Xplore. Restrictions apply.

New & Bestselling Telecommunications Books From wiley Examine FREE for 15 days

........ ..........

........... ............ ............ .............. ......... .........

.......... ............ ............ ............. ......... ))?.

Authorized licensed use limited to: Escuela Superior de Ingeneria Mecanica. Downloaded on October 30, 2009 at 20:14 from IEEE Xplore. Restrictions apply.

THREE MORE BESTSELLERS FROM ROGER FREEMAN . . . . . . . . . .

Leam the bows & whys atf planning, des~gnngand installng telecommunication systems from a leading indusby expert...

.............

RADIO SYSTEM DESIGN FOR TELECOMMUNICATIONS (1-100 GHz) ”...avaluable working tool for those involved with designing radio systems at every level.”

TELECOMMUNICATIONS SYSTEM ENGINEERING Second Edition

- Telecom Report

Catch up on the latest techniques and applications that will increase design efficiency and effectiveness. This book provides indepth coverage of general radio wave propagation in the frequency range 1-100GHz, plus a step-by-step tutorial on the design of line-of-sight microwavehillimeter radio links, troposcatter/diffraction, and analog and digital satellite systems. Comprehensive and current, the text provides answers to all your questions and includes tables, figures, equations, and sample problems that make even the most complex material easily accessible to both practicing engineers and students. 560 pp.

(0 471 -81236-6) 1987

$49.95

REFERENCE MANUAL FOR TELECOMMUNICATIONS ENGINEERING “...indispensable to anyone with a serious interest in telecommunications system - Communications Management design.” Here is a comprehensive reference for those who design, build, purchase, use, or maintain telecommunicationssystems, offering the only system design database devoted exclusively to the field. Pulls together a vast amount of information from such diverse sources as CCITTKCIR, EIA, U S .Military Standards and Handbooks, NBS, BTUATT, REA, and periodicals and monographs published by over twenty principal manufacturers. 1,504 (0 471-86753-5) 1985

$100.00

TELECOMMUNICATION TRANSMISSION HANDBOOK Second Edition .....a well-crafted book that merits serious study.”

Roger L. Freeman, Raytheon Company “ . . . i n d i s p e n s a b l eto anyone with a serious interest In telecommunications. ”

- Communications Management on the first edition

In this fully revised and expanded edition of Telecommunications System Engineering,

noted telecommunications expert and author Roger Freeman describes how a telecommunications system works and shows you how to design one anywhere in the world, whether common carrier or specialized common carrier, private or industrial networks. Stressing the impact of one discipline on another, the work covers all the concepts required to understand the design of a practical telecommunication network - whether it is to carry voice, data, facsimile, telemetry, video, or a composite of each, and whether it is analog or digital. * Provides a top to bottom treatment of transmission and switching -from transmission impairments and their mitigation to network design and data protocols Devotes two chapters to signaling -an important topic often overlooked by other books Details the transmission and switching of information using methods other than speech telephony such as data and facsimile *Thoroughly discusses OSI, LANs, and ISDN and their implications Presents the new CClT network routing strategies CONTENTS: Some Basics in ConventionalTelephony. Local Networks. Conventional Switching Techniques in Telephony. Signaling for Analog Telephone Networks. Introduction to Transmission for Telephony. Long-DistanceNetworks. The Design of Long-DistanceLinks. The Transmission of Other Information Over the Telephone Network. Digital TransmissionSystems. Digital Switching and Networks. Data Networks and Their Operation. Local Area Networks. Integrated Services Digital Network. CClTT Signaling System No. 7. TelecommunicationPlanning. Index.

A new voiume in the Wiley Series in Teiecommunications (0 471 -63423-9) 1989 $64.95

736 pp.

-Journal of Systems Management

This fully revised and enlarged reference brings together 14 basic disciplines of telecommunicationstransmission, focusing on speech telephony, datdtelegraph, facsimile and video. The text analyzes concepts and techniques used in point-to-point signal transmissions, including such disciplines as FDM, PCM, channel coding, and more. 706 pp.

(0 471 -08029-2) 1981 $69.95

..\

.... .. ..... ...... ...... ......... .......... ..........

............ .. ...........

&i;iii;;i;:, ,......... ............ .

Authorized licensed use limited to: Escuela Superior de Ingeneria Mecanica. Downloaded on October 30, 2009 at 20:14 from IEEE Xplore. Restrictions apply.

............. ........... .......... .......... ......... ........ .....

RADIO COMMUNICATIONS Analog

.I

..... .... ...

INTRODUCTORY DIGITAL SIGNAL PROCESSING WITH COMPUTER APPLICATIONS

Ralph S Carson, University of Missouri Rolla Most texts on radio communications overemphasize communications theory at the expense of applications and design considerations Radio Communications Concepts Analog examines practical topics that are indispensible to the design and operation of related circuits and systems Throughout, the text provides exhaustive coverage of the analysis, design, and operation of electronic circuits 1-

~

1'1

-

1)

,,, 1 / 1-i

111

Ill

~I

$0

3 III,,

-

Processing

Paul A. Lynn and Wolfgang Fuerst, lnformation

Management Services Digital signal processing (DSP) is becoming an increasingly important area of study because of its practical applications to a variety of disciplines. This comprehensive introduction to DSP uses computer programs - listed in both BASIC and Pascal -to illustrate the principles of DSP and to facilitate the design of digital filters and processors.

Applications oriented to help familiarize advanced undergraduates, new designers, and managers with the operation of related circuits and systems Uses worked examples to illustrate the principles presented Employs trigonometry and the trigonometric Fourier series in its mathematical derivations

CONTENTS: Linear Mixing. Nonlinear Mixing. Noise I. Special Functions and Filters. Amplitude-Modulation Processes. Angle Modulation. Noise 11. The Superheterodyne Radio Receiver. Spurious Responses. Intercept Points. Appendices. Bibliography. Index. 640 pp. (0 471-62169-2) 1989 $59.95

Numerous worked examples and problems help reinforce learning ' Emphasizes practical applications CONTENTS:

Time-DomainAnalysis. Frequency-DomainAnalysis' Digital Fourier Techniques. Frequency-DomainAnalysis: The z-Transform.Design of NonrecursiveDigital Filters.FFT Processing.Appendices. Answers to Selected Problems. Bibliography.Index.

COMMUNICATIONS SATELLITE HANDBOOK Walter L. Morgan, Communications Center of Clarksburg, and Gary D. Gordon, Aerospace Consultant

371 pp, (0 471 -91564-5) 1989 $34.95 PAPER

Offering a general introduction to communications satellite technologies, this work describes the capabilities and limitations of current satellites, as well as design requirements and applications of space and earth station technology. It addresses all important aspects of the subject, including multiple-access techniques, link budgets, the spacecraft bus, and the geostationary orbit. In addition, the Communications Satellite Handbook provides:

DIGITAL PROCESSING OF SIGNALS Theory and Practice Second Edition Maurice Bellanger, T R T Le P essis-ffoc n j c f

Self-contained chapters for easy reference Extensive use of figures and tables that pack a maximum amount of information -Graphs that are accurate enough for direct application Equations that can be run on an engineering pocket calculator

*

*

Fraoce

This fast-access handbook should prove a handy reference for the student, engineer, or manager involved with communications satellites.

Thorougnly revise0 and updatea, this book presents the most useful techniques for the digital processing of signals Emphasizing engineering aspects, it guides the reader from theory to design and implementation The selection of topics - including new developments in discrete fourier transform, finite and infinite impulse response filters, circuits and factors of complexity, and applications in telecommunications reflects the needs of industry in the field

CONTENTS: Introduction. Teletraffic. Communications Satellite Systems. Multiple-Access Techniques. Spacecraft Technology. Satellite Orbits, Index. 900 pp.

388 pp (0 471-92101-7) 1989 $44 95 PAPER

(0 471-31603-2) 1988 $69.95

-

--

- -

-

- --

DIGITAL TELEPHONY John Bellamy Southern Methodist University

MOBILE COMMUNICATION SYSTEMS J. D. Parsons, University of Liverpool, U K , and J. G. Gardiner, University of Bradford, U K

t

I L

Mobile communication svstems have seen areat develoDment and expansion in the 1980s. This increasingly important technology is covered in detail in this new, fully illustrated work The authors first present the fundamental aspects of mobile communications systems, and then delve into more specific topics, including

-

Interference created by terrain *Digital techniques associated with two-way, speech-based communication systems *The TACS cellular system * Digital techniques for high capacity cellular systems -And many more!

CONTENTS: Introduction to Mobile Communications. Multipath Characteristics in Urban Areas. Propagation and Signal Strength Prediction. Modulation Techniques. Man-Made Noise. Diversity Reception. Using the Radio Channel in Cellular Radio Networks. Analogue Cellular Radio Systems. Digital Cellular Radio Systems. Index. 292 pp.

As a departure from conventional treatment of communication theory the book stresses how systems operate and the rationaie behind their design rather than presenting rigorous anaiyfi cai formuiations

- TelecommunicationsJournal Digital Telephony brings together material previously available only in technical papers, conference proceedings, and books covering highly specialized topics Presents principles of telephone system design and digital telephone networks as well as graphs of performance criteria, tables for traffic analysis, and a glossary of terms CONTENTS. Background and Terminology Why Digital?Voice Digi-

talization Digital Transmission and Multiplexing Digital Switching Digital Radio Network Synchronization Control and Management Digital Networks Traffic Analysis Appendices Index 526 pp

(0 471 -08089-6) 1982

$62 95

(0 470-21213-6) 1989 $44.95

...

Authorized licensed use limited to: Escuela Superior de Ingeneria Mecanica. Downloaded on October 30, 2009 at 20:14 from IEEE Xplore. Restrictions apply.

SATELLITE CO MMUNCAT10NS The First Quarter Century of Service David W. E. flees, lntelsat Focusing on the range of applications available for business, this comprehensive overview of the industry traces the history of satellite communicationstechnology, how the need for it arose, and how it developed into a global enterprise. Describes satellite services available worldwide and the international, national, and regional organizations that provide them, both public and private Uses clear and nontechnical language that provides a general, but essential representation of how satellite communications have been, and are being applied to everyday life, and how this technology is likely to develop in the future Appendices supply synopses of various countries' memberships in international organizations, satellite series characteristics, satellite launch histories, worldwide TV channels, and the state of geosynchronousorbit

CONTENTS: Satellite Communications- The Need. International Satellite Organizations. International Satellite Systems Competitors. International Services. Business Applications. BroadcastingApplications. Mobile Applications. National Telecommunicationsvia Satellite. National Satellite Networks. A View to the Future. Appendices. Glossary. Index. 515 pp. (0 471 -62243-5)

... ...... ...... .... .............. .

...

Ismj ';

sxn

1989 $44.95

SYNCHR0NRATION IN DIGITAL C0MMUNICAT10NS Volume 1: Phase-,Frequency-Locked Loops and Amplitude Control Heinrich Meyr, University of Technology,FRG, and Gerd Ascheid, dADlS GmbH, FRG

Learn the fundamentals of sychronization in digital communications. Sychronization in Digital Communications, Volume I and its companion volume due in 1990 are the only up-to-date treatments of synchronization -an important 4L facet of any digital communications system. -~ 2n Synchronization in Digital Communications develops a theoretical framework of synchronization which can be applied to solve practical problems in a variety of areas of digital communications.The theory presented in Volume 1 provides telecommunicationsengineers with the tools they need to analyze and systematically derive sychronizer structures.

- BIF

BfF

CONTENTS: Phase-Locked Loop Fundamentals. Phase-Locked Loop Tracking Performance in the Presence of Noise. Unaided Acquisition. Aided Acquisition. Loop Threshold. Amplitude Control. Automatic Frequency Control. Nonlinear Theory of Synchronization Systems (in Preparation). 480 pp.

(0 471-50193-X)

1989 $54.95

EXPERT SYSTEM APPLICATIONS TO TELECOMMUNICATIONS c

Edited by Jay Liebowitz, George Washington University This book presents a representative sample of expert systems currently being developed for telecommunications. Leading authorities in the field provide sample case studies and practical guidelines that illustrate the use of expert systems for a variety of telecommunicationsfunctions and explore potential uses in future telecommunicationsareas.

CONTENTS: CASE STUDIES. Expert System Applications to Network Management.A Case Study in Expert Systems Development: The ACE System. The FIS Electronics Troubleshooting Project. Expert System Fault Isolation in a Satellite Communications Network. On-Line Expertise for Telecommunications.XTEL: An Expert System for Designing Theater-Wide TelecommunicationsArchitectures. Expert Systems for Network Management and Control in Telecommunications at Bellcore. METHODOLOGIES. Strategic Assessment of Expert System Technology: Guidelines and Methodology. Knowledge Acquisition for Knowledge Based Systems. Distributed Expert Systems: Facility Advisor. FUTURE APPLICATIONS. Expert Systems for Network Control Center Applications. Expert Systems in Radio Spectrum Management. 371 pp. (0 471 -62459-4)

1988 $41.95

Authorized licensed use limited to: Escuela Superior de Ingeneria Mecanica. Downloaded on October 30, 2009 at 20:14 from IEEE Xplore. Restrictions apply.

''::::z;-y ........ . ........ .. .. ..... .

. . ..

INTRODUCTION TO COMMUNICATIONS ENGINEERING Second Edition

......

.. ...... .... ... ...

FIBER-OPTIC SYSTEMS Network Applications Terry Edwards, Gosling Associates,

Robert M. Gagliardi, University of Southern Caiifornia

Ltd., U.K.

Thoroughly updated and expanded to reflect the many changes the telecommunications field has undergone since publication of the first edition, the Second Edition covers basic aspects of analysis and design of today's analog and digital operating communications systems. Important new topics include sateilite communications. fiber o p t m i /inks. and recent advances in high speed digital communications The Second Edition uses a unified approach that integrates the study of analog and digital systems, an approach that's more compatible with today's overall design philosophies.

CONTENTS: Communication System Models. Carrier Modulation. Carrier Transmission. Carrier Reception. Carrier Demodulation. Baseband Waveforms, Subcarriers, and Multiplexing. Binary Digital Systems. Noncoherent and Bandlimited Binary Systems. Block Waveform Digital Systems. Frequency Acquisition and Synchronization. Satellite Communications. Fiber Optic Communications. Appendices. Index. (0 471 -85644-4) 1988

540 pp.

$52.95

BUSINESS EARTH STATIONS FOR TELECOMMUNICATIONS

If you're a telecommunications engineer or manager who must select a fiber-optic system, you need the broad overview of such systems presented in this new book. fiber-optic Systems offers you a review of both the existing market and future trends and then shows you how to choose and implement the system that best fits your needs. Describes components of fiber optic systems * Considers a wide range of applications, particularly in data networks *

CONTENTS:

Fibers and Fiber Cables. Connectors, Couplers, Splicing,and "WDM." Electro-opticModules. Cable TV, FinancialServices, and Related Systems. Local, Metropolitan, and Wide-Area Networks (LANs, MANS,8 WANs). Long-DistanceTrunk Communications, Instrumentation. Current Trends and Forthcoming Developments. Glossary of Terms and Abbreviations. 141 pp. (0 471-91567-X)

Walter L. Morgan, Communications Center, Maryiand, and Denis Rouffet, Centre Nationai &Etudes de Spatiaies, Paris

a must for anyone who deais with microterminals or who is considering investing in this type of teiecommuniation system

1989 $34.95

"

TELECOMMUNICATIONS TECHNOLOGY

-Siemens Review Designed for telecommunications managers, manufacturers, sellers, and installers of microterminals (or VSATs), this guide provides the basic information and procedures needed to configure an earth station network to a specific organization's communication needs.

R. L. Brewster, University of Aston, U . K ". .clearly written and very well laid out.

234 pp.

(0 471-63556-1)

1988

"...thetype of book that will be bought rather than borrowed."

- C. J. Hughes in I€€€ Proceedings CONTENTS:

Introduction.The Telephone Network. Traffic Theory. Transmission of Telephone Signals. Information Theory and Coding. Data Transmission. Pulse Code Modulation. Data Networks. Optical Fibre Transmission. InternationalCommunication. The Cellular Mobile Radio Telephone.

$38.95

170 pp. (0 470-21454-6) 1986 %24.95 PAPER

WORLDWIDE TELECOMMUNICATIONS GUIDE FOR THE BUSINESS MANAGER

FIBRE-OPTIC SYSTEMS

Walter L. Vignault, IBM Intended for business professionals, managers, and executives in international marketing and telecommunications, this one-of-a-kindguide discusses the many methods of data and telecommunications covering both domestic and international markets. Quick access to information, examples for calculating telephone company charges, alternatives for voice and data communications, message texts, electronic mail, Teletex, and Videotex are provided.

CONTENTS: Introduction. Information Center Environment. Worldwide Environment. International Information Flow. Network Attachment Products. Office Systems. Digital Voice and Data Networks. U S . Network Services. U.S. Network Alternatives. International Traffic. Value-Added Network Services. Telematics. Satellite Communications. Trends Through the Year 2000. Directory of US. Common Carriers. Directory of International PTTs. Bibliography. Glossary of Terms. Index. 417 pp.

(0 471-85828-5)

1987

$54.95

"

- Communications Management

CONTENTS: General Description. What Is a Microterminal? Microterminal Applications. Why Use Microterminals? Who Are the Users? How Is the Service Provided? Overview of the U S . Market. Network Operators. The Economics of Microterminal versus Terrestrial Services. Microterminal Network Operations. Standards. Regulations. Technical Considerations. Space Segment Requirements. Microterminal Insurance Aspects. Glossary. Index.

Pierre Halley, CNET,€SE, France; Translated by J. C. C. Nelson, University

of Leeds, U.K. This introduction to important aspects of fiber-optic systems offers communications engineers:

-

A review of fiber-optic system fundamentals * An emphasis on fiber-optic cables in communications systems *

Performance specifications of available devices and systems

*

A multitude of practical examples with answers

189 pp. (0 471-91410-X) ... . . ..... ...... .... ... 1987 $33.95 PAPER .. . .......... .. . . ..... ._........

.... iiiii$d ..." ...:*1*

Authorized licensed use limited to: Escuela Superior de Ingeneria Mecanica. Downloaded on October 30, 2009 at 20:14 from IEEE Xplore. Restrictions apply.

PACKET SWITCHED NETWORKS Theory and Practice

LAN: LOCAL AREA p m A N ~ s NETWORKS ~6~ yoursystem Developing Your System

Richard Barnett and Sally MaynardSmith, both of Netcomm, Ltd., U.K.

; wy:smess

for Business

Telecommunicationsprofessionals can apply the concepts presented in Packet Switched Networks to learn about their own networks’ operation, limitations, and potential for expansion. This thorough grounding in the theory and practice of packet switched networks:

iI

Donne Florence and CAL Industries

i I

! *

e*

1

@

-What LAN media are all about and how to determine which medium best meets your needs * How to choose between baseband and broadband systems -The four critical variables that define your LAN * How to build bridges from one network to another as your needs expand

CONTENTS: NETWORK BASICS. A Beginner’s Guide to Networking. Packet Switched Network Basics. Packet Switched Network Components. PACKET SWITCHED NETWORK PROTOCOLS. International Standards. Standards Used in the United Kingdom. A Packet Switched Network Protocol. Higher Level Protocols. Security in Packet Switched Networks. PRIVATE PACKET SWITCHED NETWORKS. When to Use Packet Switching. Packet Switched Network Physical Interfaces. Equipmentfor Packet Switched Networks. FUTURE DEVELOPMENTS. Protocol Developments.Future Network Equipment. Value Added Network Services. Appendices. Glossary of Terms. Index.

CONTENTS: WHO NEEDS A LAN? What a Local Area Network Is And Isn’t Analyzing Your Company’s Operations Organizing a User Committee for LAN Selection WHICH KIND OF LAN Media Topology Access Methods Modulation Methods WHOSE LAN IS IT, ANYWAY? Choosing a LAN Implementinga LAN AFTER THE LAN, WHAT? Linking LANs Universal Connectivity Appendix I Local Area Network Manufacturers Appendix II Third-party Vendors of Local Area Networks Sources for Additional Information

(0 470-21392-2) 1989 $39.95

192 PP

PRACTICAL LANS ANALYZED

Today local area networks (LANs) are becoming more and more of a reality in both medium- and large-sized companies I A N shows you, in a strarghtforward manner, what LANs are, what they do, how to analyze short- and long-term needs for a local area network, and how to implement one It explains

. *

1

Introduces networkingconcepts and packet switching fundamentals ’examines modern networking and its protocols

274 pp.

inP

(0 471 -62466-7) 1989 $24 95

PAPER

‘ THEORY OF NETS

&Flows i n Networks 9

Franz-JoachimKauffels,

Bonn University,FRG

6

A

Communicationsengineers who are familiar with LAN theory and fundamentals need equivalent information on practical applications. Telecommunications professionals often need to know the immediate benefits of LANs without trudging through complicated mathematical theory. Practical LANS Analyzed meets the need for such a text. This comprehensivesurvey of LANs presents a practical, easily-accessible approach to the subject. *Discusses LAN products, protocols, and applications .Details LAN properties and functions, and LAN designs such as “ring” and “bus” .Discusses emerging technologies such as DEC and IBM local networks, SNA and DNA network architectures, and manufacturing automation protocol (MAP)

CONTENTS: Data Transfer in Local Networks. TransmissionTechniques and Network Topologies. Systems for Local Networks. Standards for Local Networks. IBM and Digital Equipment LAN Structures. High-speed Local Networks. Glossary. Bibliography. . .. ...... ..... ...... ........ 334 pp. (0 470-21229-2) 1989 $44.95 ........ .........

......... .......... ...........

Wai-Kai Chen, University of Illinois

at Chicago

A \

Electrical, communications,transoortation, and computer and neural neiworks - each type of net has its own set A of design requirements. Desianina and analyzing these networks de&an& sophisticated mathematicalmodels. Theory of Nets Presents a unified. comprehensive, and up-to-date treatment of net theory that bridges the gap between abstract graph theory and the application of network analysis to these network problems. Theory of Nets shows telecommunicationsengineers how to develop sophisticated mathematical models for analyzing a wide variety of problems in communications.



~~

\-

/ /

*Stresses basic concepts, algorithms, and implementation procedures .Theory is supplemented by numerous practical examples

CONTENTS: Graphs and Networks. The Shortest Directed-Path Problem. Maximum Flows in Networks. Minimum Trees and Communication Nets. Feasibility Theorems and Their Applications. Applications of Flow Theorems to Subgraph Problems. Summary and Suggested Readings. References. Index. 429 pp.

(0 471 -85148-5) 1989 $59.95

~~~

Authorized licensed use limited to: Escuela Superior de Ingeneria Mecanica. Downloaded on October 30, 2009 at 20:14 from IEEE Xplore. Restrictions apply.

... .... ...... ...... ....... .......

......... .......... .......... ........... ............ ........... ............. ............. ............ ............ ............. ............

A

.... ...

Bestselling Networking Books THE INTEGRATED SERVICES DIGITAL NETWORK (ISDN)

COMMUNICATION SYSTEMS AND COMPUTER NETWORKS

From Concept

R L Brewster, Aston University U K

to Application

The last two decades have seen the evolution of telecommunications networks from nearly exclusive telephony use to a fully integrated services digital network. Communication Systems and Computer Networks serves the growing number of professionals who use networks for digital communicationspurposes. Brewster provides a sound review of techniques past and present, as well as those proposed for the future of data communications and computer networking.

John Ronayne Increasing penetration of digital technology into telecommunicationsnetworks makes knowledge of ISDNs a must for telecommunicationsengineers, managers, and students. This book introduces the developing technology and emerging dilemmas of ISDNs. It also provides background for appreciating the opportunities and problems of ISDN:

Examines problems of data transmission over existing and proposed future networks, and considers the emergence of optical fibers as a transmission medium Details the computer-oriented aspects of interface with the telecommunications infrastructure * Describes network strategies for local and wide area operation Introduces ISDNs, OSI, and other new products

231 pp. (0 470-21025-7) 1988 $36.95 PAPER

-

SYSTEMS NEMlORK ARCHITECTURE

CONTENTS: Background and Basics. Modems and the Date1 Services. Baseband Modems and Line Codes. Error-Detectingand CorrectingCodes. Data Networks. Packet Switching and X25. LANs - Rings. LANs - Busses. Kilostreamand Megastream. The Future- ISDN. Index.

A Tutorial

Anton Meijer A valuable aid to understanding the structure and inner workings of Systems Network Architecture (SNA) at a conceptual level, this book offers a history and up-to-date description of IBM's SNA and related architectures.

144 pp. (0 470-21489-9) 1989 $39.95

DATA COMMUNICAT10NS NETWORKING DEVICES Second Edition

Introduces the layered structure of SNA, treating each layer in-depth Discusses the future direction of SNA

Gilbert Held, 4-Degree Consulting

223 pp. (0 470-21015-X) 1988 $36.95

Data communications networking devices are the building blocks upon which networks are constructed. In this new edition, bestselling author Gilbert Held presents up-to-date descriptions of the characteristics, operation, and application of over 25 distinct data communications products offering comprehensivecoverage of the numerous devices employed in the design, modification, or optimization of data communications networks.

OS1 EXPLAINED End-to-End Computer Communication Standards John Henshall and Sandy Shaw, both of

Universityof Edinburgh, U.K.

Uses over 300 illustrations and schematic diagrams to explain the operation and application of networking devices Includes a comprehensive 100 page chapter on fundamental concepts, plus expanded coverage of power measurements,jacking arrangements, TLI? and more *Appendices cover the mathematics and computer programs required for sizing network operations

.by far the most detailed book on the subject...

CONTENTS: Fundamental Concepts. Data Transmission Equipment. Data Concentration Equipment. Redundancy and Reliability Aids. Automatic Assistance Devices Specialized Devices. IntegratingComponents.Appendices. Index.

"...aninvaluable reference source...recommended reading.

Here is a thorough introduction to the architecture of ISO/OSI and a detailed explanation of the design and functionality of the upper four layers of the model. OS/ Explained presents methods by which different types of computer systems perform high-level application functions. It also offers worked examples, supporting tables of data, and definitions of key terms to enhance learning.

*

-

494 pp.

(0 471-91869-5) 1989 $49.95

PAPER

From reviews of the first edition: 'I..

"

- Computer Communications

217 pp.

"

- lndustrial Management &

(0-21100-8)

1988

$36.95

Data Systems

NOWlN PAPERBACK!

LANS EXPLAINED

A Guide to Local Area Networks

DATA AND COMPUTER COMMUNICATIONS Terms , Def initions , a nd Abbreviations

William Scott Currie, Edinburgh

University, U.K.

Introduces concepts behind major local area networks, bridging the gap between technically oriented texts and the laymen's guides supplied by equipment and installation manufacturers. Covers terminology and the present state of LANs. Details recent development of LANs, new products, media, specific types of LANs, software, and more.

Gilbert Held, 4-Degree Consulting The rapid evolution and expansion of the telecommunicationsindustry has introduced a wealth of terms, definitions, and abbreviations to the field. Now, bestselling author Gilbert Held has compiled a dictionary of this cutting edge vocabulary. Developedwith the help of nearly 20 top telecommunicationsfirms - like AT&T and IBM - Data and Computer Communications is a comprehensive reference that reflects the convergence of the computer and telecommunications industries, covering terms pertinent to both systems and the telecommunicationsindustry in general.

...

209 pp. (0-21427-9) 1988 $34.95 PAPER

..... .....

...... ....... ........

Contains over 7,000 entries Includes more than 125 illustrations 288 pp. (0 471-92066-5) 1989 $39.95

........ ......... .......... ........... ...........

*

............ ........... i j i;;iii$g * .......

,

jj

...........D.

Authorized licensed use limited to: Escuela Superior de Ingeneria Mecanica. Downloaded on October 30, 2009 at 20:14 from IEEE Xplore. Restrictions apply.

..... .. .....

...... ........ ......... ........ .......

JOHN WILEY & SONS, Inc.

II

Dept. 0-1506 PO.Box 6792 Somerset, N.J. 08875-9976

SECURITY FOR COMPUTER NETWORKS

D. W. Davies and W. L. Price This practical guide shows programmers, software developers, and engineers how to use cryptography to protect data in teleprocessing systems - not only keeping data secret, but also authenticating it, preventing alteration, and proving its origin. The authors’ approach is practical -principles are illustrated with examples. They describe ciphers, the Data Encryption Standard, ways to use the ciphers, cipher key management schemes, public key ciphers, and how to apply data security measures to electronic funds transfer and teleprocessing.

D. Roy Choudhary, Delhi College of

Engineering, India This comprehensive introduction to networks and systems provides a balanced presentation of theory and applications. The work begins by covering the basics of the subject, and then moves on to more advanced topics, including state variable techniques, applications of network topology, closed look feed-back systems, frequency response plots, and computer application to problem solving of circuits and system. 924 pp.

(0 470-20867-8) 1988

$43.95

1

1

MATHEMATICS FOR COMMUNICATIONS ENGINEERING

Edited by Sead Muftic, University

H.B.Wood

This comprehensive introduction to the structure of security mechanisms is based on the updated first section of a CEC COST11 Ter project. The book describes protection of entities, secure communications, database protection, process control, and cryptography in formal sequences of welldefined steps, rendering the mechanisms suitable for formal analysis and for practical implementation. It also lists security protocols, provides a survey of services and secure network applications, and discusses implementation, usage and other practical activities, and integration into large, secure and reliable network architectures. Also describes current research into new mechanisms. 195 pp.

Reviewing the development of mathematical techniques used by communicationsengineers, this work shows how to apply these techniques to the solution of engineering problems. Wood describes vector analysis methods, with specific reference to Maxwell’s equations and the transmission of TEM waves in dielectric and lossy media, and discusses development of parameters of commonly-used probability distributions and their applications to electronics. Solutions to problems using the ‘7 notation,” the Laplace transform, and matrix methods are illustrated, and BASIC computer programs and numerous exercises with worked solutions are included. 437 pp.

(0-21245-4) 1988 $64.95

(0 470-21387-6) 1989 $47.95

X.25 EXPLAINED

Protocols for Packet Switching Networks Second Edition

(0 471-92137-8) 1989 $49.95

NETWORKS AND SYSTEMS

~

SECURITY MECHANISMS FOR COMPUTER NETWORKS of Sarajevo

An Introduction to Data Security in Teleprocessing and Electronic Funds Transfer Second Edition

450 pp.

1

PAID JOHNWILEY 1 &SONS, INC.

COMPUTER COMMUNICATIONS SYSTEMS

R. J. Deasington, l6M Laboratories Ltd., U.K. “...shillfully presents much ofthe detailed structure. l would wholeheartedly recommend this text.” - Computing Reviews

Henri Nussbaumer

Emphasizingthe International Standards Organization’s (ISO) seven-layer model for Open System Interconnection (OSIO), the second edition of X.25 Explained reflects the May 1986 OS1standards. The Second Edition offers new and expanded information including a thoroughly revised chapter on network layers that now includes extensive information on CClTT X.25. It presents fourth-level protocols previously defined to ISO’s main work (used for connecting terminals to host computer systems).

Volume 1: Data Circuits, Error Detection, Data Links

Designed for those who have a basic background in data processing and wish to increase their depth of knowledge in the area of teleprocessing, Computer Communications Systems, Volume 7 presents the principles of teleprocessing, the techniques for designing and modelling networks, and communication protocols. It provides a solid review of telecommunicationsprinciples and equipment from the user’s point of view. 360 pp.

(0 471 -92379-6) 1989 $54.95

131 pp.

(0 470-20731-0) 1986

Authorized licensed use limited to: Escuela Superior de Ingeneria Mecanica. Downloaded on October 30, 2009 at 20:14 from IEEE Xplore. Restrictions apply.

$31.95

i

I

Inputloutput Functions

Network

Gaussian Pulse

21-----7 (7.0

Nodes)

N = 500 N = 2.000

-----

- -

N = 2.000

I

-2 -2

Input (x)

-1

0

1

2

-

2

I

I

-1

0

---I 1

2

Input (x)

Fig. 5. TWOdeterministic nonlinearifies ,formed using a mulfi-lu.ver perccptron and buck-propagation training.

techniques to design minimal-size back-propagation classifiers [68] [69] and to develop analysis techniques to interpret the solutions found by back-propagation classifiers [3] [SI [45] suggest approaches to these issues. Shorter training times and these other characteristics can, however, be obtained using other classifiers that can also be implemented using fine-grain parallelism.

Decision Tree Classifiers Decision tree classifiers are hyperplane classifiers that have been developed extensively over the past I O years [22] [23] [70]. They require little computation for classification and little memory. They can be implemented using fine-grain parallelism; form decision regions by performing simple, easily understood operations on input features; can use continuousvalued inputs or discrete symbolic inputs; and their size can be easily adjusted to match their complexity to the amount of training data provided. However, they require complex but efficient training procedures that are not biologically motivated and that require simultaneous access to all training examples. These classifiers have been successful in many pattern classification and artificial intelligence applications [ 121 [22] [23] ~701. In the simplest decision tree classifier, each node computes one inequality on only one input feature. Decision regions are thus formed from lines that are parallel to the input feature dimensions. Figure 6 demonstrates how such a binary tree classifier can form the shaded square decision region (shown on the right of the figure). This problem has two inputs labeled x, and x2and two classes labeled A and B. Each non-terminal node in Binary Tree 0

DecisionRegion

A

t

Matching Classifier Complexity to Training Data

2

x2

1

l

i

I I

1

2

XI

Fig. 6 . ’4 binary lree classifier.

the tree (open circles containing the node number) computes the inequality shown at the right of the node. These four nodes divide the input space into half-plane regions whose boundaries are drawn using dashed lines labeled with node numbers shown on the right. Arrows on the lines indicate the half-plane corresponding to a “yes” response at the node. The decision process begins at the bottom (node 1) and proceeds up the tree until a terminal node is reached (a filled circle). The output class is the label above the terminal node. This process is equivalent to the computation performed by a sparse neural net of the same size built from hard-limiting nodes with binary outputs. As can be seen, the desired decision region is formed from intersections of half-planes. More complex decision regions, with boundaries that are not parallel to the input feature dimensions, can be approximated by binary trees with more nodes or can be formed using more general trees that allow linear inequalities in two or more input variables at each node [ 2 2 ] [29]. Training procedures to build binary trees do not minimize a global cost function directly but gradually build a tree, minimizing a local cost function at each stage of training. This local cost function typically reflects the success of the split of the training data into classes by a node. Binary trees are designed to train rapidly. Training data is sorted or ordered separately along each input dimension and a cost function is computed for all possible splits of the training data. This proceeds rapidly when efficient sorting algorithms are used because there are only as many splits in one input dimension as there are data items that reach a node and because local cost functions are simple to calculate. Trees based on thousands of training examples can typically be trained in minutes on modern computer workstations. A recent comparison between a binary tree and a back-propagation classifier found small differences in error rates but greatly reduced training times with the binary tree classifier [71].

3

One focus of research on binary decision trees has been to use modern techniques that match classifier complexity (e.g., number of connection weights, nodes, and input features) to the amount of training data available. This is a key issue in applying any classifier to real-world problems. It is necessary to provide good generalization and prevent over-fitting the training data, and also to limit memory and computation requirements. This classifier design problem is illustrated in Figure 7.

November 1989 - IEEE Communications Magazine

Authorized licensed use limited to: Escuela Superior de Ingeneria Mecanica. Downloaded on October 30, 2009 at 20:14 from IEEE Xplore. Restrictions apply.

59

Figure 7 plots the error rates of simple, intermediate, and complex classifiers as a function of the amount of training data. The simple classifier performs best with a small amount oftraining data, which is sufficient to estimate its few parameters. Performance for the simple classifier is poor when a large amount of training data is available and the few parameters in this classifier cannot model the complexity of the training data. T4e complex classifier performs best with large amounts of trqining data when its many parameters can be estimated accurately. Performance is poor for the complex classifier when only a small amount of training data is available and parameters cannot be estimated accurately. Performance of the intermediate classifier is between that of the simple and complex classifiers. Best performance is obtained only when the comRlexity of the classifier is matched to the amount of training data. This implies that the size of a classifier or internal smoothing parameters should vary dynamically as more and more training data is provided. An introduction to these dimensionality issues is provided in [I] [22] [72] [73]. A technique called cross-validation training is frequently used to determine the appropriate complexity of binary tree and other classifiers [22] [74]. Cross-validatian is a leave-out scheme, similar in some ways to jackknife procedures [75]. I? i s used to obtain estimates of generalization error rates from limited training data. Training data is split into multiple subsets of equal size (often IO); each subset is left out one at a time; a classifier is trained using the remainder; and this classifier is then tested on the left-out subset. The cross-validation error rate is the average error rate estimated from all training data splits. Cross-validation error rates obtained with different-size classifiers or with different smoothing parameters can be used to determine the minimal size of a network or internal smoothing parameters. The appropriate-size network is the smallest network whose cross-validation error rate is not statistically different from the best-performing larger classifier. The smoothing parameter selected is that which provides the lowest cross-validation error rate. Cross-validation is computationintensive because so many classifiers must be designed, but it is reasonable for binary trees and many other classifiers. A second approach to matching the complexity of a model to training data is to use a global cost function to trim back large, previously trained nets or to terminate the growth of a net. Alternative cost functions, including one derived from Rissanen’s minimum descriptive length principle, and example applications of this approach are described in [ IO] [ 2 2 ] [23]

\

-Legend: - Complex Model

\

[76] [77]. Global cost functions contain two terms. The first term represents the goodness-of-fit of the classifier to the training data and the second term represents the “size” of the classifier. The first term decreases as the classifier fits the training data better and the second term increases as the model increases in size. The total cost function thus has a minimum where the penalty incurred by enlarging the classifier can no longer be justified by the improved fit to the training data that this provides. Although there is a long history of this approach in the field of neural-net pattern classification [lo] [78], it has only recently begun to be applied to back-propagation classifiers [681[691.

GMDH Networks and High-Order Nets Back-propagation and decision-tree classifiers include processing elements that form weighted linear sums of inputs and then pass these sums through a nonlinearity. High-order networks and GMDH networks operate on high-order products and powers of input variables as well as on linear terms. These classifiers have a long history [ l ] [24-261 [79] [go]. Figure 8 contains a small high-order net with two inputs (x,and x2), which computes an output ( y )that contains a nonlinear crossproduct term of these inputs ( w x,x2). Single-layer high-order perceptrons can be trained to Arm polynomial discriminant functions using the LMS algorithm [ l ] [9] when the added high-order terms are treated as additional inputs. This can eliminate the need for multiple layers and provide rapid training when the correct polynomial form ofthe discriminant function is known apriori. It can also, however, lead to an excessive number of parameters and poor generalization if too many high-order terms are provided and an excessive number of parameters must be estimated. GMDH networks, which are also called adaptive learning networks [IO] [SO] [81], represent a well-developed solution to the problem of matching the complexity of a multi-layer highorder net to the amount of training data provided. These networks have been successfully applied to many classification and modeling problems over the past 25 years [ I O ] [82]. Their development was motivated by the work of Ivakhnenko on GMDH algorithms [26] and by early interest in neural network models. Simple adaptive learning networks can be built from y = WO+ w, x , + w* x* + w3x,x*

1 I I I I

I I I

I

I

I I

L ~

~

Amount of Training Data

+ x,

xz

Fig. 7. E.vpected error rate of sitnple, intermediate, and complex class1fiers.

60

Fig 8. .-I high-order suhnet with two inputs.

Novembsr 1989 - IEEE Communications Magazine

Authorized licensed use limited to: Escuela Superior de Ingeneria Mecanica. Downloaded on October 30, 2009 at 20:14 from IEEE Xplore. Restrictions apply.

1

polynomial subnets of the form shown in Figure 8. Training involves building a hierarchical net one layer at a time using approaches that are conceptually similar to those used with binary trees. Weights in a first layer are trained to minimize the squared error of first-layer outputs, and outputs that don’t contribute substantially are pruned off. Additional layers are added until the extra complexity is no longerjustified by the reduction in error rate provided on training data or on held-out test data. These networks provide many of the capabilities of back-propagation classifiers but, like decision trees, use complex but efficient training procedures that require simultaneous access to all training examples. A recent comparison was made between a modified GMDH network that uses simulated annealing to add nodes and a network trained using backpropagation techniques. The GMDH network provided lower mean square error and required fewer internal connections than the back-propagation network for the chaotic-time series nonlinear mapping problem studied [92].

K-Nearest Neighbor Classifiers K-nearest neighbor exemplar classifiers were impractical in the past because memory and computation requirements made them prohibitively expensive. The recent availability of large amounts of inexpensive memory, commercial parallel computers, and fast nearest neighbor search algorithms for serial computers [35] has led to renewed interest in these classifiers. Recent work has focused on modified versions of the original k-nearest neighbor algorithm. These approaches interpolate between outputs of nearest neighbors stored during training to form complex nonlinear mapping functions [43] [82] and use specialized distance metrics for classification instead of the more common Euclidean distance [41] [43]. Distance metrics must compute the distance between stored exemplar and input patterns in a manner that provides good generalization. Much of the work required to develop a modified k-nearest neighbor classifier focuses on designing effective distance metrics. Heuristic techniques and cross-validation approaches are often used for this purpose. Two modified k-nearest neighbor classifiers, developed independently [41] [43], were recently compared to backpropagation classifiers using the NETtalk phonemic transcription task [45]. In this task, the input to a classifier is a seven-letter string formed by passing a seven-letter window over words in a dictionary. The output of a classifier specifies the phoneme corresponding to the letter at the center of the string. One modified nearest neighbor classifier ran on a parallel computer, called the connection machine, and used a carefully designed distance metric [41]; while the other ran on a sequential computer and used a simpler distance metric that was improved using cross-validation techniques [43]. Both decided on an output phoneme using the identity of the nearest stored training examples. Both also provided slightly better generalization (pronunciation of words not in the training set) than that reported in the past for a back-propagation classifier [45]. These results illustrate the trade-offs in memory and computation requirements that can be provided by nearest neighbor approaches. These classifiers are practical when large amounts of memory and sufficient computation power are available, and rapid single-trail learning is required.

exemplar nodes are compared to select that node whose centroid is closest to the input in Euclidean distance. Connections to output nodes associate exemplar nodes with class labels. The classification decision corresponds to the label of the exemplar node that is closest to the input. This network can perform the computation required by a k-nearest neighbor classifier if the k exemplar nodes that are nearest the input are selected and if one exemplar node is provided for each training example. This, however, frequently requires excessive memory. Memory requirements can be reduced by first training weights in the lower subnet without supervision to form a vector quantizer using the k-means algorithm [ 1 1 or Kohonen’s feature-map algorithm [83]. Weights to the output nodes can then be trained with supervision using a modified version of the LMS algorithm, as described in [36]. A feature-map classifier trained using this procedure was compared to a back-propagation classifier on vowel classification and artificial problems [36]. It provided similar error rates. However, it reduced the number of supervised training trials required from roughly 50,000 with back-propagation to 50 at the expense of roughly twice as many connection weights. These results illustrate the ability of unsupervised training to reduce supervised training time and memory requirements of exemplar-based classifiers.

Learning Vector Quantizer The LVQ [ 141 is similar in structure to the feature-map classifier shown in Figure 9. This classifier requires a final stage of supervised training that comes after the training used with the feature-map classifier. Final training adjusts weights to exemplar nodes to shift node centroids slightly, when an error occurs, in a direction that attempts to improve performance. This alters decision region boundaries slightly but maintains the same number of exemplar nodes. LVQ classifiers have been compared to back-propagation classifiers using both artificial problems and speech problems [ 141 [84]. These classifiers typically have error rates that are similar to those of backpropagation classifiers but often train faster, and require more memory and computation time during classification. They also typically provide reduced error rates compared to featuremap classifiers, especially when the number of exemplar nodes is small.

Hypersphere Classifiers One type of exemplar classifier creates decision regions from nodes that form variable-size hyperspheres in the input space [85]. These nodes have “high” outputs only if the input is within a given radius of the node’s centroid. Otherwise, node Output

&1

Feature-Map Classifier

Ez2:

(Euclidean Norm)

The feature-map classifier [36] is an exemplar classifier that uses combined unsupervised/supervised training and requires less memory than a k-nearest neighbor classifier. A block diagram ofthis classifier is shown in Figure 9. Intermediate exemplar nodes in this net provide outputs equal to the Euclidean distance between the input and node centroids, which are specified by weights on connections to exemplar nodes. Outputs of

Input

Fig. Y. .A .feature--rnapclassifier

November 1989 - IEEE Communications Magazine

Authorized licensed use limited to: Escuela Superior de Ingeneria Mecanica. Downloaded on October 30, 2009 at 20:14 from IEEE Xplore. Restrictions apply.

61

outputs are “low.” The classification decision is the label attached to the majority of nodes with “high” outputs. A “no decision” response occurs if no nodes have “high” outputs. Recent work has focused on a version of this classifier called a Restricted Coulomb Energy, or RCE, classifier [38] [39] [86] [87]. This classifier is similar to a k-nearest neighbor classifier in that it adapts rapidly over time, but it typically requires many fewer exemplar nodes than a nearest neighbor classifier. During adaptation, more nodes are recruited to generate more complex decision regions, and the size of hyperspheres formed by existing nodes is modified. Theoretical analyses and experiments with RCE classifiers demonstrate that they can form complex decision regions rapidly [38] [39] [86] [87]. Experiments also demonstrated that they can be trained to solve Boolean mapping problems such as the “symmetry” and “multiplexer” problems more than an order of magnitude faster than back-propagation classifiers [861. RCE networks are currently being applied to risk analysis to determine the acceptability of mortgage loan applications and also to many other real-world problems [88].

Radial Basis Function Classifiers Researchers have recently begun to re-examine kernel classifiers that use a technique called the method of potential functions [ 11. These classifiers are now called radial basis function classifiers. A block diagram of a radial basis function classifier is shown in Figure 10. The structure and operation of this classifier is similar to that of the feature-map exemplar classifier, except that no maximum-picking operation is performed. Instead, all nodes with strong outputs that are near the current input contribute to the classification decision. Kernel nodes compute radially symmetric functions that are maximum when the input is near the centroid of a node. Weights from kernel nodes to output nodes are determined using the LMS algorithm or matrix-based approaches that require simultaneous access to all training examples [34]. Kernel functions typically have Gaussian shapes, and the smoothing parameter or width of the kernel may be fixed or may vary across nodes. Smoothing parameters and the number of nodes are typically adjusted empirically to provide good performance on test data. Centroids are typically determined by randomly selecting training examples from a large set of training data. Radial basis function classifiers have been compared to back-propagation classifiers on vowel and digit speech classification tasks [32-341. Their error rates are typically similar to those provided by back-propagation classifiers. However, they greatly reduce training times at the expense of requiring a few times as many connection weights. For example, one study [34] reported a training time of four minutes for a radial basis function classifier versus three hours for a back-propagation Output

n

n

n

classifier. Similar error rates were obtained when the number of weights in the radial basis function classifier was roughly five times as large as the number of weights in the backpropagation classifier.

Summary New non-parametric adaptive pattern classifiers inspired by biological neural networks are being developed to provide high-performance and real-time response with real-world data. Although much recent emphasis has been placed on backpropagation classifiers, many other classifiers have been developed. These include decision tree classifiers, Boltzmann machines, RCE classifiers, feature-map classifiers, LVQ classifiers, high-order networks, radial basis function classifiers, and modified nearest neighbor approaches. In addition to providing reduced error rates over older approaches, these classifiers provide trade-offs in memory and computation requirements, training complexity, and ease of implementation and adaptation. For example, studies have demonstrated that modified k-nearest neighbor classifiers, which train rapidly but require large amounts of memory and computation, sometimes perform as well as back-propagation classifiers, which are more complex to train but require less memory [41] [43]. Binary decision tree classifiers, which have small memory and computation requirements, often perform as well as more complex back-propagation classifiers but are more complex to adapt [71]. Feature-map and LVQ classifiers can make effective use of unsupervised training data and reduce the amount of supervised training data required [ 141 [36]. These two classifiers and radial basis function classifiers require intermediate amounts of memory and training time [ 141 [34] [36]. Finally, classifiers such as the RCE classifier require less memory than k-nearest neighbor classifiers but adapt classifier structure over time using simple adaptation rules that recruit new nodes to match the complexity of the classifier to that of the training data [38] [391 V71. Further research similar to the recent work described in [89-921 is required to understand the trade-offs in practical characteristics across the many different classifiers. This will help develop guidelines for selecting one classifier from among the many available. Further work is also required to build VLSI hardware for implementing classifiers and also to develop efficient techniques to match the complexity or size of a classifier to the amount of training data available. More work is also needed to develop classifiers that can make more effective use of unlabeled training data and also to develop efficient techniques to discover effective input features.

Acknowledgments I would like to thank members of Royal Signals and Radar Establishment, including John Bridle and Roger Moore, for discussions regarding the material in this paper. I would also like to thank Bill Huang, Yuchun Lee, and Kenny Ng for interesting discussions, and Carolyn for her encouragement and patience.

References R. 0.Duda and P. E. Hart, Pattern ClassificationandSceneAnalysis.NY: John Wiley and Sons, 1973. [2] K. Fukunaga, Introduction to Statistical Panern Recognition, NY: Academic Press, 1972. [3] R. P. Gorman and T. J. Sejnowski, ‘Analysis of Hidden Units in a Layered Network Trained to Classify Sonar Targets,” Neural Nerworks, VOI. 1, pp. 75-89, 1988. [4] B. Kammerer and W . Kupper, “Experiments for Isolated-Word Recognition with Single and Multi-Layer Perceptrons,” Neural Networks, vol. 1 , supp. 1 , p. 302, Abstracts of 1st Annual INNS Meeting, Boston,

[l]

Input

Fig 10. ’4 radial basis .function class(fier.

62

1988.

November 1989 - IEEE Communications Magazine

Authorized licensed use limited to: Escuela Superior de Ingeneria Mecanica. Downloaded on October 30, 2009 at 20:14 from IEEE Xplore. Restrictions apply.

K. J. Lang and G. E. Hinton, "The Developmentof the Time-Delay Neural Network Architecture for Speech Recognition," Tech. Rep. CMUCS-88- 152, Carnegie-Mellon University, 1988. R. P. Lippmann, "Review of Neural Networks for Speech Recognition," Neural Comp., vol. 1 (1). pp. 1-38, 1989. S. M. Peelingand R. K. Moore, "Experiments in Isolated Digit Recognition Using the Multi-Layer Perceptron," Tech. Rep. 4073, Royal Speech and Radar Estab., Malvern, Worcester, Great Britain, Dec. 1987. A. Waibel. H. Sawai, and K. Shikano, "Modularity and Scaling in Large Phonemic Neural Nets," Tech. Rep. TR-1-0034, ATR Interpreting Telephony Research Laboratories, Japan, Aug. 1988. R. P. Lippmann, "An Introduction t o Computing with Neural Nets," I€€€ ASSP Mag., vol. 4 (2). pp. 4-22, Apr. 1987. A. R. 8arron and R. L. Barron, "Statistical Learning Networks: A Unifying View." 1 9 8 8 Symp. on the Interface: Statistics and Comp. Sci.. Reston, VA, Apr. 21-23, 1988. R. Gnanadesikanand J. R. Kettenring, "Discriminant Analysis and Clustering," Stat. S o . , vol. 4 (1). pp. 34-69, 1989. A. K. Jain. "Advances in Statistical Pattern Recognition," Panern Recognition TheoryandApplicarions, P.A. Devijver and J. Kittler. eds., pp. 1-19, Series F: Computer and Systems Sciences, vol. 30, NY: Springer Verlag, 1989. G. E. Hinton, "Connectionist Learning Procedures," Tech. Rep. CMUCS-87- 115, Carnegie Mellon University, Computer Science Department, June 1987. T. Kohonen, G. Barna, and R. Chrisley, "Statistical Pattern Recognition with Neural Networks. Bench Marking Studies," IEEE Annual lnt"1. Conf. on Neural Networks, San Diego. July 1988. J. Pearson and R. Lippmann, ed.. "Adaptive Knowledge Processing." DARPA NeuralNetworkStudy, Ch. 11, pp. 55-1 78, AFCEA International Press, Fairfax, VA, 1988. E . B. 8aum and D. Haussler. 'What Size Net Gives Valid Generalization?" Advances in Neurallnfo. Processing Syst. 1 , D.S. Touretzky, ed., San Mateo, CA: Morgan Kauffman, 1989. S. Judd, "Learning in Neural Networks," COLT '88Proc. ofthe 1 9 8 8 Workshop on Computational Learning Theory. D. Haussler and L Pitt, eds.. pp. 2-8, San Mateo, CA: Morgan Kaufmann, 1988. L. G Valiant, "Functionality in Neural Nets," COLT'88Proc. ofthe 1988 Workshop on Computational Learning Theory, D. Haussler and L. Pitt, eds.. pp. 28-39, San Mateo, CA. Morgan Kaufmann, 1988. G. Nagy, "Candide's Practical Principles of Experimental Pattern Recognition," IEEE Trans. on Panern Anal. and Machine Intel., vol. PAMI5(2), pp. 199-200, 1983 D E. Rumelhart, G. E. Hinton, and R. J. Williams, "Learning Internal Representations by Error Propagation," Parallel Distributed Processing, D.E. Rumelhart and J.L. McClelland, eds., Ch. 8, Cambridge, MA: MIT Press, 1986. D. H. Ackley, G. E. Hinton, and T. J. Sejnowski, "A Learning Algorithm for Boltzmann Machines," Cognitive Science, vol. 9, pp. 147-160, 1985. L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone, "Classification and Regression Trees," Wadsworth International Group, Belmont, CA, 1984. J R Quinlan, "Simplifying DecisionTrees," lnt'l. J. Man-MachineStudies. vol. 27. pp. 221-234, 1987. C. L. Giles and T. Maxwell, "Learning, Invariance, and Generalization in High-Order Networks," Applied Optics, vol. 26, pp. 4,972-4,978, Dec. 1987 N. J Nilsson, "Learning Machines." McGraw Hill, N.Y., 1965. S. Farlow. "Self-Organizing Methods in Modeling," Marcel Dekker, 1984. D. J Hand, ed., "Kernel Discriminant Analysis," John Wiley and Sons Ltd., New York, NY, 1982. J. Moody and C. Darken, "Fast Learning in Networks of Locally-Tuned Processing Units," Tech. Rep. YALEU/DCS/RR-654, Yale Computer Science Department, New Haven, CT. Oct. 1988. A. Rojer and E. Schwartz. "A Multiple-Map Model for Pattern Classification." Neural Computation, vol. 1(1), pp. 104-1 15, 1989. F. H. Glanz and W . T. Miller. "Shape Recognition using a CMAC-Based Learning System," SPIE Proc. Intelligent Robots and Computer Vision, Cambridge, MA, 1987 J. Moody, "Fast Learning in Multi-Resolution Hierarchies," Tech. Rep. YALEU/DCS/RR-681, Yale Computer Science Department, New Haven, CT, Feb. 1989. D. S. Broomheadand D. Lowe, "Radial Basis Functions, Multi-Variable Functional Interpolation and Adaptive Networks," Tech. Rep. RSRE, Memo. no. 4,148, Royal Speech and Radar Establishment, Malvern, Worcester, Great Britain, Mar. 1988. M. Niranjan and F. Fallside, "Neural Networks and Radial Basis Functions in Classifying Static Speech Patterns," Tech. Rep. CUED/FINFENG/TR 22, Cambridge University EngineeringDepartment, 1988. S. Renals and R. Rohwer, "Phoneme Classification Experiments using Radial Basis Functions," Proc. lnt'l. Joint Conf. on Neural Networks, pp. 1.461-1.467, IEEE. Washington, DC, June 1989. S . M. Omohundro, "Efficient Algorithms with Neural Network Behavior," Complex Systems, vol. 1, pp. 273-347. 1987.

W . M. Huang and R. P. Lippmann, "Neural Net and Traditional Classifiers," Neural Info. Processing Syst.. D. Anderson, ed., pp. 387-396, NY: American Institute of Physics, 1988. T. Kohonen, "An Introduction t o Neural Computing," NeuralNetworks, VOI. 1. pp. 3-16, 1988. D L. Reilly, L. N. Cooper, and C. Elbaum, "A Neural Model for Category Learning," Bio. Cybernetics, vol. 45, pp. 35-41, 1982. D. L. Reilly, C. Scofield, C. Elbaum, and L. N. Cooper, "Learning System Architectures Composed of Multiple Learning Modules," IEEE lstlnt'l. Conf on Neural Networks, June 1987. G. A. Carpenter and S. Grossberg, "ART 2: Self-Organization of Stable Category Recognition Codes for Analog Input Patterns," Applied Optics, vol. 26, pp. 4,919-4,930, 1987. C Stanfill and D. Waltz, "Toward Memory-Based Reasoning," Commun. oftheACM, vol. 29(12), pp. 1,213-1,228, Dec. 1986. D. Waltz, C. Stanfill, S. Smith, and R. Thau, "Very Large Database Applications of the Connection Machine System," Roc. Nat'l. Computer Conf., pp. 159-165, 1987. D. Wolpert, "Alternative Generalizers t o Neural Nets," Neural Networks, vol. 1, supp. 1, p. 474, Abstracts of 1st Annual INNS Meeting, Boston, 1988. R. Crick, "The Recent Excitement about Neural Nets," Nature, vol. 337, pp. 129-132, 1989. T. J Sejnowski and C. M. Rosenberg. "Parallel Networks that Learn to Pronounce English Text," Complex Systems, vol. 1, pp. 145-168, 1987. N. Qian and T. J. Sejnowski. "Predicting the Secondary Structure of Globular Proteins using Neural Network Models," J. MolecularBiology, vol. 202, pp. 865-884, 1988. G. Tesauro and T. J. Sejnowski, "A Neural Network that Learns to Play Backgammon," Neural Information Processing Systems, D. Anderson, ed.. pp. 794-803, American Institute of Physics, New York, 1988. D. Zipser and R. A. Andersen, " A Back Propagation ProgrammedNetwork that Simulates Response Properties of a Subset of Posterior Parietal Neurons," Nature, vol. 331, pp. 679-684, 1988. R. P. Lippmann and P. E. Beckman, "Adaptive NeuraI.Net Processing for Signal Detection in Non-Gaussian Noise," Advances in Neurallnfo. Processing Syst. 1 . D.S. Touretzky, ed., San Mateo, CA: Morgan Kauffman, 1989. S. Tamura and A. Waibel, "Noise Reduction using Connectionist Models," Proc. IEEE lnt% Conf Acoustics. Speech and Signal Processing,pp. 553-556, Apr. 1988. S. J Hanson and D. J. Burr, "Knowledge Representation in Connectionist Networks," Tech. Rep., Bell Communications Research, Morristown, New Jersey, Feb. 1987. I. D. Longstaff and J. F. Cross, "A Pattern Recognition Approach to Understanding the Multi-Layer Perceptron," Memo. 3,936, Royal Signals and Radar Establishment, July 1986. A Lapedes and R. Farber, "How Neural Nets Work," Neurallnformation Processing Systems, D. Anderson, ed., pp. 442-456, American Institute of Physics, New York, 1988. A. Wieland and R. Leighton, "Geometric Analysis of Neural Network Capabilities," IEEE lstlnt'l. Conf on NeuralNetworks. pp. 111-385. June 1987. G. Cybenko, "Approximation by Superpositions of a Sigmoidal Function." Mathematics of Control, Signals. and Systems, 2(4), 1989, "To Appear. M. R. Brady, R. Raghavan, and J. Slawny, "Gradient Descent Fails to Separate," In IEEE Annual International Conference on Neural Networks, pages1649-1656, IEEE, San Diego, July 1988. E. D Sontag and H. J Sussmann, "Back-Propagation Separates when Perceptrons Do," Tech. Rep. SYCON-88-12, Rutgers Center for Systems and Control, December 1988. B. S. Wittner and J. S. Denker, "Strategies for Teaching Layered Networks Classification Tasks," In D. Anderson, editor, Neural Information Processing Systems, pp, 850-859, American Institute of Physics. 1988. D. J. Burr, "Experiments on Neural Net Recognition of Spoken and Written Text," IEEE Transactions on Acoustics, Speech and Signal Processing, 36:pp. 1162-1 168, 1988. H. Sawai et al., "Parallelism, Hierarchy, Scaling in Time-Delay Neural Networks for Spotting Japanese Phonemes/CV-Syllables," In Proceedings International Joint Conference on Neural Networks, pp. 11.81-11.88, IEEE, Washington DC, June 1989. S. 8ecker and Y. Le Cun, "Improving the Convergence of BackPropagation Learning with Second Order Methods," Proc. ofthe 1 9 8 8 ConnectionistModels Summer School, D. Touretzky, G. Hinton, and T. Sejnowski, eds.. pp. 29-37, San Mateo, CA: Morgan Kauffman, 1989. L. W . Chan and F. Fallside, " A nAdaptive Training Algorithm for BackPropagation Networks," Comp. SpeechandLanguage, vol. 2, pp. 205218, 1987. S. E. Fahlman. 'Faster-Learning Variations on Back-Propagation: An Empirical Study," Proc of the 1988 Connectionist Models Summer School, D. Touretzky, G. Hinton, and T. Sejnowski, eds., pp. 38-51, San Mateo, CA: Morgan Kauffman, 1989. S. J. Hanson and D. J. Burr, "Minkowski-r Back-Propagation: Learning

November 1989 - IEEE Communications Magazine

Authorized licensed use limited to: Escuela Superior de Ingeneria Mecanica. Downloaded on October 30, 2009 at 20:14 from IEEE Xplore. Restrictions apply.

63

in Connectionist Models with Non-Euclidean Error Signals," Neural Information Processing Systems, D. Anderson, ed., pp. 348-357, NY: American Institute of Physics, 1988. R. A. Jacobs, 'Increased Ratesof Convergence Through Learning Rate Adaptation,' Neural Networks, vol. 1, pp. 295-307. 1988. H. C. Leung and V. W. Zue, "Some Phonetic RecognitionExperiments using Artificial NeuralNets," Proc. l€€€lnt'l. Conf on Acoustics, Speech and Signal Processing, Volume 1: Speech Processing, Apr. 1988. R. L. Watrous, "Learning Algorithms for Connectionist Networks: Applied Gradient Methods of Nonlinear Optimization," Tech. Rep. MSCIS-87-51, Linc Lab 72, University of Pennsylvania, June 1986. S. J. Hanson and L. Y. Pratt, "Some Comparisons of Constraints for Minimal Network Construction with Back-Propagation," Advances in Neural Information Processing Systems 1 , D. S . Touretzky, ed., San Mateo, CA: Morgan Kauffman, 1989. M. C. Mozer and P. Smolensky, 'Skeletonization: a Technique for Trimming the Fat from a Network Via Relevance Assessment,' Advances in Neural Information Processing Systems 1 , D. S . Touretzky, ed., Morgan Kauffman, San Mateo, CA, 1989. J. R. Quinlan, 'Induction of Decision Trees," Machine Learning, vol. 1, pp. 81-106, 1986. D. H. Fisher and K. E. McKusick, "An Empirical Comparisonof ID3 and Back-Propagation," Tech. Rep. TR CS-88- 14, Departmentof Computer Science, Vanderbilt University, Nashville, TN, 1988. A. K. Jain and E. Chandrasekaran, 'Dimensionality and Sample Size Considerationin Pattern RecognitionPractice," Handbook ofStatistics: Vol. 2, P. R. Krishnaiahand L. N. Kanal, eds., pp. 835-855, North Holland, 1982. G. V. Trunk, "A Problem of Dimensionality: a Simple Example," I€€€ Trans. on Pattern Analysis and Machine Intel., vol. PAMI-1, no. 3, pp. 306-307, 1979. M. Stone, "Cross-Validation Choice and Assessment of Statistical Predictions,- J. of the Royal Statistical Soc., vol. 6-36, pp. 1 1 1-147, 1974. B. Efron, ed., The Jackknife, the Bootstrap and Other Resampling Plans, Soc. for Industrial and Applied Mathematics, Philadelphia, PA, 1982. J. R. Quinlan and R. Rivest, 'Inferring Decision Trees using the Minimum Descriptive Length Principle," Info. and Computation, to appear, 1989. J. Rissanen, "A Universal Prior for Integers and Estimation by Minimum Descriptive Length,' The Annals of Statistics, vol. 1 1, pp. 416431, 1983. A. R. Barron, "Predicted Squared Error: a Criterion for Automatic Model Selection,' Self-organizing Methods in Modeling, S . J. Farlow, ed.. pp. 87-103, NY: Marcel Dekker, 1984. M. A . Minsky and S. A. Papen, Perceptrons-Expanded Edition. Cambridge, MA: MIT Press, 1988. T. Poggio, "On Optimal Nonlinear Associative Recall,- Bio. Cybernetics, vol. 19, p. 201, 1975. A. R. Earron, "Adaptive Learning Networks: Development and Application in the United States of Algorithms Related to GMDH," SelfOrganizing Methods in Modeling, S . J. Farlow, ed., pp. 25-65, NY: Marcel Dekker. 1984. J. D. Farmer and J. J. Sidorowich, "Exploiting Chaos to Predict the Future and Reduce Noise," Tech. Rep. LA-UR-88-901, Los Alamos National Laboratory, Los Alamos, New Mexico, Mar. 1988.

64

T. Kohonen. Self-Organization and Associative Memory, Berlin: Springer-Verlag, 1984. E. McDermott and S . Katagiri, "Phoneme Recognitionusing Kohonen's Learning Vector Quantization," ATR Workshop on NeuralNetworksand Parallel Distributed Processing, Osaka, Japan, July 1988. B. G. Batchelor, 'Classification and Data Analysis in Vector Space," Pattern Recognition,B. G. Batchelor, ed., ch. 4, pp. 67-1 16, London: Plenum Press, 1978. S.E. Hampson and D. J. Volper, "Disjunctive Models of Boolean Category Learning,' Bo. Cybernetics, vol. 56, pp. 121-137, 1987. C. L. Scofield, D. L. Reilly, C. Elbaum, and L. N. Cooper, "Pattern Class Degeneracy in an Unrestricted Storage Density Memory,' Neural Info. Processingsyst.,pp. 674-682, American Institute of Physics, 1988. E. Collins, S . Ghosh, and C. Scofield, "Appendix G: Risk Analysis," DARPA Neural Network Study, pp. 429-443, Fairfax, VA: AFCEA International Press, 1988. I. Guyon, I. Poujand, L. Personnaz, and G. Dreyfus, "Comparing Different Neural Network Architectures for Classifying Handwritten Digits," Proc.. lnt'l. Joint Conf on Neural Networks, pp. 11.127-11.132, Washington DC: IEEE, June 1989. Y. Lee, "Classifiers: Adaptive Modules in Pattern Recognition Systems," Master's thesis, Massachusetts Institute of Technology, Department of ElectricalEngineering and Computer Science, Cambridge, MA, May 1989. Y. Lee and R. P. Lippmann, "Practical Characteristics of Neural Network and Conventional Pattern Classifiers on Artificial and Speech Problems," Proc.. Neural Info. Processing Syst.-Natural and Synthetic Conf.. Denver, CO, Nov. 1989. M. F. Tenorio and W. T. Lee, "Self Organizing Neural Networks for the Identification Problem," Advances in Neurallnformation Processing Systems I , D. S.Touretzky, ed., pp. 57-64, San Mateo, CA: Morgan Kauffman, 1989.

Biography Richard P. Lippmann (M '85) was born in Mineola, NY in 1948. He received the E.S. degree in electrical engineeringfrom the PolytechnicInstitute of Brooklyn in 1970, and the S.M. and Ph.D. degrees in electrical engineering from the Massachusetts Institute of Technology in 1973 and 1978, respectively. His S.M. thesis dealt with the psycho-acoustics of intensity perception and his Ph.D. thesis with signal processing for the hearing impaired. From 1978 to 1981, he was the Director of the CommunicationEngineering Laboratory at the Boys Town Institute for CommunicationDisordersin Children in Omaha, NE. He worked on speech recognition, speech training aids for deaf children. sound alerting aids for the deaf, and signal processing for hearing aids. In 198 1, he joined the MIT Lincoln Laboratory in Lexington, MA. He has worked on speech recognition, speech 1/0systems, and routing and system control of circuit-switched networks. His current interests include speech recognition, neural net algorithms, stastistics, and human physiology, memory, and learning. Dr. Lippmann is Program Chairmanof the IEEE Neural Information Processing Systems Conference in Denver, CO, November 1989.

November 1989 - IEEE Communications Magazine

Authorized licensed use limited to: Escuela Superior de Ingeneria Mecanica. Downloaded on October 30, 2009 at 20:14 from IEEE Xplore. Restrictions apply.