Probabilistic Neural Networks

35 downloads 2976 Views 628KB Size Report
Weights of edges = Input Vector. Page 12. Pattern Layer. Each training sample has a corresponding ... Sums up kernel functions of connected pattern units – f(X) ...
Probabilistic Neural Networks Krishna Ganesula CPSC 636: Neural Networks Spring 2010 Instructor: Dr. Ricardo Gutierrez-Osuna

Outline y

Context and Problem

y

Bayesian Strategy

y

Probabilistic Neural Network

y

Comparison and Analysis

y

Recent Work and Conclusions

Context Author – Dr. Donald F. Specht p Lockheed Missiles & Space Company, Inc. Published bl h d in 1989-90 Neural Networks - Volume 3 Radial Basis Function (RBF) Networks and Bidirectional Associative Memoryy ((BAM)) Networks were proposed about the same time.

Example – Cancer Diagnosis Prior Info – (Pulse Æ a, Blood pressure Æ b, N samples, 1≤ i ≤N )

Past Scores True Diagnosis

Xi = (xi(a), xi(b)) di

Task – New score Predict

X = (x(a), x(b)) d (?)

Pulse Scores (a)

Diagnosis

Blood l d Pressure Scores (b)

KNN (1NN Æ [–] , 9NN Æ [+])

Probability Density Function (PDF) f ( {a, b} )

PDF for +ve samples

Parametric PDF y Assume the PDF is Gaussian as above. y Find f(X), f[+](X) > f[-](X) => X Є [+] But the underlying distribution isn’t always normal.

Kernel Density Estimation (Parzen Window)

K Æ kernel (weight) function h Æ smoothing parameter

(Probability distribution of X is continuous)

Kernel (Weight) Æ Gaussian

where Xi = ith training sample from category [+] σ = smoothing parameter m = number of training samples

Is f(X) enough? Prior Probabilities (P[+] , P[–] ) y

Sample Inconsistency

Misclassification ( L[[+]] , L[–] ) y

Account for serious mistakes X Є [+] Ù P[+] L[+] f[+](X ) > P[–] L[–] f[–](X )

where f[+](X ) is the PDF values for [+] samples L[+] is i the h loss l function f i for f the h decision d i i X Є [+] [ ] when h X Є [–]. [ ] p[+] is the prior probability of positive diagnosis

Probabilistic Neural Networks Architecture

Input p Layer y Input vector X = ( Xa , Xb ) where a Æ Pulse P lse score b Æ Blood Pressure score

Weights of edges = Input Vector

BP

Pulse

Pattern Layer y Each training sample has a corresponding ppattern unit Kernel Function K

Computes Gaussian Distance of each sample from input.

Summation Layer y Each category has a corresponding summation unit Onlyy connected to pattern units of same g y category

Sums up kernel functions of connected pattern units – f(X)

Output Layer In this case we have binary output for t categories. two t i Prior probability and loss figures are added to the PDF as a weight

P[ − ] L[ − ] N [ + ] × C=− P[ + ] L[ + ] N [ − ] For proportional training C is ratio of losses. In case of no losses we get an inverter C = -1

Summation Units

Comparing with MLPs Advantages y Virtually no time consumed to train. y Relatively y sensitive to outliers. y Can generate probability scores. y Can approach Bayes optimal. optimal Disadvantages y Testing time is long for a new sample. y Needs N d lot l off memory ffor training i i data. d

Smoothing effect (σ)

Nature of the PDF varies as we change σ. y Small σ creates distinct modes y Larger σ allows interpolation between points y Very Large σ approximate PDF to Gaussian

So how do we choose σ ? neither limiting case σÆ0 or σÆ∞ is optimal. y

y

y

y

No off neighbors N i hb tto average should depend on the density y of training samples Easy to find in practice, practice also low effect on error rate. rate Graph shows that any σ value between 44-10 10 gives close to optimal results.

Further Modifications Alternate estimators y Use Alternate l univariant kernels k l y Causes changes in activation function y But returns same optimal result Associative Memory y Maximize PDF to estimate unknown input variable. y For more than one unkown variable used a generalized global PDF f(X’)

Recent Applications of PNNs y

Ship Identification Using Probabilistic Neural Networks (PNN) by LF Araghi , Proceedings of IM IMECS CS

y

Application of Probabilistic Neural Network Model in E l ti off Water Evaluation W t Quality Q lit by b Changjun Ch j Zh Zhu, Zhenchun Hao, Environmental Science and Information

y

A probabilistic neural network for earthquake magnitude prediction by H Adeli, Neural Networks Vol 22

y

Detection of Resistivity for Antibiotics by Probabilistic Neural Networks by Fatma Budak and Elif Derya, J Journal l off Medical M di l Systems S

Conclusions y

PNNs are the neural network way of implementing non p parametric PDF estimation for classification.

y

PNNs are faster to train and approach the Bayes optimal as the training set increases. increases

y

It is vital to find an accurate smoothing gp parameter σ.

y

PNNs today are being widely researched to find more efficient classification solutions. solutions

Q Questions?