Deep Neural Networks

9 downloads 406 Views 2MB Size Report
... to train deep architectures. 10. Page 11. 11. Slide from: https://deeplearningworkshopnips2010.files.wordpress.com/2010/09/nips10-workshop-tutorial-final.pdf ...
Introduction to Deep Neural Networks Dr. Asifullah Khan, DCIS, PIEAS, Pakistan

Outlines • Journey from shallow to Deep learning • Shortcomings of BPNN • Details of Deep NN • • • •

RBM DBN Auto Encoders CNN

Single Layer Perceptron for Pattern Classification  Architecture

 Thus the Neuron fires if net  b   xi wi  wT x  b  0 i

wT x  b

Discrimination Hyper plane

 Thus –b can be thought of as a threshold which when exceeded would cause the neuron to fire

5

6

Back Propogation  Advantages  Multi layer networks trained by back propogation algorithm allow any mapping between input and output  What is wrong with back propogation?  Requires labeled training data Almost all data is unlabeled  Learning time does not scale well Very slow with multiple hidden layers  Vanishing gradients  Overfitting  In 90’s,one of the important reasons of Backpropagators not providing satisfactory results on complicated problems was that hardware for processing was not that advanced as it is today.

7

But Still We need Deep Neural Networks

8

9

Before 2006 Failing to train deep architectures

10

Slide from: https://deeplearningworkshopnips2010.files.wordpress.com/2010/09/nips10-workshop-tutorial-final.pdf

11

Slide from: https://deeplearningworkshopnips2010.files.wordpress.com/2010/09/nips10-workshop-tutorial-final.pdf

12

Slide from: https://deeplearningworkshopnips2010.files.wordpress.com/2010/09/nips10-workshop-tutorial-final.pdf

13

Slide from: https://deeplearningworkshopnips2010.files.wordpress.com/2010/09/nips10-workshop-tutorial-final.pdf

14

15

16

[2]

Larochelle, Hugo, et al (2009)

17

18

19

20

21

22

23

24

25

26

Contrastive Divergence

 Positive Phase • Input sample ‘v’ given to input layer • ‘v’ is feedforwarded to hiddenlayer. The result of hidden layer activations is ‘h’  Negative Phase • Propogate ‘h’ back to visible layer with result ‘v`’ • Propogate new ‘v`’ back to hidden layer with activations result ‘h`’  Weight update w(t+1) = w(t) + α(vhT – v`h`T) 27

Slide from: https://deeplearningworkshopnips2010.files.wordpress.com/2010/09/nips10-workshop-tutorial-final.pdf

28

[3]

29

[3]

30

[3]

31

[3]

32

[4]

33

[4]

34

Slide from: https://deeplearningworkshopnips2010.files.wordpress.com/2010/09/nips10-workshop-tutorial-final.pdf

35

[2]

36

[5, 6]

37

38

Deep Convolutional Neural Networks

39

Several Recent and Interesting Examples • Resnet (By Microsoft); research.microsoft.com/enus/um/people/kahe/ • Googlenet (By Google); http://deeplearning.net/2014/09/19/googlesentry-to-imagenet-2014-challenge/

40

Slide from: https://deeplearningworkshopnips2010.files.wordpress.com/2010/09/nips10-workshop-tutorial-final.pdf

41

Summary: Deep Neural Networks….cont.  DNN have both Generative and Discriminative abilities  Offer good Generalization; Unsupervised Pre-training

 DNN have capability of Dynamic Feature Extraction  Exploitation of Hardware resources for Parallel Processing  (GPU, etc.,) ( Matrix Multiplication, Exploiting “No Data-Dependency”) 42

Thank You 43

References [1] Bengio, Yoshua, et al. "Greedy layer-wise training of deep networks."Advances in neural information processing systems 19 (2007): 153. [2] Larochelle, Hugo, et al. "Exploring strategies for training deep neural networks." The Journal of Machine Learning Research 10 (2009): 1-40. [3] Hinton, Geoffrey E., Simon Osindero, and Yee-Whye Teh. "A fast learning algorithm for deep belief nets." Neural computation 18.7 (2006): 1527-1554. [4] Vincent, Pascal, et al. "Extracting and composing robust features with denoising autoencoders." Proceedings of the 25th international conference on Machine learning. ACM, 2008. [5] Bengio, Yoshua. "Learning deep architectures for AI." Foundations and trends® in Machine Learning 2.1 (2009): 1-127. [6] Erhan, Dumitru, et al. "The difficulty of training deep architectures and the effect of unsupervised pre-training." International Conference on artificial intelligence and statistics. 2009. 44

45

46

47

48

49

50