Deep Neural Networks

23 downloads 254569 Views 2MB Size Report
https://developer.apple.com/library/ios/documentation/Perfo rmance/Conceptual/vImage/ConvolutionOperations/Convolut · ionOperations.html.
Deep Neural Networks and its Applications

Dr. Asifullah Khan, Professor, DCIS, PIEAS, Islamabad, Pakistan

 This talk is related to Artificial Neural Networks (ANN), specifically, Deep Neural Networks

 ANN is a collection of simple, trainable mathematical units that can collectively learn complex functions  ANN is a field of Artificial Intelligence/Machine Learning  In ANN, some basic mathematical concepts are utilized for its information-processing, training/learning, optimization, etc. 2

Outline       

ANN; Artificial Neurons and Perceptron Multi Layer Perceptron & BPNN Why Deep Learning Why GPUs Applications of Deep NN DGX-1™ Supercomputer Types of Deep neural Networks

 Deep Convolutional Neural Networks  Deep CNN architectures: case studies  Transfer Learning 3

Artificial Neural Network (ANN)  ANN is an information processing paradigm that is inspired by the way biological nervous systems, such as the brain, process information.  It is composed of a large number of highly interconnected processing elements (neurons) working in harmony to solve specific problems.  ANNs, like people, learn by example.  Neural networks have their remarkable ability to derive meaning from complicated or imprecise data.  ANN can thus be used to extract patterns and detect trends that are too complex to be noticed by either humans or other computer techniques. 4

Artificial Neural Network (ANN) contd..  A trained neural network can be thought of as an "expert" in the category of information it has been given to analyze.  Two of its most important abilities are:  Adaptive learning: An ability to learn how to do tasks based on the data given for training.  Self-Organization: An ANN can create its own organization or representation of the information it receives during learning time.

5

Biological Neuron

Image from: http://hplusmagazine.com/2012/10/17/four-statements-aboutthe-future

6

Perceptron (Artificial Neuron and Its Simple Mathematical Model)

Image from: https://battleprogrammer.wordpress.com/2011/03/23/jaringan-syaraf-tiruanapa-apa-apa/

7

Multi Layer Neural Networks • Most real world problems are not linearly separable • Perceptron, etc., is unable to create a nonlinear separating boundary • This limits their applicability to practical problems • Inhibited the growth of Neural Networks till the 1980s until the Generalized Delta Rule

• Developed by Rumelhart & Williams (1986a, 1986b) and McCelland & Rumelhart (1988) 8

Multi Layer Neural Networks (BPNN) • Consists of multiple layers • Hidden Layer acts as a feature Transformation • Activation functions – Use of sigmoid functions • Nonlinear Operation: Ability to solve practical problems • Differentiable: Makes theoretical assessment easier z  f z _ in  j j • Derivative can be expressed in n z _ in  terms of functions themselves: j  xi vij , i 0 Computational Efficiency

x0  1,

j  1... p

z0  1,

k  1...m

yk  f  y _ ink 

p

y _ ink   z j w jk , j 0

9

Activation functions used in BPNN Bipolar Sigmmoid and its derivative

Binary Sigmoid and its derivative

1

f2(x)

f1(x)

0.8

f'2(x)

f'1(x)

0.6

1

0.8

0.4 0.2

y,dy

y, dy

0.6

0.4

0 -0.2 -0.4

0.2

-0.6 -0.8

0

-1 -8

-6

-4

-2

0 x

2

4

6

8

-8

-6

-4

-2

0 x

2

4

6

8

10

Learning Rule; Gradient Descent based Optimization

Change in wjk affects only Yk Use of Gradient Descent Minimization

11

Backpropagation training cycle Feed forward

Weight Update

Backpropagation

Imagine architectures with 5,00 neurons per layer and of 1,000 layers.

12

Why Deep Learning is an Emerging Field? Three Driving Factors…

Deep Learning and GPUs

Rao M. Ume

Deep Learning Everywhere

Deep Neural Networks & DGX-1 Supercomputer

Deep Neural Networks (DNN)  DNN may have millions of Neurons(each neuron act as a mathematical mapping)  DNN has to perform Forward and Backward passes  Forward Pass: weighted sum of inputs, and activation functions  Backward Pass: gradient descent optimization of the weight space of millions of neurons(parameters) Deep Neural Networks & DGX-1 Supercomputer

DNN Approach  DNN can model complex mathematical functions  Imagine an image classification problem with one million images to be classified in 1,000 classes;  It’s a complex learning problem;

Deep Neural Networks & DGX-1 Supercomputer

DNN (Matrix Multiplication)

Forward Processing:

 Gradient Descent:

Deep Neural Networks & DGX-1 Supercomputer

GPUs are really good at matrix multiplication

Deep Learning and GPUs

Rao M. Ume

CPU vs GPU

 CPU  Few, fast cores (1 - 16)  Good at sequential processing

 GPU  Many, slower cores (thousands)  Originally for graphics  Good at parallel computation

Deep Learning and GPUs

Rao M. Ume

cuDNN

cuDNN: Efficient Primitives for Deep Learning

Spot the CPU!

Deep Learning and GPUs

Rao M. Ume

Spot the GPU!

Deep Learning and GPUs

Rao M. Ume

Deep Learning used to Learn Complex Mappings

Deep Learning and GPUs

Rao M. Ume

Depth of a Deep NN

Deep Neural Networks & DGX-1 Supercomputer

Rao M. Ume

Revolution of Depth

28.2 25.8

152 layers

16.4 11.7

22 layers 19 layers 7.3 6.7 3.57

8 layers

ILSVRC'15

ILSVRC'14

ILSVRC'14

ResNet

GoogleNet VGG

8 layers

ILSVRC'13 ILSVRC'12

shallow

ILSVRC'11

ILSVRC'10

AlexNet

ImageNet Classification top-5 error (%) Deep residual learning for image recognition, Noorul Wahab, (26 Aug. 2016)

25

Rao M. Ume

Potential Working Areas of Deep Learning in Pakistan  Autonomous Drones  SUPARCO Satellite Trajectory Path Planning  Pakistan Meteorological Department PMD Weather Forecasting  Power Prediction  Stock Market Analysis (Financial Mathematics)  Medical Image Classification  Cancer Diagnostics

Deep Learning and GPUs

Rao M. Ume

Measuring photometric redshifts using galaxy images and Deep Neural Networks

 DNNs used for estimating the photometric redshift of galaxies by using the full galaxy image in each measured band

Deep Neural Networks & DGX-1 Supercomputer

Identifying the Higgs Boson with Convolutional Neural Networks

 Convolutional Neural Network architecture for tackling the problem of identifying the Higgs boson subatomic particle from color flow energy images

Deep Neural Networks & DGX-1 Supercomputer

AlphaGo  First Computer Program to Beat a Human Go Professional

Mastering the Game of Go with Deep Neural Networks and Tree Se Deep Neural Networks & DGX-1 Supercomputer

NVIDIA® DGX-1™  NVIDIA’s NVIDIA® DGX-1™, a deep learning supercomputer to meet the unlimited computing demands of artificial intelligence.  The NVIDIA DGX-1 deep learning system is built on NVIDIA 08 Tesla® P100 GPUs.  It provides the throughput of 250 CPU-based servers, networking, cables and racks -- all in a single box.  As neural nets become larger and larger, not only we need faster GPUs with larger and faster memory, but also much faster GPU-to-GPU communication. Deep Neural Networks & DGX-1 Supercomputer

NVIDIA® DGX-1™ Specification

Deep Neural Networks & DGX-1 Supercomputer

NVIDIA® DGX-1 Performance Comparison

Deep Neural Networks & DGX-1 Supercomputer

Types of Deep NN      

Deep Belief Networks (DBN) Deep Auto Encoders (DAE) Deep Convolutional Neural Networks (CNN) Deep Neural Networks Deep Long Short-Term Memory-Networks (LSTM) etc.,

33

Convolution

The step of the mask is known as a stride Image from: http://www.slideshare.net/uspace/ujavaorg-deep-learning-with-convolutionalneural-network

34

One of our Proposed CNN Architecture; Inupt HPF 2084x2084x3

Preprocessing

Automatic feature extraction Feature maps C1:128@37x37

80x80x3

ReLU

Feature maps C2:256@16x16

Classification

Pooled maps S3:256@8x8

ReLU

Feature maps C4:512@3x3

F5:512 Output layer

ReLU

:

Rotation, flipping and histogram equalization

Convolution Convolution 8x8 7x7 Stride:2 Stride:2

Max pooling 2x2 Stride:2

Convolution 4x4 Stride:2

Fully connected

featureMapSize = [(inputsize – filtersize + 2xzeropadding)/stride]+1 pooledMapSize = [(inputsize – filtersize)/stride]+1 35

Motivation for CNN; Less Parameters

Motivation for CNN; Why use hierarchical multi-layered models? • biological vision is hierarchically organized

Slide credit: Dr. Richard E. Turner presentation (2014)

37

Transfer Learning; Image Classification example

Features Task One

Deep Neural Network and Transfer Learning

Model One

38

Transfer Learning; Image Classification example Reuse Features Task One

Cars

Features Task Two

Model Two

Motorcycles

Task Two Deep Neural Network and Transfer Learning

39

References • http://ufldl.stanford.edu/tutorial/supervised/FeatureExtractio nUsingConvolution/ • Tutorial on Deep Learning and Applications Honglak Lee (University of Michigan) • http://docs.gimp.org/en/plug-in-convmatrix.html • https://developer.apple.com/library/ios/documentation/Perfo rmance/Conceptual/vImage/ConvolutionOperations/Convolut ionOperations.html • http://www.slideshare.net/AhmedMahany/convolutionneural-networks • http://www.robots.ox.ac.uk/~vgg/research/text/index.html 40

References cont… • http://learning.eng.cam.ac.uk/pub/Public/Turner/Teaching/mllecture-3-slides.pdf • LeCun, Yann, et al. "Gradient-based learning applied to document recognition." Proceedings of the IEEE 86.11 (1998): 2278-2324. • Zeiler, Matthew D., and Rob Fergus. "Visualizing and understanding convolutional networks." Computer vision–ECCV 2014. Springer International Publishing, 2014. 818-833. • Bengio, Yoshua. "Learning deep architectures for AI." Foundations and trends® in Machine Learning 2.1 (2009): 1-127. • Szegedy, Christian, et al. "Going deeper with convolutions." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015. • He, Kaiming, et al. "Deep Residual Learning for Image Recognition." arXiv preprint arXiv:1512.03385 (2015).

41