https://developer.apple.com/library/ios/documentation/Perfo rmance/Conceptual/vImage/ConvolutionOperations/Convolut · ionOperations.html.
Deep Neural Networks and its Applications
Dr. Asifullah Khan, Professor, DCIS, PIEAS, Islamabad, Pakistan
This talk is related to Artificial Neural Networks (ANN), specifically, Deep Neural Networks
ANN is a collection of simple, trainable mathematical units that can collectively learn complex functions ANN is a field of Artificial Intelligence/Machine Learning In ANN, some basic mathematical concepts are utilized for its information-processing, training/learning, optimization, etc. 2
Outline
ANN; Artificial Neurons and Perceptron Multi Layer Perceptron & BPNN Why Deep Learning Why GPUs Applications of Deep NN DGX-1™ Supercomputer Types of Deep neural Networks
Deep Convolutional Neural Networks Deep CNN architectures: case studies Transfer Learning 3
Artificial Neural Network (ANN) ANN is an information processing paradigm that is inspired by the way biological nervous systems, such as the brain, process information. It is composed of a large number of highly interconnected processing elements (neurons) working in harmony to solve specific problems. ANNs, like people, learn by example. Neural networks have their remarkable ability to derive meaning from complicated or imprecise data. ANN can thus be used to extract patterns and detect trends that are too complex to be noticed by either humans or other computer techniques. 4
Artificial Neural Network (ANN) contd.. A trained neural network can be thought of as an "expert" in the category of information it has been given to analyze. Two of its most important abilities are: Adaptive learning: An ability to learn how to do tasks based on the data given for training. Self-Organization: An ANN can create its own organization or representation of the information it receives during learning time.
5
Biological Neuron
Image from: http://hplusmagazine.com/2012/10/17/four-statements-aboutthe-future
6
Perceptron (Artificial Neuron and Its Simple Mathematical Model)
Image from: https://battleprogrammer.wordpress.com/2011/03/23/jaringan-syaraf-tiruanapa-apa-apa/
7
Multi Layer Neural Networks • Most real world problems are not linearly separable • Perceptron, etc., is unable to create a nonlinear separating boundary • This limits their applicability to practical problems • Inhibited the growth of Neural Networks till the 1980s until the Generalized Delta Rule
• Developed by Rumelhart & Williams (1986a, 1986b) and McCelland & Rumelhart (1988) 8
Multi Layer Neural Networks (BPNN) • Consists of multiple layers • Hidden Layer acts as a feature Transformation • Activation functions – Use of sigmoid functions • Nonlinear Operation: Ability to solve practical problems • Differentiable: Makes theoretical assessment easier z f z _ in j j • Derivative can be expressed in n z _ in terms of functions themselves: j xi vij , i 0 Computational Efficiency
x0 1,
j 1... p
z0 1,
k 1...m
yk f y _ ink
p
y _ ink z j w jk , j 0
9
Activation functions used in BPNN Bipolar Sigmmoid and its derivative
Binary Sigmoid and its derivative
1
f2(x)
f1(x)
0.8
f'2(x)
f'1(x)
0.6
1
0.8
0.4 0.2
y,dy
y, dy
0.6
0.4
0 -0.2 -0.4
0.2
-0.6 -0.8
0
-1 -8
-6
-4
-2
0 x
2
4
6
8
-8
-6
-4
-2
0 x
2
4
6
8
10
Learning Rule; Gradient Descent based Optimization
Change in wjk affects only Yk Use of Gradient Descent Minimization
11
Backpropagation training cycle Feed forward
Weight Update
Backpropagation
Imagine architectures with 5,00 neurons per layer and of 1,000 layers.
12
Why Deep Learning is an Emerging Field? Three Driving Factors…
Deep Learning and GPUs
Rao M. Ume
Deep Learning Everywhere
Deep Neural Networks & DGX-1 Supercomputer
Deep Neural Networks (DNN) DNN may have millions of Neurons(each neuron act as a mathematical mapping) DNN has to perform Forward and Backward passes Forward Pass: weighted sum of inputs, and activation functions Backward Pass: gradient descent optimization of the weight space of millions of neurons(parameters) Deep Neural Networks & DGX-1 Supercomputer
DNN Approach DNN can model complex mathematical functions Imagine an image classification problem with one million images to be classified in 1,000 classes; It’s a complex learning problem;
Deep Neural Networks & DGX-1 Supercomputer
DNN (Matrix Multiplication)
Forward Processing:
Gradient Descent:
Deep Neural Networks & DGX-1 Supercomputer
GPUs are really good at matrix multiplication
Deep Learning and GPUs
Rao M. Ume
CPU vs GPU
CPU Few, fast cores (1 - 16) Good at sequential processing
GPU Many, slower cores (thousands) Originally for graphics Good at parallel computation
Deep Learning and GPUs
Rao M. Ume
cuDNN
cuDNN: Efficient Primitives for Deep Learning
Spot the CPU!
Deep Learning and GPUs
Rao M. Ume
Spot the GPU!
Deep Learning and GPUs
Rao M. Ume
Deep Learning used to Learn Complex Mappings
Deep Learning and GPUs
Rao M. Ume
Depth of a Deep NN
Deep Neural Networks & DGX-1 Supercomputer
Rao M. Ume
Revolution of Depth
28.2 25.8
152 layers
16.4 11.7
22 layers 19 layers 7.3 6.7 3.57
8 layers
ILSVRC'15
ILSVRC'14
ILSVRC'14
ResNet
GoogleNet VGG
8 layers
ILSVRC'13 ILSVRC'12
shallow
ILSVRC'11
ILSVRC'10
AlexNet
ImageNet Classification top-5 error (%) Deep residual learning for image recognition, Noorul Wahab, (26 Aug. 2016)
25
Rao M. Ume
Potential Working Areas of Deep Learning in Pakistan Autonomous Drones SUPARCO Satellite Trajectory Path Planning Pakistan Meteorological Department PMD Weather Forecasting Power Prediction Stock Market Analysis (Financial Mathematics) Medical Image Classification Cancer Diagnostics
Deep Learning and GPUs
Rao M. Ume
Measuring photometric redshifts using galaxy images and Deep Neural Networks
DNNs used for estimating the photometric redshift of galaxies by using the full galaxy image in each measured band
Deep Neural Networks & DGX-1 Supercomputer
Identifying the Higgs Boson with Convolutional Neural Networks
Convolutional Neural Network architecture for tackling the problem of identifying the Higgs boson subatomic particle from color flow energy images
Deep Neural Networks & DGX-1 Supercomputer
AlphaGo First Computer Program to Beat a Human Go Professional
Mastering the Game of Go with Deep Neural Networks and Tree Se Deep Neural Networks & DGX-1 Supercomputer
NVIDIA® DGX-1™ NVIDIA’s NVIDIA® DGX-1™, a deep learning supercomputer to meet the unlimited computing demands of artificial intelligence. The NVIDIA DGX-1 deep learning system is built on NVIDIA 08 Tesla® P100 GPUs. It provides the throughput of 250 CPU-based servers, networking, cables and racks -- all in a single box. As neural nets become larger and larger, not only we need faster GPUs with larger and faster memory, but also much faster GPU-to-GPU communication. Deep Neural Networks & DGX-1 Supercomputer
NVIDIA® DGX-1™ Specification
Deep Neural Networks & DGX-1 Supercomputer
NVIDIA® DGX-1 Performance Comparison
Deep Neural Networks & DGX-1 Supercomputer
Types of Deep NN
Deep Belief Networks (DBN) Deep Auto Encoders (DAE) Deep Convolutional Neural Networks (CNN) Deep Neural Networks Deep Long Short-Term Memory-Networks (LSTM) etc.,
33
Convolution
The step of the mask is known as a stride Image from: http://www.slideshare.net/uspace/ujavaorg-deep-learning-with-convolutionalneural-network
34
One of our Proposed CNN Architecture; Inupt HPF 2084x2084x3
Preprocessing
Automatic feature extraction Feature maps C1:128@37x37
80x80x3
ReLU
Feature maps C2:256@16x16
Classification
Pooled maps S3:256@8x8
ReLU
Feature maps C4:512@3x3
F5:512 Output layer
ReLU
:
Rotation, flipping and histogram equalization
Convolution Convolution 8x8 7x7 Stride:2 Stride:2
Max pooling 2x2 Stride:2
Convolution 4x4 Stride:2
Fully connected
featureMapSize = [(inputsize – filtersize + 2xzeropadding)/stride]+1 pooledMapSize = [(inputsize – filtersize)/stride]+1 35
Motivation for CNN; Less Parameters
Motivation for CNN; Why use hierarchical multi-layered models? • biological vision is hierarchically organized
Slide credit: Dr. Richard E. Turner presentation (2014)
37
Transfer Learning; Image Classification example
Features Task One
Deep Neural Network and Transfer Learning
Model One
38
Transfer Learning; Image Classification example Reuse Features Task One
Cars
Features Task Two
Model Two
Motorcycles
Task Two Deep Neural Network and Transfer Learning
39
References • http://ufldl.stanford.edu/tutorial/supervised/FeatureExtractio nUsingConvolution/ • Tutorial on Deep Learning and Applications Honglak Lee (University of Michigan) • http://docs.gimp.org/en/plug-in-convmatrix.html • https://developer.apple.com/library/ios/documentation/Perfo rmance/Conceptual/vImage/ConvolutionOperations/Convolut ionOperations.html • http://www.slideshare.net/AhmedMahany/convolutionneural-networks • http://www.robots.ox.ac.uk/~vgg/research/text/index.html 40
References cont… • http://learning.eng.cam.ac.uk/pub/Public/Turner/Teaching/mllecture-3-slides.pdf • LeCun, Yann, et al. "Gradient-based learning applied to document recognition." Proceedings of the IEEE 86.11 (1998): 2278-2324. • Zeiler, Matthew D., and Rob Fergus. "Visualizing and understanding convolutional networks." Computer vision–ECCV 2014. Springer International Publishing, 2014. 818-833. • Bengio, Yoshua. "Learning deep architectures for AI." Foundations and trends® in Machine Learning 2.1 (2009): 1-127. • Szegedy, Christian, et al. "Going deeper with convolutions." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015. • He, Kaiming, et al. "Deep Residual Learning for Image Recognition." arXiv preprint arXiv:1512.03385 (2015).
41