A quaternion is a four-dimensional complex number An arbitrary orientation of frame B relative to frame A
The coordinate system {A} is rotated by θ relative to the
Angle
normalized four-dimensional vector
the quaternion paramentrization obeys
hypercomplex numbers {i,j,k}
A quaternion conjugate
A quaternion product ⊗
Hamilton rule
proof
Euler axis
Vector's Elements
Half Angle Identities
Norm
Inverse
A three dimensional vector
Derivative of Quaternion
Orthogonal matrix
Transpose = Inverse
A three dimensional vector
Inner Product Space
Representation of orientation
The coordinate system {A} is rotated by θ relative to the
Rotation Matrix: Direction Cosine Matrix
Properties of rotation matrix
Z-Y-X 오일러 각 (이동 좌표계)
→모든 회전이 이동 좌표계에 대한 회전을 이용한 상대변환이므로 해당 변환행렬을 앞에서부터 순차적으로 곱한다.
Roll-Pitch-Yaw (고정 좌표계)
→고정기준좌표계{A}의 X축에 대해 φ만큼 회전 후 고정좌표계의 Y축에 대해 θ만큼 회전한 후, →모든 회전이 고정 기준 좌표계에 대한 회전을 이용한 절대변환이므로 해당 변환행렬을 뒤에서부터 앞으로 역순으로 곱한다.
Rotation Matrix
→the three RPY angles are (θ∈(-π/2,π/2))
→or (θ∈(π/2,3 π/2))
x,y,z
Rotation Matrix
1
2
3
Euler Angle (Roll φ
1
2
3
4
5
γ/ϑ →
Rotation Matrix
1
2
3
4
5
Rotation Matrix
Orientation
Half-Angle Identities
1
2
3
4
5
6
7
8
9
Orientation from angular rate A tri-axis gyroscope Angular rate
Quaternion
Measured at time t
Orientation of the earth frame at time t
The samling period Δt
Gyroscope bias drift compensation With temperature and motion Karman-based approaches → estimate the gyroscope bias Mahony et al → gyroscope bias drift (the integral feedback of the error)
Normalized direction of the estimated error in the rate of change of orientation
Gyroscope bias
The integral gain ζ
DC component of
Gyroscope measurements
Filter gains
Filter gain
Estimated rate of gyroscope bias drift in each axis Filter gain
Jacobian matrix and determinant
Gradient descent (Linear system)
Orientation from vector observations A tri-axis accelerometer (Linear accelerations due to motion)
A tri-axis magnetometer (Local magnetic flux and distortions)
Quaternion
Predefined reference direction
Measured direction
Objective function
Quaternion may be found
Gradient descent algorithm
Orientation estimation of
step-size: α Gradient of the solution surface (General form)
Objective function
Jacobian matrix
Gradient of the solution surface (General form)
Calculation induction
Direction of gravity (Vertical axis, z axis) Appropriate convention (the equations simplify)
Normalized accelermeter measurement
Objective function
Jacobian matrix
Earth's magnetic field (One horizontal axis, Vertical axis) Appropriate convention (the equations simplify)
Normalized accelermeter measurement
Objective function
Jacobian matrix
Magnetic distortion compensation Declination errors Horizontal plane:earth's surface (Heading) Inclination errors Vertical plane: earth's surface (Sensor's attitude) The measured direction of the earth's magnetic field in the earth frame at time t
Solution surface → minimum
Objective function
Jacobian matrix
Gradient of the solution surface (General form)
Estimate orientation
Convergence rate governed by
Objective function gradient
Optimal value of the step-size
Filter fusion algorithm Estimated orientation
η,
Magnitude of a quaternion derivative == Gyroscope measurement error
Optimal fusion
Simplify Estimated orientation
Estimated orientation rate
Rate of change of orientation measured by gyroscopes
Direction of the estimated error Simplified to equation
Estimated rate of change of orientation
Quaternion derivative measured at time t
Direction of error of
Objective function
Jacobian matrix
Gradient of the solution surface (General form)
Calculation induction
IMU (Inertia Measurement Unit) algorithm
Least Squares LR (Linear Regression)
y= 1
2
3
4
y= 1
2
3
4
Multivariate Regression
Magnetic sensor Soft/Hard Iron Elliptical Sphere
Sphere
W=
1
2
3
4
t
W=
Elliptical Sphere
W=
1
2
3
t
Artificial Intelligence (AI) Machine Learning Deep Learning
Function Composition
Fully-connected network (FC-net)
Convolutional neural network (CNN)
Recurrent neural network (RNN)
Explosion
Gradient
Linear Models Universal Approximation
Model = Approximation
Linear Models Linear regression Regularization
Binary Classification Logistic Regression
Linear models for regression
Problem Setup
Basic Function
Linear models
Neural networks
Kernel regression
Least Squares Method
Regularization
Ridge Regression
Find λ
Illusstration
Logistic Regression
Logistic Regression
Error function
Cross entropy error
ex
Logistic Regression:MLE
Logistic Regression:IRLS
Multiclass Extension: Softmax Regression
Feedforward Nets Linear classification A linear discriminant function which has the form
Decision rule is given by sgn[f[x]]
Separating hyperplane
Perceptron A single layer neural network The first iterative algorithm for learning linear classification
Perceptron convergence Theorem
Perceptron Criterion
objective function
Gradient descent
Perceptron Learning: A Basic Idea
Perceptron Learning: Algorithm Outline
McCulloch-Pitts Model
Activation Functions
Multilayer Perceptron (MLP) Structure: bipartite
Square loss (error)
Sematic space Backpropagation
Image Classification
Training Optimization for deep learning
Gradient descent
A first-order iterative optimization algorithm for finding a local minimum of the objective function J[θ]
Moves from the current values of parameters,
(Full) Batch Gradient Descent ⇔ Vanilla Gradient Resort to entire training dataset to compute the gradient of the object function
The accuracy of the parameter update is high but it can be slow Intractable for datasets that do not fit in memory Does not allow us to update the model online (with new examples on the fly) Mini-Batch Gradient Descent ⇔ Stochastic Gradient descent (SGD)
Gradient=Steepest Descent Direction
By the Lagrangian method, we have
Convex function
Neural Network
Iterative methods
Problem
General form of iteration methods
Definition
Lemma
Exponentially Weighted Moving Average
Bias Correction
Gradient Descent with Momentum
Manhattan-Learning Rule
Resilent Backprop (Rprop)
AdaGrad: Adaptive Gradient
RMSProp
ADAM Optimization
Learning Rate Decay (Step Size)
Dropout
Normalization
Batch Normalization
BN is applied to individual dimension for each mini-batch of size M
Rescale and shift by learnable parameters
BN in Inference Phase
Layer Normalization
CNN (Convolutional Neural Network) Convolutional Neural Network (CNN)
Pre-trained CNNs: before Freeze, Change Output (Transfer learning)
LeNet-5
Operations
ResNet
Inception Net
Convolutions Padding
Strided Convolution
Convolution over Volume
1×1 Convolution
Max Pooling
Semantic Segmentation
Fully Convolutional Networks
Conv+Deconv
Upsampling
Nearest Neighbor
Bed of Nails
Max unpooling
Deconvolution or Transposed Convolution
e.x.1
e.x.2
Residual Block
Mitigate vanishing gradients:
Skip Connections Identity shortcuts:
when dimensions change
Inception Nets Naive Version
With Dimension Reduction
Object Detection: Localization + Classification
R-CNN:Regions with CNN features
YOLO
CNN for Time Series Classification
RNN (Recurrent Neural Network) IID Data (Independent, Identically, Distributed)
Sequence Modeling
Non IID Data
Feedforward Net
Vanilla RNN: Unfolding Computational Graph
Hidden Markov Model
Many to Many: Encoder-Decoder
Seq-to-Seq Learning
Aligment Model
Vanilla RNN: Gradient Flow
Attention in Encoder and Decoder Encoder
Decoder
Encoder-decoder attention
Hidden vector
Long Short Term Memory (LSTM)
Permutation-Equivariant Attention Modules (SAB & ISAB) MAB (multihead attention block)
SAB (set attention block)
ISAB (induced set attention block)
Gates ∈ [0,1] LSTM Cell Updates
LSTM: Gradient Flow
Gated Recurrent Unit (GRU)
Generative RNN
Likelihood
Loss function
VAE GAN Generating Sequence
Training RNNs for sequence Prediction
Training
Teacher-Forcing
TF
Without TF
Attention in RNN-Encoder-Decoder
Visual Attention
Encoder
Decoder
Tranformer models
RNN
CNN (ByteNet, ConvS2S)
Vanilla Transformer
Self-Attention: A sequence-to-sequence operation
Query, Key, Value
Scaled Dot-Product Attention
Multi-Head Attention
Position-wise Feedforward Networks
Positional Encoding
Reformer: Efficient Transformer Transformer models
Reformer models
Locality Sensitive Hashing (LSH)
Reversible Network
Autoencoder Permutation Invariance and Equivariance
Wanted
Definition (Permutation invariance)
Definition (Permutation equivariance)
Permutation Equivariant Functions
Amortized Clustering
Attention Operaotors Dot-product attention
Multihead attention
Set Transformer: Encoder & Decoder Encoder ( X → Z )
Pooling by Multihead Attention (PMA)
Decoder ( Z → y )
Deep Generative models
Latent Space = Hidden space=Invisible space
Observed space A Powerful model for unsupervised learning
Image Inpainting eCommerceGAN Linear Generative Models: Earlier Days Sparse Coding Recognizing data (via discriminative models)
Creating data (via generative models)
Density Estimation
Prescribed models
Deep learning
Implicit models
Deep learning
Variational Autoencoders (VAE) Autoendoer
Limitation
Variational Autoencoder
Training VAE with Reparameterizaion Trick
Variational Autoencoder
Probabilistic decoder (generator network)
Probabilistic encoder (inference network) for amortized variational inference
Training VAE Variational lower-bound
Reconstruction cost
Penalty
Maximize the variational lower-bound on the average log-likelihood
Given
Goal
Model
Variational lower-boud
Two problems to be addressed
Strochastic Gradient Variational Bayes
Noisy Gradients
Reparameterization Trick
Score function gradients
Reparameterization gradients
VAE: Revisited
Practice of VAEs
Shortcomings of VAEs
Over-regularization
Heuristics
Practical Implementation of VAEs:Summary
Noise Injection
Regularization
Regulaized Autoencoder (RAE) Deterministic Reqularized Autoencoders No noise injection
RAE
The loss for RAE is given by
Examples of Tikhonov regularization
Gradient penalty
Spectal normalization
Ex-Post Density Estimation No KL divergence term in RAE
Ex-post density estimation
ES-CVAE Echo-State Conditional Variational Autoencoder
Echo State Networks
Our Model: ES-CVAE
Variational Lower-Bound F on Log[p[
Neural Statistician
Amortized inference
Variational inference = per-sample inference Neural Statistician: A Bayesian Hierarchincal Model
Neural Statistician = VAE for sets
5-way 1-shot
GAN (Generative Adversarial Network) Adversarial training
Generative Adversarial Network
Generator, G[z;θ]:
Discirminator, D[x;φ]:
Training GAN
Training D
Training G
Two-player minimax game (for Nash equilibrium)
Unrolled GANs
Generating Images by GANs DCGAN
Progressive Growing of GANS
Interesting Applications of GANs GAN for single Image Super-Resolution
eCommerce GAN Improved Techniques for Training GANs Convergent Issue in GAN Training GANs
Problems in GAN Training Non-Convergence
Mode collapsing
Diminished gradient
Feature Matching to Train G
Denoising Auto-Encoder
GAN Trained with Denoising Feature Matching Training G
An Information-Theoretic Extension of GAN
Disentangled Representation
Disentangled = Interpretable and Factorized Information Maximization InfoGAN
InfoGAN
Training InfoGAN Train the discriminator D[x]
Train the generator G[z,c]
Variational Infomax
Conditional GAN
Generator
Discriminator
Optimization
Image-to-image translation Map Edges to Photo via cGAN Unpaired Image to Image Translation
Cycle-Consistency
Adversarial Loss + Cycle Consistency Loss Adversarial loss for G:X→Y and F:Y→X
Cycle consistency loss
Summary
GANs with Encoder Networks Adversarially Learned Inference
Encoder joint distribution
Decoder joint distribution
Match these two joint distributions The miniax game
Semi-Supervised Learning with GANs Small amount of labeled data
Semi-Supervised Learning with GAN Classifier for K classes
GAN
Loss
Hyperparameter Optimization Bayesian optimization Optimization of Black-Box Functions
Regret Instantaneous regret
Cumulative regret (in the bandit setting)
Simple regret: (in the optimization setting)
No regret algorithms in the bandit setting:
Hyperparameter Optimization
Hyperparameters
SigOput
Objective
Search space
Observations
Automated machine learning Objective
Search space
Observations
Clinical drug trials in healthcare Objective
Search space
Observations
Active user modeling Objective
Search space (x)
Observations (f[x])
Hyperparameters Model parameters
Hyperparameters
Auto ML ⊃ Hyperparameter Optimization ⊃ Neural Architecture Search Search over Configuration Space
Grid serach Random serach Any more efficient method in terms of the number of eveluations?
AutoML
Feature processing Model/algorithm selection Hyperparameter tuning Many companies are using AutoML
Microsoft
Amazon
AutoGluon: Introduced by Amazon in January, 2020
Democratizes the task of ML
Bayesian Optimization Surrogate function
Auquisition function
Surrogate model
GP regression Regression
Random function = Gaussian Process
Choice of Kernels Squared exponential kernel (Gaussian kernel)
Matern Kernel
Few Things about GP Pros
Cons
Besides GP regression, as surrogate models, you can also use Random forests
Neural networks
Alternative Surrogate Model: Random Forests
More alternative surrogate models include Mondrian forest regression
Neural processes
Algorithm Outline: Bayesian Optimization
Handling Categorical or Integer-Valued Variables Spearmint Integer-valued variables
Categorical variables
A naive approach
BayesOpt
A navie approach
Acquisition Functions
Utility Function
Utility and Acquisition Functions Probability of Improvement (PI)
Expected Improvement (EI)
GP Upper Confidence Bound (GP-UCB)
Expected Improvement
Exploration-Exploitation Trade-Off in EI
GP-UCB
Compressed Sensing NAS in practice
Application to Soft-Voting in Ensemble
Neural Process Generative query network (GQN) Generalizations of GQN framework
Wanted
Gaussian Processes
(non-Bayesian) deep neural networks (DNNs)
Neural processes
Generative Query Network
Conditional Neural Processes Motivation CNPs combine benefits of NNs and GPs
Supervised Learning: Data Description Observed data
Target inputs
Underlying ground truth function
Task
Supervised Learning
CNP Embedding
Aggregation
Parameterized approximating function (e.g., neural networks)
GP vs CNP Gaussian processes
Conditional neural processes
CNP: Model
CNP: Architecture Architecture
The mean aggregation is used
Neural Processes Motivation
Neural Networks vs Gaussian Processes NNs
GPs
Architecture Encoder
Aggregator
Conditional decoder
Training
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
In the beginning God created the heaven and the earth.
Copyright © 2020 RMATH. All Right Reserved. |