what happend in high dimensional space?
Pixel-based distance on high-dimensional data can be very unintuitive.
Linear Classification
1) define a score function from image pixels to class scores. benefits, no need to store all data
2) SVM and Softmax
3) a loss function, measure the quality of a paricular set of parameters based on how well the induced scores agreed with the ground truth labels
optimization (SGD)
the loss function as a hihg-dimeonsional optimization landscape, in which trying to reach the bottom
BP
Rectified linear unit (ReLU)
Neural Networks
train a small network, the final loss are relatively few local minima, and easy to converge, but they are high loss; if train large network, there may many different solutions, but the variance in final loss is much smaller. –> all solutions are equally as good, rely less on the random initialization
in practice, use regularization tech to control overfit on large train network
Data Preprocessing
1) mean subtraction
2) normalization
3) PCA & whitening
4) weight initialization
5) regularization
5.1) L-norm regularization
5.2) Dropout
Hyperparamter optimization
1) initial learning rate
2) learning rate decay schedule
3) regularization strength (L2 penalty)
tips: decay learning rate over the period of training; search for good hyperparameters with random search
CNN
layers used to build ConvNet architectures:
1) Convolutional layer
2) ReLU layer
3) Pooling layer
4) Fully-connected layer
case study:
LeNet
AlexNet
ZF Net
GoogleNet
VGGNet
ResNet