image-classification-note

Problems:

  1. Semantic Gap: There’s a huge gap between the semantic idea of a cat, and these pixel values that the computer is actually seeing.
  2. Viewpoint variation: All pixels change when the camera moves
  3. Illumination: There can be lighting conditions going on in the scene
  4. Deformation: Cats can assume a lot of different, varied poses and positions.
  5. Occlusion: You might only see a part of a cat.
  6. Background Clutter: The foreground of the cat look similar in appearance
  7. Intraclass variation: Cats can come in different shapes and sizes and colors and ages

An image classifier

1
2
3
def (image):
return class_label

no obvious way to hard-code the algorithm for recognizing a cat, or other classes.

Data-Driven Approach

  1. Collect a dataset of images and labels
  2. Use Machine Learning to train a classifier
  3. Evaluate the classifier on new images
1
2
3
4
5
6
7
def train(images, labels):
# Machine learning
return model
def predict(model, test_images):
# Use model to predict labels
return test_labels

Rather than a single function that just inputs an image and recognizes a cat, we have these two functions. One called train, that’s going to input images and labels and then output a model, another function called predict, which will input the model and make predictions for images.

#Nearest Neighbor classifier

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
import numpy as np
class NearestNeighbor:
def __init__(self):
pass
def train(self, X, y):
""" X is N x D where each row is an example. Y is 1-dimension of size N """
# the nearest neighbor classifier simply remembers all the training data
self.Xtr = X
self.ytr = y
def predict(self, X):
""" X is N x D where each row is an example we wish to predict label for """
num_test = X.shape[0]
# lets make sure that the output type matches the input type
Ypred = np.zeros(num_test, dtype = self.ytr.dtype)
# loop over all test rows
for i in xrange(num_test):
# find the nearest training image to the i'th test image
# using the L1 distance (sum of absolute alue differences)
distances = np.sum(np.abs(self.Xtr - X[i, :]), axis = 1)
min_index = np.argmin(distances) # get the index with smallest distance
Ypred[i] = self.ytr[min_index] #predict the label of the nearest example
return Ypred

Q: With N examples. how fast are training and prediction?

A: Train O(1), predict O(N)

This is bad: we want classifiers that are fast at prediciton; slow for training is ok.

k-Nearest Neighbors

Instead of copying label from nearest neighbor, thake majority vote form K closest points.

###Hyperparameters

  • What is the best value of k to use?
  • What is the best distance to use?

These are hyperparameters: choices about the algorithm that we set rather than learn

_Very problem-dependent._

_Must try them all out and see what works best._

Setting Hyperparameters

  • Split data into train, val, and test; choose hyperparameters on val and evaluate on test
  • Cross-Validation: Split data into **folds, try each fold as validation and average the results. Useful for small datasets but not used too frequently in deep learning.

k-Nearest Neighbor on images never used

  • Very slow at test time
  • Distance metrics on pixels are not informative
  • Curse of dimensionality

k-Nearest Neighbors: Summary

  • In Image classification we start with a training set of images and labels, and must predict labels on the test set
  • The *K-Nearest Neighbors classifier predicts labels based on nearest training examples
  • Distance metric and K are hyperparameters
  • Choose hyperparameters using the validation set; only run on the test set once at the very end!

#Linear Classification

These deep neural networks are kind of like Legos and this linear classifier is kind of like the most basic building blocks of these giant networks.

f(x, W) = Wx + b