USPS data set
Data set
File: USPS_dataset9296.mat
This USPS data set contains all 10 digits as labels. Inputs are images as 256-length row vectors, representing 16x16 pixels each.
Instructions
Pre-processing
We will make binary classification of the digits 3 (label 0) and 8 (label 1).
Split the data set into training and validation. Training data set will have the first 80% images for each label, and the validation data set will have the last 20% of each.
Add a bias pixel to form the feature u(x)=[1,x], expanding the input into a vector of size 257.
Least Squares
For each 1<=M<=257, crop the first M features of the vector u(x). Solve the least squares theta_M for the cropped feature vector.
Produce a graph: training and validation quadratic errors of the predictor (as two curves), as validated on the entire test sets, versus number of components M.
PCA
Calculate the PCA dictionary full 256x256 matrix acquired over the training data set (without the bias pixel). You can assume the data has no bias and needs no correction.
Compress the training and validation data set independently, using the dictionary matrix for all (1<=M<=256) principled components.
Solve a Least square problem to obtain the M-length model parameters theta, using as input the M-length representation vectors z from the training set, and their true labels.
Produce a graph: training and validation quadratic errors of the PCA predictor (as two curves), versus number of components M.
Discuss, using a printout code disp(‘…’), why this graph is different from the one in Section 2.
Logistic Regression with one layer
Train using 25 iterations of stochastic gradient descent a logistic regression (with one layer) with input size of 4, being the PCA representation (M=3) and a bias input (like Section 1c). Use the following parameters: initial model parameter vector theta= 1/sqrt(M+1)*randn(M+1,1); learning rate=0.1; and minibatch size of 10 (randperm(n,k) can be helpful).
Produce a graph of training and validation logistic losses vs iteration index.
Around quite a while back, the standardized identification was designed. Standardized tag alludes to the width of the numbers, going from dark and clear organized as per certain encoding rules used to communicate a bunch of realistic identifier data (Data ID, 2003). This unprecedented development has endured for an extremely long period. Up to this point, its effects and commitments to the trade and society have been gigantic. This report intends to present the foundation of scanner tag, look at the ideas according to a hypothetical point of view lastly, break down and assess it basically.
- Foundation