Unsupervised Learning – John Canque

Difference between targets and no targets. Unsupervised: 10 items and we separate by similarities. Supervised has a y label.

Principal components analysis and clustering

Challenge of Unsupervised learning

Another advantage

Principal Components Analysis

green line has the higest variance of the feautres
the blue dashed has the highest variance that is uncorrelated with the first componenet
If you only have 2 variables, you’ll only have two components
PCA needs to have centered at mean 0.
Looking for linear combination that has highest variance

Geometry of PCA

Further principal components

because its uncorrelatied, we’ll still look for where the variance is the highest
Maximizes variance subject to being uncorrelated to previous ones.

US Arrests

Another Interpretation of Principal Components

approximating data by using lower data
We are looking for the hyperplane of the 2 largest principal components.
We want the data to be as spread out as much as possible.
In Linear regression, we are looking for difference from point to slope line. In PCA, we are looking for perpendicular to hyper plane.

Scaling of the Variables Matters

We need to scale by standardizing (everything has to be measured on an equal scale)
If the 1st two principal componenets
proportion variance explained: if 1st 2 principal components explain 96% of data. We can just use the first 2.
Cross-validation doesn’t really help because there’s no y variables.
We can use cross-validation to help decide how many principal components to use. At this moment, we’ll have a supervising response.

PCA vs Clustering

K-means clustering

Algorithm: we sign 1 to k. We find a centroid.
We assign each observation to the cluster whose centroid is closest.
We then move the centroid to the mean of their respective cluster.
We then reassign with the new centroid. We continue this process until we get to the our number?
Finds the Local minimum: the valley. But not necessarily the global minimum. The function is not convex.
Clustering will differ when it moves.

Hierarchical Clustering