#Matrix decomposition

##Recap

Graphical models and bayes nets => supposedly the future of ml.
However inference from graphs is considered an intractible problem.

##Matrix decomposition

+In general matrix decomposition is the factorization of a matrix into a product of matrices. In particular we talk about non-negative matrix factorization(NMF), that is we factorize a matrix into two matrices such that all three matrices are element wise non-negative.

+Many problems can be rephrased as matrix decomposition and thus it is a handy tool to have in the machine learning shed.

+Data sets such as images etc. fall into the line of fire of NMF

###Example

Consider the 2D data set

where y1 and y2 are drawn from gaussian (normal) generators as follows:

+y1 ~ g(0;1^2)
+y2 ~ g(0;2^2)

where a gaussian distribution is parameterised as g(mean; std. dev)

we also have the following energy functions for y1 and y2:

+E(y1) = y1^2
+E(y2) = (y2^2)/4

(for those who are wondering what all this talk of energy functions is about and what they have to do with stats this might help.)

We can visualise the data using a histogram:

or with a scatter plot:

The physics analog here is the distribution of molecules in 2 different rooms. We want to consider them seperately, then consider the joint distribtution of molecules across the two rooms.

Thus the joint energy of the two samples is:

E(y1,y2) = (y1^2) +  (y2^2)/4

Then we can get the probability of some X across the two gaussians:

p(x) = 1/Z(e^-(E(y1,y2)))  
where Z is the partition function(see previous lectures)

##Non axial parallel example

In the case where the distributions are non-axial parallel. For example:

There are 2 sources of variation in the data given by ci, where:
||ci|| = 1 //ci is a unit vector
λi = amount of std. dev. in direction i

Thus we have another energy function, that describes a 2D gaussian with arbitrary direction:

where:

// [i=j] knuth notation, returns 0 or 1

##Expanding to n-dimensional gaussians

How do we expand this concept to n-space?

Include the lambda's:

where

When the distribution(s?) are axial parallel we have:

when the the distribution(s?) are not axial parallel the matrix gives you the orientation of the distribution(s?)

Going back to our 2D data set, we have x expressed as a sum:

##Example

say we have an 8 * 8 matrix X, a 64 * 12 matrix A (the column joined ci's) and a 12*M matrix B
we want to decompose X (approximately) S.T:

we find:

i.e. the least squares approach
with the constraint that:

(aside: all major contributions to the field of statistics have been by pyschologists, not statisticians.)

What if we relax the above constraint (i.e. that the ci's are pairwise orthogonal)?
=> No unique solution (to what???)

For example consider the distributions:

Usually only one source of variation is non-zero in this case. This allows for independent component anlaysis.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lecture-18.md

lecture-18.md

Files

lecture-18.md

Latest commit

History

lecture-18.md

File metadata and controls