Dot product optimization pass #20

jklontz · 2013-11-02T15:31:18Z

At the heart of many computer vision algorithms (subspace learning, deep learning, wavelets) is a dot product operation of an incoming image against a filter constructed offline. This idea is to introduce a suite of LLVM optimization passes that leverage the fact that the filter is known at compile time. Specifically:

Completely unroll the dot product loop based on the known filter size.
Substitute memory access instructions with the known constant values for the filter.
Eliminate instructions where the filter value is 0 (or perhaps near-zero)

Together these passes convert the code between a generic dense dot product and a hard-coded sparse dot-product.

As a stretch goal:

Other optimizations which become possible when the dot product can be approximated within a pre-specified margin of error.

This is a long-term research idea and a good paper alone. It is also an example of an interesting idea that becomes possible with a computer vision DSL.

jklontz · 2013-11-12T18:21:50Z

@mburge I think this is optimization we discussed yesterday.

jklontz · 2014-03-12T17:39:07Z

caotto: so basically you build a look-up-table for convolution with a filter at compile time?
[1:26pm] caotto: jklontz: or do I not get the idea
[1:27pm] jklontz: I would go further than a LUT
[1:27pm] caotto: elaborate?
[1:27pm] jklontz: and inline the LUT into the code
[1:27pm] jklontz: for example
[1:29pm] jklontz: in our current sheme say we learn a 2-dimensional filter, x, with values {1, 2}
[1:29pm] jklontz: conventionally we would apply that filter to another vector y like:
[1:30pm] jklontz: int response = 0; for (int i=0; i<x.len; i++) response += y[i] * x[i]
[1:30pm] jklontz: however, since we know x at compile time now
[1:30pm] jklontz: the compiler to simplify this to
[1:30pm] jklontz: int response = y[0] * 1 + y[1] * 2;
[1:31pm] jklontz: pretty powerful, no?
[1:32pm] jklontz: this then opens the door to conventional algebraic simplification
[1:32pm] caotto: so it's not a LUT based on input values, rather you are inlining the math for doing the convolution
[1:32pm] jklontz: removing the *1, elminating any operations that are *0
[1:32pm] jklontz: yeah
[1:32pm] caotto: I see, so it's not a LUT at all
[1:33pm] jklontz: and the compiler can turn that dense multiplication into a sparse one automatically
[1:33pm] jklontz: yeah I guess LUT is not the right word
[1:33pm] caotto: Well that sounds pretty interesting.
[1:34pm] jklontz: I think it could be a real game changer for deep learning which involves learning a lot of filters
[1:35pm] caotto: so this is basically saying, we do the convolution in time domain, and just optimize the hell out of it
[1:36pm] caotto: have you considered how this compares/would apply to fft based techniques
[1:36pm] jklontz: I haven't
[1:36pm] jklontz: but I think it's relevant to anything where one or both of the operands is known at training time
[1:37pm] caotto: so for example I'm pretty sure OpenCV's filter engine will do fft, apply the filter in frequency domian, then inverse fft
[1:37pm] caotto: I guess you could just do the fft at compile time?
[1:37pm] jklontz: yup
[1:37pm] caotto: and do somehting similar based on that
[1:37pm] jklontz: precisely

jklontz · 2014-07-10T17:17:03Z

@bklare This is one of the optimizations that I think would be particularly cool for a deep learning neural net.

bklare-zz · 2014-07-10T19:06:11Z

This is a pretty cool idea. I am still getting up to speed on CNN's. Do you know if the filters are the same at each layer? That is, the first layer is likely some form of Gabor filters, which you could use your hard coded and sparse approach for. I am wondering if this is the same at the subsequent layers as well. If not, the benefit may be lessened.

jklontz · 2014-07-10T23:46:46Z

I think the filters are different at each layer, but they could be equally sparse?

JordanCheney · 2014-07-11T07:05:13Z

https://www.facebook.com/publications/546316888800776/

If we wanted to build a deep learning network this could be a pretty cool one :)

bklare-zz · 2014-07-11T19:50:25Z

@jklontz Yes, I think they could be equally sparse. I guess my concern is that the selected filters for the subsequent would be unknown until after training.

bklare-zz · 2014-07-11T19:51:00Z

@JordanCheney Yeah, that paper has gotten a lot of buzz. It was presented at CVPR a few weeks ago as well. It is worth studying if you haven't already done so.

JordanCheney · 2014-07-20T02:02:50Z

I was excited talking about this yesterday and I was just wondering if there were any thoughts about implementing convolution in likely yet? I think it could be done in the likely standard library as a function that takes a base matrix, a filter matrix and a stride value. A dot product can then be computed for slices of the base matrix (is slicing implemented in likely?) and the filter. The filter would "move" across the base matrix in accordance with the specified stride value. My only question is would implementing this in the standard library take advantage of all of the optimizations discussed above?

jklontz · 2014-07-20T12:44:43Z

I completely agree that it should be implemented in the standard library, and have been doing 47d06c0 as a precursor. The convolution function should only take the image and filter as arguments and be able to infer the rest. The main feature currently lacking for this is being able to specify where in the image you want to access.

jklontz · 2014-07-20T17:40:54Z

To follow up on this, the most immediate next step is to extend kernelArgument::evaluateOperator to support arguments. For example:

image = (read "../data/misc/lenna.tiff"); currently works
(image 0 128 64); doesn't work, should return the value at column = 0, channel = 128, row = 64

JordanCheney · 2014-07-20T17:41:48Z

are the column and channel arguments switched here?

JordanCheney · 2014-07-20T19:30:58Z

Quick question- kernelArgument::evaluateOperator takes a likely_ast as one argument. I thought the purpose of an ast was that it could parse an arbitrarily sized list of arguments, which is the desired outcome here. Is there a limitation here that I am not understanding?

jklontz · 2014-07-20T19:58:27Z

Yeah, I switched column and channel in the example!

You should first confirm that ast->is_list and then you can access the parameters as ast->atoms[0] through ast->atoms[ast->num_atoms-1]

Note that in the example above, ast->atoms[0] == "image" and ast->atoms[1] == 0 and ast->num_atoms == 4. This is because the ast is isomorphic with the source code s-expression: (image 0 128 64).

JordanCheney · 2014-07-23T13:50:30Z

Quick sanity and understanding check for me- My understanding right now is that likely functions as a stack of environments, each of which has certain operations registered in some type of LUT. The root of this stack has all of the likely_standard_library functions in it. First, is this the correct, and, if so, does creating a new likely_matrix automatically register it in the current environment's LUT so that it can be processed later on? This second point ties into this because "image" is a likely_matrix not an operation so I wanted to check if ast->atoms[0] can be looked up using the same lookup() function or if a different method must be used.

jklontz · 2014-07-23T14:18:55Z

The first point is correct, but as you've noticed, it's more of a "lookup linked list" than a "lookup table". The second point about being able to lookup an image will be true in a few days, as I still have a few more patches that need to land first. Right now if you look up image you will get back (read "../data/misc/lenna.tiff") unevaluated.

jklontz mentioned this issue Jul 24, 2014

Exposing Compile Time Constant Matricies #52

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dot product optimization pass #20

Dot product optimization pass #20

jklontz commented Nov 2, 2013

jklontz commented Nov 12, 2013

jklontz commented Mar 12, 2014

jklontz commented Jul 10, 2014

bklare-zz commented Jul 10, 2014

jklontz commented Jul 10, 2014

JordanCheney commented Jul 11, 2014

bklare-zz commented Jul 11, 2014

bklare-zz commented Jul 11, 2014

JordanCheney commented Jul 20, 2014

jklontz commented Jul 20, 2014

jklontz commented Jul 20, 2014

JordanCheney commented Jul 20, 2014

JordanCheney commented Jul 20, 2014

jklontz commented Jul 20, 2014

JordanCheney commented Jul 23, 2014

jklontz commented Jul 23, 2014

Dot product optimization pass #20

Dot product optimization pass #20

Comments

jklontz commented Nov 2, 2013

jklontz commented Nov 12, 2013

jklontz commented Mar 12, 2014

jklontz commented Jul 10, 2014

bklare-zz commented Jul 10, 2014

jklontz commented Jul 10, 2014

JordanCheney commented Jul 11, 2014

bklare-zz commented Jul 11, 2014

bklare-zz commented Jul 11, 2014

JordanCheney commented Jul 20, 2014

jklontz commented Jul 20, 2014

jklontz commented Jul 20, 2014

JordanCheney commented Jul 20, 2014

JordanCheney commented Jul 20, 2014

jklontz commented Jul 20, 2014

JordanCheney commented Jul 23, 2014

jklontz commented Jul 23, 2014