Setting weight initialisation parameters to be consistent with weight norm constraints #64

matt-graham · 2015-03-11T13:27:34Z

We need to make sure that the random weight initialisation parameters we use (irange for uniform initialisation of convolutional layers and istdev for Gaussian initialisation of fully connected and softmax layers) do not force the initial weight kernel / matrix column norms to be higher than the max_kernel_norm and max_column_norm constraints in the model specification as otherwise the norm constraint will severely distort the updates applied to the weights making learning very hard and potentially unstable.

Particularly for the fully connected layer after the last convolutional layer this appears to potentially have been an issue as the input space to this layer and so column dimension of the weight matrix is very high meaning that with even very small istdev values the norms of the columns of the random initialised weight matrix appear to be above the maximum constraint. We think this might be at least part of the reason we are sometimes getting strange learning curves where the NLL suddenly jumps and/or gets stuck at some high value.

Simple way to set is to find weight kernel / matrix dimensions by running print_model.py script on a model pickle (this gives the sizes of the input/output spaces of each of the layers from which the relevant weight kernel / matrix dimensions can be inferred) then finding the irange / istddev parameter that will give an expected initial kernel / column norm some scale factor in [0,1] times by the maximum constraint. Expected kernel / column norms for a given dimensionality and initialisation distribution can either be computed analytically or just doing a simple Monte Carlo estimation. The kernel norms are calculated by pylearn2 as sqrt(sum(W**2, axis=(1, 2, 3))) and the column norms as sqrt(sum(W**2, axis=0)).

The text was updated successfully, but these errors were encountered:

matt-graham added bug in progress labels Mar 11, 2015

matt-graham added this to the Pylearn2 models milestone Mar 11, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Setting weight initialisation parameters to be consistent with weight norm constraints #64

Setting weight initialisation parameters to be consistent with weight norm constraints #64

matt-graham commented Mar 11, 2015

Setting weight initialisation parameters to be consistent with weight norm constraints #64

Setting weight initialisation parameters to be consistent with weight norm constraints #64

Comments

matt-graham commented Mar 11, 2015