-
Notifications
You must be signed in to change notification settings - Fork 4
/
Copy pathmtimesx.m
279 lines (273 loc) · 11.9 KB
/
mtimesx.m
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
% mtimesx does a matrix multiply of two inputs (single, double, or sparse)
%******************************************************************************
%
% MATLAB (R) is a trademark of The Mathworks (R) Corporation
%
% Function: mtimesx
% Filename: mtimesx.m
% Programmer: James Tursa
% Version: 1.10
% Date: December 08, 2009
% Copyright: (c) 2009 by James Tursa, All Rights Reserved
%
% This code uses the BSD License:
%
% Redistribution and use in source and binary forms, with or without
% modification, are permitted provided that the following conditions are
% met:
%
% * Redistributions of source code must retain the above copyright
% notice, this list of conditions and the following disclaimer.
% * Redistributions in binary form must reproduce the above copyright
% notice, this list of conditions and the following disclaimer in
% the documentation and/or other materials provided with the distribution
%
% THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
% AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
% IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
% ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
% LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
% CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
% SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
% INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
% CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
% ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
% POSSIBILITY OF SUCH DAMAGE.
%
%--
%
% mtimesx is a fast general purpose matrix and scalar multiply routine that utilizes
% BLAS calls and custom code to perform the calculations. mtimesx also has extended
% support for n-Dimensional (nD, n > 2) arrays, treating these as arrays of 2D matrices
% for the purposes of matrix operations.
%
% "Doesn't MATLAB already do this?" For 2D matrices, yes, it does. However, MATLAB does
% not always implement the most efficent algorithms for memory access, and MATLAB does not
% always take full advantage of symmetric cases. The mtimesx 'SPEED' mode attempts to do
% both of these to the fullest extent possible. For nD matrices, MATLAB does not have
% direct support for this. One is forced to write loops to accomplish the same thing that
% mtimesx can do faster.
%
% The usage is as follows (arguments in brackets [ ] are optional):
%
% Syntax
%
% M = mtimesx( [mode] )
% C = mtimesx(A [,transa] ,B [,transb] [,mode])
%
% Description
%
% mtimesx performs the matrix calculation op(A) * op(B), where:
% A = A single or double or sparse scalar, matrix, or array.
% B = A single or double or sparse scalar, matrix, or array.
% transa = A character indicating a pre-operation on A:
% transb = A character indicating a pre-operation on B:
% The pre-operation can be any of:
% 'N' or 'n' = No operation (the default if trans_ is missing)
% 'T' or 't' = Transpose
% 'C' or 'c' = Conjugate Transpose
% 'G' or 'g' = Conjugate (no transpose)
% mode = 'MATLAB' or 'SPEED' (sets mode for current and future calculations,
% case insensitive, optional)
% M is a string indicating the current calculation mode, before setting the new one.
% C is the result of the matrix multiply operation.
%
% Examples:
%
% C = mtimesx(A,B) % performs the calculation C = A * B
% C = mtimesx(A,'T',B) % performs the calculation C = A.' * B
% C = mtimesx(A,B,'G') % performs the calculation C = A * conj(B)
% C = mtimesx(A,'C',B,'C') % performs the calculation C = A' * B'
%
% mtimesx has two modes:
%
% 'MATLAB' mode: This mode attempts to reproduce the MATLAB intrinsic function mtimes
% results exactly. When there was a choice between faster code that did not match the
% MATLAB intrinsic mtimes function results exactly vs slower code that did match the
% MATLAB intrinsic mtimes function results exactly, the choice was made to use the
% slower code. Speed improvements were only made in cases that did not cause a mismatch.
% Caveat: I have only tested on a PC with later versions of MATLAB. But MATLAB may use
% different algorithms for mtimes in earlier versions or on other machines that I was
% unable to test, so even this mode may not match the MATLAB intrinsic mtimes function
% exactly in some cases. This is the default mode when mtimesx is first loaded and
% executed (i.e., the first time you use mtimesx in your MATLAB session and the first
% time you use mtimesx after clearing it). You can set this mode for all future
% calculations with the command mtimesx('MATLAB') (case insensitive).
%
% 'SPEED' mode: This mode attempts to reproduce the MATLAB intrinsic function mtimes
% results closely, but not necessarily exactly. When there was a choice between faster
% code that did not exactly match the MATLAB intrinsic mtimes function vs slower code
% that did match the MATLAB intrinsic mtimes function, the choice was made to use the
% faster code. Speed improvements were made in all cases that I could identify, even
% if they caused a slight mismatch with the MATLAB intrinsic mtimes results.
% NOTE: The mismatches are the results of doing calculations in a different order and
% are not indicative of being less accurate. You can set this mode for all future
% calculations with the command mtimesx('SPEED') (case insensitive).
%
% Note: You cannot combine double sparse and single inputs, since MATLAB does not
% support a single sparse result. You also cannot combine sparse inputs with full
% nD (n > 2) inputs, since MATLAB does not support a sparse nD result. The only
% exception is a sparse scalar times an nD full array. In that special case,
% mtimesx will treat the sparse scalar as a full scalar and return a full nD result.
%
% Note: The ‘N’, ‘T’, and ‘C’ have the same meanings as the direct inputs to the BLAS
% routines. The ‘G’ input has no direct BLAS counterpart, but was relatively easy to
% implement in mtimesx and saves time (as opposed to computing conj(A) or conj(B)
% explicitly before calling mtimesx).
%
% mtimesx supports nD inputs. For these cases, the first two dimensions specify the
% matrix multiply involved. The remaining dimensions are duplicated and specify the
% number of individual matrix multiplies to perform for the result. i.e., mtimesx
% treats these cases as arrays of 2D matrices and performs the operation on the
% associated parings. For example:
%
% If A is (2,3,4,5) and B is (3,6,4,5), then
% mtimesx(A,B) would result in C(2,6,4,5)
% where C(:,:,i,j) = A(:,:,i,j) * B(:,:,i,j), i=1:4, j=1:5
%
% which would be equivalent to the MATLAB m-code:
% C = zeros(2,6,4,5);
% for m=1:4
% for n=1:5
% C(:,:,m,n) = A(:,:,m,n) * B(:,:,m,n);
% end
% end
%
% The first two dimensions must conform using the standard matrix multiply rules
% taking the transa and transb pre-operations into account, and dimensions 3:end
% must match exactly or be singleton (equal to 1). If a dimension is singleton
% then it is virtually expanded to the required size (i.e., equivalent to a
% repmat operation to get it to a conforming size but without the actual data
% copy). For example:
%
% If A is (2,3,4,5) and B is (3,6,1,5), then
% mtimesx(A,B) would result in C(2,6,4,5)
% where C(:,:,i,j) = A(:,:,i,j) * B(:,:,1,j), i=1:4, j=1:5
%
% which would be equivalent to the MATLAB m-code:
% C = zeros(2,6,4,5);
% for m=1:4
% for n=1:5
% C(:,:,m,n) = A(:,:,m,n) * B(:,:,1,n);
% end
% end
%
% When a transpose (or conjugate transpose) is involved, the first two dimensions
% are transposed in the multiply as you would expect. For example:
%
% If A is (3,2,4,5) and B is (3,6,4,5), then
% mtimesx(A,'C',B,'G') would result in C(2,6,4,5)
% where C(:,:,i,j) = A(:,:,i,j)' * conj( B(:,:,i,j) ), i=1:4, j=1:5
%
% which would be equivalent to the MATLAB m-code:
% C = zeros(2,6,4,5);
% for m=1:4
% for n=1:5
% C(:,:,m,n) = A(:,:,m,n)' * conj( B(:,:,m,n) );
% end
% end
%
% If A is a scalar (1,1) and B is (3,6,4,5), then
% mtimesx(A,'G',B,'C') would result in C(6,3,4,5)
% where C(:,:,i,j) = conj(A) * B(:,:,i,j)', i=1:4, j=1:5
%
% which would be equivalent to the MATLAB m-code:
% C = zeros(6,3,4,5);
% for m=1:4
% for n=1:5
% C(:,:,m,n) = conj(A) * B(:,:,m,n)';
% end
% end
%
% ---------------------------------------------------------------------------------------------------------------------------------
%
% The BLAS routines used are DDOT, DGEMV, DGEMM, DSYRK, and DSYR2K for double
% variables, and SDOT, SGEMV, SGEMM, SSYRK, and SSYR2K for single variables.
% These routines are (description taken from www.netlib.org):
%
% DDOT and SDOT:
%
% * forms the dot product of two vectors.
%
% DGEMV and SGEMV:
%
% * DGEMV performs one of the matrix-vector operations
% *
% * y := alpha*A*x + beta*y, or y := alpha*A'*x + beta*y,
% *
% * where alpha and beta are scalars, x and y are vectors and A is an
% * m by n matrix.
%
% DGEMM and SGEMM:
%
% * DGEMM performs one of the matrix-matrix operations
% *
% * C := alpha*op( A )*op( B ) + beta*C,
% *
% * where op( X ) is one of
% *
% * op( X ) = X or op( X ) = X',
% *
% * alpha and beta are scalars, and A, B and C are matrices, with op( A )
% * an m by k matrix, op( B ) a k by n matrix and C an m by n matrix.
%
% DSYRK and SSYRK:
%
% * DSYRK performs one of the symmetric rank k operations
% *
% * C := alpha*A*A' + beta*C,
% *
% * or
% *
% * C := alpha*A'*A + beta*C,
% *
% * where alpha and beta are scalars, C is an n by n symmetric matrix
% * and A is an n by k matrix in the first case and a k by n matrix
% * in the second case.
%
% DSYR2K and SSYR2K:
%
% * DSYR2K performs one of the symmetric rank 2k operations
% *
% * C := alpha*A*B' + alpha*B*A' + beta*C,
% *
% * or
% *
% * C := alpha*A'*B + alpha*B'*A + beta*C,
% *
% * where alpha and beta are scalars, C is an n by n symmetric matrix
% * and A and B are n by k matrices in the first case and k by n
% * matrices in the second case.
%
% Double sparse matrix operations are supported, but not always directly.
% For (matrix) * (scalar) operations, custom code is used to produce a result
% that minimizes memory access times. All other operations, such as
% (matrix) * (vector) or (matrix) * (matrix), or any operation involving a transpose
% or conjugate transpose, are obtained with calls back to the MATLAB intrinsic
% mtimes function. Thus for most non-scalar sparse operations, mtimesx is
% simply a thin wrapper around the intrinsic MATLAB function and you will see
% no speed improvement.
%
% ---------------------------------------------------------------------------------------------------------------------------------
%
% Examples:
%
% C = mtimesx(A,B) % performs the calculation C = A * B
% C = mtimesx(A,'T',B,'speed') % performs the calculation C = A.' * B
% % using a fast algorithm that may not
% % match MATLAB results exactly
% mtimesx('matlab') % sets calculation mode to match MATLAB
% C = mtimesx(A,B,'g') % performs the calculation C = A * conj(B)
% C = mtimesx(A,'c',B,'C') % performs the calculation C = A' * B'
%
% ---------------------------------------------------------------------------------------------------------------------------------
function varargout = mtimesx(varargin)
%\
% If you got here then mtimesx is not compiled yet, so go compile it first.
%/
mtimesx_build;
%\
% Call the mex routine mtimesx.
%/
[varargout{1:nargout}] = mtimesx(varargin{:});
end