-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathIndFeat.m
50 lines (42 loc) · 1.32 KB
/
IndFeat.m
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
% IndFeat.m - routine for selecting independent features for
% modeling problems
%
% code by Will Dwinnell ([email protected])
%
% Sig = IndFeat(X,Y)
%
% Calculate significance level, 'Sig' of real variables (in columns) from
% matrix 'X', based on their ability to distinguish 2 categories in column
% vector 'Y' (typically, variables are kept when significance >= 2.0).
% For a continuous output variable, it is suggested that values be split
% into lower and upper 50%.
%
% Uses Weiss/Indurkhya 'independent features' significance testing
% method- keep in mind that this is not intended to be the final
% variable selection, just a means of eliminating obviously uninteresting
% features. See Weiss' and Indurkhya's book, "Predictive Data Mining".
%
% Example:
%
% X = randn(5e3,25);
% Y = double(sum(X(:,1:3)')' > 1.1);
% IndFeat(X,Y)
% figure
% stem(IndFeat(X,Y))
% grid on
% zoom on
%
% Last modified: Nov-14-2006
function Sig = IndFeat(X,Y)
% find the two (more to come) class "names"
UniqueClass = unique(Y);
% find indexes for both classes
ClassA = (Y == UniqueClass(1));
ClassB = (Y == UniqueClass(2));
nA = sum(double(ClassA));
nB = sum(double(ClassB));
% calculate significances
Sig = ...
abs(mean(X(ClassA,:)) - mean(X(ClassB,:))) ./ ...
sqrt(var(X(ClassA,:)) ./ nA + var(X(ClassB,:)) ./ nB);
% EOF