forked from root-project/root
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Boosting.tex
146 lines (130 loc) · 7.74 KB
/
Boosting.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
\subsection{Boosted classifiers}\index{Boosted}
\label{sec:boosted}
Since generalised boosting is not yet available for regression in TMVA, we
restrict the following discussion to classification applications.
A boosted\index{Boosting} classifier is a combination of a
collection of classifiers of the same type trained on the same sample
but with different events weights.\footnote{The Boost method is at the
moment only applicable to classification problems.} The response of
the final classifier is a weighted response of each individual
classifier in the collection. The boosted classifier is potentially
more powerful and more stable with respect to statistical fluctuations
in the training sample. The latter is particularly the case for
bagging as ``boost'' algorithm (\cf Sec.~\ref{sec:bagging}, page~\pageref{sec:bagging}).
The following sections do not apply to decision trees. We refer to
Sec.~\ref{sec:bdt} (page~\pageref{sec:bdt}) for a description of boosted
decision trees. In the current version of TMVA only the AdaBoost and Bagging
algorithms are implemented for the boost of arbitrary classifiers. The boost
algorithms are described in detail in Sec.~\ref{sec:boost} on page
\pageref{sec:boost}.
\subsubsection{Booking options}
To book a boosted classifier, one needs to add the booster options to
the regular classifier's option string. The minimal option required is
the number of boost iterations \code{Boost_Num}, which must be
set to a value larger than zero. Once the Factory detects a
\code{Boost_Num>0} in the option string it books a boosted classifier
and passes all boost options (recognised by the prefix \code{Boost_}) to
the Boost method and the other options to the boosted classifier.
%The alternative and more explicit booking method is to book a
%MethodBoost first, and then to book the specific classifier to it:
\begin{codeexample}
\begin{tmvacode}
factory->BookMethod( TMVA::Types::kLikelihood, "BoostedLikelihood",
"Boost_Num=10:Boost_Type=Bagging:Spline=2:NSmooth=5:NAvEvtPerBin=50" );
\end{tmvacode}
\caption[.]{\codeexampleCaptionSize Booking of the boosted classifier:
the first argument is the predefined enumerator, the
second argument is a user-defined string identifier, and the third
argument is the configuration options string. All options with the
prefix \code{Boost_} (in this example the first two options) are
passed on to the boost method, the other options are provided to the
regular classifier (which in this case is Likelihood). Individual
options are separated by a ':'. See Sec.~\ref{sec:usingtmva:booking}
for more information on the booking.
}
\end{codeexample}
The boost configuration options are given in Option
Table~\ref{opt:mva::boost}.
\begin{option}[!t]
\input optiontables/MVA__Boost.tex
\caption[.]{\optionCaptionSize Boosting configuration options. These
options can be simply added to a simple classifier's option string
or used to form the option string of an explicitly booked boosted
classifier.}
\label{opt:mva::boost}
\end{option}
The options most relevant for the boost process are the number of boost
iterations, \code{Boost_Num}, and the choice of the boost algorithm,
\code{Boost_Type}. In case of \code{Boost_Type=AdaBoost}, the option
\code{Boost_Num} describes the maximum number of boosts. The algorithm
is iterated until an error rate of 0.5 is reached or until
\code{Boost_Num} iterations occurred. If the algorithm terminates after
to few iterations, the number might be extended by decreasing the
$\beta$ variable (option \code{Boost_AdaBoostBeta}). Within the
AdaBoost algorithm a decision must be made how to classify an event, a
task usually done by the user. For some classifiers it is straightforward to
set a cut on the MVA response to define signal-like events. For the others,
the MVA cut is chosen that the error rate is minimised. The option \code{Boost_RecalculateMVACut}
determines whether this cut should be recomputed for every boosting iteration.
In case of Bagging as boosting algorithm the number of boosting
iterations always reaches \code{Boost_Num}.
By default boosted classifiers are combined as a weighted average with
weights computed from the misclassification error (option
\code{Boost_MethodWeightType=ByError}). It is also possible to use
the arithmetic average instead (\code{Boost_MethodWeightType=Average}).
\subsubsection{Boostable classifiers}
The boosting process was originally introduced for simple classifiers.
The most commonly boosted classifier is the decision tree (DT -- \cf Sec.~\ref{sec:bdt},
page~\pageref{sec:bdt}). Decision trees need to be boosted a few hundred
times to effectively stabilise the BDT response and achieve optimal
performance.
Another simple classifier in the TMVA package is the Fisher
discriminant~(\cf Sec.~\ref{sec:fisher}, page~\pageref{sec:fisher} -- which is equivalent
to the linear discriminant described in Sec.~\ref{sec:ld}). Because the output
of a Fisher discriminant represents a linear combination of the input variables,
a linear combination of different Fisher discriminants is again a Fisher
discriminant. Hence linear boosting cannot improve the performance. It is
nevertheless possible to effectively boost a linear discriminant by applying
the linear combination not on the discriminant's output, but on the actual
classification results provided.\footnote
{
Note that in the TMVA standard example, which uses linearly correlated,
Gaussian-distributed input variables for signal and background, a
single Fisher discriminant already provides the theoretically maximum
separation power. Hence on this example, no further gain can be
expected by boosting, no matter what ``tricks'' are applied.
}
This corresponds to a ``non-linear'' transformation of the
Fisher discriminant output according to a step function. The Boost
method in TMVA also features a fully non-linear transformation that is
directly applied to the classifier response value. Overall, the following
transformations are available:
\begin{itemize}
\item{\em linear:} no transformation is applied to the MVA output,
\item{\em step:} the output is $-1$ below the step
and $+1$ above (default setting),
\item{\em log:} logarithmic transformation of the output.
\end{itemize}
The macro \code{Boost.C} (residing in the \code{macros} (\code{test}) directory
for the sourceforge (ROOT) version of TMVA) provides examples for the use of
these transformations to boost a Fisher discriminant. We point out that the
performance of a boosted classifier strongly depends on its characteristics
as well as on the nature of the input data. A careful adjustment of options is required
if AdaBoost is applied to an arbitrary classifier, since otherwise it might even
lead to a worse performance than for the unboosted method.
\subsubsection{Monitoring tools}
The current GUI provides figures to monitor the boosting process. Plotted are
the boost weights, the classifier weights in the boost ensemble, the classifier
error rates, and the classifier error rates using unboosted event weights.
In addition, when the option \code{Boost_MonitorMethod=T} is set,
monitoring histograms are created for each classifier in the boost ensemble.
The histograms generated during the boosting process provide useful insight
into the behaviour of the boosted classifiers and help to adjust to the optimal
number of boost iterations. These histograms are saved in a separate folder
in the output file, within the folder of {\tt MethodBoost/<Title>/}.
Besides the specific classifier monitoring histograms, this
folder also contains the MVA response of the classifier for the training
and testing samples.
\subsubsection{Variable ranking}
The present boosted classifier implementation does not provide a ranking of
the input variables.