\subsection {Ten Pre-processors}

In this study, we investigate:
\bi
\item Three {\em simple preprocessors}: {\bf none, norm, and log};
\item One {\em feature synthesis} methods called {\bf PCA};
\item Two {\em feature selection} methods: {\bf SFS} (sequential forward selection) and {\bf SWreg};
\item Four {\em discretization} methods:  divided on equal frequency/width.
\ei
{\bf None} is the simplest preprocessor- all values are unchanged.

With the {\bf norm} preprocessor,
numeric values are  normalized
to a
0-1 interval using Equation \ref{equation:normalization}. Normalization means
that no variable has a greater influence that any other. 
\begin{equation}
\small
normalizedValue = \frac{(actualValue - min(allValues))}{(max(allValues) - min(allValues))}
\label{equation:normalization}
\end{equation}

With the {\bf log} preprocessor, all numerics are replaced with their logarithm. This {\bf log}ging
procedure minimizes the effects of the occasional very large numeric value.

Principal component analysis~\cite{Alpaydin2004}, or
{\bf PCA}, is a {\em feature synthesis} preprocessor that
converts a number of possibly correlated variables into a smaller number of uncorrelated variables called components. The first component accounts for as much of the variability in the data as possible, and each succeeding component accounts for as much of the remaining variability as possible.

Some of the preprocessors aim at finding a subset of all features according to certain criteria
such as
{\bf SFS} (sequential forward selection) and {\bf SWR} (stepwise regression).
{\bf SFS} adds features into an initially empty set until no improvement is possible with the addition of another feature. When ever the selected feature set is enlarged, some oracle is called to assess the value
of that set of features. In this study, 
we used the MATLAB, \textit{objective} function (which reports the 
the mean-squared-error of a simple linear regression on the training set).
One caution to be made here is that exhaustive search algorithms over all features can be very time consuming ($2^n$ combinations in an \textit{n}-feature dataset), therefore SFS works only in forward direction (no backtracking).

{\bf SWR} adds and removes features from a multilinear model.
Addition and removal is controlled by the p-value in an F-Statistic.  At
each step, the F-statistics for two models (models with/out
one feature) are
calculated.  Provided that the feature was not in the model, the
null hypothesis is: ``Feature would have a zero coefficient in the
model, when it is added''.  If the null hypothesis can be rejected,
then the feature is added to the model.  As for the other scenario
(i.e. feature is already in the model), the null hypothesis is:
``Feature has a zero coefficient''.  If we fail to reject the null
hypothesis, then the term is removed.  

%Basically the steps followed by the method are as follows:
%
%\begin{itemize}
%\item Fit an initial model.
%\item If some features have low p-values (lower than a pre-defined threshold, which is chosen to be $0.05$ in our study), i.e. if it is likely that they would have non-zero coefficients when added to the model, then add the one with the lowest p-value, then repeat this procedure. 
%If all terms have higher p-values than the threshold value, then proceed with the next step.
%\item If some of the features have p-values, which are greater than an exit threshold (the exit threshold we used in our study is \textit{0.10} \textit{AND} \textit{maximum p-value of previous steps}, then remove the feature with the highest p-values. 
%Remember that having a high p-value means that it is likely that we will fail to reject the null hypothesis of having a zero coefficient.
%\item Exit when none of the above steps improves the model.
%\end{itemize}

{\em Discretizers} are pre-processors that maps every numeric value in a column of data
into a small number of discrete values:
\bi
\item {\bf width3bin:} This procedure clumps the data features into 3 bins, depending on equal width of all bins see
Equation \ref{equation:binning}.

\begin{equation}\small
binWidth = ceiling\left(\frac{max(allValues) - min(allValues)}{n}\right)
\label{equation:binning}
\end{equation}
\item {\bf width5bin:} Same as {\bf width3bin} except we use 5 bins.
\item {\bf freq3bin:} Generates 3 bins of  equal population size;
\item {\bf freq5bin:} Same as {\bf freq3bin}, only this time we have {\em 5} bins.
\ei

