Numerous algorithmic methods exist to make estimations on effort.  While this paper can not comprehensively assess all of them, it assesses a subsection of algorithms using a combination of algorithms  approach.  In this way, different data preprocessors are combined with different learner algorithms, so that a larger number of algorithms are generated with the addition of a new preprocessor or learner.  The combination done involves sending the data to the preprocessor, and then sending that output to the learner to obtain an estimate.  Both the preprocessors and learners are from the WEKA\cite{weka} data mining toolkit.
\\
Multiple methods are used within the literature to assess the accuracy of estimation across a dataset.  As this paper seeks to evaluate these different measures, multiple measures of error will be used and compared.  In addition, error measures that are a synthesis of multiple error measures will be assessed to see if collective error measures are more stable.
\\ 
These different error measures will be compared using paired Mann-Whitney Wilcoxon statistical tests to determine which algorithms perform better and have a significantly different distribution.  Algorithms can win, tie (if they have similar distributions), or lose with each possible comparison.  Algorithms will be ranked on win and loss measures from the comparison.
\\
The ranking will also be done across multiple datasets, to provide a broader reference as well as potentially gaining information about the datasets themselves.  Which algorithms perform well on a given dataset could be indicative of that datasets terrain.
\\
This section will discuss the preprocessors, learners, and error measures used as well as the data sets which they were used upon.

\subsection{Preprocessors}
Before being passed to the learners, the data was run through a preprocessor.  Some preprocessors change the values of the data, and others change the shape of the dataset by converting it in to a representative model.  The specific preprocessors used will be discussed below.
\subsubsection{None (none)} 
The data is passed to the learner without any preprocessing performed.\\
\subsubsection{Logarithmic (log)}
The features in the data are replaced with the natural log of their value.  This reduces the distance between features, effecting many of the learners the data is sent to.\\
\subsubsection{Equal Frequency Discretization (freq3bin, freq5bin)}
The numeric data is discretized in to a number of bins, with each bin having an equal number of items, or frequency.  The data is put in to bins based on numeric value.  For example, a 2-bin frequency discretization on the data\\
$\{1, 4, 2, 8, 3, 9\}$\\
Would produce a bin $\{1, 2, 3\}$ and a bin $\{4, 8, 9\}$.  Equal Frequency Discretization is performed using 3-bins and 5-bins in this experiment\\  
\subsubsection{Equal Width Discretization (width3bin, width5bin)}
The numeric data is discretized in to a number of bins, with each width having an equal width of starting and ending values contained.  The width of each bin is computed as:\\
$\frac{Max Value - Min Value}{Number of Bins}$\\
To provide an example,\\
$\{1, 2, 3, 4, 8, 9\}$\\
passed through a 2-bin Equal Width Discretization would produce a bin containing $\{1, 2, 3, 4\}$ and a bin containing $\{8, 9\}$.  Equal Width Discretization is done in this experiment with 3 and 5 bins.\\
\subsubsection{Normalization (norm)}
Each numeric entry in the data is replaced with a normalized value, computed as:\\
$\frac{Value - Min Value}{Max Value - Min Value}$\\
\subsubsection{Principal Component Analysis (pca)}
Principal Component Analysis performs a transformation on the features of a dataset.  It selects a features with a large enough variance and adds them to the available features.  Finally, when no features meet the variance threshold, it creates an additional feature that is orthogonal to the area defined by the previous features included.
\subsubsection{BAMBOO}
All numerics are logged, and the highest range of variance (standard deviation) for each column of data or attribute is found. Rows are removed, that are part of the highest ranges until the number of rows remaining is: $\sqrt[2]{n}$ Where n is the length of the rows. 

\subsection{Learners}
After the data has been preprocessed, it is passed to a learner which makes an estimate on the effort required.
\subsubsection{Simple Linear Regression (SLReg)}
Simple Linear Regression applies n-dimensional linear regression on the data, attempting to determine a correlation of attributes that generate a given effort value.  The instance to be tested is then placed along the regression, to estimate for its effort value.
\subsubsection{Partial Least Squares Regression (plsr)}
Partial Least Squares Regression separates the predicting features and the target feature of effort.  It treats both as separate matrices, and tries to find a direction in the space of the predicting features that accounts for the target features.  This direction serves as a regression model, on which the unknown project can be placed an an estimate obtained.
\subsubsection{$k$-Nearest Neighbor (1NN, ABE0)}
The $k$-Nearest Neighbor finds the $k$ projects in the data that are closest to the unknown projects, and then reports the mean of the projects efforts as an estimate for the new projects effort.  In this experiment, $k$ is set to $1$ and $5$ (\emph{ABE0}) for two different runs.
\subsubsection{Zero R (ZeroR)}
ZeroR takes the mean of all effort values in the historical instances and reports it as the estimate.
\subsubsection{Neural Net (nnet)}
A multilayer perceptron is applied on the historical instances, which attempts to create a network of operations on the target feature values that produce an estimate for the target feature.  This is done by a stochastic search process in which an initial network is creates, and then a change is made in that network.  If error was reduced by the change, the new network is kept and a change is made on it.  Otherwise, the previous network is kept.  After a number of iterations, a stopping criteria is reached and a final network is produced.  The unknown project is put in to the inputs of this network, and the output is reported as the estimate.
\subsubsection{Decision Stump (DecisionStump)}
A decision stump is a decision tree with only a single level.  A single attribute is used to predict for the effort.
\subsubsection{KStar (KStar)}
Similar to K-Nearest Neighbors, KStar is another instance based learner.  It uses an entropic distance measure rather than euclidean distance.
\subsubsection{BAMBOO}
BAMBOO can be used as a learner as well, hopefully under the condition that it was also used as the preprocessor or some other form of row reduction was used as a preprocessor because it would be very time consuming otherwise. As a learner it takes a test instance and arranges all points in a dataset by their distance to the test data point using euclidian distance. It then takes the k nearest neighbors, skipping over one with a 33% chance. It then continues to take nearest neighbors, still using a 33% chance of skipping, if the total variance of points chosen decrease. It then finds the median of those points. This process is repeated 10 times, and the median of all those medians are chosen. The effort estimation of that point is taken and adopted as the testdata's estimated effort. 

\subsection{Error Measures}
For each dataset, leave one out analysis is performed.  In this way, each instance in the dataset is removed to be tested under all preprocessor and learner combinations available, and the estimates stored.  Once all estimates have been found, collective error measures are gathered for each combination of preprocessor and learner on each dataset.  The error measures used are detailed below.
\subsubsection{Mean Absolute Residual (MAR)}
The absolute residual error of an estimate is computed as:\\
$|actual - predicted|$ \\
After the absolute residuals have been calculated across a dataset, their mean is taken and reported for the error value.
\subsubsection{Mean Magnitude of Relative Error (MMRE)}
The magnitude of relative error is calculated across a dataset, computed as:\\
$\frac{|actual - predicted|}{actual}$\\
After the magnitude of relative error for each instance in the dataset is computed, their mean is reported.
\subsubsection{Mean Magnitude of Error Relative to the Estimate (MMER)}
The magnitude of relative error is computed, but in contrast to MRE it is computed relative to the estimate, as follows:\\
$\frac{|actual - predicted|}{predicted}$\\
After the MER is computed for each instance in the dataset, their mean is computed and reported.
\subsubsection{Median Magnitude of Relative Error (MDMRE)}
As MMRE, but the median of the MRE values across the dataset is computed and reported.
\subsubsection{Pred25}
The number of instances in a dataset whose predicted value had an MRE score of less than 25\% are divided by the number of total instances, and reported as the Pred25 score.
\subsubsection{Mean Balanced Relative Error (MBRE)}
Balanced relative error is computed as :\\
$\frac{|actual - predicted|}{dividing}$\\
Where the dividing term is the smaller of the actual or predicted term.  Once the BRE scores have been computed for each instance in the dataset, their mean is computed and reported.
\subsubsection{Mean Inverted Balanced Relative Error (MIBRE)}
Inverted balanced relative error is computed as :\\
$\frac{|actual - predicted|}{dividing}$\\
Where the dividing term is the larger of the actual or predicted term.  Once the IBRE scores have been computed for each instance in the dataset, their mean is computed and reported.


\subsection{Data Sets}
The datasets used in this experiment were obtained from the PROMISE data repository\cite{promise}, which provides freely available software engineering data from real world projects.  The COMBA software platform used could not handle discrete elements in the data, so discrete elements were removed from the data before being sent to the data preprocessor.  The information about each dataset, after removing discrete elements, is provided.
\begin{center}
\begin{tabular} {| l || c | r |}
\hline
Dataset & Features & Instances \\ \hline
Albrecht & 8 & 24 \\ \hline
China & 18 & 499 \\ \hline
Cocomo81 & 17 & 63 \\ \hline
Cocomo81e & 17 & 28 \\ \hline
Cocomo81o & 17 & 21 \\ \hline
Cocomo81s & 17 & 11 \\ \hline
Desharnais & 11 & 81 \\ \hline
DesharnaisL1 & 11 & 46 \\ \hline
DesharnaisL2 & 11 & 25 \\ \hline
DesharnaisL3 & 11 & 10 \\ \hline
Finnish & 8 & 38 \\ \hline
Kemerer & 7 & 15 \\ \hline
Maxwell & 27 & 62 \\ \hline
Miyazaki94 & 8 & 48 \\ \hline
Nasa93 & 17 & 88 \\ \hline
Nasa93\_center\_1 & 17 & 12 \\ \hline
Nasa93\_center\_2 & 17 & 37 \\ \hline
Nasa93\_center\_5 & 17 & 39 \\ \hline
SDR & 24 & 24 \\ \hline
Telecom1 & 3 & 18 \\ 
\hline
\end{tabular}
\end{center}
