\subsection {Nine Learners}\label{sec:learners}

Based on our reading of the effort estimation literature, we identified nine commonly used learners that divide
into
\bi
\item Two {\em instance-based} learners: {\bf ABE0-1NN, ABE0-5NN};
\item Two {\em iterative dichotomizers}: {\bf CART(yes),CART(no)};
\item A {\em neural net}: {\bf NNet};
\item Four {\em regression methods}: {\bf LReg, PCR, PLSR, SWReg}.
\ei
{\em Instance-based learning} can be used for analog-based estimation.
A large class of   ABE algorithms was described in \fig{cbr}. Since it is
not practical to experiment with the 6000 options defined in \fig{cbr},
we focus on two standard variants.
ABE0 is our name for
a very basic type of ABE that we derived from
various ABE studies~\cite{Mendes2003, Li2009, Kadoda2000}.
In {\bf ABE0-xNN}, features are firstly normalized to 0-1 interval,
then the distance between test and train instances is measured
according to Euclidean distance function, \textit{x} nearest neighbors
are chosen from training set and finally for finding estimated value
(a.k.a adaptation procedure) the median of \textit{x} nearest
neighbors is calculated.  We explored
two different \textit{x}:
\bi
\item {\bf ABE0-1NN:} Only the closest analogy is used. 
Since the median of a single value is itself, the 
estimated value in {\bf ABE0-1NN} is the actual effort value of the closest analogy.
\item {\bf ABE0-5NN:} The 5 closest analogies are used for adaptation.
\ei
\textit{Iterative Dichotomizers} 
seek
the best attribute value $splitter$ that most simplifies the data that
fall into the different splits. 
Each such splitter becomes a root of a tree.
Sub-trees are generated
by 
calling iterative dichotomization recursively
on each of the splits.
The CART iterative dichotomizer~\cite{Breimann1984} is defined for continuous target concepts 
and its  $splitters$ strive to reduce the GINI index of the data that
falls into
each split.
In this study, we use two variants:
\bi
\item {\bf CART (yes):} This version prunes the generated tree using cross-validation.
For each cross-val, an internal nodes is made into a leaf (thus pruning its sub-nodes).
The sub-tree that resulted in the lowest error rate is returned. 
\item {\bf CART (no):} Uses the full tree (no pruning).
\ei

In \textit{ Neural Nets}, or {\bf NNet},
an input layer of project details
is connected to zero or more ``hidden'' layers which then  connect
to an output node (the effort prediction). The connections are weighted.
If the signal arriving to a node sums to more than some
threshold, the node  ``fires'' and a weight is propagated
across the network.  Learning in a neural net
compares the output value to the expected value, then applies some
correction method to improve the edge weights (e.g. 
back propagation).
Our {\bf NNet} uses three layers.

This study also uses four
\textit{regression methods}.
{\bf LReg} is a simple linear regression algorithm. 
Given the dependent variables, this learner calculates the coefficient estimates of the independent variables.
{\bf SWreg} is the stepwise regression discussed above. Whereas above, {\bf SWreg} was used to
select features for other learners, here we use {\bf SWreg} as a learner (that is, the predicted
value is a regression result using the features selected by the last step of {\bf SWreg}).
Partial Least Squares Regression ({\bf PLSR}) as well as Principal Components Regression ({\bf PCR}) 
are algorithms that are used to model a dependent variable.
While modeling an independent variable, they both construct new independent variables as linear combinations of original independent variables.
However, the ways they construct the new independent variables are different.
 {\bf PCR} generates new independent variables to explain the observed variability in the actual ones.
While  generating new variables the dependent variable is not considered at all.
In that respect, {\bf PCR} is similar to selection of \textit{n-many} components via {\bf PCA} (the default value of components to select is 2, so we used it that way) and applying linear regression.
{\bf PLSR}, on the other hand,
 considers the independent variable and picks up the \textit{n-many} of the new components (again with a default value of 2) that yield lowest error rate.
Due to this particular property of {\bf PLSR}, it usually results in a better fitting.
