This section reviewed the effort estimation literature with regards to 
(a) the
major estimation techniques used by empirical research studies on cost estimation within the last 15 years and
(b) the conclusion instability problem.

\subsection{Algorithmic Methods}
There are many algorithmic effort estimators.
For example, if we
restrict ourselves to just instance-based algorithms, \fig{cbr} shows
that there are thousands of options just in that one sub-field.

As to non-instance methods, that are many proposed in the literature
including various kinds of regression (simple, partial least square,
stepwise, regression trees), and neural networks just to name a
few. For notes on these non-instance methods, see \tion{learners}.

Note that instance \& non-instance-based methods can be combined to create even more algorithms. For example,
once an instance-based method finds its nearest neighbors, those
neighbors might be summarized with regression or
neural nets~\cite{Li2009}.


\subsection {Non-Algorithmic Methods}
An alternative approach to algorithmic approaches (e.g. the instance-based methods of \fig{cbr})
is to utilize the best knowledge of an experienced expert. 
Expert based estimation \cite{Jor2004e} is a human intensive approach that is most commonly adopted in practice. Estimates are usually produced by a domain expert rather than an estimation expert based on their very own personal experience and recollection of similar past projects in the organization. It is flexible and intuitive in a sense that it can be applied in a variety of circumstances where other estimating techniques do not work  (for example when there is a lack of historical data). Furthermore in many cases requirements are simply unavailable at the bidding stage of a project where a rough estimate is required in a very short period of time.

Jorgensen \cite{Jor2005b} provides guidelines for producing realistic software development effort estimates derived from industrial experience and empirical studies. One important finding concluded was that the {\em combine estimation} method in expert based estimation offers the most robust and accurate combination method, as combining estimates captures a broader range of information that is relevant to the target problem, for example combining estimates with analogy based with expert based method. Data and knowledge relevance to the project's context and characteristics will more likely to influence the prediction accuracy.

%In contrast, software estimation by analogy \cite{shepperd96} is a more formal and systematic approach to expert based estimation using direct comparison with one or more past projects. The distinction between expert based and analogy based in current software engineering research is that the former is a human-intensive approach and can based on variety of different methods such as rules of thumbs, personal recollection of past experiences etc. and the later is a data-intensive approach based on one or more specified potential analogous projects, and can be automated and repeated. The general principle of data-intensive analogy is to reuse software development experience in the form of past project cases stored in a project repository or a database, which includes what are considered to be the important project features of those projects from the point of view of their possible effect on development effort . An estimate of the effort to complete a new software project is made by analogy with one or more previously completed projects, based on the {\em K-Nearest Neighbour Algorithm} \cite{shepperd96,shepperd97}.

Although widely used in industry, there are no standard methods for
expert based estimation. Shepperd et al. \cite{shepperd96} do not
consider expert based estimation an empirical method because the
means of deriving an estimate are not explicit and therefore not
repeatable, nor easily transferable to other staff. In addition,
knowledge relevancy is also a problem, as an expert may not be able
to justify estimates for a new application domain as
well as its validity. Hence, the rest of this paper does not consider non-algorithmic methods.


%\subsection {Data quality and effort estimation}
%Software effort estimation researchers have focused on the development of advanced algorithms and optimizing models in order to achieve better prediction accuracy measured by different evaluation criteria, so that the result is comparable to many existing prediction accuracy results produced. In many cases, the prediction performance improvements are less than significant in a statistical sense. Keung \cite{keung08b} argues that estimates are probabilistic in nature, they represents the most likely values based on historical data, an error-free prediction in software cost estimation is both empirically and theoretically impossible. Using different prediction systems may result in similar outcomes only with a small variation in the prediction accuracy, this is because different quality and characteristics of the dataset determine the {\em Theoretical Maximum Prediction Accuracy} (TMPA)\cite{keung08b} of any prediction system being used. Some prediction system may be more suitable than the others on a dataset, this is because the  algorithm within a prediction system is more suitable for the dataset characteristic. In Keung \cite{keung08b}'s experiment, to optimize the prediction accuracy, one approach is to dynamically select a method that would produce a favourable result given the actual effort is known for comparison, he used a different number of k-nearest neighbours for each data point estimate, resulting in an improved overall performance accuracy using the entire dataset. It shows that variance in the dataset drastically changed the prediction accuracy, rather than using different prediction models.  
%
%The dataset quality and their characteristics are usually overlooked in the development of a better algorithm for software effort estimation. Studies shown the use of a data preprocessor in the estimation experiment generally show a significant improvement in the estimation accuracy. Dataset homogenization will also result in improved performance in estimation. \cite{mendes_04}\cite{kitchenham_07}
%
%Without looking into the eminent issue of data quality and their characteristics, the research into the development of a better effort estimator had reached its destination. 
%If the above statement sustains, then the selection of a single useful evaluation criteria and the prediction algorithm are less important than the dataset quality itself. Research effort should be more focusing on the evaluation of dataset quality and its characteristics, including variance in the dataset. 
%

\input{ekrem/data}
\subsection{Conclusion Instability}

To derive  stable conclusions about which estimator is ``best'',
there have been attempts in trying to compare model prediction
performance of different effort estimation approaches. For example,
Shepperd and Kododa \cite{shepperd01b} compared regression, rule
induction, nearest neighbor and neural nets, in an attempt to
explore the relationship between accuracy, choice of prediction
system, and different dataset characteristic by using a simulation
study based on artificial datasets. They also reported a number of
conflicting results exist in the literature as to which method
provides better prediction accuracy, and offers possible explanations
including the use of an evaluation criteria such as MMRE and the
underlying characteristics of the problem dataset being used will
have a strong influence upon the relative effectiveness of different
prediction models. Their work as a {\em simulation study}
that took a single dataset, then generated very large artificial datasets
using the distributions seen on that data. They concluded that:
\bi
\item
{\em None}
of these existing estimators  were consistently ``best'';
\item
The accuracy of an estimate depends on the dataset characteristic
and a suitable prediction model for the dataset. 
\ei
They conclude that it is
generally {\em infeasible} to determine which prediction technique
is "best".
 
Recent results suggest that it is appropriate to revisit the conclusion instability hypothesis.
Menzies et al.~\cite{menzies11} applied 158 estimators to various subsets of two
COCOMO datasets. In a result consistent  with Shepperd and Kododa, 
they found the precise ranking of the 158 estimators 
changed according to the random number seeds used to generate train/test sets;
the performance evaluation criteria used; and which subset of the data was used.
However, they also found that four methods consistently out-performed the other 154
across all datasets, across 5 different random number seeds, and across
three different evaluation criteria. 

Also,  there are now many more public domain datasets, readily available for stability studies. 
\fig{datasets} lists 20 datasets which have become available in the last
year at the PROMISE repository of reusable SE data\footnote{\url{http://promisedata.org/data}}.
Given the availability of this data, it is no longer necessary to work on
simulated data (as done by Shepperd and Kadoda~\cite{shepperd01b}) or to study merely two datasets (as done by Menzies et al.~\cite{menzies11}).
The rest of this paper explores conclusion stability over 20 datasets given in \fig{datasets}.

%The literature placed strong emphasis on {\em No General Conclusion}
%as the "accepted wisdom" in the field of software effort estimation.
%Given the instability and conflicting results of many experiments,
%and the covered algorithms and simulated datasets, we doubt that
%some of the results were not general enough to produce general
%stable conclusion. In this study, we use a large number of real
%project datasets against a large number of different algorithms to
%revisit this challenging issue.
%
%


