\documentclass[10pt,journal,letterpaper,compsoc]{IEEEtran}
\usepackage{ifpdf}
\ifCLASSOPTIONcompsoc
  % IEEE Computer Society needs nocompress option
  % requires cite.sty v4.0 or later (November 2003)
  \usepackage[nocompress]{cite}
\else
  % normal IEEE
  \usepackage{cite}
\fi
\ifCLASSINFOpdf
  \usepackage[pdftex]{graphicx}
  % declare the path(s) where your graphic files are
  % \graphicspath{{../pdf/}{../jpeg/}}
  % and their extensions so you won't have to specify these with
  % every instance of \includegraphics
  \DeclareGraphicsExtensions{.pdf,.jpeg,.png}
\else
  % or other class option (dvipsone, dvipdf, if not using dvips). graphicx
  % will default to the driver specified in the system graphics.cfg if no
  % driver is specified.
  \usepackage[dvips]{graphicx}
  % declare the path(s) where your graphic files are
  % \graphicspath{{../eps/}}
  % and their extensions so you won't have to specify these with
  % every instance of \includegraphics
  \DeclareGraphicsExtensions{.eps}
\fi
\usepackage[cmex10]{amsmath}
\usepackage{algorithmic}
\usepackage{array}
\usepackage{mdwmath}
\usepackage{mdwtab}
\usepackage{eqparbox}
\ifCLASSOPTIONcompsoc
	\usepackage[tight,normalsize,sf,SF]{subfigure}
\else
	\usepackage[tight,footnotesize]{subfigure}
\fi
\ifCLASSOPTIONcompsoc
  %\usepackage[caption=false]{caption}
  \usepackage[font=normalsize,labelfont=sf,textfont=sf,caption=false]{subfig}
\else
  %\usepackage[caption=false]{caption}
  \usepackage[font=footnotesize,caption=false]{subfig}
\fi
\ifCLASSOPTIONcompsoc
  \usepackage[caption=false,font=normalsize,labelfont=sf,textfont=sf]{subfig}
\else
  \usepackage[caption=false,font=footnotesize]{subfig}
\fi
\usepackage{fixltx2e}
\usepackage{stfloats}
\ifCLASSOPTIONcaptionsoff
	\usepackage[nomarkers]{endfloat}
	\let\MYoriglatexcaption\caption
	\renewcommand{\caption}[2][\relax]{\MYoriglatexcaption[#2]{#2}}
\fi
\usepackage{url}
\hyphenation{op-tical net-works semi-conduc-tor}

\pdfinfo{
/Title (NOVA: Search Techniques for Reducing Software Cost)
/Author (Steven Williams, Tim Menzies, and Michael Cole)}

\begin{document}

\title{NOVA: Search Techniques for Reducing Software Cost}
\author{Steven~Williams,
        Tim~Menzies,
        and~Michael~Cole}
\maketitle

\begin{abstract}
\begin{quote}
Software cost estimation is a problem because of local tuning requirements: estimation models usually require local data in order to produce accurate results. Previously, different search engines have been successfully used to explore the space of all possible tunings within a set of USC COCOMO-family models. These results led us to believe that there are optimal search algorithms for reducing software cost even when no local data are available. We checked this by running six search engines over the COCOMO-family models and comparing the results. Though future refinements to our results are needed, our current work leads us to recommend beam search as the most effective engine for searching untuned COCOMO-family models.
\end{quote}
\end{abstract}

\section{ASE Intro}

%In the $21^{st}$ century, it is common place to
%find
%large complex
%software systems running on multiple platforms being
%being maintained and extended by international communities, interacting only via electronic means:
%e.g. ECLIPSE or  the continually evolving LINUX project(s).
%Decades of research
%has yielded tools and techniques that allow even a single individual
%to built complex software: e.g. Stallman \& EMACS
%or Torvalds \& LINUX (v1.0) 
%or
%Matsumoto \&  RUBY.  
%None of this would have been possible with decades of research into automating, simplifying, and improving
%common software engineering (SE) tasks. 
%
%Having addressed the {\em generation} problem 
%the focus of the automated  SE community
%has changed.
Much current ASE research concerns
{\em automatic analysis} of source code or better
{\em execution-based testing tools},
These tools might, say,
verify formal properties or
search for the fewest regression tests that exercise  most of the system.
Some of these
tools are ready for industrial use; e.g, SPIN~\cite{holzmann97} or 
JPF~\cite{have00}, just to name
a few.   

Recently, the authors were asked to make the business case for introducing some
of these new ASE tools into large NASA projects. This case proved difficult to make,
due to an {\em anti-automation bias} and the {\em local tuning problem}.

\begin{figure}[!b]
\scriptsize
\begin{center}
\begin{tabular}{l@{}|l@{~}r@{~}l}
%& & relative\\
id & features  & relative weight\\\hline
1&Personnel/team capability  & 3.53	&\rule{17mm}{2mm}\\
2&  Product complexity & 2.38&	\rule{12mm}{2mm}\\
3& Time constraint & 1.63&	\rule{8mm}{2mm}\\
4&    Required software reliability & 1.54&	\rule{8mm}{2mm}\\
5&    Multi-site development   &	1.53&	\rule{7mm}{2mm}\\
6&       Doc. match to life cycle  & 	1.52	&\rule{7mm}{2mm}\\
7&  Personnel continuity  &	1.51&	\rule{7mm}{2mm}\\
8&  Applications experience & 1.51	&\rule{7mm}{2mm}\\
9& Use of software tools & 1.50	&\rule{7mm}{2mm}\\
10& Platform volatility & 1.49&	\rule{7mm}{2mm}\\
11& Storage constraint & 1.46&	\rule{7mm}{2mm}\\
12& Process maturity  &	1.43	&\rule{7mm}{2mm}\\
13&    Language \& tools experience  &	1.43	&\rule{7mm}{2mm}\\
14&     Required dev. schedule & 1.43	&\rule{7mm}{2mm}\\
15&Data base size & 1.42&	\rule{7mm}{2mm}\\
16&  Platform experience & 1.40&	\rule{7mm}{2mm}\\
17&     Arch. \& risk resolution  &	1.39&	\rule{7mm}{2mm}\\
18&  Precedentedness  &	1.33&	\rule{6mm}{2mm}\\
19&  Developed for reuse  	&1.31	&\rule{6mm}{2mm}\\
20& Team cohesion  &	1.29&	\rule{6mm}{2mm}\\
21&    Development mode   &	1.32	&\rule{6mm}{2mm}\\
22& Development flexibility &	1.26	&\rule{6mm}{2mm}\\
\end{tabular}
\end{center}
\caption{Relative effects  on development effort.
Data from
a regression analysis of 161 projects~\protect\cite{boehm00a}.  
}\label{fig:coceffects}
\end{figure}

The {\em anti-automation bias} was seen
at an ICSE 2007 panel.
Tim Lister (a co-author of Peopleware~\cite{demarco87})
commented
that ``sociology beats technology
in terms of successfully completing projects''- a notion endorsed by the other panelists.
That is,
software project managers should focus {\em less}
on new ASE tools
and {\em
more} on managing the sociology aspects of their team (e.g. decrease
staff turnover).

%\fig{coceffects}
 offers some support for this bias.
This figure shows the known relative productivity effects
of changing project features.
According to this figure,
the benefits of automatic tools (ranked number nine in the list) can be equaled or bettered
via other means (e.g. any item 1 to 8 {\em or}  any {\em pair} of items 
10 to 22).

Note that this support for the anti-automation bias is based solely
on the development effort; i.e.
%\fig{coceffects}
is blind to the impact of new ASE tools in reducing {\em defects}
and any  other {\em threats} to the success of the project.
A complete business case should therefore 
study
predictors for effort {\em and} detects {\em and} threats using, for example:
\begin{itemize}
\item
The COQUALMO defect predictor~\cite[p254-268]{boehm00b};
\item
The  COCOMO effort predictor~\cite[p29-57]{boehm00b}; 
\item
The THREAT  predictor for effort \& schedule overrun~\cite[284-291]{boehm00b}.
\end{itemize}
Such a study is complicated by the {\em local tuning problem}.
These predictors are 
most accurate {\em after} they have been
tuned to local data %(see \tion{sec:ev}). 
Unfortunately, 
   the data required for
   local tuning is difficult to obtain, especially across multiple
   organizations~\cite{me06d}.  This is due to  the business sensitivity associated
   with the data as well as differences in how the metrics are defined,
   collected and archived.  In many cases the required data has not been
   archived at all.  For example, after two years we were only able to
add 7 records to our NASA wide software cost metrics repository.

The premise of this paper is that, in terms of ranking different methods,
a {\em precise} local tuning is not required.
Based on research dating back to 1981~\cite{boehm81}, we assert that the
space of {\em
possible} tunings is well known. This space can be explored to
find features that minimize effort, defects, and threats.  

To test that premise, we built the STAR1 algorithm.
STAR1 uses a
simulated annealer
to sample the space of possible local tunings within
COCOMO, COQUALMO and THREAT. 
A standard SA
offers constraints to all variables. STAR1, on the other hand,
uses a Bayesian method to rank those constraints 
according to how
well they reduce
effort/ defects/ threats.
Superfluous details can hence deleted and the algorithm can return
a minimal set of project decisions 
that most reduce effort {\em and} defects {\em and} threats.

The general contributions of this paper, hence, are:
\begin{itemize}
\item
A {\em novel search engine} for sampling, then pruning a space of options within a model.
\item
A {\em novel combination} of effort/defect/threat prediction models.
Previously, these three models have been analyzed one at a time or in pairs~\cite{me00e,me02f}.
\item 
A   {\em new solution} to the local tuning problem:
while
waiting for
local tuning data (which may never arrive),  
seek stable conclusions over the set of possible tunings.
%\item
%A removal of the {\em tuning bias} problem. Predictors tuned to historical data can be biased
%by that data. This work, on the other hand, makes new conclusions far less susceptible to historical
%biases since it reports conclusions from across the space of biases. 
\item
A {\em demonstration} that 
the relative merits of different project decisions can be assessed
{\em without} specific local tuning data.
\end{itemize}

More specifically, using STAR1 we
we found situations where  
new ASE tools were {\em optional}  for
one NASA systems but {\em essential} for another 
(measured in terms of reducing effort {\em and} defects
{\em and} threats). Also, we did not find an {\em either/or} situation
where we had to choose, as suggested by Lister, between sociology
and technology.  When
ASE tools were useful, they were useful {\em in combination} with the sociological
features.



\textbf{The rest of this paper documents the COCOMO, COQUAL-MO, and THREAT models, along with the STAR1 algorithm and 2 other estimation tools : SCAT and 2CEE, and the Tactical and Strategic analysis. STAR1 is then applied on 5 NASA/JPL case studies for both Tactical and Strategic analysis. In this part, our baysian ranking is verified by comparison to an alternate ranking scheme which ranks according to the average results of the attributes, otherwise called {\em Energy Ranking}. This is followed by a complete analysis of these case studies by STAR1, the results of which are compared to those of both SCAT and 2CEE. The paper is then concluded with notes on external validity and related work.}

\section{Introduction}
NOVA \cite{Menzies2008a} is an automated software engineering library written in LISP by West Virginia University CSEE based on original code from Tim Menzies. NOVA works using a set of the USC COCOMO-family models \cite{Boehm2000}, which include parts of COCOMO, COQUALMO, and Madachy's heuristic schedule risk model \cite{Madachy1997}. This set of models is referred to as XOMO \cite{Menzies2006}. For this project, the NOVA space is searched using multiple search engines in an attempt to determine which engine is best suited for the task.

\subsection{XOMO}
XOMO is the combination of the COCOMO effort estimation model, the COQUALMO defect model, and Madachy's schedule risk model. It includes predictions for:

\begin{itemize}
\item How long it will take to build software; measured in elapsed months\item How many months of effort are required during those elapsed months (so your team size is effort/months)\item How many defects will be introduced or removed from your software\item The number of projects threats; i.e how prone is your software to schedule overruns due to poor management decisions\end{itemize}

\subsubsection{COCOMO Effort Estimation Model.}
The COnstructive COst MOdel's (COCOMO) effort estimation model is used in XOMO and measures effort using person-months.

\subsubsection{COQUALMO Defect Introduction and Removal Model.}
The COnstructive QUALity MOdel (COQUALMO) default model predicts the number of residual defects per KSLOC (Thousands of Source Lines of Code).

\subsubsection{Madachy's Heuristic Schedule Risk Model.}
Raymond Madachy's Heuristic Schedule Risk Model identifies, categorizes, quantifies, and prioritizes project risks. It also detects cost estimate anomalies and provides risk control advice.

\subsection{Policies and Case Studies}
The COCOMO variables can be separated into two policies, tactical and strategic \cite{Menzies2008}. Tactical variables are those that can be changed in the scope of one project, while strategic variables take much more effort to change, such as a higher-level institutional change. The tactical COCOMO variables include: flex, resl, team, tool, sced, data, cplx, ruse, docu, stor, auto, exec, peer, and the strategic COCOMO variables include: prec, pmat, acap, pcap, pcon, aexp, pexp, ltex, site, auto, exec, peer. An ``all'' policy that combines the strategic and tactical policies is also considered in our implementation.

For this exercise, there are four case studies which can be applied to the data. These case studies include: OSP, OSP2, flight, ground \cite{Menzies2008}. When used in NOVA, these case studies apply some initial constraints to the search space. By applying a case study, the search space is reduced and can lead to better results. A ``default'' case study is also implemented, allowing all variables to take on any valid value.

\subsection{Search Engines}
The goal of this project is to search over and optimize the XOMO space for a random software engineering plan with a random number of lines of code using different search strategies. The engines implemented for comparison of results include iterative sampling (ISAMP), simulated annealing, MaxWalkSat, A*, beam, and KEYS. Following is the pseudo code and a small description of each search engine.

\subsubsection{ISAMP.}
Iterative sampling \cite{William1995} is a simple, random search that is fast and incomplete. It involves randomly building a solution to the problem and comparing it to the best solution seen so far. Due to its simple nature, ISAMP runs very quickly and only requires enough memory for two solutions (the current and the best). Figure 1 contains the pseudo code for the ISAMP search engine.

\begin{figure}[htb!]
   \centering
   \includegraphics[width=3.3in]{isamp_pseudocode.png} 
   \textit{\caption{ISAMP pseudo code.}}
   \label{fig:isamp}
\end{figure}

\subsubsection{Simulated Annealing.}
Simulated annealing \cite{kirkpatrick83optimization} is a commonly-used, lightweight search engine inspired by the metallurgic annealing process. Simulated annealing is an adaptation of the Metropolis-Hastings algorithm and was developed by Kirkpatrick, Gelatt, and Vecchi in 1983 and independently by Cerny in 1985. Figure 2 contains the pseudo code for the simulated annealing search engine. The random condition allows simulated annealing to escape local minima, by switching to a neighbor even if it is not a better solution. The chance of passing the random condition decreases throughout the course of the search. Like ISAMP, simulated annealing is incomplete and requires very little memory (enough for three solutions), but takes slightly longer to run due to the use of local search while switching to neighbors.

\begin{figure}[htb!]
   \centering
   \includegraphics[width=3.3in]{sa_pseudocode.png} 
   \textit{\caption{Simulated annealing pseudo code.}}
   \label{fig:sa}
\end{figure}

\subsubsection{MaxWalkSat.}
MaxWalkSat, a variant of WalkSat designed to solve the weighted satisfiability problem, is a stochastic local search algorithm developed by Kautz, Selman, and Jiang. MaxWalkSat and other WalkSat variants are very successful and widely used search engines, due to their simplicity and the performance and speed of stochastic local search. Figure 3 contains the pseudo code for the MaxWalkSat search engine. Like ISAMP, MaxWalkSat is incomplete and only requires enough memory to store two solutions (the current and the best).

\begin{figure}[htb!]
   \centering
   \includegraphics[width=3.3in]{maxwalksat_pseudocode.png} 
   \textit{\caption{MaxWalkSat pseudo code.}}
   \label{fig:mws}
\end{figure}

\subsubsection{A*.}
A* \cite{HartJuly1968} is a graph search algorithm which searches all routes from a given starting point to a goal for the shortest path. A* was developed by Hart, Nilsson, and Raphael in 1968 and original called algorithm A. Figure 4 contains the pseudo code for the A* search engine. In its basic form, A* has very large memory requirements (it must remember an exponential number of nodes) and can be complete. In our implementation, the memory requirements are reduced to 1000 paths and we are limiting the scope of the search by ending when the solution stabilizes.

\begin{figure}[htb!]
   \centering
   \includegraphics[width=3.3in]{astar_pseudocode.png} 
   \textit{\caption{A* pseudo code.}}
   \label{fig:astar}
\end{figure}

\subsubsection{Beam.}
Beam search is also a graph search algorithm that combines features of best-first search and depth-first search. Figure 5 contains the pseudo code for the beam search engine. Beam search considers several paths down the tree at once and chooses the best out of those to move onto next. The number of paths that are considered at once is the beam width. Beam search requires enough memory for the beam width number of solutions plus the best solution seen so far. Due to the fact that not all possible paths are considered, beam is an incomplete search \cite{Norvig1992}.

\begin{figure}[htb!]
   \centering
   \includegraphics[width=3.3in]{beam_pseudocode.png} 
   \textit{\caption{Beam pseudo code.}}
   \label{fig:beam}
\end{figure}

\subsubsection{KEYS.}
KEYS \cite{Jalali2008} is a very fast, incomplete, stochastic search algorithm. It was developed in-house at West Virginia University and is a favorite among projects in the WVU CSEE department. Figure 6 contains the pseudo code for the KEYS search engine. KEYS works by generating sets of random mitigations and ranking these mitigations based on their likelihood of appearing in the best 10\% of the sets. The top mitigation is applied to the solution, after which the process repeats until no more mitigations may be made. KEYS fully constrains the search space upon completing the search.

\begin{figure}[htb!]
   \centering
   \includegraphics[width=3.3in]{keys_pseudocode.png} 
   \textit{\caption{KEYS pseudo code.}}
   \label{fig:keys}
\end{figure}

\section{Related Work}
This report is an extension of the tests and simulations done by Menzies et al. on justifying the case for automated software engineering \cite{Menzies07}. Due to the fact that their studies only searched the XOMO model space using a single search engine, we were tasked to compare the results of various other search engines across the same space. Another goal of our work was to have results for others to evaluate the effectiveness and efficiency of LISP compared to the work previously done in C.

This paper is greatly influenced by Tim Menzies's XOMO paper \cite{Menzies2006}, which defines XOMO as a combination of the COCOMO effort estimation model, the COQUALMO defect introduction and removal model, and Madachy's heuristic schedule risk model. The XOMO model is the core of the NOVA library, which was developed by and used in this study.

A major point of contention between our work and the previous work by Menzies et al. is a difference in the understanding and implementation of constraints. We define a variable to have no constraints when all possible values are available to choose from, e.g., prec - (1 2 3 4 5 6). A variable becomes more constrained as possible values are removed from the default set. Therefore, by fixing prec to 3, you have introduced five constraints into the solution. The previous way of handling constraints worked in the opposite way. The initial unconstrained variable contained all possible values and as new values are added to the set of possible values for a variable, the variable was said to become more constrained. Upon reaching the point of maximum constraints, e.g. prec - (1 2 3 4 5 6), you had circled back around to the point where you began. Due to this, results for solutions which fully constrain the data set, such as KEYS, form a curve with an initial high starting point, a good solution in the middle, and end with values close to where they started. For the most part, our algorithms reach similar results, but we feel that the initial implementation of constraints causes much confusion and our reworking should be considered for future work.

\section{Results}

\subsection{Baselines}
Before running the search engines, each case study was analyzed to determine typical initial values for effort, months, defects, threat, and energy. These baseline values are shown in Figure 7. This figure shows median and quartile values for all of the case studies that were searched.

\begin{figure}[htb!]
   \centering
   \includegraphics[width=3.3in]{baseline.png} 
   \textit{\caption{Baselines.}}
   \label{fig:baseline}
\end{figure}

These baselines provide a starting point for our search engines: to be effective, a search engine would have to reduce the numeric value and the variability of these five statistics. Note that energy is calculated using normalized values for effort, months, defects, and threat. Energy values are therefore only valid within the context of a given case study. I.e., comparing energy values between case studies makes no sense.

\subsection{Energy Comparison}
Once baselines were generated, the six search engines were run 20 times on each case study and policy described above. In A*, beam search, and KEYS, the algorithm proceeds in an ordered fashion from no constraints toward the maximum number of constraints. This yields Figures 8 through 10, which show the results of a single run searching the GROUND case study with all policies. Since ISAMP, MaxWalkSat, and simulated annealing are not similarly directed in their search for possible solutions, corresponding plots could not be generated for these search engines.

\begin{figure}[htb!]
   \centering
   \includegraphics[width=3.3in]{a_star.png} 
   \textit{\caption{A* results.}}
   \label{fig:astarresults}
\end{figure}

\begin{figure}[htb!]
   \centering
   \includegraphics[width=3.3in]{beam.png} 
   \textit{\caption{Beam search results.}}
   \label{fig:beamresults}
\end{figure}

The A* algorithm takes into account both the estimated cost to the goal (the current energy) and the cost of the solution so far (the number of constraints imposed). A* solutions therefore tended to be less constrained at the expense of slightly less optimal energies. Compare the A* result with that of beam search, which achieves a slightly lower energy only after applying many more constraints.

\begin{figure}[htb!]
   \centering
   \includegraphics[width=3.3in]{keys.png} 
   \textit{\caption{KEYS results.}}
   \label{fig:keysresults}
\end{figure}

KEYS takes relatively large steps in the search space, constraining one input to a single value in each era. This causes the wide gaps between successive points in Figure 10.

Figures 11 and 12 summarize the results of 20 runs of each search engine on the GROUND case study.

\begin{figure}[htb!]
   \centering
   \includegraphics[width=3.3in]{energy_scatter.png} 
   \textit{\caption{Energy vs. constraints.}}
   \label{fig:energyscatter}
\end{figure}

In Figure 11, the result of each run is plotted in terms of the minimum energy versus the number of constraints imposed. Since smaller values are desirable for both of these quantities, the optimum solution would appear close to the origin. However, many of the search engines achieved low energies by over-constraining their solutions (i.e., moving too far to the right-hand side of the energy/constraints graph). ISAMP, MaxWalkSat, and simulated annealing tended to produce a wide range of solutions, often without significantly reducing the energy. A*, beam search, and KEYS produced solutions with significantly lower energies and much less variability.

\begin{figure}[htb!]
   \centering
   \includegraphics[width=3.3in]{energy_candlestick.png} 
   \textit{\caption{Energy comparison.}}
   \label{fig:energycandlestick}
\end{figure}

Figure 12 shows the median, quartile, and extreme energies encountered by each search engine over 20 runs. This figure and the statistical analysis below confirm our qualitative observations from Figure 11 that A*, beam search, and KEYS are much more effective than the other three search engines.

\subsection{Runtime Comparison}
Runtimes for the six search engines on the GROUND case study are shown in Figures 13 and 14. All runtimes are in milliseconds.

\begin{figure}[htb!]
   \centering
   \includegraphics[width=3.3in]{runtime_end.png} 
   \textit{\caption{Runtime to end.}}
   \label{fig:runtimesend}
\end{figure}

``Runtime to end'' is the time elapsed between the start and the end of a search, including any post-processing. KEYS runs significantly faster than the other searches, which is impressive considering the energy reduction it generally achieves.

\begin{figure}[htb!]
   \centering
   \includegraphics[width=3.3in]{runtime_min.png} 
   \textit{\caption{Runtime to minimum.}}
   \label{fig:runtimesmin}
\end{figure}

``Runtime to minimum'' is the time elapsed between the start of the search and the point when minimum energy is reached.  This may be significantly less than the runtime to end, especially if the solution stabilizes toward the end of the search or some post-processing is necessary.

\subsection{Statistical Analysis}
For performing the statistical analysis on the results, we had a to make the choice between a paired or unpaired test and a parametric or nonparametric test. A paired test is used when the two sets of data match.
\begin{quote}
You should select a paired test whenever you expect a value in one group to be closer to a particular value in the other group than to a randomly selected value in the other group \cite{Motulsky1999}.
\end{quote}Due to this information, paired tests are not an option for analyzing the data. This removes the paired t-test and the Wilcoxon test from the list of possible solutions. Parametric tests assume that the data follows a Gaussian distribution and since it cannot be assumed that this is the case, the test used must be nonparametric. This leaves only one choice, the nonparametric unpaired test, which is the Mann-Whitney test.

Tables 1 and 2 show the results of running a Mann-Whitney test on the results of the GROUND/ALL data set and a combination of all of the results across all data sets.

\begin{table}
	\centering
		\begin{tabular}{ | l | l | l | l |}
    	\hline
	    \bfseries{Engine} & \bfseries{W} & \bfseries{L} & \bfseries{T} \\ \hline
	    Beam & 4 & 0 & 1 \\ \hline
	    KEYS & 4 & 0 & 1 \\ \hline
	    A* & 3 & 2 & 0 \\ \hline
	    SA & 2 & 3 & 0 \\ \hline
	    MaxWalkSat & 1 & 4 & 0 \\ \hline
	    ISAMP & 0 & 5 & 0 \\
	    \hline
		\end{tabular}
	\caption{GROUND/ALL statistical analysis.}
	\label{tab:GROUNDALL}
\end{table}

\begin{table}
	\centering
		\begin{tabular}{ | l | l | l | l |}
    	\hline
	    \bfseries{Engine} & \bfseries{W} & \bfseries{L} & \bfseries{T} \\ \hline
	    Beam & 66 & 1 & 8 \\ \hline
	    KEYS & 47 & 14 & 14 \\ \hline
	    A* & 43 & 22 & 10 \\ \hline
	    SA & 29 & 38 & 8 \\ \hline
	    MaxWalkSat & 15 & 53 & 7 \\ \hline
	    ISAMP & 0 & 72 & 3 \\
	    \hline
		\end{tabular}
	\caption{Totals statistical analysis.}
	\label{tab:TOTALS}
\end{table}

\section{Discussion}
Though the statistical analysis appears to indicate a lopsided victory for beam search, inspection of plots such as Figure 12 indicate that A* and KEYS are often nearly as effective in reducing software cost. ISAMP, MaxWalkSat, and simulated annealing were generally less successful and produced inconsistent results. Evidence of this wide variability can be seen in Figure 11.

The common element among A*, beam search, and KEYS is that they are \emph{directed}; they add constraints to an initially unconstrained solution, doing some processing along the way to ensure that the solution is indeed improving. The \emph{undirected} algorithms are more random, often only pursuing ``intelligent'' decisions a fraction of the time. We may conclude that the large size of the XOMO search space is more effectively searched by an algorithm that aggressively improves its current solution rather than one that randomly thrashes through the search space.

However, one significant problem with our directed searches was the tendency to over-constrain solutions. Marginally lower energies were pursued regardless of the number of constraints needed to achieve them. This resulted in solutions that would be prohibitively labor- and cost-intensive to actually implement in a software organization. Some future work would be necessary to alleviate this problem.

A final consideration in analyzing our results is runtime. KEYS consistently ran more quickly than all of our other search engines. Situations where fast processing time is critical might require the use of KEYS rather than a slower-running engine that more effectively reduces cost.

\section{Future Work}
We recommend that future enhancements to NOVA include a facility to reduce the possibility of over-constraining solutions. For A*, this could be achieved by tuning the search algorithm to weight additional constraints more heavily as the search progresses. For beam search and KEYS, each search run could be post-processed to prune constraints that provided a statistically negligible energy reduction. Reducing constraints would make a solution much less costly for a software organization to implement, and would hopefully move solutions toward the ``sweet spot'' of low energy and low constraints.

Future NOVA work could also include the implementation of additional search engines. The six engines discussed in this paper are by no means exhaustive; exploring additional algorithms might provide still better results for software cost reduction.

\section{Conclusion}
Based on our current results, we conclude that beam search is the most effective of the NOVA search engines. However, we encourage further work to tune the existing search engines and explore additional algorithms.

\bibliographystyle{IEEEtran}
\bibliography{IEEEabrv,report_cole_williams}

\end{document}