\documentclass{llncs}
	\usepackage{rotating,wrapfig,cite,url,times,graphicx}
	\newcommand{\aive}{{\"aive}}

	\newenvironment{smallitem}
	 {\setlength{\topsep}{0pt}
	  \setlength{\partopsep}{0pt}
	  \setlength{\parskip}{0pt}
	  \begin{itemize}
	   \setlength{\leftmargin}{.2in}
	  \setlength{\parsep}{0pt}
	  \setlength{\parskip}{0pt}
	  \setlength{\itemsep}{0pt}}
	 {\end{itemize}}

	\newenvironment{smallenum}
	 {\setlength{\topsep}{0pt}
	  \setlength{\partopsep}{0pt}
	  \setlength{\parskip}{0pt}
	  \begin{enumerate}
	  \setlength{\leftmargin}{.2in}
	  \setlength{\parsep}{0pt}
	  \setlength{\parskip}{0pt}
	  \setlength{\itemsep}{0pt}}
	 {\end{enumerate}}


	 \newcommand{\bd}{\begin{description}}
	\newcommand{\ed}{\end{description}}
	 \newcommand{\bi}{\begin{smallitem}}
	\newcommand{\ei}{\end{smallitem}}
	 \newcommand{\be}{\begin{smallenum}}
	\newcommand{\ee}{\end{smallenum}}
	\newcommand{\fig}[1]{Figure~\ref{fig:#1}}
	\newcommand{\eq}[1]{Equation~\ref{eq:#1}}
	\newcommand{\tion}[1]{\S\ref{sec:#1}}

	\begin{document}

	%\CopyrightYear{2008} % Allows default copyright year (200X) to be over-ridden - IF NEED BE.
	%\crdata{0-12345-67-8/90/01}  % Allows default copyright data (0-89791-88-6/97/05) to be over-ridden - IF NEED BE.

	\title{
	Real-time Optimization of Requirements Models
	}

	\author{ Gregory Gay\inst{1}  \and
	Tim Menzies\inst{1} \and 
	Gregory Mundy\inst{2} \and 
	Omid Jalali\inst{1} \and 
	Martin Feather\inst{3} \and
	James Kiper\inst{4}}
	\authorrunning{Jalali et al.}
	\institute{West Virginia University, Morgantown, WV, USA,\\
	\email{gregoryg@csee.wvu.edu, tim@menzies.us, ojalali@mix.wvu.edu},\\ 
	\and{Alderson-Broaddus College, Philippi, WV, \\
	\email{mundyge@ab.edu}}
	\and Jet Propulsion Laboratory, California Institute of Technology, Pasadena, CA, USA, \\
	\email{martin.s.feather@jpl.nasa.gov}
	\and Dept. of Computer Science and Systems Analysis, Miami University, Oxford, OH, USA,\\
	\email{kiperjd@muohio.edu }}

	\maketitle
	\begin{abstract}
	As requirements models grow larger, so to does the need for faster requirements
	optimization methods,
	particularly when models are used by a large room of debating experts as part of rapid interactive
	dialogues. 
	Hence, there is a pressing need for 
	``real-time requirements optimization''; i.e.
	requirements optimizers  that can offer advice before
	an expert's attention wanders to other issues.

	One candidate technology for real-time requirements optimization is KEYS2.
	KEYS2 uses
	a very simple (hence, very fast) technique that
	identifies both the useful succinct sets of mitigations  
	as well as cost-attainment tradeoffs for partial solutions. 
	This paper reports experiments demonstrating that  KEYS2 
	runs four orders of magnitude 
	faster than our previous implementations and outperforms
	standard search algorithms including a classic stochastic search
	(simulated annealing), a state-of-the art local search (MaxWalkSat),
	and a standard graph search (A*).

	\end{abstract}

	\pagestyle{plain}

	\section{Introduction}
	%During requirements generation, experts debate 
	%particularly when dealing with multiple what-if queries.   
	%Often, it is not possible for human experts to debate
	%all $N$ options within a trade space.  Many requirements discussions range over 
	%dozens to hundreds of options and human beings can not rigorously
	%debate all $2^N$ options.
	%Consequently, humans
	%usually
	%focus on some small set of $W$ ``what-if'' scenarios. For example, at NASA's
	%Jet Propulsion Laboratory (JPL), it is common in  requirement engineering
	%sessions for experts to focus on (say) ten binary choices within a
	%total space of $N > 100$ options.  Note that this introduces a
	%``blind-spot'' into the trade space analysis of the $2^{N-W}$ issues
	%that are not included in the  debate.  
	%The room is crowded and everyone is talking at once.  Time is limited:
	%the people in that room are needed elsewhere, in just
	%a few hours time.  While the facilitator tries to guide the group
	%to consensus, your PDA rings: its a text message from a friend
	%across the room. She is suggesting an innovative redesign of the
	%file system. You agree that it is a good idea and the two of you
	%base all your future discussion on that redesign. Meanwhile, across
	%the room, someone else's PDA rings and, unknown to you, two other
	%people are agreeing on a contradictory change to the file system.
	%This design clash is not detected till much later in the project,
	%at which point it becomes very expensive to realign separate parts
	%of the design based on conflicting assumptions.

	%This scenario is not imaginary.  For example, at the ``Team X''
	%meetings at NASA's Jet Propulsion Laboratory, experts
	%in  science data collection, communication, guidance, etc    
	%to generate a ``mission concept'' document which may the basis of
	%millions to billions of dollars of subsequent development work.
	%There is no time for everyone to talk to everyone else and, all too
	%often, experts on one side the room can make decisions that one
	%hinder decisions {\em and the group is unaware of this conflict}.

	%This is not just a NASA problem. Rather, it is  a
	%a looming problem with all group-think tools. As our tools grow better, and they will be used by larger groups who will build
	%more complex models.
	%The problem of co-ordinating group discussions 
	%is challenging in the $21^{st}$ century
	%net-enabled world where
	%participates communicate via
	%via multiple channels; e.g. talking to the whole room while texting some individuals across the room.
	%In such a multiple channeled environment, small sub-groups can evolve opinions that are not
	%shared with the rest of the group.

	%If miscommunication lead to inappropriate decisions, then this 
	%can be fatal to software projects.
	%For example,
	%it is very common for budgets to be determined after some requirements analysis. Once set,
	%it can be difficult 
	%revise those budgets later in the project.
	%Also, it can be very expensive to change poor decisions made early in the lifecycle.
	%The longer a system is designed around the wrong
	%set of assumptions, then more expensive it becomes to change those assumptions.
	%Boehm \& Papaccio 
	%advise that reworking software is far
	%cheaper earlier in the life cycle than later ``by factors
	%of 50 to 200''~\cite{boehm88}. 
	%Hence, when managing software projects, we must make the best decisions we can, as early as possible. 

	%If requirements analysis is automatic and fast, then 
	%automatic methods can help a group maintain focus within multi-channeled meetings.
	%Individuals will focus on the whole group is
	%if the activity of the whole
	%team commands their attention.
	%For example, suppose a requirements analyzer finds a major problem or a novel better solution. 
	%Such a result would command the attention of the whole group, in which case everyone in the
	%room would
	%interrupt their current deliberations to focus on the new finding.

	%A premise of this approach is that 
	%requirements analyzers offer 
	%feedback on their models in ``real time'';
	%i.e. 
	%before an expert's attention wanders to other issues.
	%Neilson~\cite[chp5]{neilson93} reports that ``the basic advice regarding response times has 
	%been about the same for thirty years; i.e.
	%\bi
	%\item
	%1.0 second is about the limit for the user's flow of thought to stay uninterrupted, even though the user will notice the delay
	%\item
	%10 seconds is the limit for keeping a user's attention focused on the dialogue.'' 
	%\ei
	%In this paper, we split the difference and demand five second 
	%response time for real-time requirements analysis.

	%*****************************
	% V7 content added by G. Mundy
	%*****************************
	Decades of experience has shown that developing large scale software systems is an extremely high-risk, complicated, and tedious task. There are 
	many factors that contribute to the complex nature of modern software development efforts; however, communication between stakeholders is viewed as 
	one of the greatest barriers to achieving success. Modern software projects often involve large groups of individuals working together. These 
	individuals are often geographically dispersed and must rely on modern communication methods such as teleconferencing, email, and instant messaging 
	to communicate with each other. These communication methods, though being extremely useful, can do very little to help alleviate the underlying 
	communication issues that arise as a result of the sheer number of people involved with a project.

	To illustrate the threats to software development posed by these communication issues, consider the Team X meetings at the National Aeronautics and 
	Space Administration (NASA) Jet Propulsion Laboratories (JPL), where experts in science, engineering, data collection, and communication work 
	together to generate mission concept designs that may be the basis for millions to billions of dollars of subsequent development work. In these 
	meetings, there is no time for everyone to talk to everyone else and, all too often, experts on one side of the room may make decisions that hinder 
	decisions made by another group of experts on another side of the room and neither group is aware of the conflict.

	This issue is not just confined to NASA and Team X. It is a looming problem that is present within all group-think tools and strategies, 		particularly given the nature of $21^{st}$ century communication where individuals have grown accustomed to communicating simultaneously via 
	multiple channels (such as talking to the whole room while texting some individual across the room). In these kinds of environments, small sub-
	groups can evolve opinions or ideas that are not shared with the rest of the group.

	Miscommunication can often lead to inappropriate decisions, which can be fatal to software projects. For example, it is very common for budgets to 
	be determined after some requirements analysis. Once set, it is difficult to revise these budgets later in the project, which may lead to many 
	issues including lower quality software being produced because developers had to cut corners to stay on budget.

	In addition to budget issues, miscommunication can lead to the software being built around an incorrect set of assumptions. As noted by Boehm and 
	Papaccio, it is far cheaper to rework software earlier in the lifecycle than later ``by factors of 50 to 200'' ~\cite{boehm88}. The ideal when 
	managing software projects is to make the best decisions that can be made as early as possible.

	If requirements analysis is automatic and fast, then automated methods can help a group maintain focus within multi-channeled meetings. Individuals 
	will focus on the entire group if the activity of the whole team commands their attention. For example, suppose a requirements analyzer finds a 
	major problem or a more novel solution. Such a result would command the attention of the whole group and cause them to interrupt their current 
	deliberations to focus on the new finding.

	A premise of this approach is that requirements analyzers offer feedback on their models in ``real time'', which is defined for this purpose as the 
	time before an expert’s attention wanders to other issues. Nielson ~\cite[chp5]{neilson93} reports that ``the basic advice regarding response times 
	has been about the same for thirty years; i.e.
	\bi
	\item 1 second is about the limit for the user’s flow of thought to stay uninterrupted, even though the user will notice the delay
	\item 10 seconds is the limit for keeping a user’s attention focused on the dialogue'' 
	\ei
	For our purposes, we split the difference and consider a five second response time for real-time requirements analysis.

	This is a challenging problem. Based on the current projections from JPL, we predict that by 2013, requirements models will be ten times larger 
	than today and contain nearly $N\approx 1000$ variables. This is an alarming growth rate since:
	\bi
	\item Requirements optimization is often a non-linear problem where possible benefits must be traded off against the increased cost of implementing 
	those benefits;
	\item In such non-linear problems, the number of combined options to consider in a model of size $N$ can grow exponentially to $2^N$.
	\item Based on our JPL experience, we find that experts often explore their models in context of $W$ what-if scenarios where combinations of these 
	$2^N$ options must be studied in the context of $2^W$ possible assumptions. That is, the space of options to explore is $2^W \times 2^N$.
	Assuming that users explore just a handful of what-if query models (say $W = 10$) of size $N \leq 1000$, then the total search space to be explored 
	in five seconds or less is $\geq 10^{300}$.
	\ei

	Extrapolating forward in time, we can generate an estimate of how fast we need to be processing our \emph{current models} on our \emph{current 
	hardware} in order to explore the models we expect to see in five years time. The appendix of this paper offers a ``back of the envelope'' 
	calculation for that extrapolation. Based on that calculation, we assert that our current models must terminate in $10^{-2}$ seconds if we wish to 
	support five second response times for the kinds of real-time requirements problems we expect to see now and in the near future.

	The field of Search-Based Software Engineering (SBSE) has evolved out of the premise that many software engineering tasks can be formulated as 
	optimization problems; however the computational complexity of these problems lead to the application of heuristic search techniques such as 
	simulated annealing and genetic algorithms as methods of identifying near optimal or ``good enough'' solutions.

	It is not very difficult to see how SBSE can be used as an effective method for approaching requirements optimization problems and aiding 
	in creating the kind of real-time requirements optimizer described previously. One may argue that conventional optimization techniques such as 
	\emph{dynamic programming} and \emph{integer programming} would yield more optimal solutions than their heuristic-based counterparts; however, 
	given the size of the search space ($\geq 10^{300}$) that must be explored, these algorithms would be unable to solve the requirements optimization 
	problems within the five second window. As Clarke, et al. remark, ``\ldots software engineers face problems which consist, not in finding the 	
	solution, but rather in engineering an acceptable or near-optimal solution from a large number of alternatives'' ~\cite{clarke03}.

	According to Harman\cite{harman01}, many software engineering problems are over-constrained and no precise solution over all variables is 
	achievable; therefore partial solutions based on heuristic search methods are preferred. When offering partial solutions, it is very important to 
	also offer insight into the space of options around the proposed solution. Such neighborhood information is very useful for managers with only 
	partial control over their projects since it can give them confidence that even if only some of their recommendations are effected, then at least 
	the range of outcomes is well understood. Harman\cite{harman07} comments that understanding the neighborhood of our solutions is an open and 
	pressing issue in SBSE:
	\begin{quote}
		``In some software engineering applications, solution robustness may be as important as solution functionality. For example, it may be 
		better to locate an area of the search space that is rich in fit solutions, rather than identifying an even better solution that is 
		surrounded by a set of far less fit solutions.''
	\end{quote}

	\begin{quote}
		``Hitherto, research on SBSE has tended to focus on the production of the fittest possible results. However, many application areas require 
		solutions in a search space that may be subject to change. This makes robustness a natural second order property to which the research 
		community could and should turn its attention.''
	\end{quote}

	The main goal of this paper is to discuss two keys-based heuristic search algorithms, referred to as KEYS and KEYS2, and how they offer robust 	
	information around partial solutions within the $10^{-2}$ second time boundary previously discussed.

	\subsection{Contributions of This Paper}
	The specific contribution of this paper is to define the problem of real-time requirements engineering. We offer materials that allow other 
	researchers to conduct further investigation into this problem. All the code, Makefiles, scripts, and so on used in this paper are available 
	online\footnote{See \url{http://unbox.org/wisp/tags/ddpExperiment/install}}. 
	
	A more general contribution of this paper is that it demonstrates the advantages of keys-based search over other search methods in widespread use 
	(simulated annealing) and which are considered to be state-of-the-art (MaxWalkSat). Our keys-based search algorithm (KEYS) is remarkably simple to 
	implement and has been demonstrated to return better quality solutions in faster times than simulated annealing and MaxWalkSat. In addition, our 
	keys-based search finds solutions that constrain fewer variables than other methods (ASTAR) but still achieves similar or slightly better scores.
	Furthermore, the incremental keys-based search algorithm (KEYS2) described in this paper offers extensive information about the space of effects 
	around a particular solution. Alternative search methods offer point solutions that are inappropriate when managers do not have full control over 
	which decisions are made, or when such control is prohibitively expensive.

	A third contribution of this paper is that we offer a set of baseline results for the models explored in this paper that we can achieve 
	optimizations in around $10^{-2}$ seconds. Such baseline results are important because they allow others in the field to demonstrate clear advances 
	over (say) this paper.

	A final contribution of this paper is that it comments on a standard search algorithm used widely in the field of SBSE. Clark et al. warn that 
	simulated annealing may be quite computationally expensive~\cite{clark03}. But these warnings do not go any further to say that, assuming 
	simulated annealing terminates, its results may be worse than those seen by other methods. More generally, apart from cautions on the runtime costs 
	of simulated annealing, we are unaware of other cautionary notes on the relative value of the results of Simulated Annealing in the software 
	engineering literature. Such cautions should be offered since we demonstrate in this paper that the results of simulated annealing can be 
	outperformed by a variety of alternative search engines, some of which are quite simple to code, such as our keys-based algorithms. This result is 		quite exciting because it implies that current results from simulated annealing (for example\cite{mitchell06,boutif06,pan08,tracey98,bin07}) could 
	be greatly improved simply by switching to an alternate search algorithm. 

	%
	%
	%{\em Model-based requirements engineering} can support option exploration.
	%If a model exists of the requirements, and some automatic oracle can score model output,
	%then it is 
	%possible to conduct automatic trade studies. For each of
	%the $2^{W}$ scenarios, the constraints of that scenario,
	%are imposed on the  inputs to the requirements model  (and the other $N-W$ options are selected
	%at random).
	%Each of the $2^{W}$ scenarios are then ranked via their net effect
	%on (say) benefits and cost.  Such studies can find options in the trade space that
	%that human experts have overlooked.  For example: \bi \item Experts
	%may inappropriately focus on propulsion weight reduction; a better
	%solution may be to use smarter controlling algorithms, thus requiring
	%fewer thrusters.  \item A trade space analysis  might suggest
	%reducing the weight of a science payload by (say) smarter software
	%to control the remote sensing.  
	%\ei This ability to explore human blind-spots 
	%is a key feature of requirements optimization that adds value to
	%standard design practices.  
	%
	%make decisions, they could be asserted into the requirements model
	%This is a challenging problem. Based on current projections from  JPL,
	%we predict that by 2013, our requirements models will be ten times larger than today and
	%contain nearly $N\approx 1000$  variables.
	%This is an alarming growth rate since:\bi
	%\item Requirements optimization is often a non-linear problem where possible benefits
	%must be traded off against the increase cost of implementing those benefits;
	%\item In such non-linear problems,
	%the number of combined options to consider in a model of size $N$ can grow exponentially
	%on model size to $2^N$. 
	%\item
	%Based on our JPL experience, we find that experts often explore their models in the
	%context of $W$ what-if scenarios where  combinations, these $2^N$ options must be studied
	%in the context of $2^W$ possible assumptions.
	%That is, the space of options to explore is $2^W*2^N$. 
	%\item
	%Assuming that users
%explore just a handful of what-if queries (say, $W=10$)
	 %models of size $N\le 1000$, then the
	%total search space to be explored in five seconds (or less) is
	%$\ge 10^{300}$.
	%\ei
	%Extrapolating forwards in time,  
	%we can generate an estimate of how fast we need to be processing our 
	%{\em current models}
	%on our {\em current hardware} 
	%in order to explore  the models we expect
	%to see in five years time.
	%The appendix of this paper offers a ``back of the envelope'' calculation for that
	%extrapolation. 
	%Based on that calculation, we assert that our current models must be 
	%terminating in $10^{-2}$ seconds
	%if we wish to support five second response times
	%for the kinds of real-time requirements problems we expect to see in the near future.


	%This paper shows that, at least for the case studies we explore here,   
%that three techniques let us achieve
%	runtimes
%	of $10^{-2}$.
%	This is a significant improvement (10,000 times faster)
%	 than  our prior
%	results in this area~\cite{fea02a} that used simulated annealing~\cite{kirkpatrick83}. 
%The three techniques are:
%	\be
%	\item Like many before us~\cite{shum94,mylo92,mylo99,maclean96,saaty80,akao90,corn01,feather08},
%	we advocate the use of {\em ultra-lightweight requirement models}.
%	Such ultra-lightweight models are the preferred options for very early life cycle
%	models when the details required for more elaborate models are not yet available.
%	Another advantage of such models is that, as shown in this article, there exists 
%	non-trivial 
%	ultra-lightweight models that can be processed fast enough
%	to support real-time requirements engineering.
%	\item
%	{\em Knowledge compilation} pushes some  percentage
%	of the query computational overhead into a compilation phase. This cost
%	is amortized over time as we increase the number of on-line queries.
%	Previous work with knowledge compilation has focused
%	on compilation languages utilizing CNF equations, state machines,
%	or BDD~\cite{darwiche02,biere99:bmc}.  These forms may be needlessly complex for ultra-lightweight models.
%	We show here that  a simple procedural
%	target language can suffice.
%	\item
%	{\em Keys-based search:} 
%	Many 
%	models contain a small number of {\em key variables} that set the remaining variables~\cite{Amarel86,me03l,kohavi97,williams03,crawford94}.
%	For models contain keys, the complexity of the searching the entire model reduces to just the complexity of searching
%	the key variables. The KEYS2 algorithm, proposed here, is a very fast method for finding the key variables.
%	\ee
%This
%	paper improves over an earlier publication~\cite{promise08} in three
%	significant ways: \be \item This paper reports a new version of
%	KEYS that runs two to four times faster than the previously reported
%	version.  \item Previously, we explored one search engine (KEYS);
%	this report compares that search engine against alternatives (KEYS2,
%	Simulated Annealing, A*, MaxWalkSat)\footnote{ KEYS2 evolved as
%	follows.  Clark and Menzies~\cite{clark05} proposed a support-based
%	Bayesian sampling method   to optimizing rule learning.
%	Jalali~\cite{promise08} speculated that a greedy hill-climbing
%	version of that sampling method could {\em replace}, rather than
%	merely {\em augment} rule learning.  This resulted in the original
%	KEYS algorithm~\cite{promise08}.  Gay and Menzies (this paper)
%	proposed an optimization to that process, called KEYS2, whereby the
%	algorithm starts out greedy but then takes progressive longer steps
%	into the search space.  }.  \item This paper takes care to ensure
%	that all our experiments repeatable and hence  and/or
%	refutable.  All of the models and algorithms required to repeat our
%	experiments are freely available on-line  under an open-source
%	license%\ee

%\subsection{Contributions of This Paper}
%The specific contribution is to define the problem of real-time requirements engineering.
%We offer materials that allow other researchers to work on this problem.
%All the code, Makefiles, scripts etc used in this paper are available on-line\footnote{See
%	\url{http://unbox.org/wisp/tags/ddpExperiment/install}}.
%Also, we offer a baseline result. At least for the models explored in this paper, we can
%achieve optimizations in around $10^{-2}$ seconds.  Such baseline results since they
%allow others in the field to demonstrate clear advances over (say) this paper.

%A more general contribution to this paper is to demonstrate the advantages
%of  keys-based search. Our
%keys-based search returns better quality solutions in faster
%time that other methods in widespread use (simulated annealing) or which are supposedly
%state of the art (MaxWalkSat). 
%our keys-based search is remarkably simple to implement 
%and
%finds solutions that
%constrain fewer variables that other methods (ASTAR) but which achieve similar, or slightly better
%scores.  
%Also, the incremental   keys-based search  described in this paper
%offers extensive information about the space of effects {\em around} a particular solution.
%Alternative search methods offer 
%	point solutions that are inappropriate when managers do not not have
%	full control over which decisions are made, or when such control
%	is prohibitively expensive.  
%We hope these results motivates other researchers to experiment with the keys assumption.

%Another important result from this paper is to  comment on a standard search engine, used widely in the
%field of search-based software engineering (SBSE). 
%Clark et al.~\cite{clark03} warn that simulated annealing may be quite computationally expensive.
%But those warning do not go further to say that, assuming simulated annealing terminates,
%then its results may be worse than those seen by other methods.
%More generally, apart from cautions on the runtime cost of simulated annealing, we are unaware of other
%cautionary notes on the relative value of  the results of simulated annealing in the 
%software engineering literature. Such cautions should be offered since,
%here, we show that the results of simulated annealing 
%can be 
%out-performed by a variety of alternatives search
%engines (some of these are quite simple to code such as KEYS). This is an exciting result
%since it means that current results from simulated annealing (e.g.
%\cite{mitchell06,boutif06,pan08,tracey98,bin07}) could be greatly improved, just by switching to an
%alternate search engine.
	\section{Related Work} 
	\subsection{Other Requirements Tools}
	This work on real-time requirements analysis should not be interpreted as a rejection of other  tools.
	There exist many powerful requirements analysis tools including
	continuous simulation (also called system dynamics)~\cite{hamid91,sterman00},
	state-based simulation (including petri net and data flow approaches)
	~\cite{akhavi93,harel90,martin00}, hybrid-simulation (combining discrete
	event simulation and systems dynamics)~\cite{martin01,donzelli01,setamanit07},
	logic-based and qualitative-based methods~\cite[chapter
	20]{bratko01}~\cite{iwasaki89}, and rule-based
	simulations~\cite{mi90}. 
	One can find these models being used in the requirements phase (i.e. 
	the DDP tool described below), design refactoring using patterns ~\cite{france03}, software integration~\cite{denno03}, 
	model-based security~\cite{jurjens06}, and performance assessment~\cite{bals04}.
	Many researchers have proposed support environments to
	help explore the increasingly complex models that engineers are
	developing. Gray et al ~\cite{gray06} have developed the Constraint-Specification
	Aspect Weaver (C-Saw), which uses aspect-oriented approaches ~\cite{filman04}
	to help engineers in the process of model transformation.
	Cai and Sullivan ~\cite{cai05} describe a formal method and tool called
	{\em Simon} that ``supports interactive construction of formal models,
	derives and displays design structure matrices... and supports
	simple design impact analysis.'' 
	Other tools of note are lightweight formal methods such as ALLOY ~\cite{jackson02}
	and SCR~\cite{heitmeyer02} as well as various UML tools that allow
	for the execution of life cycle specifications 
	(e.g. the CADENA scenario editor ~\cite{child06}).

	Many of the above tools were built to maximize the expressive power of the representation language or the constraint language
	used to express invariants. What distinguishes our work is that we are 
	{\em willing to trade off
	representational or constrain expressiveness 
	for faster runtimes}.
	There exists a class of ultra-lightweight model languages which, as we show below, can be processed fast enough to support the
	real-time requirements engineering problem described above. Any of the tools listed in the last paragraph are also candidate
	solutions to the problem explored in this paper, if it can be shown that their processing terminates in hundreths of  a second.

	\subsection{Other Optimizers}
	As documented by the search-based SE literature~\cite{clarke03,harman01,harman04,rela04}
and Gu et al~\cite{gu97},
there are many possible optimization methods.
	 	For example:
\bi
\item
	Sequential methods run on one CPU while parallel
	methods spread the work over a distributed CPU farm. 
\item
Discrete methods
	assume model variables have a finite range (that may be quite small)
	while continuous methods assume numeric values with a very large
	(possibly infinite) range.  
\item The search-based SE literature prefers meta-heuristic methods 
like simulated annealing, genetic algorithms and tabu search.
\item
Some methods map discrete values true/false into a continuous range 1/0
	and then use integer programming methods like CPLEX~\cite{mittle07}
	to achieve results. 
\item
	Other methods find overlaps in goal expressions and generate a binary decision diagram (BDD)
	where parent nodes store the overlap of children nodes.
\ei
	This list is hardly exhaustive: Gu et al. list hundreds of other methods and no
single paper can experiment with them all.
	All the algorithms studied here  are discrete and sequential.
	We are currently exploring parallel versions of our optimizers but, so far, the communication overhead
	outweighs the benefits of parallelism.
	As to using integer programming  methods, 
	we do not explore them here for two reasons.
	Coarfa et
	al.~\cite{coarfa00} found that  integer programming-based approaches
	ran two an order of magnitude slower than discrete methods like
	the 
	MaxWalkSat and KEYS2 algorithm we use below.  Similar results reported by Gu et.al  where discrete
	methods ran 100 times faster than integer programming~\cite{gu97}.

Harman offers another reason to avoid integer programming methods.
In his search-based SE
manifest, Harman~\cite{harman01} argues 
that many SE problems are
over-constrained and so there may exist no precise solution that
covers all constraints. 
A complete solution over all variables is hence impossible and  partial solution based on 
heuristic search methods are  preferred.
Such methods may  not be complete; however, as Clarke et al remark, ``...software engineers
face problems which consist, not in finding {\em the} solution,
but rather, in engineering an {\em acceptable} or 
{\em near-optimal solution} from a large number of alternatives.'' ~\cite{clarke03}

\subsection{Robust Solutions in SBSE}
\label{sec:other}
Another reason to avoid precise solutions that comment 
on all controllable variables is that
such may be  inappropriate for decision making and software projects.
In many cases, managers do not or can not  control all variables in a project:
\bi
\item
In any project, there is some expense  associated with changing the current organization of  a project.
In this case, managers prefer smaller solutions to larger ones.
\item
In large projects, much of the development is performed by contractors and sub-contractors 
(and sub-sub-contractors). In such projects is 
is difficult
to monitor and control all process options. For example, in government contracts, it is common for
contractors to minimize the amount of contact government personnel have with the contractors.
Such contractors
rigorously define a minimum set of monitoring and control points. Once enabled, it can be difficult 
(perhaps even impossible)
to change that list.
\ei
When offering partial solutions, it is very important to also offer insight into the space of options
{\em around} the proposed solution. Such {\em neighborhood} information is very useful for managers
with only partial control over their projects since it can give them confidence that even if only
some of their recommendations are effected, then at least the range of outcomes is well understood.
Harman~\cite{harman07} comments that understanding the neighborhood our solutions is an open and pressing issue in SBSE:
\begin{quote}
``In some software engineering applications, solution 
{\em robustness} may be as important as solution functionality. For 
example, it may be better to locate an area of the search 
space that is rich in fit solutions, rather than identifying an 
even better solution that is surrounded by a set of far less fit 
solutions. 

``Hitherto, research on SBSE has tended to focus on the 
production of the fittest possible results. However, many application areas require solutions in a search space that may 
be subject to change. This makes robustness a natural second order property to which the research community could 
and should turn its attention.''
\end{quote}
It is trivial for our preferred method (KEYS and KEYS2) to offer robust information
around partial solutions. As we shall see, such robust neighborhood information  cannot be
generated from other methods (e.g. simulated annealing) without extensive and time-consuming post-processing.

%As we shall see, 
%Keys-based searching also have another advantage. Many optimization methods  offer solutions
%that constrain all model options. Such {\em single point}
%offers little insight into the space of options
%around that solution. 
%Point solutions are inappropriate when managers do  
%not not have full control over which decisions are made, or when such control is prohibitively expensive. 
%Managers often seek the fewest interventions required to control their projects:
%\bi
%\item
%Smaller sets of decisions make fewer changes to a project than larger solutions and so may be cheaper and quicker to implement.
%\item
%For larger projects involving sub-contractors, such minimal sets
%can be used to define a contracts most informative set of monitoring and control points.
%\ei
%Software planners need minimal solutions (with fewest interventions). They also need to know the region of effects {\em around} their
%proposed decision. As we shall see, the KEYS2 algorithm does not offer point solutions. Rather, 
%it can very quickly describe the space of effects around proposed solutions.
%
%
%Bo
%This paper describes experiments in real-time trade space analysis of software requirements
%that are   effective, non-brittle, and reveal redundancies \& stability.
%Using a  new algorithm, 
%KEYS2, we can perform this analysis in $10^{-2}$ seconds; i.e. fast enough for real-time  trade space analysis. 
%We show that KEYS2 runs faster and identifies simpler solutions that other standard methods
%that are widely advocated such as simulated annealing, MaxWalkSat, and ASTAR.
%
%Note that we do not preclude additional ``slow time'' analyses augmenting real time analysis. 
%
%Requirements engineering decisions are vital to the success of a software project. In the usual case, there are many possible decisions. These space of options
%forms a ``trade space'' that
%Brantley et al.~\cite{brantley02} define as follows:
%\begin{quote}
%... the set of program and system parameters, attributes, and characteristics required to satisfy performance standards. Decision makers define and refine the developing system by making tradeoffs with regard to cost, schedule, risk, and performance; all of which fall within the systems trade space. 
%\end{quote}
%They argues for more use of trade space analysis, earlier in a project lifecycle:
%\begin{quote}
%Traditional trade space methodologies often support "stove-piped" programs that conduct multiple distributed but disparate analyses.
%However, by using models and simulations  throughout the system's life cycle, {\em particularly within the requirements generation phase}, 
%it is possible to expand the number of feasible alternatives within the trade space analysis.
%\end{quote}
%%Ideally, a trade space analysis returns not just one solution, but
%%also knowledge of the neighborhood of that solution. We say that a bad solution to  a trade space analysis is
%%\bd
%%\item[a) ``ineffective'']:  the solution achieves too few goals of the system;
%%\item[b) ``brittle'']:   small changes in the decisions lead to major changes in attainment.
%%\item[c) ``redundant'']:  the same solution can be achieved using just X\% of the decisions.
%%\item[d) ``unstable'']: the value of that  solution varies wildly according to other factors ignored by the X\% solution.
%%\item[e) ``delayed'']: the trade space analysis should not be the bottleneck in requirements discussions.
%%\ed
%%The last point requires some elaboration.
%\subsection{Contributions of this Work}
%The contribution of this paper is five-fold.
%Firstly, we show that a trade space analysis can be very fast for
%certain kinds of requirement models.
%
%Secondly, we propose a simple but powerful visualization for handling trade space analysis is multi-dimensional space.
%As shown in the next section,
%even advanced visualizations can only handle seven dimensions. Our ``rank chart'', on the other hand,
%can handle many more.
%
%Thirdly,
%we demonstrate the value of the ``keys effect'' for trade space analysis.
%As shown below, many 
%models contain a small number of {\em key variables} that set the remaining variables.
%When models contain keys, the complexity of the searching the entire model reduces to just the complexity of searching
%the master variables. Our KEYS and KEYS2 algorithms
%are both the {\em test} for the presence of key variables as well as a method to {\em exploit} such variables.
%KEYS and KEYS2 are deliberately designed to {\em fail} if a model does not contain keys.
%Hence, one way to test for the presence of key variables is to apply algorithms like KEYS and KEYS2. 
%Once we know a model contains keys, then we can exploit that knowledge for (e.g.) trade space analysis since, by restricting
%the analysis to just the key variables, the major influences on the trade space can be readily isolated.
%
%Fourthly, we advice an extension to search-based software engineering
%(SBSE) paradigm of Harman et.al.~\cite{harman04}.  The standard
%SBSE toolkit is some combination of simulated annealing and other
%meta-heuristic search engines such as genetic algorithms, simulated
%annealing, and tabu search.  Inspired by the search-based SE
%literature, our initial experiments with requirements optimization
%used simulated annealing. As shown above, this simulated annealing
%algorithm is out-performed by every other algorithm studied here.
%Perhaps researchers in SBSE might  it advantageous to look beyond
%standard search algorithms.
%
%Fifthly, we offer a methodological lesson that it can be valuable
%to explore a domain with very simple tools.  A student of the
%research literature might conclude that it is better to use someone
%else's search tool for their problem.  Such tools may be  the results
%of years of development and come complete with elaborate tool suites
%and a large on-line community of developers who can support the
%newcomer.  Nevertheless, there is a case for trying  new tools.  In
%his textbook on empirical methods, Cohen~\cite{cohen95}  advises that supposedly
%more sophisticated methods should be benchmarked against ``straw
%men''; i.e. seemingly naive alternate implementations.  In our
%experience, we have sometimes found that the ``straw men do not
%burn''; i.e. that very simple methods like KEYS2 perform remarkably
%well. 
%
%\section{Visualizing Trade Space}
%
%One method of  trade space analysis is to check for
%output performance curve changes results from input changes.
%For continuous linear models, this can be just  computing partial
%derivatives around the region of some proposed solution~\cite{saltelli00}. For non-continuous
%discrete models, this is not possible. Discrete models contain ``cliffs'' (discontinuities) 
%where the output performance can change dramatically.
%
%A standard method for understanding discontinuous model outputs is visualization~\cite{treinish98,lee04}.
%\begin{figure}[!t]
%\centerline{\includegraphics[width=3.5in]{mat_den_SD.pdf}}
%\caption{A visualization tool for scientific data. From~\cite{treinish98}.}
%\label{fig:treinish}
%\end{figure}
%Many researchers have developed impressive visual environments for
%decreasing the cognitive overload associated with exploring
%a multi-dimensional space. For example, \fig{treinish} shows a tool
%developed at IBM that augments a standard three-dimensional display
%with visual cues relating to higher-dimensional data; e.g.,
%supplementary dimensions are shown bottom right; blue circles around
%the axle show circular motion information; and a color key, shown
%bottom left, indicates how colors in  the display relate to density
%information~\cite{treinish98}. 
%While such visualizations help analysts
%explore visual information, they present their own challenges.
%There are still
%limits on how many dimensions can be displayed-  we have yet
%to see effective visualizations for more than a ten dimensionality space.
%
%Recently~\cite{promise08,me07f}, we have been exploring an alternative representation called``rank charts'' that can summarize the effects
%of dozens to hundreds of variables on some output goal.
%To generate a rank chart, we 
%start with  a requirements model with inputs $I_1....I_m$.
%The model produces outputs 
%$O_1,...O_n$ which an oracle can score \[attainment=oracle(O_1,...O_n)\] 
%We suppose that some algorithm (the details of which are discussed below) can  rank the input
%decisions  $I$ into a new sort
%$I'_1,...,I'_m$ where decision $I'_{i}$ has more impact on the outputs than $I'_{j>i}$.
%Using this oracle, we can draw  rank charts like \fig{egdat} as follows:
%\be
%\item Select the first $x$ items in $I'$ (i.e. the $x$ top-ranked inputs).
%\item Conduct $N$ simulations of the model selecting inputs as follows: (a) use the first $x$ decisions;
%(b)  select the remaining inputs at random. 
%\item  Collect statistics on the attainment including both the central value and the variance around that central value.
%\item Repeat the above for
%$1 \le x \le m$. Graph the results 
%\ee
%The $median$ curve of \fig{egdat} displays the 50\% percentile of the $attainments$ found at each $x$ point.
%The $spread$ curve is a 
%non-parametric
%measure of variance  that is the 75\%-50\% percentile interval.
%As we make more decisions (i.e. move right on the x-axis), the variance decreases since we are adding more constraints
%to the model inputs.
%
%\begin{figure}[!t]
%\begin{center}
%\includegraphics[width=2in]{egdat.pdf}
%\end{center}
%\caption{A rank chart representation of trade space. 
%The x-axis shows the space of options.
%The y-axis shows the x-axis shows attainments seen in $N$ simulations  after fixing all the 
%top $x$ top ranked decisions (i.e. all the decisions from 1 to $x$);
%then(b) selecting the other decisions at random.}\label{fig:egdat}
%\end{figure}
%Rank charts address the problem of
%{\em point decisions}.
%A point decision is a single solution that offers no insight into the space of options
%around that decision. 
%Simple optimization methods yield such point decisions if
%they fully constrain all controllable variables.
%Point solutions are inappropriate when managers do  
%not not have full control over which decisions are made, or when such control is prohibitively expensive. 
%Managers often seek the fewest interventions required to control their projects:
%\bi
%\item
%Smaller sets of decisions make fewer changes to a project than larger solutions and so may be cheaper and quicker to implement.
%\item
%For larger projects involving sub-contractors, such minimal sets
%can be used to define a contracts most informative set of monitoring and control points.
%\ei
%Software planners need minimal solutions (with fewest interventions). They also need to know the region of effects {\em around} their
%proposed decision.
%Rank charts offer that information- when assessing the merits
%of some solution at some point $X$ on a rank charts, a manager can
%readily see the effects of implementing twice, or half of that solution by looking 
%at the points  $2X$ or $\frac{X}{2}$ 
%
%Rank charts also let us comment on several important features of a trade space analysis.
%A bad trade space analysis offers solutions that are:
%\bd
%\item[a) brittle]:   Point decisions are particularly problematic for ``brittle'' solutions where small changes
%to the solution has a large impact on attainment.
%Brittle solutions can be detected in rank charts when 
% the median decisions-vs-attainment plot exhibits sudden increasing and decreasing cliffs.
%\item[b) ineffective]:   i.e. achieves poor median performance.
%\item[c) unstable]: i.e. the variance around the median is very large.
%\item[d) redundant]:  i.e. non-minimal. If the median decisions-vs-attainment plot plateaus, then all decisions after the start
%of the plateau are redundant; e.g.  for \fig{egdat}, decisions 41..100 are redundant.
%\ed
%The rest of this paper describes algorithms for the real time generation of generating 
%rank charts for finding trade space solutions that are 
%effective, non-brittle, succinct (i.e. non-redundant) and stable.
%Before describing the algorithms, the next  section describes the models the algorithms process.
%
%
%
%Model-based design is a well-accepted software engineering technique.
%According to Sendall and Kozacaynski, ``using concepts closer
%to the problem domain ...'' via modeling~\cite{sendall03} can lead
%to increased productivity and reduced time-to-market. 
%Hailpern and Tarr argue that  model-based design ``imposes structure and common
%vocabularies so that artifacts are useful for their main purpose
%in their particular stage in the life cycle''~\cite{hailpern06}.
%Model-based Software Engineering has become the norm at large
%organization such as Microsoft~\cite{greenfield04}; Lockheed Martin
%~\cite{waddington03}; and the Object Management Group~\cite{brown06}.
%
%Such models reveal important combinations of options that might otherwise be missed. 
%For example, at NASA, design teams use ``Defect Detection and Prevention'' (DDP) ~\cite{corn01,feather08} models 
%to find the {\em least} costly project options that {\em most} increase the chance to obtain more requirements. 
%DDP is used by teams of experts to display their (a)~current understanding of requirements;
%(b)~the known risks to requirements; as well as (c) the mitigations that might reduce the risks.  A typical DDP session lasts
%hours to days. A dozen (or more) experts meet in a large room. In that room, experts
%either sit together for
%large group discussions or break apart into small groups around the room.
%In order to assure cohesiveness between all the experts and all the groups, a DDP model showing the total
%space of options 
%is continuously updated and displayed on a screen at one end of the  room. 
%For more details on DDP, see \S2.
%
%Often, it is not possible for human experts to debate all $N$ options
%within a DDP model. Many models contain dozens to hundreds of options
%and human beings can not rigorously debate all $2^N$ combinations
%of those options.  Typically, humans focus on $W \ll N$ ``what-if''
%scenarios. For example, at JPL, it is common in  DDP sessions for
%experts to focus on (say) $W=2..10$ binary choices within a total
%space of $N > 100$ choices.  Note that this introduces a ``blind-spot''
%into the requirements analysis of the $N-W$ issues that is not included in the experts' debates.
%
%In order to probe those blind spots, we conduct optimization studies.
%For each of the $2^W$ scenarios, we impose the constraints
%of that scenario, then use an automatic optimizer to explore the
%other $N-W$ options. This exploration proposes settings to the other
%options that most increase the benefits of requirements coverage
%while minimizing their associated cost, for more details on that
%process, see \S2. Each of the $2^W$ scenarios are then ranked via
%their net effect on benefits and cost. 
%In an interesting number of cases, our optimization studies
%offer recommendations  from the space of $N-W$ options in the model
%that the human experts have overlooked.  For example: \bi \item
%Experts may inappropriately focus on propulsion weight reduction;
%a better solution may be to use smarter controlling algorithms, thus
%requiring fewer thrusters.  \item DDP optimization might suggest
%reducing the weight of a science payload by (say)
%smarter software to control the remote sensing, thus
%requiring fewer instruments.  \ei This ability to explore human
%blind-spots in requirements models is a key feature of DDP that
%adds value to standard design practices.
%
%To best support this blind-spot exploration, our DDP optimization should run fast enough that humans can
%get feedback on their models in a timely manner. 
%Previously, we have proposed various technologies for DDP optimization~\cite{feather08,fea02a}. 
%For example, after a Monte Carlo run of 1000 randomly selected options,
%rule learning can be used to identify which options were most associated with lower cost and higher attainment of
%requirements~\cite{fea02a}. The constraints offered by these rules can then be imposed on the DDP models, followed by another
%round of
%Monte Carlo simulation and rule learning.
%For a DDP model with 100 variables,
%it takes $\approx 300$ seconds before the rig terminates (i.e. no improvement is observed
%in round $i+1$ after round $i$).
%
%The premise of this paper is that our previous solutions to the  DDP optimization problem are  too slow.
%Ideally, we wish for requirements optimization to occur in ``real-time''; i.e. 
%to offer advice before an expert's attention wanders to other issues.
%Neilson~\cite[chp5]{neilson93} reports that ``the basic advice regarding response times has 
%been about the same for thirty years; i.e.
%\bi
%\item
%1.0 second is about the limit for the user's flow of thought to stay uninterrupted, even though the user will notice the delay
%\item
%10 seconds is the limit for keeping a user's attention focused on the dialogue.'' 
%\ei
%
%The rest of this paper describes our progress towards achieving these runtimes:
%\bi
%\item
%After a discussion of early life cycle ultra-lightweight modeling  (such as DDP),
%we present various search-based software engineering techniques to optimize DDP models.
%\item
%Our new KEYS2 algorithm will be discussed as well as various experiments that were conducted
%to  compare these techniques. 
%\ei
%In summary, KEYS2 is running in the order of $10^{-3}$ seconds, which
%makes it a candidate solution for real-time requirements optimization
%over $2^W$ scenarios of $W \le 10$.  However,
%further optimizations are desirable and our future work section
%offers some promising directions.
%
%%Accordingly, we are concerned with finding {\em faster} techniques that meet or exceed the solution requirements for these models. 
%%Nielson found that users are most productive when response time is less than one second and 
%%that they will stop paying attention if those response times are longer than ten seconds ~\cite{nielson97}.
%
%
%
%%We seek a tool that will fall in that ideal response-time window.  
%%Our theory is that by exploiting certain {\em key} variables, we may surpass the computational barrier. 
%%Setting these key variables reduces the option space in larger models to a more manageable size. 
%%Several of the techniques that we explored, including KEYS and ASTAR, represent massive improvements in the time taken to find solutions. 
%
%convergence before comparison
%XXX make the pseudo code line up with e bullet points
\section{Ultra-lightweight Requirements Models}

In early life cycle requirements engineering,
software engineers often pepper their discussions with
numerous hastily-drawn
sketches.
Kremer argues convincingly that such visual forms
have numerous advantages for acquiring knowledge
\cite{kremer98}.
Other researchers take a similar view: ``Pictures are superior to texts in a sense that they are abstract,
     instantly comprehensible, and universal.''\cite{hirakawa94}.

Normally, hastily-drawn sketches
are viewed as precursors to some other, more precise
modeling techniques which necessitate further analysis.  Such
further
formalization may not always be practical.
The time and expertise required to build formal models
does not accommodate the pace with which humans tend to revise
their ideas.

Also, such further
formalization may not be advisable.
Goel~\cite{goel92} studied the effects of
diagram formalism on system
design.  In an {\em ill-structured diagram} (such as a freehand sketch using
pencil and paper), the denotation and type of each visual element is
ambiguous.  In a {\em well-structured diagram} (such as those created using MacDraw),
each visual element clearly denotes one thing of one class only.  Goel
found that ill-structured tools generated more design
variants --- more drawings, more ideas, more re-use of old ideas --- than
well-structured
tools. 

The value of ultra-lightweight ontologies
in early life cycle modeling
is widely recognized, as witnessed by the numerous frameworks that support them.
For example:
\bi
\item
Mylopoulos' {\em soft-goal} graphs~\cite{mylo92,mylo99}.
represent
knowledge about non-functional requirements.
Primitives in soft goal modeling include statements of partial
influence such as {\em helps} and {\em hurts}.
\item  A common framework in the design rationale
community is a  ``questions-options-criteria'' (QOC)
graph~\cite{shum94}. In  QOC
graphs:
\bi
\item {\em questions} suggest {\em options}.
 Deciding on  one option can
raise other  questions;   
\item Options   shown  in  a  box denote  {\em selected
options};  
\item
Options are assessed  by  {\em criteria}; 
\item Criteria are gradual
knowledge;  i.e. they {\em  tend to support}  or {\em  tend to reject}
options.  
\ei
QOCs   can succinctly  summarize   lengthy debates; e.g. the 480
sentences  uttered in  a  debate on  interface
options  can be    displayed    in  a   QOC   graph  on    a    single
page~\cite{maclean96}.
Saaty's Analytic Hierarchy Process (AHP)~\cite{saaty80} is a variant of QOC.
\item Another widely used framework is Quality
Functional Deployment diagrams (QFD)~\cite{akao90}
Where as AHP and QOC  propagate influences over hierarchies,  QDF (and DDP) propagate  influences over matrices.
\ei

Based on the above, Cornford and Feather designed the 
``Defect Detection and Prevention'' (DDP) tool ~\cite{corn01,feather08}. DDP
provides a succinct ontology for representing this design process.
These models allow for the representation of the goals, risks, and
risk-removing mitigations that belong to a specific project.  
Users of DDP explore combinations of mitigations that cost the least
and support the most number of requirements.  \begin{figure}[!t]
~\hrule~\\

DDP assertions are either:

\bi
\item \textit{Requirements} (free text) describing the objectives
and constraints of the mission and its development process; 
\item \textit{Weights} (numbers) associated with requirements, reflecting
their relative importance; 
\item \textit{Risks} (free text) are
events that damage requirements; 
\item \textit{Mitigations}: (free
text) are actions that reduce risks; 
\item \textit{Costs}: (numbers) effort associated with mitigations, and repair costs for
correcting Risks detected by Mitigations; 
\item \textit{Mappings}: directed edges between
requirements, mitigations, and risks that capture quantitative relationships among them. 
\item \textit{Part-of relations}
structure the collections of requirements, risks and mitigations; 
\ei

~\hrule~
\caption{DDP's ontology}
\label{ddpont}
\end{figure}
\begin{figure}
{\scriptsize
\begin{tabular}{p{\linewidth}}\hline
\be
\item Requirement goals:
\bi
\item
Spacecraft ground-based testing \& flight problem monitoring
\item
 Spacecraft experiments with on-board Intelligent Systems Health Management (ISHM)
\ei
\item Risks:
	\bi
	\item Obstacles to spacecraft ground-based testing \& flight problem monitoring
			\bi
			\item
			Customer has no, or insufficient, money available for our use
			\item
			Difficulty of building the models / design tools
			\ei
	\item ISHM Experiment is a failure (without necessarily causing flight failure)
	\item Usability, User/Recipient-system interfaces undefined
	\item V\&V (certification path) untried and scope unknown
	\item Obstacles to Spacecraft experiments with on-board ISHM
	\bi
	\item Bug tracking / fixes / configuration management issues, Managing revisions and upgrades (multi-center tech. development issue)
	\item Concern about our technology interfering with in-flight mission
	\ei
	\ei
\item Mitigations:
	\bi
	\item  Mission-specific actions
		\bi
		\item Spacecraft ground-based testing \& flight problem monitoring
		\item  Become a team member on the operations team
		\item Use Bugzilla and CVS
		\ei
	\item  Spacecraft experiments with on-board ISHM
		\bi
		\item Become a team member on the operations team
		\item Utilize xyz's experience and guidance with certification of his technology
		\ei
	\ei
\ee\\\hline
\end{tabular}}
\caption{Sample DDP requirements, risks, mitigations.}\label{fig:ddpegeg}
\end{figure}
%DDP is based on the following principles:
%\bi
%\item {\em Earlier is Better:} The case for ``earlier is better'' is well established. Fark et al. ~\cite{fa07} noted that ``80\% of all design decisions are taken in the first 20\% of the design time.'' Design tools that do not operate early in the life cycle can only affect a small portion of the remaining decisions.
%Not only is earlier design critical, but rapid improvement of those designs is also critical. 
%Fixing defect costs exponentially more the longer those defects remain in the system. Boehm and Papaccioa advise that reworking software
%is cheaper earlier in the life cycle ``by factors of 50 to 200.'' ~\cite{boehm88}
%\item {\em Use Risk-driven SE:} In risk-driven SE ~\cite{boehm86}, there are three primary modeling constructs. The engineering {\em requirements} are mapped to a series of {\em risks},  a representation of any of the 
%factors that could prevent a goal from being reached. 
%{\em Mitigations} are measures that may be implemented in order to alleviate, or at least reduce, risks. 
%Hence, our DDP tool is a visual editor of a model with node types reflecting these three constructs. During interactive sessions, DDP projects its model on a screen where teams and designers review and extend the network.
%This information can influence the entire design process and may lead to a more efficient development cycle.
%\item {\em Use Value-based SE:} DDP is also a {\em value-based SE} ~\cite{bi05} tool where the {\em benefits} of a design variant are weighed against their {\em cost}. Each mitigation in DDP has an associated cost so that DDP can 
%guide and optimize decision making. For example, on a fixed budget, DDP can find the cheapest mitigations that retire the most risks and hence
%allow for the attainment of the most requirements. 
%DDP can repeat this process for different cost levels to generate a cost-benefit trade-space diagram.  
%\ei 
DDP is used as follows.
A dozen experts, or more,
gather together for short, intensive knowledge
acquisition sessions (typically, 3 to 4 half-day sessions). These
sessions \textit{must} be short since it is hard to gather together
these experts for more than a very short period of time. 
The DDP tool supports a graphical interface for the rapid entry of the
assertions. Such rapid entry is essential, time is crucial and no tool should slow the debate. 
Assertions from the experts are expressed using the
 ontology of  Figure~\ref{ddpont}.
The ontology must be  lightweight since
only high-level assertions can be collected in short knowledge acquisition
sessions. 
If the assertions get more elaborate, then experts
may be unable to understand technical arguments from outside their
own field of expertise. 
These sessions generate a set of requirements/ risks/ and candidate mitigations (e.g. see \fig{ddpegeg})
linked together into an  intricate network (e.g. 
see
\fig{DDPexample}).


\begin{figure}
\begin{center}
\includegraphics[angle=90, height=6in,width=5in]{fig2new.png}
\end{center}
\caption{An example of a model formed by the DDP tool. Red lines connect 
risks (middle) to requirements (left). Green lines connect mitigations (right)
to the risks.} 
\label{fig:DDPexample}
\end{figure}

Sometimes, we are asked if requirements optimization is a real
problem.  The usual question is something like: ``With these
ultra-lightweight languages, aren't all open issues just obvious?''.
Such a question is usually informed by the small model fragments
that appear in the ultra-lightweight modeling literature. Those
sample model fragments are typically selected according to their
ability to fit on a page or to succinctly illustrate some point of
the authors.  Real world ultra-lightweight models can be much more
complex, paradoxically perhaps due to their simplicity: if a model
is easy to write then it is easy to write a lot of it.  \fig{DDPexample},
for example, was generated in under a week by four people discussing
one project.  It is complex and densely-connected (a close inspection
of the left and right hand sides of \fig{DDPexample} reveals the
requirements and fault trees that inter-connect concepts in this
model) and it is, by no means, the biggest or most complex DDP model
that has ever been built. 
Given the size of DDP
models, and the rich set of connections between concepts, requirements optimization defeats
manual analysis. Our task is to provide automatic (and rapid) support for this 
problem of {\em least cost, max effect mitigation selection}.


We base our experimentation around DDP for three reasons.
Firstly,
one potential drawback with ultra-lightweight models is that they are excessively lightweight and contain no useful
information. DDP's models are demonstrably useful (that is, we are optimizing a real-world problem of some value).
Clear project improvements have been seen from DDP sessions at JPL. Cost savings in at least two sessions have exceeded \$1 million, while savings of over \$100,000 have resulted in others. Cost savings are not the only benefits of these DDP sessions. 
Numerous design improvements such as savings of power or mass have come out of DDP sessions.
Likewise, a shifting of risks has been seen from uncertain architectural ones to more predictable and manageable ones. 
At some of these meetings, non-obvious significant risks have been identified and subsequently mitigated. 

Our second reason to use DDP is that we
can access numerous large real-world requirements models
written in this format, both now and in the future.
The DDP tool can be used to document not just final decisions but also to review the rationale that lead to those decisions.
Hence, DDP remains in use at JPL:
not only for its original purpose (group decision support),
but also as a design rationale tool to document decisions. 
Recent DDP sessions included:
\bi
\item An identification of the challenges of intelligent systems health management (ISHM) technology maturation
(to  determine the most cost-effective approach to achieving maturation)~\cite{feather08b};
\item A study on the selection and planning of deployment of prototype software~\cite{feather08c}.
\ei

Our third, and most important reason to use DDP in our research is that the
tool is representative of  other requirements modeling tools in widespread use.
At its core, DDP is a set of influences expressed in a hierarchy, augmented with the occasional equation. Edges
in the hierarchy have weights that strengthen or weaken influences that flow along those edges. At this level of abstraction, 
DDP
is just another form QOC or QFD, or a quantitative variant of Mylopoulos's qualitative soft goal graphs.
In future work, we will extend our current implementation to those systems.

\section{Knowledge Compilation}
\begin{figure}
~\hrule~
{\scriptsize
\begin{verbatim}
#include "model.h"

#define M_COUNT 2
#define O_COUNT 3
#define R_COUNT 2

struct ddpStruct
{
    float oWeight[O_COUNT+1];
    float oAttainment[O_COUNT+1];
    float oAtRiskProp[O_COUNT+1];
        float rAPL[R_COUNT+1];
    float rLikelihood[R_COUNT+1];
    float mCost[M_COUNT+1];
    float roImpact[R_COUNT+1][O_COUNT+1];
    float mrEffect[M_COUNT+1][R_COUNT+1];
};

ddpStruct *ddpData;

void setupModel(void)
{
    ddpData = (ddpStruct *) malloc(sizeof(ddpStruct));
    ddpData->mCost[1]=11;
    ddpData->mCost[2]=22;
    ddpData->rAPL[1]=1;
    ddpData->rAPL[2]=1;
    ddpData->oWeight[1]=1;
    ddpData->oWeight[2]=2;
    ddpData->oWeight[3]=3;
    ddpData->roImpact[1][1] = 0.1;
    ddpData->roImpact[1][2] = 0.3;
    ddpData->roImpact[2][1] = 0.2;
    ddpData->mrEffect[1][1] = 0.9;
    ddpData->mrEffect[1][2] = 0.3;
    ddpData->mrEffect[2][1] = 0.4;
}

void model(float *cost, float *att, float m[])
{
    float costTotal, attTotal;
    ddpData->rLikelihood[1] = ddpData->rAPL[1] * (1 - m[1] * ddpData->mrEffect[1][1]) 
        			* (1 - m[2] * ddpData->mrEffect[2][1]);
    ddpData->rLikelihood[2] = ddpData->rAPL[2] * (1 - m[1] * ddpData->mrEffect[1][2]);
    ddpData->oAtRiskProp[1] = (ddpData->rLikelihood[1] * ddpData->roImpact[1][1]) 
        			+ (ddpData->rLikelihood[2] * ddpData->roImpact[2][1]);
    ddpData->oAtRiskProp[2] = (ddpData->rLikelihood[1] * ddpData->roImpact[1][2]);
    ddpData->oAtRiskProp[3] = 0;
    ddpData->oAttainment[1] = ddpData->oWeight[1] * (1 - minValue(1, ddpData->oAtRiskProp[1]));
    ddpData->oAttainment[2] = ddpData->oWeight[2] * (1 - minValue(1, ddpData->oAtRiskProp[2]));
    ddpData->oAttainment[3] = ddpData->oWeight[3] * (1 - minValue(1, ddpData->oAtRiskProp[3]));
    attTotal = ddpData->oAttainment[1] + ddpData->oAttainment[2] + ddpData->oAttainment[3];
    costTotal = m[1] * ddpData->mCost[1] + m[2] * ddpData->mCost[2];

    *cost = costTotal;
    *att = attTotal; 
}
\end{verbatim}}
~\hrule~
\caption{A trivial DDP model after knowledge compilation}
\label{fig:ddpExample}
\end{figure}
A simple {\em knowledge compiler} exports the DDP models into a form accessible by our search engines.
Knowledge compilation is a technique whereby certain information is compiled into a target language.
These compiled models are used to rapidly answer a large number of
queries while the main program is running~\cite{selman96knowledge,darwiche02}.  

Knowledge compilation pushes some  percentage
of the computational overhead into the compilation phase. This cost
is amortized over time as we increase the number of on-line queries.
This is a very useful feature in our application since
some of the algorithms that we use make thousands of calls to these
DDP models. 
Previous work with knowledge compilation has focused
on compilation languages utilizing CNF equations, state machines,
or BDD~\cite{darwiche02,biere99:bmc}.  For this work, a procedural
target language was sufficient. Hence, the DDP models were compiled
into a structure used by the C programming language.  

%XXX Converting those models to the pre-compiled C structures provided a 130x increase in speed.

Our knowledge compilation process stores a flattened form of the DDP requirements tree. In standard DDP:
\bi
\item 
Requirements form a tree;
\item
The relative influence of each leaf requirement is computed via
a depth-first search from the root down to the leaves.  
\item
This computation is repeated each time the relative influence of a requirement is required.
\ei
In our compiled form, the computation is performed once 
and added as a constant to each reference of the requirement.

For example, here is a trivial DDP model
where  {\tt mitigation1} costs \$10,000 to apply and each requirement is of equal
value (100): 
\[\overbrace{mitigation1}^{\$10,000} \underbrace{\rightarrow}_{0.9} risk1 \rightarrow
\left< \begin{array}{c} \overbrace{\rightarrow}^{0.1} (requirement1=100) \\ 
\underbrace{\rightarrow}_{0.99} (requirement2=100) \end{array} \right.\]
(The other numbers show the impact of mitigations on risks, and the impact of risks
on requirements).

The knowledge compiler converts this trivial DDP model into {\tt setupModel} and {\tt model} functions 
similar to those in \fig{ddpExample}.
The {\tt setupModel} function is called only once and sets several constant values. 
The {\tt model} function is called whenever new cost and attainment values are needed.
The topology of the mitigation network is represented as terms in equations within these functions. 
As our models grow more complex, so do these equations. For example,
our biggest model, which contains 99 mitigations, generates 1427 lines of code.
\fig{ddpModels} compares this biggest model to four other real-world DDP models.

\begin{figure}[!t]
\small
\begin{center}
\begin{tabular}{|c|r|r|r|c|c|} \hline
Model&LOC&Objectives&Risks&Mitigations\\ \hline
model1.c&  55  &3         &2    &2          \\ \hline
model2.c& 272   &1         &30   &31         \\ \hline
model3.c& 72   &3         &2    &3         \\ \hline
model4.c& 1241   &50        &31   &58         \\ \hline
model5.c&1427    &32        &70   &99         \\ \hline
\end{tabular}
\end{center}
\caption{Details of Five DDP Models. }
\label{fig:ddpModels}
\end{figure}

While not a linear process, knowledge compilation is not the  bottleneck
in our ability to handle real-time requirements optimization. Currently
it takes
about two seconds to compile a model with 50 requirements, 31 risks, and 58 mitigations. 
Note that:
\bi
\item This compilation only has to happen once, after which we will run our $2^W$ what-if scenarios.
\item
These runtimes come from an unoptimized Visual Basic implementation which we can certainly
significantly speed up.
\item
Usually, experts change a small portion of the model then run $2^W$ what-if scenarios to
understand the impact of that change. An incremental knowledge compiler
(that only updates changed portions) would run much faster than a full compilation of an
entire DDP model.
\ei

Initially, it was hoped that knowledge compilation by itself would suffice for real-time requirements
optimization.
However, on experimentation, we found that it reduced our runtimes by only one to two orders of magnitude.
While an impressive speed up, this is still two orders of magnitude too slow to
to achieve the $10^{-2}$
seconds response time required for real-time requirements optimization of $2^W$ scenarios for ($W\le 10$)
what-if queries. Hence, in this paper, we combined knowledge compilation with AI search methods.


Note that knowledge compilation is useful for more than just runtime optimization.
Without it, the collaborative research that led to this paper would
have been impossible.  Knowledge compilation allows JPL to retain
their proprietary information while at the same time, allowing
outsiders to access these models.   For our experiments, JPL
ran the knowledge compiler and passed
models like those shown in \fig{ddpExample} to West Virginia University.
These models have been
anonymized to remove proprietary information while still maintaining
their computational nature.  In this way, JPL could assure its clients that any
project secrets would be safe while allowing outside researchers
to perform valuable experiments.

\section{Scoring Outputs}

When the {\tt model} function is called, a pairing of the total cost of the selected mitigations and the number of reachable requirements (attainment) is returned. 
All of our algorithms then use that information to obtain a ``score'' for the current set of mitigations. 
The two numbers are normalized to a single score that represents the distance to a {\em sweet spot} of maximum requirement attainment and minimum cost: 
\begin{equation}\label{eq:score}
score = \sqrt{\overline{cost}^2 + (\overline{attainment}-1)^2}
\end{equation}

Here, $\overline{x}$ is a normalized value 
$0 \le \frac{x - min(x)}{max(x) - min(x)} \le 1$. 
Hence, our scores ranges $0 \le score \le 1$ and
{\em higher} scores are {\em better}.


The search methods presented below can be categorized many ways. However, for our purposes, the following
will be insightful:
\bi
\item
One group proposes settings to all variables, at each step of their search. 
For such {\em full} solutions, scoring is a simple matter: just call the DDP model (e.g. \fig{ddpExample}) then execute \eq{score}
on the returns {\em cost} and {\em attainment} values.
\item
Another group of search methods proposes settings 
to some subset of the variables. For such {\em partial solutions}, the scoring procedure must be modified. Given {\em fixed}
settings to some of the variables, and {\em free} settings to the rest, call the DDP model and \eq{score} $N$ times (each time with
randomly selected settings to the free variables) and report the median of the generated scores.
\ei
In the following work, the following observation will be important. Scoring  {\em partial} 
solutions requires $N$ calls to the models while
scoring {\em full} solutions requires only one call.


\section{Keys-Based Search}
Our proposed solution to real-time requirements optimization assumes
that the behavior of a large system is
determined by a very small number of {\em key} variables. 
For a system with collars, the state space within the model {\em keys} to a small fraction of
the possible reachable states.

The following small example illustrates key variables.
Within a model, there are chains of {\em reasons} that link inputs to the desired goals.
As one might imagine, some of the links in the chain clash with others. 
Some of those clashes are most upstream; they are not dependent on other clashes. 
In the following chains of reasoning 
the clashes are $\{e, \neg e\}$, 
$\{g, \neg g\}$ 
\& $\{j,\neg j\}$;
the most upstream clashes 
are
$\{e \neg e\}$, 
\& $\{g \neg g\}$, 
\[\begin{array}{r@{}l@{}l}
&a \longrightarrow b \longrightarrow c \longrightarrow d \longrightarrow e\\
input_1 \longrightarrow  &f \longrightarrow g \longrightarrow h \longrightarrow i \longrightarrow j & \longrightarrow goal\\
input_2 \longrightarrow  &k \rightarrow \neg g \longrightarrow l \longrightarrow m \rightarrow \neg j & \longrightarrow goal\\
&n \longrightarrow o \longrightarrow p \longrightarrow q \longrightarrow
\neg e
\end{array}
\]
In order to optimize decision making about this model, we must first decide about these {\em upstream clashing reasons}.
We refer to these decisions as {\em the collars} as they have the most impact on the rest of the model.

Returning to the above reasoning chains,
any of $\{a, b, ..q\}$ are subject to discussion.
However, most of this model is completely irrelevant to the task of
$input_i\vdash~goal$. For example, 
the $\{e, \neg e\}$ clash is
unimportant to the decision making process
as no reason uses $e$ or $\neg e$. 
In the context of reaching $goal$ from 
$input_i$, the only important discussions are
the clashes $\{g,\neg g,j,\neg j\}$.
Further, since $\{j, \neg
j\}$ are dependent on $\{g,\neg g\}$, then the core decision
must be about variable $g$ with two disputed values: true and
false. 

We call 
$g$ 
the {\em collar} since it restricts the entire model.
The collar may be internal to a model and may not be directly controllable. 
We refer to the controllable variables that can influence the collar as the   $keys$
In the previous example, those keys are $input_i$.

Using the keys to set the $collars$ reduces the number of reachable states within the model. 
Formally, the reachable states reduce to
the cross-product of all of the ranges of the collars.
We call this the $clumping$ effect. Only a small fraction of the possible states are actually reachable.
The effects of clumping can be quite dramatic.
Without knowledge of these chains and the collar, the above model has $2^{20} > 1,000,000$ possible consistent states.
However, in the context of 
$input_i\vdash~goal$,
those 1,000,000 states clumps to just the following two states: 
$\{input_1,f,g,h,i,j,goal\}$ or
$\{input_2,k,\neg g,l,m,\neg j,goal\}$.


Our reading
of the literature is that $keys,collars$  have been discovered and rediscovered
many times.  Elsewhere ~\cite{me03l}, we have documented
dozens of papers that have reported this effect
under different names including {\em
narrows, master-variables, back doors,} and {\em feature subset
selection}:
\bi
\item
Amarel ~\cite{Amarel86} observed that search problems contain narrow sets of variables or collars that must be used in any solution.
In such a search space, what matters is not so much how you get to these collars, but what decision you make when you get there.
Amarel defined macros encoding paths between {\em narrows}, effectively permitting a search engine to jump between them.
\item
In a similar theoretical analysis,
Menzies \& Singh~\cite{me03l} 
computed the odds of a system selecting solutions
to goals using complex, or simpler, sets of preconditions.
In their
simulations, they found that a system will naturally select for tiny
sets of preconditions (a.k.a. the keys) at a very high probability.
\item
Numerous researchers have examined {\em feature subset selection};
i.e. what happens when a data miner deliberately ignores some of
the variables in the training data. For example, Kohavi and
John~\cite{kohavi97} showed in numerous datasets that as few as
20\% of the variables are $key$ - the remaining 80\% of variables
can be ignored without degrading a learner's classification accuracy.
\item
Williams et.al.~\cite{williams03} discuss how
to use keys  (which they call ``back doors'') to optimize search.
Constraining these back doors also constrains the rest of the program. 
So, to quickly search a program, they suggest imposing some set values on the key variables.
They showed that setting the keys
can reduce
the solution time of certain hard problems from exponential to polytime, provided that the
keys can be cheaply located, an issue on which Williams et.al. are curiously silent.
\item
Crawford and Baker ~\cite{crawford94} compared the performance of
a complete TABLEAU prover to a very simple randomized search engine
called ISAMP.  Both algorithms assign a value to one variable, then
infer some consequence of that assignment with forward checking.
If contradictions are detected, TABLEAU backtracks while ISAMP
simply starts over and re-assigns other variables randomly. Incredibly,
ISAMP took {\em less} time than TABLEAU to find {\em more} solutions
using just a small number of tries. Crawford and Baker hypothesized
that a small set of {\em master variables} set the rest and that
solutions are not uniformly distributed throughout the search space.
TABLEAU's depth-first search sometimes drove the algorithm into
regions containing no solutions.  On the other hand, ISAMP's
randomized sampling effectively searches in a smaller space.
\ei
In summary, the core assumption of our algorithms are supported in many domains.
If a model contains keys,  then a general search through a large space of
options is superfluous. 
A better (faster, simpler) approach
would be to just explore the keys.
KEYS uses support-based Bayesian sampling to quickly find these important variables. 


\subsection{The KEYS Algorithm, Version 1}

There are two main components to KEYS - a greedy search and the BORE ranking heuristic.

The greedy
search explores a space of $M$ 
mitigations over the course of 
$M$ ``eras''. 
Initially, the entire set of mitigations is set randomly. 
During each era, one more
mitigation is set to $M_i=X_j$, $X_j \in \{true,false\}$. 
In the original version of KEYS~\cite{promise08}, the greedy search fixes one variable per era.
This paper experimented with a newer version, called KEYS2, that fixes an increasing number of variables as the search progresses
(see below for details).

For both KEYS and KEYS2, each era $e$ generates
a set $<input,score>$ as follows:
\be
\item[1:]
$MaxTries$ times repeat:
\bi
\item
$Selected[1{\ldots}(e-1)]$ are settings from previous eras.
\item
	$Guessed$ are randomly selected values for unfixed mitigations.  
\item
	$Input = selected \cup guessed$.
\item
Call {\tt model} to compute $score=ddp(input)$;
\ei
\item[2:] The $MaxTries$ $score$s are divided into $\beta$\% ``best'' while
the remainder are sent to  ``rest''. 
\item[3:]
The mitigation values in the $input$ sets are then scored using BORE
(described below).
\item[4:]
The top ranked mitigations (the default is one, but the user may fix multiple mitigations at once) are fixed and stored in $selected[e]$.\ee
The search moves to era $e+1$ and repeats steps 1,2,3,4.
This process stops 
when every mitigation has a setting.  
The exact settings for $MaxTries$ and $\beta$ must be set via engineering judgment.
After some experimentation, we used $MaxTries=100$ and $\beta=10$.
For full details, see \fig{KEYS}. 
\begin{figure}
~\hrule~
{\scriptsize
\begin{verbatim}
1. Procedure KEYS
2. while FIXED_MITIGATIONS != TOTAL_MITIGATIONS
3.    for I:=1 to 100
4.       SELECTED[1...(I-1)] = best decisions up to this step
5.       GUESSED = random settings to the remaining mitigations
6.       INPUT = SELECTED + GUESSED
7.       SCORES= SCORE(INPUT)
8.    end for   
9.    for J:=1 to NUM_MITIGATIONS_TO_SET 
10.      TOP_MITIGATION = BORE(SCORES)       
11.      SELECTED[FIXED_MITIGATIONS++] = TOP_MITIGATION 
12.   end for  
13. end while          
14. return SELECTED \end{verbatim} }
~\hrule~
\caption{Pseudocode for KEYS}
\label{fig:KEYS}
\end{figure}

KEYS ranks mitigations by combining a novel support-based
Bayesian ranking measure.
BORE ~\cite{clark05} (short for ``best or rest'')
divides numeric scores seen over $K$ runs and stores the top 10\%  in $best$ and the remaining 90\% scores in the set $rest$
(the $best$ set is computed by studying the delta
of each score 
to the best score seen in any era).
It then computes the probability that a value is found in $best$
using
Bayes theorem.
The theorem uses
evidence $E$ and a prior probability $P(H)$ for
hypothesis $H\in\{best, rest\}$,
to calculate a  posteriori  probability
$P(H|E)=P(E|H) P(H)\; /\; P(E)$.
When applying the theorem, {\em likelihoods}
are computed from observed
frequencies. These likelihoods (called "like" below) are then normalized to create
probabilities. This normalization cancels out $P(E)$ in Bayes theorem.
%footnote{
%Such simple  Bayes classifiers are often called ``n\aive'' since they assume
%independence of each variable. 
%Domingos and Pazzini have shown
%that the independence assumption is a problem in a vanishingly small percent
%of cases~\cite{domingos97optimality}. This explains the repeated
%empirical result that seemingly n{\aive} Bayes classifiers perform as well
%as other more sophisticated schemes; e.g. see Table~1 in~\cite{domingos97optimality}.}.
For example, after $K=10,000$ runs are divided into  1,000 {\em best} solutions and 9,000
{\em rest}, the  value \mbox{$mitigation31=false$} might appear 10 times in the $best$ solutions, but only
5 times in the $rest$. Hence:\newline
\begin{minipage}{1.0\linewidth}
{\scriptsize
\begin{eqnarray}\nonumber
E&=& (mitigation31=false)\\\nonumber
P(best)&=&1000/10000 = 0.1\\\nonumber
P(rest)&=&9000/10000 = 0.9\\\nonumber
freq(E|best)&=&10/1000 = 0.01\\\nonumber
freq(E|rest)&=&5/9000 = 0.00056\\\nonumber
like(best|E)&=&freq(E|best) \cdot P(best)=0.001\\\nonumber
like(rest|E)&=&freq(E|rest) \cdot P(rest)= 0.000504\\\label{eq:ten}
P(best|E)&=& \frac{like(best|E)}{like(best|E) + like(rest|E)}=0.66
\end{eqnarray}}
\end{minipage}\newline
Previously~\cite{clark05},
we have found that  Bayes theorem
is a poor ranking heuristic
since it is easily distracted by low frequency evidence.
For example, note how
the probability of $E$ belonging to the best class
is moderately high even though its support is very low; i.e. $P(best|E)=0.66$ but
$freq(E|best) = 0.01$. 

To avoid the problem of unreliable low frequency evidence, we augment \eq{ten}
with a support term.
Support should $increase$ as the frequency of a value $increases$,  i.e. 
$like(best|E)$ is a valid support measure.
 Hence, step 3 of our greedy search ranks values  via
\begin{equation}\label{eq:ps3}\small
P(best|E) * support(best|E) = \frac{like(best|E)^2}{like(best|E) + like(rest|E)}
\end{equation}
\subsection{KEYS2: a Second Version of Keys}

For each era, KEYS samples the DDP models and fixes the top $N=1$ 
settings.
The following observation suggested that  $N=1$  is, perhaps, an  overly  conservative search policy.

At least for the DDP models, we have observed that the improvement in costs and attainments generally increase
for each era of KEYS.
This lead to the speculation that we could jump further and faster into the solution space by 
fixing $N\ge 1$ settings  per era.
Such a {\em jump} policy 
can be implemented as a
small change to KEYS:
\bi
\item
Standard KEYS
assigns the value of one to  {\tt NUM\_MITIGATIONS\_TO\_SET} (see the pseudo-code of \fig{KEYS});
\item
Other variants of KEYS assigns larger values.
\ei
We experimented with several variants before settling on the following:
in
this variant, era $i$ sets $i$ settings. We call this variant ``KEYS2''.
Note that, in era 1, KEYS2 behaves exactly the same as KEYS
while
in (say) era 3, KEYS2 will
fix the top 3 ranked ranges. 
Since it sets more variables at each era,
KEYS2 terminates earlier than KEYS.

\section{Other Algorithms}
In order to assess KEYS and KEYS2, we benchmarked those algorithms against alternate algorithms.
We selected the comparison algorithms with much care. Numerous researchers stress the difficulties associated
with comparing radically different algorithms.
For example, Uribe and Stickel~\cite{uribe94}
struggled valiantly to make some general statement about the value
of Davis-Putnam proof (DP) procedures or binary-decision diagrams
(BDD) for constraint satisfaction problems. In the end, they
doubted that it was fair to compare
algorithms that perform constraint satisfaction and no search (like BDDs) and
methods that perform search and no constraint satisfaction (like DP). 
For this reason, researchers in model checking researchers like Holzmann (pers. communication)
eschew companions of tools like SPIN~\cite{holzmann97}, which are search-based,  with tools like NuSMV~\cite{nusmv}, which are BDD-based.
Hence we take care to compare algorithms similar to KEYS. 

In terms of the Gu et al. survey~\cite{gu97}, our selected
algorithms 
(simulated annealing, a-star and MaxWalkSat) share four properties with KEYS and KEYS2.
They are each discrete, sequential, {\em unconstrained} algorithms
(constrained algorithms
work towards a  pre-determine the number of possible solutions while unconstrained methods are allowed to adjust the goal space.)

Also, the selected algorithms have other properties that make them informative:
\bi
\item Simulated annealing is
the de facto standard in search-based software engineering, perhaps because it is so simple to implement.
By comparing KEYS with SA, we can see how well our new method improves over alternatives in widespread use.
\item SA is very old (first defined in the 1950s) while algorithms like MaxWalkSat are widely regarded as 
current state-of-the-art. 
\item 
ASTAR and KEYS2 offer full and partial solutions (respectively) over all or some of the model input variables.
As we shall see, KEYS2's solutions are better than ASTAR.
This lets us comment on what is lost by KEYS2's partial solutions (as we shall see,
there is no lose associated with using KEYS2's partial solutions).
\ei
\subsection{SimpleSA: Simulated Annealing}

Simulated Annealing is a classic stochastic search algorithm. It was first described in 1953~\cite{metropolis53} and refined in 1983 ~\cite{kirkpatrick83}. 
The algorithm's namesake, annealing, is a technique from metallurgy, where a material is heated,
then cooled. The heat causes the atoms in the material to 
wander randomly through different energy states and the cooling process increases the chances of finding a state with a lower energy than the starting position.

\begin{figure}
~\hrule~
{\scriptsize
\begin{verbatim}
1. Procedure SimpleSA
2. MITIGATIONS:= set of mitigations
3. SCORE:= score of MITIGATIONS
4. while TIME < MAX_TIME && SCORE < MIN_SCORE //minScore is a constant score (threshold)
5.    find a NEIGHBOR close to MITIGATIONS
6.    NEIGHBOR_SCORE:= score of NEIGHBOR
7.    if NEIGHBOR_SCORE > SCORE
8.        MITIGATIONS:= NEIGHBOR
9.        SCORE:= NEIGHBOR_SCORE
10.   else if prob(SCORE, NEIGHBOR_SCORE, TIME, temp(TIME, MAX_TIME)) > RANDOM)
11.       MITIGATIONS:= NEIGHBOR
12.       SCORE:= NEIGHBOR_SCORE
13.   TIME++
14. end while
15. return MITIGATIONS
\end{verbatim}
}
~\hrule~
\caption{Pseudocode for SimpleSA}
\label{fig:simplesa}
\end{figure}

For each round, SimpleSA ``picks'' a neighboring set of mitigations. To calculate this neighbor, a function traverses the mitigation settings of the current state and randomly flips those mitigations (at a 5\% chance). 
If the neighbor has a better score, SimpleSA will move to it and set it as the current state. If it isn't better, the algorithm will decide whether to move to it based on the mathematical function:
\begin{equation}\label{eq:ps1}\small
prob(w,x,y,temp(y,z)) = e^{((w-x) * \frac{y}{temp(y,z)})}
\end{equation}
\begin{equation}\label{eq:ps}\small
temp(y,z) = \frac{(z-y)}{z}
\end{equation}
If the value of the {\tt prob} function is greater than a randomly generated number, SimpleSA will move to that state anyways. 
This randomness is the very cornerstone of the Simulated Annealing algorithm. Initially, the ``atoms'' (current solutions) 
will take large random jumps, sometimes to even sub-optimal new solutions. These random jumps allow simulated
annealing to sample a large part of the search space, while avoiding being trapped in local minima.
Eventually,
the ``atoms'' will cool and stabilize and the search will converge to a simple hill climber.

As shown in line 4 of \fig{simplesa}, the algorithm will continue to operate until the number of tries is exhausted or a score meets the threshold requirement. 

\subsection{MaxFunWalk}
The design of simulated annealing dates back to the 1950s. In
order to benchmark our new search engine (KEYS) against a more recent state-of-the-art
algorithm, we implemented the variant of 
MaxWalkSat described below.

WalkSat is a local search method designed to address the problem
of boolean satisfiability~\cite{kautz96}. MaxWalkSat is
a variant of WalkSat that applies weights to each clause in a
conjunctive normal form equation~\cite{selman96}. 
While WalkSat tries to satisfy the entire set of clauses, MaxWalkSat
tries to maximize the sum of the weights of the satisfied
clauses. 

In one respect, both algorithms can be viewed
as a variant of simulated annealing. Whereas simulated annealing always selects
the next solution randomly, the WalkSat algorithms will {\em sometimes}
perform random selection while, other times, conduct a local search to find the
next best setting to one variable.
 
We have adapted MaxWalkSat to the DDP problem as follows.
Our MaxFunWalk algorithm is a variant of MaxWalkSat:
\bi
\item
Like MaxWalkSat, MaxFunWalk makes decisions about settings to variables.
\item
Unlike MaxWalkSat, MaxFunWalk  scores those decisions by passing those
settings to some function. In the case of DDP optimization, that function
is the scoring function of \eq{score}. 
\ei

\begin{figure}
~\hrule~
{\scriptsize
\begin{verbatim}
1. Procedure MaxFunWalk
2. for TRIES:=1 to MAX-TRIES
3.    SELECTION:=A randomly generate assignment of mitigations
4.    for CHANGED:=1 to MAX-CHANGES
5.        if SCORE satisfies THRESHOLD return
6.        CHOSEN:= a random selection of mitigations from SELECTION
7.        with probability P
8.            flip a random setting in CHOSEN
9.        with probability (P-1)
10.           flip a setting in CHOSEN that maximizes SCORE
11.    end for
12. end for
13. return BESTSCORE
\end{verbatim}
}
~\hrule~
\caption{Pseudocode for MaxFunWalk}
\label{fig:MWF}
\end{figure}


The MaxFunWalk procedure is shown in \fig{MWF}. When run, the user supplies an ideal cost and attainment. This setting is normalized, scored, and set as a goal threshold. If the current setting of mitigations satisfies that threshold, the algorithm terminates. 

MaxFunWalk begins by randomly setting every mitigation. From there, it will attempt to make a {\em single} change until the threshold is met or the allowed number of changes runs out (100 by default). 
A random subset of mitigations is chosen and a random number P between 0 and 1 is generated. The value of P  will decide the form that the change takes:
\bi
\item P$\le\alpha$: A stochastic decision is made. A setting is changed completely at random within subset C.
\item P$>\alpha$: Local search is utilized. Each mitigation in C is tested until one is found that improves the current score.
\ei

The best setting of $\alpha$ is a domain-specific engineering decision. For this study,
we used $\alpha=0.3$.

If the threshold is not met by the time that the allowed number of changes is exhausted, the set of mitigations is completely reset and the algorithm starts over. This measure allows the algorithm to avoid becoming trapped in local maxima. 
For the DDP models, we found that the number of retries has little effect on solution quality.

If the threshold is never met, MaxFunWalk will reset and continue to make changes until the maximum number of allowed resets is exhausted. At that point, it will return the best settings found. 

As an additional measure to improve the results found by MaxFunWalk, a heuristic was implemented to limit the number of mitigations that could be set at one time. If too many are set, the algorithm will turn off a few in an effort to bring the cost factor down while minimizing the effect on the attainment. 

\subsection{A* (ASTAR)}

 A* is a best-first path finding algorithm that uses distance from origin ($G$)
and estimated cost to goal ($H$)
to find the best path~\cite{hart68}. The algorithm has been proven to be optimal for a given scoring heuristic ~\cite{dechter85}, and has seen widespread use in multiple fields ~\cite{pearl84,norvig03,stout97,hui04}. 

A* is a natural choice for DDP optimization since the scoring
function described above
is actually a Euclidean distance measure
to the desired goal 
of maximum attainment and minimum costs.  
Hence, for H,  we can make direct use of \eq{score}.


\begin{figure}
~\hrule~
{\scriptsize
\begin{verbatim}
1. Procedure ASTAR
2. CURRENT_POSITION:= Starting assignment of mitigations
3. CLOSED[0]:= Add starting position to closed list
4.
5. while END:= false
6.    NEIGHBOR_LIST:=list of neighbors
7.    for each NEIGHBOR in NEIGHBOR_LIST
8.        if NEIGHBOR is not in CLOSED
9.            G:=distance from start
10.           H:=distance to goal
11.           F:=G+H
12.           if F<BEST_F
13.               BEST_NEIGHBOR:=NEIGHBOR
14.   end for
15.   CURRENT_POSITION:= BEST_NEIGHBOR
16.   CLOSED[++]:=Add new state to closed list
17.   if STUCK
18.       END:= true
19. end while	
20. return CURRENT_POSITION
\end{verbatim}
}
~\hrule~
\caption{Pseudocode for ASTAR}
\label{fig:ASTAR}
\end{figure}

%Our implementation of A*, called ASTAR, allows for two possible starting configurations. Like all of the other algorithms used for this experiment, ASTAR can randomly make assignments to each mitigation. This was the setting used for most of our experiments. However, A* is a deterministic algorithm. Given a {\em fixed} starting position and an {\em unchanging} search space, it should always come up with the same final answer. To test this, our ASTAR technique can be set to start from a fixed state where cost and attainment are both equal to 0. 

The ASTAR algorithm keeps a {\em closed} list in order to prevent backtracking. We begin by adding the starting state to the closed list. In each ``round'', a list of neighbors is populated from the series of possible states reached by making a change to a single mitigation. 
If that neighbor is not on the closed list, two calculations are made: 
\bi
\item G = Distance from the start to the current state plus the additional distance between the current state and that neighbor.
\item H = Distance from that neighbor to the goal (an ideal spot, usually 0 cost and a high attainment determined by the model). 
For DDP models, we use \eq{score} to compute H.
\ei
The {\em best} neighbor is the one with the lowest F=G+H. The algorithm ``travels'' to that neighbor and adds it to the closed list. 
Part of the optimality of the A* algorithm is that the distance to the goal is underestimated. 
Thus, the final goal is never actually reached by ASTAR. 
Our implementation terminates once it stops
finding better solutions
for a total of ten rounds. This number was chosen to give ample time for ASTAR to become ``unstuck'' if it hits a corner early on.  


\subsection{Scoring Costs}\label{sect:partials}
Return now to the issue of scoring {\em full} versus {\em partial} solutions, we note that:
\bi
\item
At each era, KEYS and KEYS2 generate partial solutions that 
fix 
$1,2,3,..|V|$ variables (where $V$ is the set of variables).
\item
Simulated Annealing, MaxWalkSat, and MaxFunWalk
work with full solutions since, at each step, they offer settings to all 
$v\in V$ variables.
\item 
ASTAR could produce either full or partial solutions.
If implemented as a {\em full} solution generator, algorithm, each member of the closed list will contain a solution
with fixed settings for all
$v\in V$ variables. If implemented as a {\em partial} solution generator, then the open list will contain the current
list of {\em free} variables and each step of the search will {\em fix} one more setting.
\ei
As mentioned above, such a partial solution generator will require $N$ calls to the model in order to score the current
solution; for example, KEYS and KEYS2 calls the models 100 times in each era. Since our concern is with runtimes, for this study, we used
a full solution generator for ASTAR.  

\section{Results}

Each of the above algorithms was tested on the five  models of \fig{ddpModels}.
Final attainment and costs, runtimes, and (for KEYS/KEYS2), the convergence towards
the final result were recorded. That information is shown below.

Note that:
\bi
\item
Models one and three are trivially small and we used them to debug our code.
We report our results using
models two, four and five since they are large enough to stress test real-time optimization.
\item
Model 4 was discussed in~\cite{me03n} in detail. 
The largest, model 5 was optimized previously in ~\cite{fea02a}.
As mentioned in the introduction,
on contemporary machines, model 5 takes 300 seconds to optimize using our
old, very slow, rule learning method.
\ei

\subsection{Attainment and Costs}

We ran each of our algorithms 1000 times on each model. This number
was chosen because it yielded enough data points to give a clear
picture of the span of results. At the same time, it is a low enough
number that we can generate a set of results in a fairly short time
span.  

\begin{figure}[!b]\centering
\begin{center}\scriptsize
\begin{tabular}{c|c@{~}|c@{~}|c}
&Model 2&Model 4 &Model 5 \\
& (31 mitigations)& (58 mitigations)&(99 mitigations)\\ \hline
SimpleSA& \includegraphics[width=3.5cm]{WithLabel/2/SA.pdf} & \includegraphics[width=3.5cm]{WithLabel/4/SA.pdf} & \includegraphics[width=3.5cm]{WithLabel/5/SA.pdf} \\\hline 
MaxFunWalk& \includegraphics[width=3.5cm]{WithLabel/2/MaxFunWalk.pdf} & \includegraphics[width=3.5cm]{WithLabel/4/MaxFunWalk.pdf} & \includegraphics[width=3.5cm]{WithLabel/5/MaxFunWalk.pdf} \\ \hline

ASTAR&  \includegraphics[width=3.5cm]{WithLabel/2/AStar.pdf} & \includegraphics[width=3.5cm]{WithLabel/4/AStar.pdf} &  \includegraphics[width=3.5cm]{WithLabel/5/AStar.pdf} \\ \hline

 KEYS &  \includegraphics[width=3.5cm]{WithLabel/2/KEYS.pdf} & \includegraphics[width=3.5cm]{WithLabel/4/KEYS.pdf} & \includegraphics[width=3.5cm]{WithLabel/5/KEYS.pdf} \\ \hline

KEYS2& \includegraphics[width=3.5cm]{WithLabel/2/KEYS-L.pdf} & \includegraphics[width=3.5cm]{WithLabel/4/KEYS-L.pdf} & \includegraphics[width=3.5cm]{WithLabel/5/KEYS-L.pdf} \\\hline


\end{tabular}
\caption{1000 results of running four algorithms on three models (12,000 runs in all).
The y-axis shows
cost and  the x-axis shows attainment. 
The size of each model is measured in number of mitigations. 
Note that better solutions fall
towards the bottom right of each plot; i.e. lower costs and higher attainment.
Also better solutions exhibit less variance, i.e. are clumped tighter together.
\label{fig:finalModelResults}
}
\end{center}
\end{figure}

The results are pictured in ~\fig{finalModelResults}. Attainment
is along the x-axis and cost (in thousands) is along the y-axis.
Note that better solutions fall
towards the bottom right of each plot; i.e. lower costs and higher attainment.
Also better solutions exhibit less variance, i.e. are clumped tighter together.

These graphs give a clear picture of the results obtained by our
four algorithms. Two methods are clearly inferior:
\bi
\item SimpleSA exhibit
the worst variance, lowest attainments, and highest costs.
\item MaxFunWalk is better than SimpleSA (less variance, lower costs, higher attainment)
but it is clearly inferior to the other methods.
\ei
As to the others:
\bi
\item On smaller models such as model2,
ASTAR produces higher attainments and lower variance than the KEYS algorithms.
However, this advantage disappears on the larger models.
\item
On larger models such as model4 and model5 KEYS and KEYS2 exhibit lower variance, lower costs, higher attainments than ASTAR.
\ei

\begin{figure}[!h]
\small
\begin{center}
\begin{tabular}{c|c|c|c} 
&Model 2&Model 4 &Model 5 \\
&~(31 mitigations)~& ~(58 mitigations)~&~(99 mitigations)~\\ \hline
SimpleSA& 0.577   &1.258        &0.854  \\ \hline
MaxFunWalk& 0.122   &0.429   &0.398         \\ \hline
ASTAR&  0.003  &0.017         &0.048    \\ \hline
KEYS  & 0.011   &0.053         &0.115   \\ \hline
KEYS2 	&0.006  &0.018 		&0.038 \\ 
\end{tabular}
\end{center}
\caption{
 Runtimes in seconds, averaged over 100 runs, measured using the
``real time'' 
value from the Unix {\tt times} command. 
The size of each model is measured in number of mitigations (and for more details
on model size, see \fig{ddpModels}). 
}
\label{fig:runtimes}
\end{figure}
\subsection{Runtime Analysis}
Measured in terms of  attainment and cost, there is little difference between KEYS and KEYS2.
However, as shown by \fig{runtimes}, KEYS2 runs twice to three times as fast as KEYS.
Interestingly, \fig{runtimes} ranks two of the algorithms in a similar order to \fig{finalModelResults}: 
\bi
\item
SimpleSA is clearly the slowest;
\item MaxFunWalk is somewhat better but not as fast as the other algorithms.
\ei
As to ASTAR versus KEYS or KEYS2:
\bi
\item
ASTAR is faster than KEYS;
\item
and  KEYS2 runs in time comparable to  ASTAR.
\ei
Measured in terms of runtimes, there is little to recommend KEYS2 over ASTAR. 
However, the two KEYS algorithms offer
certain functionality
which, if replicated in ASTAR, would make that algorithm much slower. To understand that functionality,
we return to the issue of solution {\em stability} within the {\em full} versus {\em partial} solutions discussed in \S3.2. 
\begin{figure}[!t]
\begin{center}
\includegraphics[width=4.5in]{model2.pdf}

{\bf \fig{in5}a:} Internal Decisions on Model 2.

~\\
\includegraphics[width=4.5in]{model4.pdf}

{\bf \fig{in5}b:} Internal Decisions on Model 4.

~\\
\includegraphics[width=4.5in]{model5.pdf}

{\bf \fig{in5}c:} Internal Decisions on Model 5.
\end{center}
\caption{Median and spread of partial solutions learned by KEYS and KEYS2. X-axis shows the number
of decisions made.  ``Median'' shows the 50-th percentile of the measured
value seen in 100 runs at each era. ``Spread'' shows the difference between the 75th  and 50th percentile. }
\label{fig:in5}
\end{figure}

 \fig{in5} shows the effects of the 
decisions made by KEYS and KEYS2.
For KEYS and KEYS2, at $x=0$, all of
the mitigations in the model are set at random. During each subsequent
era, more mitigations are fixed (KEYS sets one at a time, KEYS2 sets one or more at a time).  
The lines in each of these
plots show the median and spread seen in the 100 calls to the {\tt
model} function during each round. We define median as the 50th
percentile and spread (the measure of deviation around the median)
as (75th percentile - median).  
Interestingly, as we can see in the
plots, the spread is generally quite small compared to the median, particularly after 20 decisions.
This indicates that our median estimates are good descriptions of
the tendencies of the models.  

The plots for KEYS and KEYS2 are nearly identical:
\bi
\item
On termination (at maximum value of $x$), KEYS and KEYS2 
arrive at nearly identical median results (caveat:
for model2, KEYS2 attains slightly more requirements at a slightly higher
cost than KEYS). 
\item
The spread plots  for both algorithms are almost indistinguishable (exception: in model2, the KEYS2 spread
is less than KEYS). 
\ei
Since these results are so similar, and KEYS2 runs faster than KEYS, we recommend KEYS2 over KEYS.

%Median attainment improve (i.e. increase)
%dramatically in model2, somewhat in model4, and only
%slightly in model5. What is significant about these changes is that the attainment
%improvements are achieved while cost is reducing: 
%in all results, median costs 
%improve (i.e. reduce) as KEYS2 makes more decisions. 
\section{Discussion}

Returning now to the issue of robust solutions for SBSE
discussed in \tion{other}, one key feature of KEYS and KEYS2 is the ease with which they support
{\em partial
solutions} as well as  {\em neighborhood} and {\em trade-off} information.
\bi
\item Suppose a manager is advocating
a policy based on the mitigations found by KEYS2 in some range $1..X$.
\fig{in5} lets them readily discuss the effects of solutions in the {\em neighborhood}
of $X$. For example,
the effects of applying (say) $x/2$ (half) or double $2x$ that solution
is readily seen, just by glancing at \fig{in5}
\item Neighborhood information
allows managers to explore
{\em trade offs} in how
they adjust their projects. If they lack sufficient resources to implement all
of KEYS2's recommendations, they can consider the merits of {\em partial solutions}.
For example, in model4 and model5, most of the improvement comes making just 20 decisions.
Hence, for these models, managers might decide to implement a succinct partial solution
(i.e. all the recommendations $1\le x \le 20$).
\ei

KEYS2 and KEYS generate neighborhood information as a side-effect of their processing.
The other algorithms discussed here also generate neighborhood information, but at the expense
of greatly increased runtimes.
For example, as mentioned
above (end of \S6), ASTAR could be modified to produce partial solutions but this would
increase the runtimes of that algorithm since it would require multiple calls
to the scoring function. 

The lesson of KEYS is that such multiple scoring is avoidable, at least for some requirements models.
Historically, KEYS was originally designed as a {\em post-processor} to other algorithms.
We proposed a greedy forward select search to
prune solutions generated by other means. 
That order of that search was to have been informed by minimal instrumentation of other methods.
Then,
just as an experiment, we tried replacing (a) the other algorithms with a random input generator;
and (b) replacing the minimal instrumentation with the Bayes sampling methods of Equation 3.
As shown above, this approach ran  much faster. That is, at least for the models we have explored
so far, it may well be that the post-processor can replace the original processing.



%
%To use a search-based approach, software engineers have to reformulate their problem by:
%\bi
%\item Finding a {\em representation of the problem} that can be 
%symbolically manipulated (e.g. simulated or mutated). 
%Such representations always exist with model-based SE. In our work, DDP is the representation of the problem.
%\item
%Defining a {\em fitness function} (a.k.a. ``utility function'' or
%``objective function''); i.e.
%an ``oracle'' that
%scores a model configuration. Current model-based SE methods rarely offer
%such a function (exception: formal methods that generate temporal constraints). In our experience, generating such a fitness function is
%usually possible, albeit requiring days of work with the domain experts~\cite{me99j}. 
%Our fitness function is the calculation of cost and attainment from \eq{score}. 
%\item 
%Determining an appropriate set of {\em manipulation operators} 
%to select future searches based on the prior
%searches~\cite{harman01}.  Our manipulation operators are the various algorithms that we employ such as SimpleSA, ASTAR, MaxFunWalk, KEYS, KEYS2. 
%\ei
%Inspired by the search-based SE literature, our initial experiments with requirements optimization used simulated annealing.
%As shown above, this simulated annealing algorithm is out-performed by every other algorithm studied here.
%Perhaps
%researchers in search-based SE
%might find it advantageous to look beyond standard search algorithms (even
%supposedly state-of-the-art algorithms like MaxWalkSat).

%\section{External Validity}
%
%Our model examples come from software systems engineering at NASA's Jet Propulsion Laboratory (JPL).
%Our use of JPL models injects a sampling bias into our results. Therefore, it is important to consider the generality of lessons learned from JPL: 
%\bi
%\item JPL conducts large software system design tasks for complex systems (e.g., software for satellites, ground systems to process the information transferred back from the satellites, etc.).  
%An in-house team of over a hundred developers works on these systems, sometimes augmented with teams from external contractors or other NASA centers. 
%JPL's managers face the same challenges as other managers; e.g. handling large teams with a wide range of talent and domain experience, all the while reacting appropriately to budget cuts and project redirections.  
%\item We assert that industry could learn much from JPL systems engineering practices.  
%A manager in a conventional project has a ``delivery date'', which often can be renegotiated with the client depending on what functionality is ready, changing business conditions, etc.  
%A JPL software development manger, on the other hand, has fixed deliverable dates dictated by celestial mechanics in addition to the usual demanding programmatic constraints. Those dates may either be the launch date of the satellite or date at which the satellite will encounter the science event it is meant to process (for instance, the date when the satellite passes through the tail of a comet). Such dates are non-negotiable. JPL systems engineering methods have evolved to meet the demands of delivering complex systems, right when they are required for delivery. 
%\ei
%
\section{Conclusion}
As our requirements engineering
tools grow better, and they will be used by larger groups who will
build and debate increasingly complex models.
In the net-enabled $21^{st}$ century, 
it is hard to keep a group's attention focused on  the same set of issues.

We propose a model-based solution that runs fast enough to keep up with 
the group's dialogue and debates.  
We define
{\em real-time requirement
optimization} as  the task of generating conclusions from requirements models
before an expert's 
attention wanders to other issues. 
Our hope is that these conclusions will be so riveting (either startling, encouraging, 
or worrying) that the group will stop entering into side-debates, focusing instead
on the conclusions from the requirements optimizer.


Based on a back-of-the-envelope calculation
(presented in the appendix) we advised that  that real-time requirements optimization
of current models running on contemporary machines  must terminate in time
at least $10^{-2}$ seconds.  The generated solution must have two properties:
\bi
\item {\em Partial solutions}, since software project managers may be unable or unwilling
to precisely control all aspects of a project; and 
\item {\em Robustness information} commenting
on the brittleness of that solution within the space of nearby solutions.
\ei

The experiments of this paper recommend knowledge compilation and KEYS2 for
real-time requirements optimization:
\bi
\item
Knowledge compilation speeds up optimization by one to two orders of magnitude. While this is not enough by itself to support (say) the JPL 
models of the kind we expect to see in the near future, it is certainly useful when combined with AI search algorithms.
\item 
KEYS2 runs faster than KEYS and four orders of magnitude faster than our previous rule-learning based approach
(for model5,
300 seconds from rule learning~\cite{fea02a}
to 0.038 seconds for KEYS2). That is, KEYS2 is fast enough for real-time
requirements optimization.
\item
KEYS2 is an incremental keys-based search algorithm. A log of its incremental
findings  can be presented
in simple diagrams like \fig{in5}.  Glancing at that diagram, it is simple to understand
the trade space around the neighborhood of partial solutions.
\ei

We have shown that, at least for the DDP models, 
KEYS2 out-performs other methods.
When compared to those of ASTAR, we see that ASTAR can over-specify
a solution. KEYS2's partial  solutions achieve the same, or better, goals that ASTAR and
but, as shown by the trade space of \fig{in5} can often do so using only part of the
solutions proposed by ASTAR.
As to other search algorithms, SimpleSA and MaxFunWalk run slower than KEYS2 and result in solutions
with much larger variance.

We attribute the  success of KEYS2 to 
the presence of $keys$ in our models; i.e. variables that
control everything else.  These key variables were exploited by KEYS2 using two methods:
BORE finds promising keys with high probabilistic support; and
KEYS2's greedy search sets the most promising key,
before exploring the remaining options.

Elsewhere ~\cite{me03l}, we have documented
dozens of papers that have reported the keys effect 
under different names including {\em
narrows, master-variables, back doors,} and {\em feature subset
selection}.
That is, in theory, a keys-aware search like KEYS2 could optimize the processing
of many other models.
Our intent for this paper is to motivate other keys-based
optimizations for other kinds of models.
%\section{Future Work}
%DDP is an example of a class of ultra-lightweight early life cycle requirements engineering
%frameworks. In future work, we plan to explore real-time requirements optimization
%using other frameworks such as
%soft-goals, QOC, AHP, and QFD.
%
%As to further improvements for DDP,
%in terms of attainment and cost, it may be hard to improve over the
%KEYS2 (and KEYS) results.
%Observe in \fig{finalModelResults} how the
%results from these algorithms are
%very close to maximum attainment and minimum cost (the lower right-hard corner of
%these plots). 
%Further improvements in real-time optimization of DDP models may be restricted to issues of speed
%and stability.
%
%While KEYS2 is fast enough for real-time requirements optimization of \mbox{$1 \le W \le 10$} what-ifs over  DDP-like models, further optimizations
%would enable the processing of more what-ifs or more complex models.
%For example,
%our current implementation runs over binary variables and many problems require variables with
%larger ranges.
%
%
%The stochastic nature of KEYS2 lends itself
%naturally to optimization by parallelism. We are currently exploring grid computing methods, with each node in the grid running its own version of KEYS2.
%In this scheme, at random time intervals, results are dispatched from the nodes to a centralized learner. This learner reports
%back
%to the grid on how to best bias future simulations. To date, we have found that the communication overhead
%between processors out-weights the benefits of using multiple processors. However,
%this is a potentially very fruitful approach.
% 
%Another  possible project would be to improve ASTAR. 
%If ASTAR could be given the stability guarantee of KEYS while maintaining its runtime advantage,
%we could potentially address far more complex models. 
%For example, 
%we could enhance ASTAR  with the BORE heuristic found in KEYS2; e.g.
%BORE could be used as a sub-routine that biases ASTAR's selection of where to search next. 
%
%More generally, KEYS2 could be a useful sub-routine for improving the random selections made by other algorithms.
%Stochastic SAT solvers such as MaxWalkSat or the Markov Logic reasoning within ALCHEMY ~\cite{rich05}
%could be biased by KEYS2 to find $key$ variables within the CNF clauses.
%The effects of such an optimization
%could be dramatic- recall from the above discussion that Williams et.al.~\cite{williams03}
%reported that setting ``back doors'' (the keys) can reduce the solution time of hard problems
%from exponential time to polytime.
%

%Earlier ~\cite{promise08}, the authors advocated KEYS as an embodiment of the 
%``simplicity-first'' recommendations of Holte~\cite{holte93}. 
%Sophistication is unimportant if simpler methods perform as well as complex methods.
%Holte found that extremely small trees were often as accurate as more complex trees. 
%He cautioned the machine learning community to benchmark against simpler methods.
%
%This point still holds: all of our methods are far less sophisticated than our prior rule-based learner. 
%KEYS2 and ASTAR both yielded cost-attainment results that met or exceeded those of the previous work while running orders of magnitude faster.
%Rather, we should revise our original statement. Sophistication is unimportant if simpler methods perform as well as complex methods while yielding some sort of stability guarantee. SimpleSA, MaxFunWalk, and ASTAR are all simpler methods than KEYS2. Perhaps they are too simple. 
%Perhaps there is some threshold point where, at one side, simple methods beat complex methods; on the other end, those methods are too simple to use.
%
%We advocate the use of KEYS not only because it is competitive with TAR3, but because it offers a stability guarantee that the other algorithms in this paper do not.
 %If KEYS2 is cut off at any point, a user would know what decisions had been made and the effect of those decisions. 
%A project lead could find the minimal set of keys to set for the greatest effect by looking at the mitigations fixed during the first few rounds. 

\section{Acknowledgments}
This research was conducted at West Virginia University, the Jet Propulsion
Laboratory under a contract with the National Aeronautics and Space administration, and Miami University. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise, does 
not constitute or imply its endorsement by the United States Government
\bibliographystyle{abbrv}
\bibliography{refs,refsgreg} 
\appendix
\section*{Appendix}
This appendix shows a ``back of the envelope'' calculation on how
fast trade space analysis needs to run on current machines in order
to handle larger requirement models of the kind we expect to see
in five years time.  We will assume that those models are in the
DDP format and that our goal is five seconds to complete  a $2^W$
trade space analysis of $W\le10$ DDP scenarios.

\begin{figure}[!b]
\centering
\begin{tabular}{c|rl}
year & \multicolumn{2}{c}{\#variables} \\\hline
2004 & 30 &\rule{3mm}{1.5mm}\\
2008 & 100& \rule{10mm}{1.5mm}\\
2010 & 300&\rule{30mm}{1.5mm}\\
2013 & 800 &\rule{80mm}{1.5mm}\\
\end{tabular}
\caption{Growth trends: number of variables in the data dictionaries of NASA's requirements models. } 
\label{fig:variables}
\end{figure}
%Prior experiments with simulated annealing~\cite{feather08} and treatment learning~\cite{fea02a} took minutes to hours to terminate.
%This is a potential problem. Success with these model-based tools ~\cite{fea02a,me03c,feather08} has led to high expectations from the user community.
\begin{figure}
\begin{center}
\includegraphics[width=2.2in]{runs_exp.pdf}
\end{center}
\caption{Estimated time required on contemporary machines in order to handle $W$ what-if scenarios
of NASA requirements models in 2013 (when the models are assumed to be
eight times larger than those currently seen at JPL).
The x-axis represents a range of assumptions on the size of the exponential scale up factor $X^8$.}
\label{fig:goal}
\end{figure}
\fig{variables} shows the growth rate (historical and projected) of variables within DDP models. Note that by 2013, it
is expected that DDP models
will be eight times larger than today. 

It is possible to estimate the required speed of 
our algorithms on contemporary computers, such that by
the year 2013, DDP optimization will terminate in 5 seconds.
Assuming a doubling of CPU speed every two years, then by 2013
our computers will run $2^{(2013-2008)/2}=560\%$ faster. 
However, 
DDP optimization time tends to grow
exponentially with the number of variables since
larger models have more interconnections.

\fig{goal} shows those calculations assuming that we are exploring
 $2^W$  scenarios in the space of \mbox{$1 \le W \le 10$} what-ifs. We assume
the larger models are slower to process by some  amount 
$X^8$ where  
$X$ is an exponential scale-up factor.
Since we do not know the exact value of $X$,
\fig{goal} explores a range of possible values.

These plots show
that
our current runtimes are orders of magnitude too slow for
real-time requirements optimization.
If we run only one what-if query, the $W=1$ curve shows that
for even modest scale up facts ($X\ge2$), we require runtimes on
contemporary machines of $10^{-1}$ seconds; i.e.
four orders of magnitude faster than the rule learning approach we used in 2002~\cite{fea02a}.
Worse, handling  multiple what-ifs (e.g. $W<=10$) 
requires response times
of at least of $10^{-2}$  seconds.


\end{document}
Note that this oracle scores the {\em decisions}, but not necessarily the final {\em outcome} of those decisions.
Before a system a built, many factors result in an inability
to determine if requirement
decisions will lead to effective outcomes:
\bi
\item There may be incomplete experience in the application domain; e.g. the design is for a novel system that has never
existed before.
\item There may be uncertainty about environmental factors that effect the program; e.g. the peak user load on the system
may be unknown.
\ei
Even when it is impractical to determine the decisions' final outcomes, it may be possible to offer a heuristic
assess the effect
of the decisions. F