\begin{figure}
{\footnotesize
\begin{center}
\begin{tabular}{|l|c|c|c|c|c|c|c|}
		& 
		& \multicolumn{3}{c|}{Nighthawk v1.0}
		& \multicolumn{3}{c|}{Nighthawk v1.1}
	 \\
Source file	& SLOC
		&  Nt	& RCt & Coverage
		&  Nt	& RCt & Coverage
	 \\
\hline
ArrayList	& 150
		& 48	& 15
		& 140 (.93)
		& 93	& 12
		& 140 (.93)
	 \\
\hline
EnumMap		& 239
		& 5	& 8
		& 7 (.03)
		& 20	& 10
		& 12 (.05)
	 \\
\hline
HashMap		& 360
		& 176	& 30
		& 347 (.96)
		& 136	& 25
		& 347 (.96)
	 \\
\hline
HashSet		& 46
		& 39	& 22
		& 44 (.96)
		& 125	& 21
		& 44 (.96)
	 \\
\hline
Hashtable	& 355
		& 157	& 25
		& 325 (.92)
		& 252	& 26
		& 329 (.93)
	 \\
\hline
IHashMap	& 392
		& 134	& 34
		& 333 (.85)
		& 182	& 17
		& 335 (.85)
	 \\
\hline
LHashMap	& 103
		& 129	& 25
		& 96 (.93)
		& 153	& 24
		& 96 (.93)
	 \\
\hline
LHashSet	& 9
		& 24	& 16
		& 9 (1.0)
		& 69	& 15
		& 9 (1.0)
	 \\
\hline
LinkedList	& 227
		& 53	& 17
		& 225 (.99)
		& 172	& 18
		& 225 (.99)
	 \\
\hline
PQueue		& 203
		& 103	& 13
		& 155 (.76)
		& 120	& 14
		& 147 (.72)
	 \\
\hline
Properties	& 249
		& 47	& 18
		& 102 (.41)
		& 79	& 35
		& 102 (.41)
	 \\
\hline
Stack		& 17
		& 26	& 8
		& 17 (1.0)
		& 45	& 7
		& 17 (1.0)
	 \\
\hline
TreeMap		& 562
		& 106	& 26
		& 526 (.94)
		& 227	& 28
		& 525 (.93)
	 \\
\hline
TreeSet		& 62
		& 186	& 26
		& 59 (.95)
		& 124	& 27
		& 59 (.95)
	 \\
\hline
Vector		& 200
		& 176	& 20
		& 195 (.98)
		& 36	& 19
		& 196 (.98)
	 \\
\hline
WHashMap	& 338
		& 110	& 21
		& 300 (.89)
		& 201	& 24
		& 300 (.89)
	 \\
\hline
\hline
Total		& 3512
		& 1519	& 324
		& 2880 (.82)
		& 2034	& 322
		& 2883 (.82)
	 \\
\hline
Per unit	& 220
		& 95	& 20
		& 
		& 127	& 20
		&
	 \\
\hline
\end{tabular}
\end{center}
}
\caption{
  Results of running configurations of Nighthawk
  on the 16 {\tt java.util} Collection and Map classes.
  SLOC: number of source lines of code contained in the {\tt .java}
  file of the unit, including inner classes.
  Nt: time (sec) taken by Nighthawk to find the winning chromosome.
  RCt: time (sec) taken by RunChromosome to generate and run 10 test cases
  based on the winning chromosome.
  Coverage: source lines of code covered by the 10 test cases run
  by RunChromosome, as measured by Cobertura
  (the number in parentheses is the ratio of lines covered).
}
\label{collection-map-cov}
\end{figure}

\begin{figure}
{\footnotesize
\begin{center}
\begin{tabular}{|l|c|c|c|c|}
Source file		& SLOC
			& Nt	& RCt
	& Coverage \\
\hline
ArrayStack		& 37
			& 6		& 6
	& 37 (1.0) \\
\hline
BagUtils		& 14
			& 52		& 7
	& 14 (1.0) \\
\hline
BeanMap			& 212
			& 19		& 76
	& 138 (0.65) \\
\hline
BinaryHeap		& 149
			& 123		& 18
	& 148 (0.99) \\
\hline
BoundedFifoBuffer	& 89
			& 6		& 11
	& 89 (1.0) \\
\hline
BufferOverflowException	& 9
			& 3 		& 4
	& 9 (1.0) \\
\hline
BufferUnderflowException	& 9
			& 3		& 5
	& 9 (1.0) \\
\hline
BufferUtils		& 12
			& 2		& 6
	& 12 (1.0) \\
\hline
ClosureUtils		& 34
			& 3		& 10
	& 22 (0.65) \\
\hline
CollectionUtils		& 329
			& 301		& 45
	& 211 (0.64) \\
\hline
ComparatorUtils		& 33
			& 13		& 10
	& 33 (1.0) \\
\hline
CursorableLinkedList	& 528
			& 170		& 41
	& 496 (0.94) \\
\hline
DefaultMapEntry		& 25
			& 2		& 5
	& 24 (0.96) \\
\hline
DoubleOrderedMap	& 521
			& 164		& 26
	& 480 (0.92) \\
\hline
EnumerationUtils	& 3
			& 2		& 7
	& 3 (1.0) \\
\hline
FactoryUtils		& 8
			& 2		& 7
	& 8 (1.0) \\
\hline
FastArrayList		& 519
			& 334		& 49
	& 481 (0.93) \\
\hline
FastHashMap		& 258
			& 167		& 24
	& 223 (0.86) \\
\hline
FastTreeMap		& 288
			& 239		& 37
	& 262 (0.91) \\
\hline
FunctorException	& 36
			& 4		& 12
	& 29 (0.81) \\
\hline
HashBag			& 5
			& 8		& 9
	& 5 (1.0) \\
\hline
IteratorUtils		& 114
			& 455		& 54
	& 97 (0.85) \\
\hline
LRUMap			& 44
			& 76		& 28
	& 44 (1.0) \\
\hline
ListUtils		& 64
			& 53		& 10
	& 28 (0.44) \\
\hline
MultiHashMap		& 138
			& 36		& 20
	& 128 (0.93) \\
\hline
PredicateUtils		& 31
			& 64		& 9
	& 31 (1.0) \\
\hline
ReferenceMap		& 297
			& 161		& 23
	& 247 (0.83) \\
\hline
SequencedHashMap	& 236
			& 105		& 41
	& 232 (0.98) \\
\hline
SetUtils		& 30
			& 42		& 8
	& 19 (0.63) \\
\hline
StaticBucketMap		& 214
			& 324		& 36
	& 199 (0.93) \\
\hline
SynchronizedPriorityQueue	& 11
			& 2	& 4
	& 9 (0.82) \\
\hline
TransformerUtils	& 39
			& 3		& 11
	& 27 (0.69) \\
\hline
TreeBag			& 10
			& 10		& 10
	& 10 (1.0) \\
\hline
UnboundedFifoBuffer	& 81
			& 32		& 48
	& 81 (1.0) \\
\hline
\hline
Total			& 4427
			& 2986		& 717
	& 3885 (.88) \\
\hline
Per unit		& 130
			& 88		& 21
	& \\
\hline
\end{tabular}
\end{center}
}

% Even for the 8 units of 200 SLOC or greater,
% this represents 2631 / 2976 SLOC covered, or 88.44%

\caption{
  Results of running Nighthawk 1.1 using 8 chromosomes
  on the 34 Apache Commons Collections classes studied.
  Column headings are as in Figure \ref{collection-map-cov}.
}
\label{apache-cov}
\end{figure}

We now present statistics on runs of versions of Nighthawk on
two subject software packages, in order to help evaluate how
cost-effective Nighthawk is, and to help compare version 1.0
of Nighthawk with version 1.1.

We collect statistics on Nighthawk using the following
procedure.  For each subject unit, we run Nighthawk for 50
generations.  In order to give engineers an accurate sense of
how long Nighthawk typically takes to find its highest coverage,
we record the time Nighthawk reported after first achieving the
highest coverage it achieved.  We take the winning chromosome
from Nighthawk and run 10 test cases based on that chromosome
using RunChromosome.  Finally, we run Cobertura's report
generation script; we calculate how many lines of code are
measurable by Cobertura in the source file for the unit, and
how many lines are reported by Cobertura as having been covered.
These counts include inner classes.

In order to compare Nighthawk v1.0 with v1.1 (the version using
the more accurate initial value for {\tt numberOfCalls}), we
ran the above procedure using v1.0 on the {\tt java.util}
Collection and Map classes, and again using v1.1.  The v1.0
results were reported originally in \cite{andrews07};
Figure \ref{collection-map-cov} collects these statistics and
also those for v1.1.

A glance at the Nt columns in Figure \ref{collection-map-cov}
suggest that version 1.1 of Nighthawk takes longer than version
1.0, but statistical tests suggest there is no statistically
significant difference in run times (a paired Wilcoxon test
yields $p=0.07$; a Shapiro-Wilk normality test on the difference
between the columns yields $p=0.14$, not rejecting the
hypothesis of normality; and a paired $t$ test yields $p=0.08$).
There is no statistically significant difference in either the
runtime needed by RunChromosome (a paired Wilcoxon test yields
$p=0.98$) or the coverage achieved (a paired Wilcoxon test
yields $p=0.60$).

We conclude that for these subject units, the corrected value of
{\tt numberOfCalls} did not result in either significant extra
runtime or significant extra coverage.  Nighthawk was able to
find a winning chromosome after an average of at most 127
seconds per unit, or 58 seconds per 100 source lines of code.
RunChromosome was able to generate and run ten test cases in an
average of 20 seconds per unit, or 9.2 seconds per 100 source
lines of code.  These tests covered an average of 82\% of the
source lines of code, a ratio which is not higher mainly because
of the poor performance of Nighthawk on the large {\tt EnumMap}
unit.  (In \cite{andrews07} we discuss why Nighthawk did
poorly on {\tt EnumMap} and what we did to correct the problem.)

We also ran the evaluation procedure on the Apache Commons
Collections classes that we used in our optimization research.
As discussed in Section XXX, there was strong evidence that the
two low-ranked gene types {\tt candidateBitSet} and
{\tt chanceOfNull} were not cost-effective.
For this study, we therefore ran version 1.1 of Nighthawk using
the 8 other, highest-ranked gene types.
Figure \ref{apache-cov} shows the resulting data.
Nighthawk was able to find a winning chromosome after an average
of 88 seconds per unit, or 67 seconds per 100 source lines of code.
RunChromosome was able to generate and run ten test cases in an
average of 21 seconds per unit, or 16 seconds per 100 source
lines of code.  These tests covered an average of 88\% of the
source lines of code.  Inspection of the units on which
Nighthawk did less well indicate that some of the missing
coverage was due to code that tested whether arguments were
instances of specific subclasses of {\tt java.util.Collection},
such as {\tt java.util.Set} or {\tt java.util.Map}.

In summary, for the widely-used subject units we studied,
Nighthawk is able to find randomized testing parameters that
achieve high coverage in a reasonable amount of time; the
parameters that it finds allow programmers to quickly generate
many distinct test cases that achieve high coverage of the units
under test.

