\section{CLIFF Assessment on Standard Data Sets}
%\label{section:assess}
In this chapter, we evaluate CLIFF as a prototype learner on standard data sets in cross validation experiments. In the following sections we present results which show the probability of detection (pd) and probability of false alarm (pf) before and after the use of CLIFF. 

\subsection{Experimental Method}
\label{section:brit}

%The data set used in this work is donated by \cite{Karslake09}. It contains 37 samples each with five(5) replicates (37 x 5 = 185 instances). Each instance has 1151 infrared measurements ranging from 1800-650cm-1. (Further details of this algorithm can be found elsewhere \cite{Karslake09}). For our experiments we took the original data set and created four (4) data sets each with a different number of clusters (3, 5, 10 and 20) or groups. These clusters were created using the K-means algorithm (\fig{kmeans}).

The effectiveness of CLIFF is measured using pd, pf and brittleness level (high, low) completed as folows: By allowing A, B, C and D to represent true negatives, false negatives, false positives and true positives respectfully, it then follows that \emph{pd} also known as recall, is the result of true positives divided by the sum of false negative and true positives \emph{D / (B + D)}. While pf is the result of: \emph{C / (A + C)}. The $pd$ and $pf$ values range from 0 to 1. When there are no false alarms $pf$ = 0 and at 100\% detection, $pd$ = 1.

%The results were visualized using \emph{quartile charts}. To generate these charts the performance measures for an analysis are sorted to isolate the median and lower and upper quartile of numbers. In our quartile charts, the upper and lower quartiles are marked with black lines; the median is marked with a black dot; and the vertical bars are added to mark the 50\% percentile value. 
%need to include examples

%The brittleness level measure is conducted as follows: First we calculate Euclidean distances between the validation or testing set which has already been validated and the training set. For each instance in the validation set the distance from its nearest like neighbor (NLN) and its nearest unlike neighbor (NUN) is found. Using these NLN and NUN distances from the entire validation set a Mann-Whitney U test was used to test for statistical difference between the NLN and NUN distances. 

The following sections describes the experiment and discusses the results.

\subsection{Is CLIFF viable as a Prototype Learner for NNC?}

The goal here is to see if the performance of CLIFF is comparable or better than the plain k nearest neighbor (KNN) algorithm. So in this experiment we compare the performance of predicting the target class using the entire training set to using only the prototypes generated by CLIFF. To accomplish this, our experiment design follows the pseudo code given in \fig{knnexp1} for the standard data sets. For each data set, tests were built from 20\% of the data, selected at random. The models were learned from the remaining 80\% of the data.

This procedure was repeated 5 times, randomizing the order of data in each project each time. In the end CLIFF is tested and trained 25 times for each data set.

\begin{figure}[h!]
\small
\begin{center}
\begin{tabular}{ p{7cm} }
\hline
\begin{verbatim}
DATA = [bc bs heart iris lym pima]
LEARNER = [KNN]
STAT_TEST = [Mann Whitney]

REPEAT 5 TIMES
 FOR EACH data IN DATA
  TRAIN = random 90% of data
  TEST = data - TRAIN
		
  \\Construct model from TRAIN data
  MODEL = Train LEARNER with TRAIN
  \\Evaluate model on test data
  [pd, pf] = MODEL on TEST
 END
END	
\end{verbatim}
 \\ \hline
    \end{tabular}
\end{center}
\caption{Pseudo code for Experiment}\label{fig:knnexp1}
\end{figure}

\subsubsection{Results from Experiment}

%\fig{result1} shows the 25\%, 50\% and 100\% percentile values of the $pd$, $pf$ and position values in each data set when r=1 (upper table) and r=2 (lower table. Next to these is the brittleness signal where $high$ signals an unacceptable level of brittleness and $low$ signals an acceptable level of brittleness. The results show that the brittleness level for each data set is $low$. The $pd$ and $pf$ results are promising showing that 50\% of the pd values are at or above 95\% for the data set with 3 clusters and at 100\% for the other data sets. While 50\% of the pf values are at 3\% for 3 clusters and 0\% for the others. These results show that our model is highly discriminating and can be used successfully in the evaluation of trace evidence.


\end{center}
\caption{Results for Experiment 1 for the 4 data sets distinguished by the number of clusters. Here for the upper and lower tables f=4 is used while r=1 is used for the upper table and r=2 for the lower table.}\label{fig:result1}
\end{figure}

\begin{figure}[ht]
\begin{center}
\small
\scalebox{0.8}{
\begin{tabular}{ l }
\begin{tabular}{l@{~}|l@{~}|r@{~}r@{~}@{~}r@{~}|c}
ir &Treatment& 25\%& 50\% & 75\%&Q1 median Q3\\\hline 
pd&BEFORE & 91& 100& 100&  \boxplot{0.0}{90.9}{100.0}{100.0}{0.0}  \\
 &AFTER & 73& 100& 100&  \boxplot{0.0}{72.7}{100.0}{100.0}{0.0}  \\ \hline
pf&BEFORE & 0& 0& 5&  \boxplot{0.0}{0.0}{0.0}{4.5}{95.5} \\
&AFTER & 0& 0& 8&  \boxplot{0.0}{0.0}{0.0}{8.3}{91.7}  \\
\hline
\multicolumn{5}{c}{~}&~~~~~0~~~~~~~~50~~~~100
\end{tabular}
\\  \hline \hline
\begin{tabular}{l@{~}|l@{~}|r@{~}r@{~}@{~}r@{~}|c}
bc &Treatment& 25\%& 50\% & 75\%&Q1 median Q3\\\hline 
pd&BEFORE & 29& 47& 85&  \boxplot{0.0}{29.4}{47.1}{85.4}{14.6}  \\
 &AFTER & 23& 40& 95&  \boxplot{0.0}{23.1}{40.0}{95.1}{4.9}  \\ \hline
pf&BEFORE & 11& 21& 69&  \boxplot{0.0}{11.1}{20.6}{68.8}{31.2} \\
&AFTER & 5& 12& 73&  \boxplot{0.0}{4.8}{11.6}{72.7}{27.3}  \\
\hline
\multicolumn{5}{c}{~}&~~~~~0~~~~~~~~50~~~~100
\end{tabular}
\\  \hline \hline
\begin{tabular}{l@{~}|l@{~}|r@{~}r@{~}@{~}r@{~}|c}
bs &Treatment& 25\%& 50\% & 75\%&Q1 median Q3\\\hline 
pd&BEFORE & 0& 68& 79&  \boxplot{0.0}{0.0}{68.4}{79.0}{21.0}  \\
 &AFTER & 44& 53& 61&  \boxplot{0.0}{44.4}{52.6}{61.0}{39.0}  \\ \hline
pf&BEFORE & 3& 18& 25&  \boxplot{0.0}{3.2}{18.2}{24.6}{75.4} \\
&AFTER & 12& 20& 29&  \boxplot{0.0}{12.1}{19.7}{28.8}{71.2}  \\
\hline
\multicolumn{5}{c}{~}&~~~~~0~~~~~~~~50~~~~100
\end{tabular}
\\  \hline \hline
\begin{tabular}{l@{~}|l@{~}|r@{~}r@{~}@{~}r@{~}|c}
ht &Treatment& 25\%& 50\% & 75\%&Q1 median Q3\\\hline 
pd&BEFORE & 8& 21& 50&  \boxplot{0.0}{8.3}{21.4}{50.0}{50.0}  \\
 &AFTER & 0& 0& 50&  \boxplot{0.0}{0.0}{0.0}{50.0}{50.0}  \\ \hline
pf&BEFORE & 6& 11& 20&  \boxplot{0.0}{5.7}{10.7}{20.0}{80.0} \\
&AFTER & 0& 4& 20&  \boxplot{0.0}{0.0}{3.6}{20.4}{79.6}  \\
\hline
\multicolumn{5}{c}{~}&~~~~~0~~~~~~~~50~~~~100
\end{tabular}
\\  \hline \hline
\begin{tabular}{l@{~}|l@{~}|r@{~}r@{~}@{~}r@{~}|c}
ly &Treatment& 25\%& 50\% & 75\%&Q1 median Q3\\\hline 
pd&BEFORE & 0& 71& 88&  \boxplot{0.0}{0.0}{71.4}{87.5}{12.5}  \\
 &AFTER & 0& 69& 100&  \boxplot{0.0}{0.0}{68.8}{100.0}{0.0}  \\ \hline
pf&BEFORE & 0& 0& 24&  \boxplot{0.0}{0.0}{0.0}{23.8}{76.2} \\
&AFTER & 0& 0& 25&  \boxplot{0.0}{0.0}{0.0}{25.0}{75.0}  \\
\hline
\multicolumn{5}{c}{~}&~~~~~0~~~~~~~~50~~~~100
\end{tabular}
\\  \hline \hline
\begin{tabular}{l@{~}|l@{~}|r@{~}r@{~}@{~}r@{~}|c}
pm &Treatment& 25\%& 50\% & 75\%&Q1 median Q3\\\hline 
pd&BEFORE & 40& 58& 72&  \boxplot{0.0}{40.0}{57.9}{71.9}{28.1}  \\
 &AFTER & 24& 46& 87&  \boxplot{0.0}{24.0}{46.4}{86.5}{13.5}  \\ \hline
pf&BEFORE & 28& 36& 60&  \boxplot{0.0}{28.0}{35.6}{60.0}{40.0} \\
&AFTER & 13& 31& 76&  \boxplot{0.0}{13.3}{30.5}{75.5}{24.5}  \\
\hline
\multicolumn{5}{c}{~}&~~~~~0~~~~~~~~50~~~~100
\end{tabular}
\end{tabular}}
\caption{Probability of Detection (PD) and Probability of False Alarm (PF)results}
\label{fig:results100}
\end{center}
\end{figure}

\begin{figure}
\begin{center}
\begin{tabular}{ | l | l | l | l | l | l |}
\hline
group & pd (Before \rightarrow After) & pf (Before \rightarrow After) & position (Before \rightarrow After) & data & \left| data \right|   \\ \hline
a & same & same & same & pm & 1 \\ \hline
b & same & same & increase & bc ir ly & 3  \\ \hline
c & decrease & decrease & increase & ht & 1 \\ \hline
d & decrease & increase & increase & bs & 1 \\ \hline
\end{tabular}
\end{center}
\caption{Summary of Mann Whitney U test results (95\% confidence): moving from Befroe to After.}
\label{fig:man1}
\end{figure}    


\begin{figure}[ht!]
%  \begin{center}
  \scalebox{0.85}{
    \begin{tabular}{l}
      \resizebox{100mm}{!}{\includegraphics{bc1}} 
      \resizebox{100mm}{!}{\includegraphics{bs1}} \\
      \resizebox{100mm}{!}{\includegraphics{ht1}} 
      \resizebox{100mm}{!}{\includegraphics{ir1}} \\
      \resizebox{100mm}{!}{\includegraphics{ly1}} 
      \resizebox{100mm}{!}{\includegraphics{pm1}} \\
    \end{tabular}}
    \caption{Position of values in the 'before' and 'after' population with data set at 3, 5, 10 and 20 clusters. The first row shows the results for r=1 while the second row shows the results for r=2}
    \label{fig:charts1}
 % \end{center}
\end{figure}


%include - as the number of clusters increase...
%assume the brittleness is yes

%\subsection{Experiment 2: Can CLIFF increase the level of tolerance for noise?}