\section{CLIFF Design and Operation}

If standard methods are brittle what can we do? We seek our answer to this question in the work of \cite{Karslake09}, and we explore an intuition that to reduce brittleness, data with dissimilar outcomes should not be close neighbors. In this section the details of CLIFF's core procedure and tools are discussed. Included in this discussion is a sub-section which further explores our intuition for brittleness reduction and our tool borne from this intuition - the CLIFF selector.

The Design of CLIFF is deeply rooted in the work of \cite{Karslake09}. In their work, analysis is done using Chemometrics, an application of mathematical, statistical and/or computer science techniques to chemistry. In the work done by \cite{Karslake09}, Chemometrics using computer science techniques is applied to analyze the infrared spectra of the clear coat layer of a range of cars. The analysis proceeded as follows:

\begin{itemize}
\item Agglomerative hierarchical clustering (AHC) for grouping the data into classes
\item Principal component analysis (PCA) for reducing dimensions of the data
\item Discriminant analysis for classification i.e. associating an unknown sample to a group or region
\end{itemize}

This technique produced a strong model which achieved 100\% accuracy i.e. when validated by removing random samples from the model, all the samples were correctly assigned. The goal of CLIFF is not only to create a strong forensic model but also to show how strong the model is. To achieve this CLIFF includes a brittleness measure as well as a method to reduce brittleness. Also, in an effort to keep CLIFF simple, we substituted different tools to preform the analysis done in \cite{Karslake09}. For instance Kmeans is used instead of AHC for grouping the data into classes. FastMap is used for dimensionality reduction and K-nearest neighbor is used for classification. The basic operation of CLIFF is shown in \fig{process}. The data is collected and the dimensions is reduced if necessary. Clusters are then created from the data and classification is done along with a brittleness measure (further discussed in Section \ref{subsection:bm}). Finally, we test if brittleness can be reduced using a novel prototype learning technique (Instance Selection).

\begin{figure}[h!]
\begin{center}
\includegraphics[scale=0.32]{process}
\end{center}
\caption{Proposed procedure for the forensic evaluation of data}\label{fig:process}
\end{figure}

\subsection{Dimensionality Reduction}

The data used in our experiments contains 1151 attributes and 185 instances. Using the data set as is would cause us to create a model that is computationally expensive and likely to produce unacceptable results such as a high false positive values caused by redundant and noisy data. To avoid this foreseen problem, we turn to dimensionality reduction.

Dimensionality Reduction refers to reducing high-dimensional data to low dimensional data. This is accomplished by attempting to summarize the data by using less terms than needed. While this reduces the overall information available and thus a level of precision, it allows for easy visualization of data otherwise impossible to visualize. Some algorithms that can be used for Dimensionality Reduction are Principle Component Analysis (PCA), and FastMap. 

The data used in this work contains 1,151 variables and 185 samples. To perform an analysis on this data set we must first reduce the number of variables used. In \cite{Karslake09}, PCA is used to perform dimensionality reduction. PCA can be defined as ``the orthogonal projection of the data onto a lower dimensional linear space". In other words, looking at our data set, our goal is to project the data onto a space having dimensionality that is less than 1,151 (M < 1,151) while maximizing the variance of the projected data \cite{pca}. In \cite{Karslake09}, two techniques - Pearson correlation and covariance for comparison of the two, were used to determine an appropriate value for M (M = 4). 

To speed things up a little, in our model we use \emph{FastMap} to reduce the dimensions of the data set. In FastMap the basis of each reduction is using the cosine law on the triangle formed by an object in the feature space and the two objects that are furthest apart in the current (pre-reduction) space. These two objects are referred to as the pivot objects of that step in the reduction phase (M total pivot object sets). Finding the optimal solution of the problem of finding the two furthest apart points is an N squared problem (where N is the total number of objects), but this is where the heuristic nature of FastMap comes into play. Instead of finding the absolute furthest apart points, FastMap takes a shortcut by first randomly selecting an object from the set, and then finding the object that is furthest from it and setting this object as the first pivot point. After the first pivot point is selected, FastMap finds the points farthest from this and uses it as the second pivot point. The line formed by these two points becomes the line that all of the other points will be mapped to in the new M dimension space. (Further details of this algorithm can be found elsewhere \cite{fastmap}).

To determine the appropriate value for M using FastMap, we experimented experimented with different values for M. \fig{exp1} shows results for various K-nearest neighbor classifiers (discussed further in Sections \ref{subsection:knn} and \ref{section:assess}), with M fixed at 2, 4, 8 and 16. When M is 2 or 4 100\% of the validation samples are predicted correctly (pd) and 0\% are predicted incorrectly (pd). For this reason, our model model is analysed using M = 4.

\begin{figure}
  \begin{center}
  \scalebox{0.95}{
    \begin{tabular}{c}
      \resizebox{60mm}{!}{\includegraphics{varspd}} \\
      \resizebox{60mm}{!}{\includegraphics{varspf}} \\
      
    \end{tabular}}
    \caption{Probability of detection (pd) and Probability of False alarms (pf) using fixed values for dimensions and fixed k values for k-nearest neighbor}
    \label{fig:exp1}
  \end{center}
\end{figure}


\subsection{Clustering}

Clustering is the second step in the CLIFF tool and can be defined as the grouping of the samples into groups whose members are similar in some way. The samples that belong to two different clusters are dissimilar. The major goal of clustering is to determine the intrinsic grouping in the set of unlabelled data. In most of the clustering techniques, distance is the major criteria. Two objects are similar if they are close according to the given distance.

CLIFF clusters using K-means. The \fig{kmeans} represents the pseudo code for the K-means algorithm. The idea behind K-means clustering is done by assuming some arbitrary number of centroids, then the objects are associated to nearest centroids. The centroids are then moved to center of the clusters. These steps are repeated until a suitable level of convergence is attained.

\begin{figure}[h!]
\small
\begin{center}
\begin{tabular}{ p{7cm} }
\hline
\begin{verbatim}

DATA = [3, 5, 10, 20]
k = [1, ..., Number of clusters]
STOP = [Stopping criteria]
        
FOR EACH data IN DATA
 N = count(data)
 WHILE STOP IS FALSE
  // Calculate membership in clusters
  FOR EACH data point X IN data
   FIND NEAREST CENTROID_k
   ADD TO CLUSTER_k
  END
 
  // Recompute the centroids  
  FOR EACH CLUSTER
   FIND NEW CENTROIDS
  END
  
  // Check stopping criteria
  [TRUE or FALSE] = STOP
 END
END
\end{verbatim}
 \\ \hline
    \end{tabular}
\end{center}
\caption{Pseudo code for K-means}\label{fig:kmeans}
\end{figure}

\subsection{Classification with KNN}
\label{subsection:knn}
%from wiki
%1973 Duda and Hart
K-nearest neighbor (KNN) classification is a simple classification method usually used when there is little or no prior knowledge about the distribution of data. KNN is described in \cite{knn} as follows: Stores the complete training data. New samples are classified by choosing the majority class among the k closest examples in the training data. For our particular problem, we used the Euclidean, i.e. sum of squares, distance to measure the distance between samples. Finally, to determine a value for k, we investigated the performance of six (6) KNN classifiers where k is fixed at 2, 4, 8 and 16. \fig{exp1} shows the results which indicate that using KNN classifiers where k is equal to 4, 8 or 16, the validation of samples is 100\%. For CLIFF k = 4 is used.

\subsection{The Brittleness Measure}
\label{subsection:bm}
Calculating the brittleness measure is a novel operation of CLIFF. We use the brittleness measure in this work to determine if the results of CLIFF comes from a region where all the possible results are (dis)similar. For the purpose of this work the optimal result will come from a region of similar results. To make this determination, using each sample from a validation set, once each sample from this set has been classified, the distance from the nearest unlike neighbor (NUN) i.e. the distance from a sample with a different class and the distance from the nearest like neighbor (NLN) i.e. the distance from a sample with the same class is recorded. Recall that brittleness is a small change can result in a different outcome, so here the closer the distances of NUN to NLN the more brittle the model. So an ideal result will have the greatest distance between NUNs and NLNs. 

The brittleness measure will give an output of either $high$ or $low$: high indicating that there is no significant difference between the NUN and NLN values, while $low$ indicates the opposite. The significance of these values was calculated using the Mann-Whitney U test. This is a non-parametric test which replaces the distance values with their rank or position inside the population of all sorted values.%need to talk about the rank process

\eq{bm} embodies our definition of brittleness: if the significance of NUN values are less than or equal to the NLN values, then an unacceptable level of  brittleness is present in the model. 

\begin{equation}
[NUN <= NLN] ==> BRITTLENESS
\label{eq:bm}
\end{equation}

\subsection{The CLIFF Selector}
\label{subsection:selector}
The CLIFF selector is the last major operation of CLIFF. It is a  Prototype Learning tool based on a novel prototype learning algorithm which is defined by it's purpose - to reduce a training set via various selection and/or creation methods to produce $good$ prototypes which increases the distance between NUNs and NLNs. These $good$ prototypes are therefore a representation of the original training data set such that they maintain comparable or increased classification performance of a classifier.

In an effort to reduce the brittleness of our model, the CLIFF selector selects samples from a training set which best represents their respective target class. To accomplish this, a Bayesian ranking measure along with a support measure is used. First we assume that the target class is divided into one class as $best$ and the other classes as $rest$. This makes it easy to find the attribute ranges of values which have a high probability of belonging to the current \emph{best} class. The attribute ranges are found using an equal frequency binning algorithm which sorts attribute values into $N$ equal frequency regions (making sure that there are no repeats in the numbers that are the boundaries between regions). The rank values are generated by applying \eq{one} to each attribute range. The remainder of this section further discusses how the attribute ranges and their corresponding ranks are generated.

\begin{verbatim}
Let likelihood = like
\end{verbatim}
\begin{equation}
P(best|E) * support(best|E) = \frac{like(best|E)^2}{like(best|E) + like(rest|E)} 
\label{eq:one}
\end{equation}

\begin{figure}[h!]
\small
\begin{center}
\begin{tabular}{ p{8cm} }
\hline
\begin{verbatim}
        
    Outlook   Temp(F)  Humidity Windy? Class
01. Sunny      69        70     False  Lots
02. Sunny      75        70     True   Lots
03. Overcast   83        88     False  Lots
04. Overcast   64        65     True   Lots
05. Overcast   72        90     True   Lots
06. Overcast   81        75     False  Lots
07. Sunny      85        86     False  None
08. Sunny      80        90     True   None
09. Sunny      72        95     False  None
10. Rain       65        70     True   None
11. Rain       71        96     True   None
12. Rain       70        96     False  Some
13. Rain       68        80     False  Some
14. Rain       75        80     False  Some
\end{verbatim}
 \\ \hline
    \end{tabular}
\end{center}
\caption{A log of some golf-playing behavior}\label{fig:golf}
\end{figure} 

The algorithm used in this work is a Support Based Bayesian Ranking (SBBR) algorithm. It's pseudo code is shown in \fig{sbbra}. The \emph{best-rest} function divides the data set $D$ into $best$ and $rest$ (for example, in \fig{golf} if class $Lots$ is selected as $best$ then samples 1-6 would be in the best set while samples 7-14 would be in the rest set). As explained earlier the $EqualFrequencyBinning$ function in line 2 finds the attribute ranges for each attribute in the data set. For instance, looking at the attribute values for $Humidity$, if we want to get three(3) bins the following would be generated - \newline \emph{65 70 70 70 | 75 80 80 86 | 88 90 90 95 96 96}.

Once the ranges for each attribute is found, lines 5-12 are applied to each range to produce a rank value. This rank value represents the probability a sample containing a value from a particular attribute range will be in the $best$ class.

Finally to choose the samples which best represents a class, for each class a criteria is established where samples containing attribute range values with the highest $r$ rank values of the highest $n$ attributes are selected. For example, looking at the golf data set in \fig{golf}, if r=1 and n=4, the criteria for choosing the best samples in the $Lots$ class could be [Overcast, 81-83, 75-88, False]. With this criteria, samples 3 and 6 would be selected. Please note that the attributes are ordered from most important to least important according to the highest rank value found in the respective attribute ranges. Therefore the highest $n$ attributes is/are chosen according to this order. \fig{exp3} shows the $pd$ and $pf$ (further discussed in Section \ref{subsubsection:brit}) results from CLIFF when n = [1, 2, 3, 4] and r = [1, 2]. These results seem to indicate that r=2 would be the best value to use in our experiments. However looking at the table in \fig{instances} which reveals the number instances left after applying just the CLIFF selector to the data sets, we see that r=1 gets rid of more of the instances than r=2 which still maintaining relatively high $pd$ and low $pf$ values as seen in \fig{exp3}. With these results in mind, Experiments 1 and 2 in this work used n=4 and r=1,2 to evaluate CLIFF.

\begin{figure*}[ht!]
  \centering  
  \scalebox{0.95}{
    \begin{tabular}{c}
      \resizebox{60mm}{!}{\includegraphics{jpdk1}} 
      \resizebox{60mm}{!}{\includegraphics{jpfk1}} \\
      \resizebox{60mm}{!}{\includegraphics{jpdk2}} 
      \resizebox{60mm}{!}{\includegraphics{jpfk2}} \\
    \end{tabular}}
    \caption{pd and pf results for n=[1,2,3,4] and r=[1,2]}
    \label{fig:exp3} 
\end{figure*}

\begin{figure}
\begin{center}
\begin{tabular}{l@{~}|l@{~}|c@{~}| c@{~}|}
%\begin{tabular}{l@{~}|l@{~}|r@{~}|r@{~}|}
%\multicolumn{2}{c}{~}&\multicolumn{5}{c}{quartiles}\\\cline{3-5}
%&min& & median & &max\\\cline{3-5}
%\multicolumn{2}{c}{~}&\multicolumn{2}{c}{Before}\\
%\multicolumn{2}{c}{~}&\multicolumn{2}{c}{percentiles} \\\cline{3-4}
%Clusters& Type & Before & After \\\hline
%\multicolumn{2}{c}{~}&\multicolumn{3}{c}{Before}\\
%\multicolumn{2}{c}{~}&\multicolumn{1}{c}{# of Instances}\\ %
\cline{1-4}
Clusters & r & Number of Instances & Reduction\% \\\hline
\multirow{2}{*}{3} & r=1 & 36 & 19  \\
 & r=2 & 100 & 54 \\
 \hline
\multirow{2}{*}{5} & r=1 & 42 & 23 \\
 & r=2 & 99  & 54 \\
  \hline
\multirow{2}{*}{10} & r=1 & 58 & 31 \\
 & r=2 & 112 & 61\\
 \hline
\multirow{2}{*}{20} & r=1 & 88 & 48 \\
 & r=2 & 159 & 86 \\
  \hline 
%\multicolumn{5}{c}{~}&~~~~~0~~~~~~~~50~~~~100 
\end{tabular}
\end{center}
\caption{Instance selection using the CLIFF selector. The Reduction\% column shows the percentage of the original data set left after selection. (All the original data sets contain 185 instances)}\label{fig:instances}
\end{figure}


\begin{figure}[h!]
\small
\begin{center}
\begin{tabular}{ p{8cm} }
\hline
\begin{verbatim}
DATA = [3, 5, 10, 20]
A = [Attributes]
BEST = [Instances with current best class]
REST = [Instances with other classes]
FREQ = [frequency]
LIKE = [likelihood]
EFB = [EqualFrequencyBinning]
BIN = [Values within Attribute range]
P_BEST = [Probability of BIN in BEST]
P_REST = [Probability of BIN in REST]
        
FOR EACH data IN DATA
 BIN_DATA = EFB on data
 FOR EACH attribute in A{
  FOR EACH BIN IN attribute{
   P_BEST = count(BEST) / count(data)
   P_REST = count(REST) / count(data)
   FREQ(BIN|BEST) = count(BIN in BEST) / count(BEST)
   FREQ(BIN|REST) = count(BIN in REST) / count(REST)                       
   LIKE(BEST|BIN) = FREQ(BIN|BEST) x P_BEST
   LIKE(REST|BIN) = FREQ(BIN|REST) x P_REST
   LIKE_BEST_REST = LIKE(BEST|BIN) + LIKE(REST|BIN)
   RANK = LIKE(BEST|BIN)^2 / LIKE_BEST_REST
   RETURN [BIN, RANK]
  END
 END
END
\end{verbatim}
 \\ \hline
    \end{tabular}
\end{center}
\caption{Pseudo code for Support Based Bayesian Ranking algorithm}\label{fig:sbbra}
\end{figure}       

%The result of applying the second visualization technique i.e. charting the probabilities of those measurements from $x$ in certain ranges would match corresponding measurements in $y$ with the help of a Contrast Set Learning (CSL) algorithm, is shown in \fig{tableseh}.

%\fig{tableseh1} shows a snippet of the result of applying SBBR to a data set generated from the Seheult model. The attribute values of $x$ have been binned using equal frequency binning, and so the $Range$ row represents the minimum and maximum values of each bin. The $Rank$ values represents the level of importance for each range of $x$ values. Here a rank of 1.0 is optimum indicating that any $x$ value in the range of 0.95 to 0.98 is more likely to have a higher LR than $x$ values from 0.94 to 0.95 and 0.98 to 1.0. %need to change to actual values

%For the purpose of this work, \fig{tableseh1} and Figures \ref{fig:tableseh} - \ref{fig:tablewal}, offer a perfect visual of brittleness in these forensic models. Recall our definition for brittleness as a tiny change in data can lead to a change in interpretation i.e. the rank value can change from 0.23 to 1.0 if x=0.945 changes to x=0.955. An example of little or no brittleness is shown in \fig{tableseh2}. There is no change in the rank if y should change from 0.923 to 0.932 and a small change if y=0.932 to y=0.944. Additionally, without even looking at the numbers, a casual glance at the SBBR charts in this paper shows the clear contrasts of neighboring ranges where a dark block (low LR) sits next to a light block (high LR).


%Using Bayes theorem where evidence E and a prior probability P (H) for hypothesis H ∈ {best, rest}, to calculate a posteriori probability P (H|E) = P (E|H)P (H) / P (E). Such simple Bayes classifiers are often called ``näive" since they assume independence of each variable. Domingos and Pazzani show that the independence assumption is a problem in a vanishingly small percent of cases [12]. This explains the repeated empirical result that seemingly näive Bayes classifiers perform as well as other more sophisticated schemes (e.g. see Table 1 in [12]). When applying the theorem, likelihoods are computed from observed frequencies, then normalized to create probabilities (this normalization cancels out P (E) in Bayes theorem). For example, after K = 10, 000 runs divided into 1,000 best solutions and 9,000 rest, the value mitigation31 = false might appear 10 times in the best solutions, but only 5 times in the rest. Hence:


%E   =    (mitigation31 = false)                   
%P (best)  =    1000/10000 = 0.1
%P (rest)  =    9000/10000 = 0.9
%freq(E|best)    =    10/1000 = 0.01                            
%freq(E|rest)    =    5/9000 = 0.00056
%like(best|E)    =      freq(E|best) · P (best) = 0.001                                         
%like(rest|E)    =   freq(E|rest) · P (rest) = 0.000504                                           
                                                                       
%                                     like(best|E)
                                                                       
%           P (best|E)   =                                  = 0.66 (1)
%                             like(best|E) + like(rest|E)               
                                                                       
%Previously [8] we have found that Bayes theorem is a poor ranking heuristic since it is distracted by low frequency evidence. For example, note how the probability of E belonging to the best class is moderately high even though its support is very low; i.e. P (best|E) = 0.66 but freq(E|best) = 0.01.
                                                                       
%To avoid such unreliable low frequency evidence, we augment Equation 1 with a support term. Support should increase as the frequency of a value increases, i.e. like(best|E) is a valid support measure. Hence, step 3 of our greedy search ranks values via      
                                                                       
%                                              like(best|E)2            
%  P (best|E) ∗ support(best|E) =                                  (2)  
%                                      like(best|E) + like(rest|E)
                                      
%Once the P(best|E) values are found for each range of values in each attribute, we select samples which best represents the current best class. We choose the top S attributes i.e. attributes with highest P(best|E) values are chosen first. From each of those attributes the ranges of values with the highest N P(best|E) values are noted. With this information samples are chosen according to the values on S and N. In our experiments S ranges from 1 to 4 inclusive, and N is 1 or 2.
%need to mention efb
%need algorithm


\section{CLIFF Assessment}
\label{section:assess}

In this section, we evaluate CLIFF as a forensic model on a data set donated by \cite{Karslake09} in cross validation experiments. First, we describe the data set and experimental procedures. Next we present results which show the probability of detection (pd), probability of false alarm (pf) and brittleness level of CLIFF before and after the use of the selector. 

\subsection{Data Set and Experimental Method}
\label{subsubsection:brit}

The data set used in this work is donated by \cite{Karslake09}. It contains 37 samples each with five(5) replicates (37 x 5 = 185 instances). Each instance has 1151 infrared measurements ranging from 1800-650cm-1. (Further details of this algorithm can be found elsewhere \cite{Karslake09}). For our experiments we took the original data set and created four (4) data sets each with a different number of clusters (3, 5, 10 and 20) or groups. These clusters were created using the K-means algorithm (\fig{kmeans}).

The effectiveness of CLIFF is measured using pd, pf and brittleness level (high, low) completed as folows: By allowing A, B, C and D to represent true negatives, false negatives, false positives and true positives respectfully, it then follows that \emph{pd} also known as recall, is the result of true positives divided by the sum of false negative and true positives \emph{D / (B + D)}. While pf is the result of: \emph{C / (A + C)}. The $pd$ and $pf$ values range from 0 to 1. When there are no false alarms $pf$ = 0 and at 100\% detection, $pd$ = 1.

%The results were visualized using \emph{quartile charts}. To generate these charts the performance measures for an analysis are sorted to isolate the median and lower and upper quartile of numbers. In our quartile charts, the upper and lower quartiles are marked with black lines; the median is marked with a black dot; and the vertical bars are added to mark the 50\% percentile value. 
%need to include examples
The brittleness level measure is conducted as follows: First we calculate Euclidean distances between the validation or testing set which has already been validated and the training set. For each instance in the validation set the distance from its nearest like neighbor (NLN) and its nearest unlike neighbor (NUN) is found. Using these NLN and NUN distances from the entire validation set a Mann-Whitney U test was used to test for statistical difference between the NLN and NUN distances. The following sections describes two experiments and discusses their results.

\subsection{Experiment 1: KNN as a forensic model?}

Our goal is to determine if KNN is an adequate model for forensic evaluation. In other words, can it be used in preference to current statistical models? To answer this question, our experiment design follows the pseudo code given in \fig{knnexp1} for the four (4) data sets created from the original data set. For each data set, tests were built from 20\% of the data, selected at random. The models were learned from the remaining 80\% of the data.

This procedure was repeated 5 times, randomizing the order of data in each project each time. In the end CLIFF is tested and trained 25 times for each data set.

\begin{figure}[h!]
\small
\begin{center}
\begin{tabular}{ p{7cm} }
\hline
\begin{verbatim}
DATA = [3, 5, 10, 20]
LEARNER = [KNN]
STAT_TEST = [Mann Whitney]

REPEAT 5 TIMES
 FOR EACH data IN DATA
  TRAIN = random 90% of data
  TEST = data - TRAIN
		
  \\Construct model from TRAIN data
  MODEL = Train LEARNER with TRAIN
  \\Evaluate model on test data
  [brittleness] = STAT_TEST on NLN and NUN
  [pd, pf, brittleness] = MODEL on TEST
 END
END	
\end{verbatim}
 \\ \hline
    \end{tabular}
\end{center}
\caption{Pseudo code for Experiment 1}\label{fig:knnexp1}
\end{figure}

\subsubsection{Results from Experiment 1}

\fig{result1} shows the 25\%, 50\% and 100\% percentile values of the $pd$, $pf$ and position values in each data set when r=1 (upper table) and r=2 (lower table. Next to these is the brittleness signal where $high$ signals an unacceptable level of brittleness and $low$ signals an acceptable level of brittleness. The results show that the brittleness level for each data set is $low$. The $pd$ and $pf$ results are promising showing that 50\% of the pd values are at or above 95\% for the data set with 3 clusters and at 100\% for the other data sets. While 50\% of the pf values are at 3\% for 3 clusters and 0\% for the others. These results show that our model is highly discriminating and can be used successfully in the evaluation of trace evidence.


\begin{figure}
\begin{center}
\begin{tabular}{l@{~}|l@{~}|r@{~}r@{~}@{~}r@{~}| c@{~}|}
%\begin{tabular}{l@{~}|l@{~}|r@{~}|r@{~}|}
%\multicolumn{2}{c}{~}&\multicolumn{5}{c}{quartiles}\\\cline{3-5}
%&min& & median & &max\\\cline{3-5}
%\multicolumn{2}{c}{~}&\multicolumn{2}{c}{Before}\\
%\multicolumn{2}{c}{~}&\multicolumn{2}{c}{percentiles} \\\cline{3-4}
%Clusters& Type & Before & After \\\hline
\multicolumn{2}{c}{~}&\multicolumn{3}{c}{Before}\\
\multicolumn{2}{c}{~}&\multicolumn{3}{c}{percentiles}\\\cline{3-5}
Clusters & Types & 25\%& 50\% & 75\%  & Brittlness Level\\\hline
\multirow{3}{*}{3} & pd & 90& 95& 100 & \multirow{3}{*}{low}  \\
 & pf & 0& 3& 4 &\\
 & position & 264& 614& 1068 &\\ 
 \hline
\multirow{3}{*}{5} & pd & 94& 100& 100 & \multirow{3}{*}{low}\\
 & pf & 0& 0& 0  & \\
 & position & 374& 855& 1225 &\\
  \hline
\multirow{3}{*}{10} & pd & 75& 100& 100  & \multirow{3}{*}{low}\\
 & pf & 0& 0& 0 &\\
 & position & 361& 783& 1254 & \\
 \hline
\multirow{3}{*}{20} & pd & 0& 100& 100 & \multirow{3}{*}{low}\\
 & pf & 0& 0& 3 &\\
 & position & 377& 762& 1256 & \\
  \hline 
%\multicolumn{5}{c}{~}&~~~~~0~~~~~~~~50~~~~100 
\end{tabular}

\\ \\ \hline

\begin{tabular}{l@{~}|l@{~}|r@{~}r@{~}@{~}r@{~}| c@{~}|}
%\begin{tabular}{l@{~}|l@{~}|r@{~}|r@{~}|}
%\multicolumn{2}{c}{~}&\multicolumn{5}{c}{quartiles}\\\cline{3-5}
%&min& & median & &max\\\cline{3-5}
%\multicolumn{2}{c}{~}&\multicolumn{2}{c}{Before}\\
%\multicolumn{2}{c}{~}&\multicolumn{2}{c}{percentiles} \\\cline{3-4}
%Clusters& Type & Before & After \\\hline
\multicolumn{2}{c}{~}&\multicolumn{3}{c}{Before}\\
\multicolumn{2}{c}{~}&\multicolumn{3}{c}{percentiles}\\\cline{3-5}
Clusters & Types & 25\%& 50\% & 75\%  & Brittlness Level\\\hline
\multirow{3}{*}{3} & pd & 89& 94& 100 & \multirow{3}{*}{low}  \\
 & pf & 0& 0& 4 &\\
 & position & 419& 905& 1351 &\\ 
 \hline
\multirow{3}{*}{5} & pd & 94& 100& 100 & \multirow{3}{*}{low}\\
 & pf & 0& 0& 0  & \\
 & position & 437& 903& 1297 &\\
  \hline
\multirow{3}{*}{10} & pd & 50& 100& 100  & \multirow{3}{*}{low}\\
 & pf & 0& 0& 3 &\\
 & position & 442& 908& 1354 & \\
 \hline
\multirow{3}{*}{20} & pd & 0& 67& 100 & \multirow{3}{*}{low}\\
 & pf & 0& 0& 0 &\\
 & position & 437& 896& 1345 & \\
  \hline 
%\multicolumn{5}{c}{~}&~~~~~0~~~~~~~~50~~~~100 
\end{tabular}

\end{center}
\caption{Results for Experiment 1 for the 4 data sets distinguished by the number of clusters. Here for the upper and lower tables n=4 is used while r=1 is used for the upper table and r=2 for the lower table.}\label{fig:result1}
\end{figure}


%include - as the number of clusters increase...
%assume the brittleness is yes

\subsection{Experiment 2: Can brittleness be reduced?}

The first experiment shows that KNN creates strong models for forensic evaluation, with high pd's, low pf's and low brittleness levels. With experiment 2 we want to find out if these results can be improved by reducing brittleness further. Since we believe that it is the nearness of unlike neighbors which causes the brittleness (See \eq{bm}), in this section we evaluate the CLIFF selector which selects a subset of data from each cluster which best represents the cluster in hopes that this increases the distance between like neighbors and therefore decrease brittleness while maintaining comparable pd and pf results from experiment 1. Also we expect that the position values well be greater than those in the experiment 1.

The design for this experiment can be seen in \fig{knnexp2}. It is similar to that in \fig{knnexp1}, however, the CLIFF selector is included and is described in \ref{subsection:selector}.

\begin{figure}[h!]
\small
\begin{center}
\begin{tabular}{ p{8cm} }
\hline
\begin{verbatim}
DATA = [3, 5, 10, 20]
LEARNER = [KNN]
STAT_TEST = [Mann Whitney]
SELECTOR = [CLIFF selector]

REPEAT 5 TIMES
 FOR EACH data IN DATA
  TRAIN = random 90% of data
  TEST = data - TRAIN
		
  \\CLIFF selector: select best from clusters
  N_TRAIN = SELECTOR with TRAIN
		
  \\Construct model from TRAIN data
  MODEL = Train LEARNER with N_TRAIN
  \\Evaluate model on test data
  [brittleness] = STAT_TEST on NLN and NUN
  [pd, pf, brittleness] = MODEL on TEST
 END
END	
\end{verbatim}
 \\ \hline
    \end{tabular}
\end{center}
\caption{Pseudo code for Experiment 2}\label{fig:knnexp2}
\end{figure}       

\subsubsection{Results from Experiment 2}                                                                

%\fig{} shows the 25\%, 50\% and 100\% percentile values of the $pd$ and $pf$ values in each data set. Next to these is the brittleness signal where $yes$ signals an unacceptable level of brittleness and $no$ signals an acceptable level of brittleness. The $pd$ and $pf$ results are promising showing that 50\% of the pd values are at or above 95\% for the data set with 3 clusters and at 100\% for the other data sets. While 50\% of the pf values are at 3\% for 3 clusters and 0\% for the others. This shows that our model is highly discriminating and can be used successfully in the evaluation of trace evidence.
%include - as the number of clusters increase...
%assume the brittleness is yes

\fig{result2} shows results for 5 and 10 clusters remain the same for 50\% of the pd and pf values while for 3 and 20 clusters the pd's have decreased to 82\% and 67\% respectively. Also the brittleness level remains low for each data set. The results shown in \fig{result2} does not provide any information about the difference between the low level of brittleness between \fig{result1} and \fig{result2}, however the model remains strong. \fig{dist3} illustrates the reduction of brittleness after the CLIFF selector is applied. Mann Whitney U test was also applied to these results to see if there was a statistical difference between the before and after results. The test indicated that the $after$ results are better than $before$ (see \fig{result3}). So brittleness can be reduced while maintaining comparable results.

In summary, by using CLIFF, inappropriate statistical assumptions about the data are avoided. We found a successful way to reduce any brittleness found, to create strong forensic evaluation models. One important point to note here also is this: In order to evaluate data sets with multiple variables, a host of new statistical models has been built \cite{09Zadora, 09aZadora, 06Aitken, 04Aitken, 02Koons, 99Koons}. This has been the case with forensic scientists building these models for glass interpretation when using the elemental composition of glass rather than just the refractive indices. On the other hand, with CLIFF an increase in the number of variables used does not signal the need to create a new model, it works with any data set.
%brittleness?


\begin{figure}[ht!]
\begin{center}
\begin{tabular}{l@{~}|l@{~}|r@{~}r@{~}@{~}r@{~}| c@{~}|}
%\begin{tabular}{l@{~}|l@{~}|r@{~}|r@{~}|}
%\multicolumn{2}{c}{~}&\multicolumn{5}{c}{quartiles}\\\cline{3-5}
%&min& & median & &max\\\cline{3-5}
%\multicolumn{2}{c}{~}&\multicolumn{2}{c}{Before}\\
%\multicolumn{2}{c}{~}&\multicolumn{2}{c}{percentiles} \\\cline{3-4}
%Clusters& Type & Before & After \\\hline
\multicolumn{2}{c}{~}&\multicolumn{3}{c}{After}\\
\multicolumn{2}{c}{~}&\multicolumn{3}{c}{percentiles}\\\cline{3-5}
Clusters & Types & 25\%& 50\% & 75\% & Brittleness Level\\\hline
\multirow{3}{*}{3} & pd & 49& 82& 100 & \multirow{3}{*}{low} \\
 & pf & 0& 9& 20 &\\
 & position & 787& 1228& 1609 &\\
  \hline
\multirow{3}{*}{5} & pd & 94& 100& 100 & \multirow{3}{*}{low} \\
 & pf & 0& 0& 0  & \\
 & position & 563& 988& 1532 &\\ 
  \hline
\multirow{3}{*}{10} & pd & 60& 100& 100 & \multirow{3}{*}{low} \\
 & pf & 0& 0& 3 & \\
 & position & 578& 1048& 1463 &\\
  \hline
\multirow{3}{*}{20} & pd & 0& 67& 100 & \multirow{3}{*}{low} \\
 & pf & 0& 0& 3 & \\
 & position & 601& 1081& 1481 &\\
  \hline 
%\multicolumn{5}{c}{~}&~~~~~0~~~~~~~~50~~~~100 


\end{tabular}

\\ \\ \hline
\begin{tabular}{l@{~}|l@{~}|r@{~}r@{~}@{~}r@{~}| c@{~}|}
%\begin{tabular}{l@{~}|l@{~}|r@{~}|r@{~}|}
%\multicolumn{2}{c}{~}&\multicolumn{5}{c}{quartiles}\\\cline{3-5}
%&min& & median & &max\\\cline{3-5}
%\multicolumn{2}{c}{~}&\multicolumn{2}{c}{Before}\\
%\multicolumn{2}{c}{~}&\multicolumn{2}{c}{percentiles} \\\cline{3-4}
%Clusters& Type & Before & After \\\hline
\multicolumn{2}{c}{~}&\multicolumn{3}{c}{After}\\
\multicolumn{2}{c}{~}&\multicolumn{3}{c}{percentiles}\\\cline{3-5}
Clusters & Types & 25\%& 50\% & 75\% & Brittleness Level\\\hline
\multirow{3}{*}{3} & pd & 89& 100& 100 & \multirow{3}{*}{low} \\
 & pf & 0& 0& 5 &\\
 & position & 633& 1047& 1432 &\\
  \hline
\multirow{3}{*}{5} & pd & 90& 100& 100 & \multirow{3}{*}{low} \\
 & pf & 0& 0& 0  & \\
 & position & 507& 982& 1465 &\\ 
  \hline
\multirow{3}{*}{10} & pd & 100& 100& 100 & \multirow{3}{*}{low} \\
 & pf & 0& 0& 0 & \\
 & position & 506& 968& 1426 &\\
  \hline
\multirow{3}{*}{20} & pd & 0& 80& 100 & \multirow{3}{*}{low} \\
 & pf & 0& 0& 0 & \\
 & position & 495& 957& 1424 &\\
  \hline 
%\multicolumn{5}{c}{~}&~~~~~0~~~~~~~~50~~~~100 


\end{tabular}

\end{center}
\caption{Results for Experiment 2 for the 4 data sets distinguished by the number of clusters. Here for the upper and lower tables n=4 is used while r=1 is used for the upper table and r=2 for the lower table.}\label{fig:result2}
\end{figure}


\begin{figure*}[ht!]
%  \begin{center}
  \scalebox{0.85}{
    \begin{tabular}{l}
      \resizebox{50mm}{!}{\includegraphics{bd3r1}} 
      \resizebox{50mm}{!}{\includegraphics{bd5r1}} 
      \resizebox{50mm}{!}{\includegraphics{bd10r1}}
      \resizebox{50mm}{!}{\includegraphics{bd20r1}} \\
      \resizebox{50mm}{!}{\includegraphics{bd3r2}} 
      \resizebox{50mm}{!}{\includegraphics{bd5r2}} 
      \resizebox{50mm}{!}{\includegraphics{bd10r2}}
      \resizebox{50mm}{!}{\includegraphics{bd20r2}} \\
    \end{tabular}}
    \caption{Position of values in the 'before' and 'after' population with data set at 3, 5, 10 and 20 clusters. The first row shows the results for r=1 while the second row shows the results for r=2}
    \label{fig:dist3}
 % \end{center}
\end{figure*}

\begin{figure}[ht!]
\begin{center}
\begin{tabular}{l@{~}|l@{~}| c@{~}|}
%\begin{tabular}{l@{~}|l@{~}|r@{~}|r@{~}|}
%\multicolumn{2}{c}{~}&\multicolumn{5}{c}{quartiles}\\\cline{3-5}
%&min& & median & &max\\\cline{3-5}
%\multicolumn{2}{c}{~}&\multicolumn{2}{c}{Before}\\
%\multicolumn{2}{c}{~}&\multicolumn{2}{c}{percentiles} \\\cline{3-4}
%Clusters& Type & Before & After \\\hline
%\multicolumn{2}{c}{~}&\multicolumn{3}{c}{After}\\
%\multicolumn{2}{c}{~}&\multicolumn{3}{c}{percentiles}\\\cline{3-5}
Clusters & Treatments &  Significance\\\hline
\multirow{2}{*}{3} & before &  \multirow{2}{*}{-1} \\
 & after  &\\
  \hline
\multirow{2}{*}{5} & before & \multirow{2}{*}{-1} \\
 & after   & \\
  \hline
\multirow{2}{*}{10} & before  & \multirow{2}{*}{-1} \\
 & after  & \\
  \hline
\multirow{2}{*}{20} & before  & \multirow{2}{*}{-1} \\
 & after  & \\
  \hline 
%\multicolumn{5}{c}{~}&~~~~~0~~~~~~~~50~~~~100 
\end{tabular}
\end{center}
\caption{Results for Experiment 2 of before and after results. -1 indicates that the after is better than before}\label{fig:result3}
\end{figure}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%