\section{Introduction}
The results in Chapter \ref{chapter:assess} are a good indication that CLIFF can be beneficial in the field of forensic interpretation. To explore the use of CLIFF as part of a proposed forensic model for the interpretation of trace evidence, a case study is conducted involving spectra data collected from the clear coat paint of cars. Specifically, CLIFF is used as a means to reduce brittleness in our proposed forensic model.

The principal goal of forensic interpretation models is to check that evidence found at a crime scene is (dis)similar to evidence found on a suspect. In creating these models, attention is given to the significance level of the solution however the \emph{brittleness} level is never considered. The \emph{brittleness} level is a measure of whether a solution comes from a region of similar solutions or from a region of dissimilar solutions. We contend that a solution coming from a region with a low level of brittleness i.e. a region of similar solutions, is much better that one from a high level of brittleness - a region of dissimilar solutions. This is because, intuitively, a solution from low brittleness is less likely to signal a false alarm.

The concept of \emph{brittleness} is not a stranger to the world of forensic science, in fact it is recognized as the ``fall-off-the-cliff-effect", a term coined by Ken Smalldon. In other words, Smalldon recognized that tiny changes in input data could lead to a massive change in the output. Although Walsh \cite{Walsh94} worked on reducing the brittleness in his model, to the best of our knowledge, no work been done to quantify brittleness in current forensic models or to recognize and eliminate the causes of brittleness in these models.

In our studies of forensic models for evaluation particularly in the sub-field of glass forensics, we conjecture that brittleness is caused by the following:

%\begin{itemize}
\be
\item A tiny error(s) in the collection of data;
\item Inappropriate statistical assumptions, such as assuming that the distributions of the refractive index of glass collected at a crime scene or a suspect obeys the properties of a normal distribution; 
\item and the use of measured parameters from surveys to calculate the \emph{frequency of occurrence} of trace evidence in a population
\ee
%\end{itemize}

In this study we quickly eliminate the two(2) latter causes of brittleness by using a simple classification method, k-nearest neighbor (KNN) which are neither concerned with the distribution of data nor the frequency of occurrence of the data in a population. To reduce the effects of errors in data collection, CLIFF is used to augment KNN (we refer to this combination as \emph{the CLIFF Avoidance Model - CAM}). As explained in Chapter \ref{chapter:cliff}, CLIFF selects samples from the data which best represents the region or neighborhood it comes from. In other words, we expect that samples which contain errors would be poor representatives and would therefore be eliminated from further analysis. This leads to neighborhoods with different outcomes being further apart from each other. 
%In the end our goal for this work is threefold. First we want to show the forensic scientist the importance of reporting the brittleness level of their models. Second, to encourage them to seek out and eliminate the causes of brittleness in their models and third, for those causes which cannot be eliminated

In the end our goal for this case study is threefold. First we want to develop a new generation of forensic models which avoids inappropriate statistical assumptions. Second, the new models must not be \emph{brittle}, so that they do not change their interpretation without sufficient evidence and third, provide not only an interpretation of the evidence but also a measure of how reliable the interpretation is, in other words, what is the brittleness level of the model.

Our research is guided by the following research questions:

\begin{itemize}
\item Is CAM a strong forensic model?
\item Does CAM reduce brittleness?
\end{itemize}


\section{Motivation}
This work is in part motivated by a recent National Academy of Sciences report titled ``Strengthening Forensic Science" \cite{09NAS}. This report took special notice of forensic interpretation models stating:

\begin{quote}
With the exception of nuclear DNA analysis, ...no forensic method has been rigorously shown to have the capacity to consistently, and with a high degree of certainty, demonstrate a connection between evidence and a specific individual or source. \cite{09NAS}
\end{quote}

The concern voiced in that statement is exactly what CLIFF is meant to alleviate: Solving the problem of lack of $consistency$ by reducing brittleness. Before exploring our proposed model, CAM, we will look at four(4) of the early standard glass forensic model which are prone to high levels of brittleness.

\subsection{Glass Forensic Models}
This section provides an overview of the following glass forensic models used in this work to show brittleness.

\begin{enumerate}
\item The 1978 Seheult model \cite{Seheult78}
\item The 1980 Grove model \cite{Grove80}
\item The 1995 Evett model \cite{Evett94}
\item The 1996 Walsh model \cite{Walsh94}
\end{enumerate}

\subsubsection{Seheult 1978}
\label{subsection:seh}
Seheult \cite{Seheult78}, examines and simplifies Lindley's \cite{77Lindley} 6th equation for real-world application of Refractive Index (RI) Analysis. According to Seheult: 
\begin{quote}A measurement $x$, with normal error having known standard deviation $\sigma$, is made on the unknown refractive index $\Theta_1$ of the glass at the scene of the crime. Another measurement $y$, made on the glass found on the suspect, is also assumed to be normal but with mean $\Theta_2$ and the same standard deviation as $x$. The refractive indices $\Theta$ are assumed to be normally distributed with known mean $\mu$ and known standard deviation $\tau$. If $I$ is the event that the two pieces of glass come from the same source($\Theta_1$ = $\Theta_2$) and $\bar{I}$ the contrary event, Lindley suggests that the odds on identity should be multiplied by the factor
\begin{equation}
\frac{p(x,y|I)}{p(x,y|\bar{I})} \label{eq:lin1}
\end{equation}
In this special case, it follows from Lindley's 6th equation that the factor is
\begin{equation}
\frac{1+\lambda^2}{\lambda(2+\lambda^2)^{1/2}}^{-\frac{1}{2(1+\lambda^2)}\cdot(u^2-v^2)} \label{eq:lin2}
\end{equation}
Where
\begin{equation*}
\lambda = \frac{\sigma}{\tau}, \newline
u = \frac{x-y}{\sigma\sqrt{2}} ,  \newline
v = \frac{z-\mu}
{\tau(1+\frac{1}{2}\lambda^2)^{\frac{1}{2}}} , \newline
z = \frac{1}{2}(x+y) 
\end{equation*}
\end{quote}

\subsubsection{Grove 1980}
\label{subsection:gro}

By adopting a model used by Lindley and Seheult, Grove proposed a non-Bayesian approach based on likelihood ratios to solve the forensic problem. The problem of deciding whether the fragments have come from common source is distinguished from the problem of deciding the guilt or innocence of the suspect. To explain his method, Grove first reviewed Lindley's method. He argued that we should, where possible, avoid parametric assumptions about the underlying distributions. Hence, in discussing the respective roles of $\theta_1$ and $\theta_2$ Grove did not attribute any probability distribution to an unknown parameter without special justification. So when considering ($\theta_1$ != $\theta_2$), $\bar{I}$ can be interpreted as saying that the fragments are present by chance entailing a random choice of value for $\theta_2$. The simplified likelihood ratio obtained from the Grove's derivation is:
\begin{equation}
\frac{\tau}{\sigma}\cdot e^{\big\{\frac{-(X-Y)^2}{4\sigma^{2}} + \frac{(Y-\mu^2)}{2\tau^2}\big\}} \label{eq:gro2} 
\end{equation}

We are of course only concerned with the evidence about \emph{I} and $\bar{I}$ so far as it has the bearing on the guilt or innocence of the suspect. Grove also considered the Event of Guilty factor \b{G} in the calculation of likelihood ratio (LR). Therefore the LR now becomes 
\begin{equation} p(X,Y|G)/p(X,Y|\bar{G})\end{equation}
Here in the expansion event \b{T}, that fragments were transferred from the broken window to the suspect and persisted until discovery and event \b{A},that the suspect came into contact with glass from other source. Here p(A/{G})=p(A/$\bar{G}$)=Pa and p(T/G)= Pt. The resulting expression is %\newline

  \begin{equation} 
    \frac{P(X,Y,S|G)}{P(X,Y,S|\bar{G})} = 1+Pt\Big\{(\frac{1}{Pa}-1)\frac {p(X,Y|I)}{p(X,Y|\bar{I})}-1\Big\}\label{eq:gro4}  
  \end{equation} 
  
\subsubsection{Evett 1995}
\label{subsection:eve}

Evett et al used data from forensic surveys to create a Bayesian approach in determining the statistical significance of finding glass fragments and groups of glass fragments on individuals associated with a crime \cite{Evett94}.

Evett proposes that likelihood ratios are well suited for explaining the existence of glass fragments on a person suspected of a crime. A likelihood ratio is defined in the context of this paper as the ratio of the probability that the suspected person is guilty given the existing evidence to the probability that the suspected person is innocent given the existing evidence. The given evidence, as it applies to Evett's approach, includes the number of different sets of glass and the number of fragments in each unique group of glass.

The Lambert, Satterthwaite and Harrison (LSH) \cite{lsh95} survey used empirical evidence to supply probabilities relevant to Evett's proposal. The LSH survey inspected individuals and collected glass fragments from each of them. These fragments were placed into groups based on their refractive index (RI) and other distinguishing physical properties. The number of fragments and the number of sets of fragments were recorded, and the discrete probabilities were published.  In particular, there are two unique probabilities that are of great interest in calculating Evett's proposed likelihood ratio.
\begin{itemize}
\item S, the probability of finding N glass \emph{fragments} per group
\item P, the probability of finding M \emph{groups} on an individual.
\end{itemize}

%\clearpage
%\subsection{Mathematical Formulae}
The following symbols are used by Evett to express his equations:
\begin{itemize}
\item $P_n$ is the probability of finding $n$ groups of glass on the surface of a person's
clothes
\item $T_n$ is the probability that $n$ fragments of glass would be transferred, retained
and found on the suspect's clothing if he had smashed the scene window
\item $S_n$ is the probability that a group of glass fragments on a person's clothing
consists of $n$ fragments
\item $f$ is the probability that a group of fragments on person's
clothing would match the control sample
\item $\lambda$ is the expected number of glass fragments remaining at a time, $t$
\end{itemize}
Evett utilizes the following equations to determine the likelihood ratio for the first case described in his 1994 paper. In this case, a single window is broken, and a single group of glass fragments is expected to be recovered.

\begin{equation}
LR = \frac{{P_0}{T_n}}{{P_1}{S_n}{f}}+{T_0}
\end{equation}

\begin{equation} 
T_n = \frac{e^{-\lambda}{\lambda^n}}{n!}
\end{equation}

\subsubsection{Walsh 1996}
\label{subsection:wal}

The equation presented by Walsh \cite{Walsh94} is similar to one of Evett's. The difference is that Walsh argues that instead of incorporating grouping	and matching, only grouping should be included. Walsh says this is because match/non-match is really just an arbitrary line. He examines the use of a technique in interpreting glass evidence of a specific case. This technique is as follows:
\begin{equation}
\frac{T_L P_0 p(\bar{X},\bar{Y}|S_y,S_x)}{P_1S_Lf_1}
\end{equation}
Where 

\begin{itemize}
\item
$T_L$ = the probability of 3 or more glass fragments being transferred from the crime scene to the person.
\item
$P_0$ = the probability of a person having no glass on their clothing
\item
$P_1$ = the probability of a person having one group of glass on their clothing
\item
$S_L$ = the probability that a group of glass on clothing is 3 or more fragments
\item
$\bar{X}$ and $\bar{Y}$ are the mean of the control and recovered groups respectively
\item
$S_x$ and $S_y$ are the sample standard deviations of the control and recovered groups respectively
\item
$f_1$ is the value of the probability density for glass at the mean of the recovered sample
\item
$p(\bar{X},\bar{Y}|S_y,S_x)$ is the value of the probability density for the difference between the sample means
\end{itemize} 
%%%%

\subsection{Visualization of Brittleness in These Models}
%\subsection{The First Technique}

The visualization of the models described above are shown in \fig{models}. For the first two(2) models the $x$ and $y$ axes represent the mean refractive index (RI) values of evidence from a crime scene and suspect respectively. While the $x$ axis of the Walsh model represents $f1$ is the value of the probability density for glass at the mean of the recovered sample and the $y$ axis represents the value of the probability density for the difference between the sample means. The $x$ and $y$ axes of the Evett model represents $\lambda$ and $f-values$ respectively. The green, red and blue points in the charts represent the likelihood ratios (LR) generated from these models, in other words, the significance of the match/non-match of evidence to an individual or source. Here the $green$ and $blue$ points have a LR of 10 (match) while the $red$ points have a LR of 0 (non-match).

Using data donated by the Royal Canadian Mounted Police (RCMP), values such as the RI ranges and their mean, were extracted to generate random samples for the forensic glass models. In all four(4) models $1000$ samples are randomly generated for the variables in each model. For instance, in the Seheult model, each sample looks like this: [$x$, $y$, $\sigma$, $\mu$, $\tau$]. The symbols are explained in \ref{subsection:seh}.

In \fig{models} - the Seheult and Grove models, brittleness or Smalldon's ``fall-off-the-cliff-effect" is clearly demonstrated. These models proposed by Seheult (Section \ref{subsection:seh}) and Grove (\ref{subsection:gro}) respectively, show how the likelihood ratio changes as we try different values from the refractive index of from glass from two sources (x and y). This model could lead to incorrect interpretations if minor errors are made when measuring the refractive index of glass samples taken from a suspect's clothes. Note how, for both Seheult and Grove, how tiny changes in the blue points can lead to a dramatic change in the likelihood ratio from ten(10) to zero(0).

The Evett model shows similar brittleness as in the Seheult and Grove models. However, Walsh \cite{Walsh94} is the clear exception. As mentioned earlier, Walsh's model is the only one of these models which attempted to reduce $brittleness$. As shown in \fig{models} the result of this is the creation of a clear boundary between the LR. Hence $brittleness$ only exists at the boundary.

\begin{figure}[h!]
  \begin{center}
  \scalebox{0.97}{
    \begin{tabular}{c}
      \resizebox{90mm}{!}{\includegraphics{ns1}} 
      \resizebox{90mm}{!}{\includegraphics{ng}} \\
      \resizebox{90mm}{!}{\includegraphics{nw}} 
      \resizebox{90mm}{!}{\includegraphics{ne}} \\
    \end{tabular}}
    \caption{Visualization of four(4) glass forensic models.}
    \label{fig:models}
  \end{center}
\end{figure}

From these visualizations it is obvious that the concern of the National Academy of Sciences report \cite{09NAS} mentioned earlier in this section is a valid one. So how can this concern be alleviated? We propose not only including a \emph{brittleness} measure to a forensic method as a solution, but also moving away from forensic models which use surveys \cite{Seheult78, Evett84, Evett90, Evett94, Walsh94}, and statistical assumptions \cite{Seheult78, Grove80, Walsh94}.

The following sections gives details of CAM (the CLIFF Avoidance Model) as well as the data set used to evaluate the models.




%\section{CLIFF Design and Operation}
\section{The CLIFF Avoidance Model (CAM)}
\begin{quote}
\emph{If standard forensic interpretation models are brittle what can we do?}
\end{quote}

We the found the answer to this question in the work of \cite{Karslake09}, and also in our exploration of the intuition that to reduce $brittleness$, data with dissimilar outcomes should not be close neighbors. To that end we introduce CAM, a forensic interpretation model designed to reduce $brittleness$ by avoiding inappropriate statistical assumptions, and present a measure of how strong the model is.

The Design of CAM is deeply rooted in the work of \cite{Karslake09}. In their study, analysis is done using chemometrics, an application of mathematical, statistical and/or computer science techniques to chemistry. The chemometric analysis done by \cite{Karslake09}, uses computer science techniques to analyze the absorbance spectra of the clear coat layer of a range of cars. The analysis proceeded as follows:

\begin{itemize}
\item Agglomerative hierarchical clustering (AHC) for grouping the data into classes
\item Principal component analysis (PCA) for reducing dimensions of the data
\item Discriminant analysis for classification i.e. associating an unknown sample to a group or region
\end{itemize}

This technique produced a strong model which achieved an overall classification accuracy of 91.61\% . This encouraging result and other insights gained from this study led to the design of CAM. The goal of CAM is not only to create a strong forensic model but also to show how strong the model is. To achieve these goals, CAM includes a brittleness measure as well as a method to reduce brittleness - CLIFF. Also, in an effort to keep CAM simple, we substituted different tools to preform the analysis done in \cite{Karslake09}. For instance $K-means$ is used instead of AHC for grouping the data into classes. $FastMap$ \cite{fastmap} is used for dimensionality reduction and K-nearest neighbor is used for classification. The basic operation of CAM is shown in \fig{process}. The data is collected and the dimensions is reduced if necessary. Clusters are then created from the data and classification is done along with a brittleness measure (further discussed in Section \ref{subsection:bm}). Finally, we test if brittleness can be reduced by CLIFF (Instance Selection).

The details of CAM are discussed below.

\begin{figure}[h!]
\begin{center}
\includegraphics[scale=0.40]{process}
\end{center}
\caption{Proposed procedure for the forensic evaluation of data}\label{fig:process}
\end{figure}

\subsection{Dimensionality Reduction}

The data used in our experiments contains 1151 attributes and 185 instances. Using the data set as is would cause us to create a model that is computationally expensive and likely to produce unacceptable results such as a high false positive values caused by redundant and noisy data. To avoid this foreseen problem, we turn to dimensionality reduction.

Dimensionality Reduction refers to reducing high-dimensional data to low dimensional data. This is accomplished by attempting to summarize the data by using less terms than needed. While this reduces the overall information available and thus a level of precision, it allows for easy visualization of data otherwise impossible to visualize. Some algorithms that can be used for Dimensionality Reduction include Principle Component Analysis (PCA), and FastMap. These are discussed below. 

\subsubsection{Principal Component Analysis}

PCA can be defined as ``the orthogonal projection of the data onto a lower dimensional linear space". In other words, looking at our data set, our goal is to project the data onto a space having dimensionality that is less than 1,151 (M $<$ 1,151) while maximizing the variance of the projected data \cite{pca}. The result of this serves two main purposes:

\be
\item To simplify analysis and
\item To aid in the visualization of the data
\ee  

To achieve this goal, the data set is transformed to a new set of variables which are not correlated and which are ordered so that the first few principal components (PCs) retain most of the variation present in all of the original variables \cite{joll02}. %Let us look at an example. \fig{iris} shows a visualization of Fisher's five-dimensional iris data set on a scatter plot. First, PCs are extracted from the four continuous variables (sepal-width, sepal-length, petal-width, and petal-length). Second, these variables are projected onto the subspace formed by the first two components extracted. Finally this two-dimensional data is shown on a scatter-plot in \fig{iris}. The fifth dimension (species) is represented by the color of the points on the scatter-plot.

%\begin{figure}[h!]
%\begin{center}
%\includegraphics[scale=0.7]{iris}
%\end{center}
%\caption{PCA for iris data set}\label{fig:iris}
%\end{figure}

%The data used in this work contains 1,151 variables and 185 samples. To perform an analysis on this data set we must first reduce the number of variables used. 

%In \cite{Karslake09}, PCA is used to perform dimensionality reduction. Two techniques - Pearson correlation and covariance for comparison of the two, were used to determine an appropriate value for M (M = 4). 

In the interest of speed, in CAM we use \emph{FastMap} to reduce the dimensions of the data set to M=4. 

\subsubsection{FastMap}
In FastMap \cite{fastmap}, the basis of each reduction is using the cosine law on the triangle formed by an object in the feature space and the two objects that are furthest apart in the current (pre-reduction) space (see~\fig{fm1}). According to the algorithm, these two objects are referred to as the pivot objects of that step in the reduction phase (M total pivot object sets). Fast Map however, instead of finding the absolute furthest apart points, first randomly selecting an object from the set, and then finding the object that is furthest from it and setting this object as the first pivot point. After the first pivot point is selected, FastMap finds the points farthest from this and uses it as the second pivot point. The line formed by these two points becomes the line that all of the other points will be mapped to in the new M dimension space. (Further details of this algorithm can be found elsewhere \cite{fastmap}).

\begin{figure}
\begin{center}
\includegraphics[width=200pt]{fastmap1.png}
\end{center}
\caption{Example of using the cosine law to find the position of $Oi$ in
the dimension $k$. Extracted from \cite{fastmap}.}
\label{fig:fm1}
\end{figure}

\begin{figure}
\begin{center}
\includegraphics[width=200pt]{fastmap2.png}
\end{center}
\caption{Projects of points $O_i$ and $O_j$ onto the hyper-plane perpendicular to the line $O_a$$O_b$. Extracted from \cite{fastmap}.}
\label{fig:fm2}
\end{figure}

FASTMAP uses the following equation to calculate $x_i$, or the position of object $O_i$ in the reduced space:
\begin{equation}
{x_i = \frac{d_{a,i}^2 + d_{a,b}^2 - d_{b,i}^2}{2d_{a,b}} }
\end{equation}
This technique can be visualized by imagining the hyper-plane perpendicular to the line formed by pivot points, $O_a$ and $O_b$, and projecting the new point onto this plane (see~\fig{fm2}). FASTMAP requires only $2D$ passes over $D$ documents.

\begin{comment}
To determine the appropriate value for M using FastMap, we experimented experimented with different values for M. \fig{exp1} shows results for various K-nearest neighbor classifiers (discussed further in Sections \ref{subsection:knn} and \ref{assess}), with M fixed at 2, 4, 8 and 16. When M is 2 or 4 100\% of the validation samples are predicted correctly (pd) and 0\% are predicted incorrectly (pd). For this reason, our model model is analysed using M = 4.

\begin{figure}
  \begin{center}
  \scalebox{0.95}{
    \begin{tabular}{c}
      \resizebox{100mm}{!}{\includegraphics{varspd}} \\
      \resizebox{100mm}{!}{\includegraphics{varspf}} \\      
    \end{tabular}}
    \caption{Probability of detection (pd) and Probability of False alarms (pf) using fixed values for dimensions and fixed k values for k-nearest neighbor}
    \label{fig:exp1}
  \end{center}
\end{figure}
\end{comment}

\subsection{Clustering}

Clustering is the second step in CAM and can be defined as the grouping of the samples into groups whose members are similar in some way. The samples that belong to two different clusters are dissimilar. The major goal of clustering is to determine the intrinsic grouping in the set of unlabelled data. In most of the clustering techniques, distance is the major criteria. Two objects are similar if they are close according to the given distance.

CAM clusters using K-means. The \fig{kmeans} represents the pseudo code for the K-means algorithm. The idea behind K-means clustering is done by assuming some arbitrary number of centroids, then the objects are associated to nearest centroids. The centroids are then moved to center of the clusters. These steps are repeated until a suitable level of convergence is attained.

\begin{figure}[h!]
\small
\begin{center}
\begin{tabular}{ p{7cm} }
\hline
\begin{verbatim}


k = [1, ..., Number of clusters]
STOP = [Stopping criteria]

FOR EACH instance IN DATA
 WHILE STOP IS FALSE
  // Calculate membership in clusters
  FOR EACH instance X IN DATA
   FIND NEAREST CENTROID_k
   ADD TO CLUSTER_k
  END
 
  // Recompute the centroids  
  FOR EACH CLUSTER
   FIND NEW CENTROIDS
  END
  
  // Check stopping criteria
  [TRUE or FALSE] = STOP
 END
END
\end{verbatim}
 \\ \hline
    \end{tabular}
\end{center}
\caption{Pseudo code for K-means}\label{fig:kmeans}
\end{figure}

\subsection{Classification with KNN}
\label{subsection:knn}
%from wiki
%1973 Duda and Hart
K-nearest neighbor (KNN) classification is a simple classification method usually used when there is little or no prior knowledge about the distribution of data. KNN is described in \cite{knn} as follows: Stores the complete training data. New samples are classified by choosing the majority class among the k closest examples in the training data. For our particular problem, we used the Euclidean, i.e. sum of squares, distance to measure the distance between samples and k = 1. %Finally, to determine a value for k, we investigated the performance of six (6) KNN classifiers where k is fixed at 2, 4, 8 and 16. \fig{exp1} shows the results which indicate that using KNN classifiers where k is equal to 4, 8 or 16, the validation of samples is 100\%. For CLIFF k = 4 is used.

\subsection{The Brittleness Measure}
\label{subsection:bm}
Calculating the brittleness measure is a novel operation of CAM. We use the brittleness measure in this work to determine if the results of CAM comes from a region where all the possible results are (dis)similar. For the purpose of this work the optimal result will come from a region of similar results. To make this determination, using the 1NN classifier as a baseline for CAM, each instance from a test set is assigned to a target class. Next, the distances from their nearest unlike neighbor (NUN) i.e. the distance from an instance with a different class are recorded. The result is two(2) lists of distances for 1NN and CAM. Recall that brittleness is a small change can result in a different outcome, so here the closer the distances of CAM to 1NN the more brittle the model. So an ideal result will have the greatest distance between CAM and 1NN. 

The brittleness measure will give an output of either $high$ or $low$: high indicating that there is no significant difference between the CAM and 1NN distances, while $low$ indicates the opposite. The significance of these values was calculated using the Mann-Whitney U test. This is a non-parametric test which replaces the distance values with their rank or position inside the population of all sorted values.%need to talk about the rank process

\eq{bm} embodies our definition of brittleness: if the significance of CAM values are less than or equal to the 1NN values, then an unacceptable level of  brittleness is present in the model. 

\begin{equation}
[CAM <= 1NN] ==> BRITTLENESS
\label{eq:bm}
\end{equation}


%\section{CLIFF Assessment}
%\label{section:assess}

In this chapter, we evaluate CAM as a forensic model on a data set donated by \cite{Karslake09} in cross validation experiments. First, we describe the data set and experimental procedures. Next we present results which show the probability of detection (pd), probability of false alarm (pf) and brittleness level of CAM.

\section{Data Set and Experimental Method}
\label{section:brit}

The data set used in this work contains 37 samples each with five(5) replicates (37 x 5 = 185 instances). Each instance has 1151 infrared measurements ranging from 1800-650cm-1. For our experiments we took the original data set FastMapped it to 4 dimensions and then created four (4) data sets each with a different number of clusters (3, 5, 10 and 20) or groups. These clusters were created using the K-means algorithm (\fig{kmeans}). The effectiveness of CAM vs 1NN is later measured using pd, pf and brittleness level. The brittleness level measure is conducted as described in Section \ref{subsection:bm}. %follows: First we calculate Euclidean distances between the validation or testing set which has already been validated and the training set. For each instance in the validation set the distance from its nearest like neighbor (NLN) and its nearest unlike neighbor (NUN) is found. Using these NLN and NUN distances from the entire validation set a Mann-Whitney U test was used to test for statistical difference between the NLN and NUN distances. The following sections describes two experiments and discusses their results.

\subsection{Experiment 1: CAM as a forensic model?}

Our goal is to determine if CAM is an adequate model for forensic evaluation. In other words, can it be used in preference to current statistical models? To answer this question, our experiment design follows the pseudo code given in \fig{knnexp1} for the four (4) data sets created from the original data set. For each data set, tests were built from 20\% of the data, selected at random. The models were learned from the remaining 80\% of the data.

This procedure was repeated 5 times, randomizing the order of data in each project each time. In the end CAM is tested and trained 25 times for each data set.

\begin{figure}[h!]
\small
\begin{center}
\begin{tabular}{ p{7cm} }
\hline
\begin{verbatim}
DATA = [3, 5, 10, 20]
LEARNER = [1NN]

REPEAT 5 TIMES
 FOR EACH data IN DATA
  TRAIN = random 80% of data
  TEST = data - TRAIN
		
  \\Construct model from TRAIN data
  MODEL = Train LEARNER with TRAIN
  \\Evaluate model on test data
  [pd, pf] = MODEL on TEST
 END
END	
\end{verbatim}
 \\ \hline
    \end{tabular}
\end{center}
\caption{Pseudo code for Experiment 1}\label{fig:knnexp1}
\end{figure}

\subsubsection{Results from Experiment 1}

\fig{results1} shows the 25\%, 50\% and 100\% percentile values of the $pd$, $pf$ for 1NN and CAM as well as the $size$ and $rank$ results which indicate the percentage of the training set used and the statistical difference between 1NN and CAM respectively. The $pd$ and $pf$ results for both 1NN and CAM are promising with CAM showing that 50\% of the pd values are at or above 91\% for the data set with 3 clusters and at 100\% for the other data sets, while 1NN shows all at 100\%. For pf, 50\% of the values for CAM are at 4\% for 3 clusters and 0\% for the others, while 1NN shows all at 0\%. These results show that our model is highly discriminating and can be used successfully in the evaluation of trace evidence.


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{figure}[ht]
\begin{center}
\small
\scalebox{1}{
\begin{tabular}{ l }
\begin{tabular}{l@{~}| l@{~}| c@{~}| r@{~}|r@{~}r@{~}@{~}r@{~}|c}
3 & PLS & rank & size\% & 25\%& 50\% & 75\%&Q1 median Q3\\\hline 
pd
	&1nn      & 1& 100& 90& 100& 100&  \boxplot{0.0}{90.0}{100.0}{100.0}{0.0}  \\
	&cam      & 1&  25& 63& 91& 100&  \boxplot{0.0}{62.9}{90.9}{100.0}{0.0}  \\
	\hline
pf
	&1nn      & 1& 100& 0& 0& 4&  \boxplot{0.0}{0.0}{0.0}{4.0}{96.0} \\
	&cam      & 2&  25& 0& 4& 16&  \boxplot{0.0}{0.0}{4.0}{15.8}{84.2} \\
	\hline
\multicolumn{7}{c}{~}&~~~~~0~~~~~~~~50~~~~100
\end{tabular}
\\  \hline \hline
\begin{tabular}{l@{~}| l@{~}| c@{~}| r@{~}|r@{~}r@{~}@{~}r@{~}|c}
5 & PLS & rank & size\% & 25\%& 50\% & 75\%&Q1 median Q3\\\hline 
pd
	&1nn      & 1& 100& 91& 100& 100&  \boxplot{0.0}{90.9}{100.0}{100.0}{0.0}  \\
	&cam      & 1&  30& 88& 100& 100&  \boxplot{0.0}{87.5}{100.0}{100.0}{0.0}  \\
	\hline
pf
	&1nn      & 1& 100& 0& 0& 3&  \boxplot{0.0}{0.0}{0.0}{2.9}{97.1} \\
	&cam      & 1&  30& 0& 0& 3&  \boxplot{0.0}{0.0}{0.0}{2.9}{97.1} \\
\hline
\multicolumn{7}{c}{~}&~~~~~0~~~~~~~~50~~~~100
\end{tabular}
\\  \hline \hline
\begin{tabular}{l@{~}| l@{~}| c@{~}| r@{~}|r@{~}r@{~}@{~}r@{~}|c}
10 & PLS & rank & size\% & 25\%& 50\% & 75\%&Q1 median Q3\\\hline 
pd
	&1nn      & 1& 100& 92& 100& 100&  \boxplot{0.0}{100.0}{100.0}{100.0}{0.0}  \\
	&cam      & 1&  36& 67& 100& 100&  \boxplot{0.0}{66.7}{100.0}{100.0}{0.0}  \\
	\hline
pf
	&1nn      & 1& 100& 0& 0& 0&  \boxplot{0.0}{0.0}{0.0}{0.0}{100.0} \\
	&cam      & 1&  36& 0& 0& 3&  \boxplot{0.0}{0.0}{0.0}{2.8}{97.2} \\
\hline
\multicolumn{7}{c}{~}&~~~~~0~~~~~~~~50~~~~100
\end{tabular}
\\  \hline \hline
\begin{tabular}{l@{~}| l@{~}| c@{~}| r@{~}|r@{~}r@{~}@{~}r@{~}|c}
20 & PLS & rank & size\% & 25\%& 50\% & 75\%&Q1 median Q3\\\hline 
pd
	&1nn      & 1& 100& 33& 100& 100&  \boxplot{0.0}{33.3}{100.0}{100.0}{0.0}  \\
	&cam      & 1&  53& 75& 100& 100&  \boxplot{0.0}{75.0}{100.0}{100.0}{0.0}  \\
	\hline
pf
	&1nn      & 1& 100& 0& 0& 0&  \boxplot{0.0}{0.0}{0.0}{0.0}{100.0} \\
	&cam      & 1&  53& 0& 0& 3&  \boxplot{0.0}{0.0}{0.0}{2.8}{97.2} \\
\hline
\multicolumn{7}{c}{~}&~~~~~0~~~~~~~~50~~~~100
\end{tabular}
\end{tabular}}
\caption{Results for Experiment 1 for the 4 data sets distinguished by the number of clusters 3, 5, 10 and 20.}
\label{fig:results1}
\end{center}
\end{figure}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%\end{center}
%\caption{Results for Experiment 1 for the 4 data sets distinguished by the number of clusters.}\label{fig:result1}
%\end{figure}



%include - as the number of clusters increase...
%assume the brittleness is yes

\subsection{Experiment 2: Does CAM reduce brittleness?}

The first experiment shows that 1NN and CAM creates strong models for forensic interpretation, with high pd's and low pf's. However there is no indication of whether or not CAM reduces $brittleness$. We accomplish this with Experiment 2. The design for this experiment can be seen in \fig{knnexp2}. It is similar to that in \fig{knnexp1}, however, CLIFF, the instance selector in CAM is included and is described in Chapter \ref{chapter:cliff}.

\begin{figure}[h!]
\small
\begin{center}
\begin{tabular}{ p{8cm} }
\hline
\begin{verbatim}
DATA = [3, 5, 10, 20]
LEARNER = [1NN]
STAT_TEST = [Mann Whitney]
SELECTOR = [CLIFF]

REPEAT 5 TIMES
 FOR EACH data IN DATA
  TRAIN = random 90% of data
  TEST = data - TRAIN
		
  \\CLIFF selector: select best from clusters
  N_TRAIN = SELECTOR with TRAIN
		
  \\Construct model from TRAIN data
  MODEL = Train LEARNER with N_TRAIN
  \\Evaluate model on test data
  [brittleness] = STAT_TEST on the CAM and 1NN lists of distances
  [pd, pf, brittleness] = MODEL on TEST
 END
END	
\end{verbatim}
 \\ \hline
    \end{tabular}
\end{center}
\caption{Pseudo code for Experiment 2}\label{fig:knnexp2}
\end{figure}       

\subsubsection{Results from Experiment 2}                                                                

%\fig{} shows the 25\%, 50\% and 100\% percentile values of the $pd$ and $pf$ values in each data set. Next to these is the brittleness signal where $yes$ signals an unacceptable level of brittleness and $no$ signals an acceptable level of brittleness. The $pd$ and $pf$ results are promising showing that 50\% of the pd values are at or above 95\% for the data set with 3 clusters and at 100\% for the other data sets. While 50\% of the pf values are at 3\% for 3 clusters and 0\% for the others. This shows that our model is highly discriminating and can be used successfully in the evaluation of trace evidence.
%include - as the number of clusters increase...
%assume the brittleness is yes

In \fig{dist3} and \fig{result3} the pattern is clear. The distances of the test set for CAM have to move further away in order to change their predicted target classes than those of 1NN (see \fig{dist3}. This result is further established by \fig{result3} which shows that for all the data sets, the brittleness level is low, i.e. CAM's list of distances is significantly better than 1NN's list of distances.

\begin{figure*}[ht!]
  \begin{center}
  \scalebox{1}{
    \begin{tabular}{l}
      \resizebox{90mm}{!}{\includegraphics{p3}} 
      \resizebox{90mm}{!}{\includegraphics{p5}} \\
      \resizebox{90mm}{!}{\includegraphics{p10}}
      \resizebox{90mm}{!}{\includegraphics{p20}} \\      
    \end{tabular}}
    \caption{Position of values in 1NN and CAM population with data set at 3, 5, 10 and 20 clusters.}
    \label{fig:dist3}
  \end{center}
\end{figure*}

\begin{figure}[ht!]
\begin{center}
\begin{tabular}{l@{~}|c@{~}| c@{~}|}
Clusters & Forensic Models &  Significance\\\hline
\multirow{2}{*}{3} & 1nn &  \multirow{2}{*}{low} \\
 & cam  &\\
  \hline
\multirow{2}{*}{5} & 1nn & \multirow{2}{*}{low} \\
 & cam   & \\
  \hline
\multirow{2}{*}{10} & 1nn  & \multirow{2}{*}{low} \\
 & cam  & \\
  \hline
\multirow{2}{*}{20} & 1nn  & \multirow{2}{*}{low} \\
 & cam  & \\
  \hline 
\end{tabular}
\end{center}
\caption{Summary of Mann Whitney U-test results for Experiment 2 (95\% confidence): In the Significance column, $low$ indicates that CAM is better than just 1NN}\label{fig:result3}
\end{figure}

\subsection{Summary}
In summary, by using CAM, inappropriate statistical assumptions about the data are avoided. We found a successful way to reduce any brittleness to create strong forensic interpretation models. One important point to note here also is this: In order to evaluate data sets with multiple attributes, a host of new statistical models has been built \cite{09Zadora, 09aZadora, 06Aitken, 04Aitken, 02Koons, 99Koons}. This has been the case with forensic scientists building these models for glass interpretation when using the elemental composition of glass rather than just the refractive index. On the other hand, with CAM an increase in the number of attributes used, does not signal the need to create a new model, it works with any data set.
%brittleness?


