%\section{Introduction}
Since the creation of the Nearest Neighbor algorithm in 1967 (Hart \cite{Hart67}), a copious amount of prototype learning schemes (PLS) have appeared to remedy the five (5) major drawbacks associated with the algorithm and it's variations. First, the high computation costs caused by the need for each test sample to find the distance between it and each training sample. Second, the storage requirement is large since the entire dataset needs to be stored in memory. Third, outliers can negatively affect the accuracy of the classifer. Fourth the negative effect of data sets with non-separable and/or overlapping classes and last, the low tolerance to noise. To solve these issues, PLS are used. Their main purpose is to reduce a training set via various selection and/or creation methods to produce \emph{good prototypes} (\fig{selection}). \emph{Good} here meaning that the prototypes are a good representation of the original training data set such that they maintain comparable or improved performance of a nearest neighbour classifier while simultaneously eliminating the effects of the drawbacks mentioned previously.

\begin{figure*}[ht!]
  \begin{center}
  \scalebox{1}{
    \begin{tabular}{l}
      \resizebox{100mm}{!}{\includegraphics{selection}} 
      \end{tabular}}
    \caption{The Instance Selection Process.}
    \label{fig:selection}
  \end{center}
\end{figure*}

A review of the literature on prototype learning for this thesis has yielded at least 40 PLS each indicating with experimental proof that their particular design is comparable or better than the standard schemes; published surveys of PLS can be found in \cite{Bezdek2000, Bezdek2001, Kim2003, lot2010}. However many of these schemes suffer from at least one of the following disadvantages:

\bi
\item computationally expensive
\item order effects (where the resulting prototypes are unreasonably affected by the order of the original training set)
\item overfitting
\ei

The goal of this thesis is not to be unduly critical of the PLS which may succumb to any of the above disadvantages (particularly since many of them have proven to be successful), but rather to present a novel approach to this field of study which overcomes these disadvantages to report little or no loss in recognition of NNC.

Thus, this thesis presents CLIFF, a prototype learning scheme which runs in linear time, is independent of the order of the training set and avoids overfitting. A novel feature of CLIFF is that instead of either removing or adding (un)qualified prototypes from/to a 'prototype list'based on misclassification or clustering, criteria are generated for each target class. These criteria are created using a ranking algoritm called BORE (Best Or Rest) \cite{jalali08} or SBBR (Support Based Baysean Ranking) algorithm, which basically finds the value for each attribute which best represents the specific target class i.e. have the highest ranks. Any of the instances in the training set which adheres to some or all the criteria is or are selected as prototypes. Using this structure the percentage reduction of the training set is directly related to the number of constraints from the criteria used on the prototype selection process, in that the more constraints used the lower the percentage reduction. In this work, to be sure that each class is represented in the final list of prototypes, instances are selected using one contraint at a time until the prototype list is reduced as much a possible without being empty.

After describing the design and operation of CLIFF, its performance is demonstrated by evaluating it using cross-validation experiments with the wide variety of standard data sets from the UCI repository \cite{Frank+Asuncion:2010}. %Our experiments also use FastMap \cite{fastmap} \footnote{A Fast Algorithm for Indexing, Data Mining, and Visualization of Traditional and Multimedia Datasets} and Feature Subset Selection (FSS) to avoid the curse of dimensionality \cite{01cod} which negatively affects Nearest Neighbor Classifiers (NNC). 

Next we describe how CLIFF can be used as part of a tool/model for the evaluation of trace forensic evidence. The principal goal of forensic evaluation models is to check that evidence found at a crime scene is (dis)similar to evidence found on a suspect. In our studies of forensic models for evaluation particularly in the sub-field of glass forensics, we conjecture that many of these models succumb to the following flaws:

\be
\item A tiny error(s) in the collection of data;
\item Inappropriate statistical assumptions, such as assuming that the distributions of refractive indices of glass collected at a crime scene or a suspect obeys the properties of a normal distribution; 
\item and the use of measured parameters from surveys to calculate the \emph{frequency of occurrence} of trace evidence in a population
\ee

In this work we show that CLIFF plays an effective role in the evaluation of forensic trace evidence.

Our research is guided by the following research question:

\begin{itemize}
\item Is CLIFF viable as a Prototype Learner for NNC? \newline
The goal here is to see if the performance of CLIFF is comparable or better than the plain k nearest neighbor (KNN) algorithm and three(3) other PLS. So in our first experiment we compare the performance of predicting the target class using the entire training set to using only the prototypes generated by CLIFF and the other PLS.

%\item Can CLIFF be successful as part of a forensic evaluation model? \newline
%With this forensic case study we introduce the concept of \emph{brittleness} to forensic models. \emph{Brittleness} here is described as tiny changes in input data could lead to a massive change in the output. So our goal here is to show that by incorporating CLIFF as part of a forensic model that \emph{brittleness} will be reduced thereby improving the results of the model.
\end{itemize}

%\section{Statement of Thesis}

\section{Contributions of this Thesis}
The contributions of this thesis are:

\bi
\item The CLIFF algorithm, a linear time instance selector;
\item A measure for the concept of $brittleness$ - i.e. how far does a test instance have to move before it changes class;
\item A viable method to reduce one effect of noise when performing instance selection;
\item CAM, a forensic interpretatin model with CLIFF at its core.
\ei

It is partly our intent that this work open the eyes of the forensic scientist to the real problem of $brittleness$ which exists in current forensic models. We hope in the future that the scientist, when verifying a model, they include a brittleness measure along with their evaluation of forensic evidence as done in this work. This will allow them to be confident that their result comes from a region or neighborhood of similar rather than dissimilar interpretation.

\clearpage
\section{Structure of this Thesis}

The remaining chapters of this thesis are structured follows:

\bi
\item Chapter 2 provides a survey of Prototype Learning Schemes over the years;
\item Chapter 3 describes the design and operation of CLIFF;
%\item Chapter 4 describes the data sets used along with the preprocessing tools used to avoided the curse of dimensionality \cite{01cod}
\item Chapter 4 presents a detailed description of the experimental procedure followed to analyze standard data-sets using CLIFF;
\item Chapter 5 examines a case study in which CLIFF is used as part of a forensic interpretation model to reduce the \emph{brittleness} of published forensic models;
\item Chapter 6 conclusions and future work are presented.
\ei
