%\section{Introduction}
Since the creation of the Nearest Neighbour algorithm in 1967 (Hart \cite{Hart67}), a copious amount of prototype learning schemes (PLS) have appeared to remedy the five (5) major drawbacks associated with the algorithm and it's variations. First, the high computation costs caused by the need for each test sample to find the distance between it and each training sample. Second, the storage requirement is large since the entire dataset needs to be stored in memory. Third, outliers can negatively affect the accuracy of the classifer. Fourth the negative effect of data sets with non-separable and/or overlapping classes and last, the low tolerance to noise. To solve these issues, PLS are used. Their main purpose is to reduce a training set via various selection and/or creation methods to produce \emph{good prototypes}. \emph{Good} here meaning that the prototypes are a good representation of the original training data set such that they maintain comparable or improved performance of a nearest neighbour classifier while simultaneously eliminating the effects of the drawbacks mentioned previously.

A review of the literature on prototype learning for this thesis has yielded at least 40 PLS each indicating with experimental proof that their particular design is comparable or better than the standard schemes; published surveys of PLS can be found in \cite{Bezdek2000, Bezdek2001, Kim2003}. However many of these schemes suffer from at least one of the following disadvantages:

\bi
\item computationally expensive
\item order effects (where the resulting prototypes are unreasonably affected by the order of the original training set)
\item overfitting
\ei

The goal of this thesis is not to be unduly critical of the PLS which may succumb to any of the above disadvantages (particularly since many of them have proven to be successful), but rather to present a novel approach to this field of study which overcomes these disadvantages to report little or no loss in recognition of NNC.

Thus, this thesis presents CLIFF, a prototype learning scheme which runs in linear time, is independent of the order of the training set and avoids overfitting. A novel feature of CLIFF is that instead of either removing or adding (un)qualified prototypes from/to a 'prototype list', a conjunction of contraints is generated for each target class. These conjuctions are created using a ranking algoritm call BORE (Best Or Rest) \cite{jalali08}, which basically finds a range of values for each attribute which best represents the specific target class i.e. have the highest ranks. Any of the instances in the training set which adheres to all the constraints is or are selected as prototypes. Using this structure the percentage reduction of the training set is related to the number of constraints used on the prototype selection process, in that the more constraints used the lower the percentage reduction. So there is no need to predetermine the number of prototypes desired when using CLIFF.

After describing the design and operation of CLIFF, its performance is demonstrated by evaluating it using cross-validation experiments with the wide variety of standard data sets from the UCI repository \cite{Frank+Asuncion:2010}. Our experiments also use FastMap \cite{fastmap} \footnote{A Fast Algorithm for Indexing, Data Mining, and Visualization of Traditional and Multimedia Datasets} and Feature Subset Selection (FSS) to avoid the curse of dimensionality \cite{01cod} which negatively affects Nearest Neighbor Classifiers (NNC). 

Next we describe how CLIFF can be used as part of a tool/model for the evaluation of trace forensic evidence. The principal goal of forensic evaluation models is to check that evidence found at a crime scene is (dis)similar to evidence found on a suspect. In our studies of forensic models for evaluation particularly in the sub-field of glass forensics, we conjecture that many of these models succumb to the following flaws:

\be
\item A tiny error(s) in the collection of data;
\item Inappropriate statistical assumptions, such as assuming that the distributions of refractive indices of glass collected at a crime scene or a suspect obeys the properties of a normal distribution; 
\item and the use of measured parameters from surveys to calculate the \emph{frequency of occurrence} of trace evidence in a population
\ee

In this work we show that CLIFF plays an effective role in the evaluation of forensic trace evidence.

Our research is guided by the following research question:

\begin{itemize}
\item Is CLIFF viable as a Prototype Learner for NNC? \newline
The goal here is to see if the performance of CLIFF is comparable or better than the plain k nearest neighbor (KNN) algorithm. So in our first experiment we compare the performance of predicting the target class using the entire training set to using only the prototypes generated by CLIFF.

%\item Can CLIFF be successful as part of a forensic evaluation model? \newline
%With this forensic case study we introduce the concept of \emph{brittleness} to forensic models. \emph{Brittleness} here is described as tiny changes in input data could lead to a massive change in the output. So our goal here is to show that by incorporating CLIFF as part of a forensic model that \emph{brittleness} will be reduced thereby improving the results of the model.
\end{itemize}

\section{Statement of Thesis}

\section{Contributions of this Thesis}

\section{Structure of this Thesis}

The remaining chapters of this thesis are structured follows:

\bi
\item Chapter 2 provides a survey of Prototype Learning Schemes over the years
\item Chapter 3 describes the design and operation of CLIFF
\item Chapter 4 describes the data sets used along with the preprocessing tools used to avoided the curse of dimensionality \cite{01cod}
\item Chapter 5 presents a detailed description of the experimental procedure followed to analyze the data using CLIFF
\item Chapter 6 examines a case study in which CLIFF is used to reduce the \emph{brittleness} of a forensic model
\item Chapter 7 conclusions are presented
\ei