The principal goal of this study is to show the forensic scientist the importance of knowing the brittleness level of their results. This is so they can consider it in evaluations. Also consider knn.

What is the measurement for brittleness here?

From fig(?) you can see brittleness clearly in the seheult and grove models. However in Evett need more and Walsh added a factor for smoothing so can't see brittleness here. We can use CSL for these models (show BORE formular).

So far in this study, we have used generated data - let us now test on real world data. (We contend that these methods can be used on other forms of evidentary data.

The causes of brittleness - from studies done, since brittleness occurs with small changes we conjecture that brittleness is caused by statistical assumptions and errors in data collection plus the use of surveys.

Obviously we cannot change the measurement techniques, but we can avoid statistical assumptions and surveys (frequency of occurence).

The following section investigates the use of data mining techniques (chemo) for evaluation.

%need col of fig for the 4 models
%forget about the bayesian approach, their are different ways to find lr - all have advantages and disadvantages

%---------------------

The goal here is to first reduce brittleness by avoiding statistical assumptions and surveys (foo). Also to further reduce with prototype learning. We focus on the former in this section.

Using the data described in section(?), our analysis proceed in three(3) steps, 1) reduce diminsions with FastMap, 2) cluster with kmeans and 3) analyze with knn(fig?).

In data mining it is common practice to reduce the dimensions of a data set with high dimensions. This is to avoid the curse of dimensionallity and also lessen the complexity of the model. The classic method used is PCA - linear correlation. For this study we used FastMap - a faster version of PCA (what makes it faster?). More details about this algorithm is presented elsewhere.

Concern has been voiced (Koon) that to reduce dimensions reduces the discrimination ability of the model. However to ease this concern run tests to decide on the number of features to reduce the data to - what works best.

In a case where the forensic scientist must consider the possibility that the evidence collected from a suspect comes from different sources not all of which maybe associated with the case at hand, or a case where one is trying to identify a manufacturer, clustering or grouping has been used. The most popular methods used in the forensic field are ACH and divisive(?). In this work we use the kmeans clustering algorithm. It works as follows...

Finally, once the data has been clustered and each sample assigned to a cluster or group, we continue by applying the knn classifier. Knn is a very simple classifier which uses a distance metric to determine the association of an unknown sample.

To further reduce brittleness this work proposes the application of prototype learning. Using the process of CSL described in section(?) samples are selected according to the ranges with top ranks and attributes are selected according to top ranks. In other words, for each class the best samples are selected.

%---------------------------
Assessment

cross validation, pd, pf, quartile, and brittleness

Brittleness measure

Once each test has been classified, distance from the nearest unlike neighbor (NUN) i.e. sample with a different class - is recorded. Recall that brittleness is a small change can result in a different outcome, so here the closer the test sample to the NUN the more brittle it is. So an ideal result will have a greater distance from NUN.

%--------------------------

Results and Discussion

To begin the evaluation process, first we need to decide on the variable values such as k in knn and the number of features from FastMap. These values were chosen based on the results shown in fig(?).

