Upon initial experimentation with {\W}, we were forced to decide upon a few arbitrary values for internal decisions. For example, when deciding which historical cases were relevant to a given project, we chose the standard CBR method of taking $k$ nearest neighbors based on euclidean distance from the defined query. Given the size of our datasets we arbitrarily chose $k=20$ for our definition of the closest neighbors.

This proved problematic in two regards. First, the $knn$ calculation required $O(n^2)$ time to run, limiting our application to very large datasets. Second, the arbitrary selection of 20 cases (separated into the 5 ``best'' and 15 ``rest'') often selected too large a subset of the data for certain datasets. For example, if data was only provided for 12 historical cases, once separated into a training set of 66\%, only 8 cases remain. At this point no relevancy filtering is performed, and the entire space is selected for learning.

To resolve this, a non-static metric for relevancy was devised. Instead of selecting cases based on an arbitrary value, cases were ranked and selected based on how well they were contained within the query space. For each attribute in a case, the case was compared to the project query. If the case's value falls within the query, the case scores on ``point'' for that attribute. These scores are combined and ranked. For example, if a case within the $nasa93$ dataset falls within the Orbital Space Plane (OSP) case study query for 16 of its attributes and fails for the other 7, it is said to be $70\%$ contained. The cases with the highest containment (``Best Overlap'') are then selected for contrast set reasoning.

The performance of this new method is shown in figure ~\ref{fig:bestoverlap-vs-knn}. KNN represents the old $O(n^2)$ method of relevancy filtering compared with the new BestOverlap method. In all but one case, BestOverlap performs better. However, even when BestOverlap performs slightly worse, it still performs better than KNN in spread reduction.

\input{figs/w2-runtimes.tex}

\input{figs/knn-vs-bestoverlap.tex}
