

\documentclass{svjour3}                     % onecolumn (standard format)
%\documentclass[smallextended]{svjour3}     % onecolumn (second format)
%\documentclass[twocolumn]{svjour3}         % twocolumn
%
\smartqed  % flush right qed marks, e.g. at end of proof
%
\usepackage{graphicx}
\usepackage{verbatim}
\usepackage{fancyvrb}
\usepackage{algorithmic}
\usepackage{cite}
\usepackage{multirow}
\usepackage{rotating}
\usepackage{subfigure}
\usepackage{float}
\restylefloat{figure}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\newcommand{\bi}{\begin{itemize}}
\newcommand{\ei}{\end{itemize}}
\newcommand{\be}{\begin{enumerate}}
\newcommand{\ee}{\end{enumerate}}
\newcommand{\tion}[1]{\S\ref{tion:#1}}
\newcommand{\eq}[1]{Equation~\ref{eq:#1}}
\newcommand{\fig}[1]{Figure~\ref{fig:#1}}
\newenvironment{smallitem}
 {\setlength{\topsep}{0pt}
  \setlength{\partopsep}{0pt}
  \setlength{\parskip}{0pt}
  \begin{itemize}
   \setlength{\leftmargin}{.2in}
  \setlength{\parsep}{0pt}
  \setlength{\parskip}{0pt}
  \setlength{\itemsep}{0pt}}
 {\end{itemize}}
\newenvironment{smallenum}
 {\setlength{\topsep}{0pt}
  \setlength{\partopsep}{0pt}
  \setlength{\parskip}{0pt}
  \begin{enumerate}
  \setlength{\leftmargin}{.2in}
  \setlength{\parsep}{0pt}
  \setlength{\parskip}{0pt}
  \setlength{\itemsep}{0pt}}
 {\end{enumerate}}

\begin{document}

\title{STAT745 Term Project - Part 2%\thanks{Grants or other notes
%about the article that should go on the front page should be
%placed here. General acknowledgments should be placed at the end of the article.}
}
\subtitle{Unsupervised Dataset: BOSTON-HOUSING}

%\titlerunning{Short form of title}        % if too long for running head

%\author{Ekrem Kocaguneli        \and
%        Tim Menzies \and
%        Jacky W. Keung
%} ghjhghkjhhkjhkhkjhkjhkkjhkhkjhhkjhkjhkjhkjhkjhkjhkjhkjhkjh

%\authorrunning{Short form of author list} % if too long for running head

%\institute{E. Kocaguneli and T. Menzies\at
%              Lane Department of Computer Science and Electrical Engineering \\
%              West Virginia University\\
%              Morgantown, WV 26505, USA\\
%              \email{ekocagun@mix.wvu.edu, tim@menzies.us}
%           \and
%           Jacky W. Keung \at
%              School of Computer Science and Engineering\\
%              University of New South Wales\\
%              Sydney, Australia\\
%              \email{jacky.keung@nicta.com.au}
%}

%\date{Received: date / Accepted: date}
% The correct dates will be entered by the editor
\institute{Ekrem Kocaguneli\at
              Lane Department of Computer Science and Electrical Engineering \\
              West Virginia University\\
              Morgantown, WV 26505, USA\\
              \email{ekocagun@mix.wvu.edu}
}

\maketitle

\begin{abstract}
In this part we are going to be interested in the unsupervised dataset.
The dataset that was given to me was Boston-Housing.
The problem with the unsupervised datasets is that there is not supervisor giving us the labels (for classification) or the numeric values (for regression).
To derive a meaning out of the data we will use multi-dimensional scaling and rank the data instances to provide an interpretation.
\end{abstract}

\section{Introduction}
\label{sec:introduction}

We are using an unsupervised dataset called Boston-Housing or Housing.
Similar to my approach to the previous case, I have firstly manually inspected the data to get an understanding of the features as well as to see if there are any missing values.
Boston-Housing data consists of $506$ instances, that are defined by $14$ attributes.

We will use multi-dimensional scaling (MDS) on the dataset to observe similarities between instances and to provide a ranking of the data.
MDS algorithm uses the proximities between the instances and by using these proximities tell us about the similartiy and dissimilarity of instances.
For the purpose of finding proximities, R provides a function called \textit{``dist''}, which returns a distance matrix reporting the distances between every pair of instances in the dataset.
Furthermore, MDS is a dimensionality reduction technique, that helps us visualize the data in lower dimensions.
R language, provides us \textit{``cmdscale''} that implements the dimensionality reduction according to MDS.

In our case, we will represent our dataset in 2 dimensions.
In \fig{distance} the representation of $506$ instances are shown on a 2 dimensional plot.
The x and y axis of \fig{distance}  are the new dimensions.
As can be seen, when reduced to a 2 dimensional representation, the instances (a.k.a. houses) show proximity to one another: Notice how the instances from $400$ to $500$ are aligned on a line, whereas the remaining of the houses align on another line and notice how those two lines (or those two clusters) are separate from one another.



\begin{figure}
\includegraphics[scale=0.8]{distance-matrix.pdf}
\caption{The 2-dimensional representation of Boston-Housing data. Notice how there are two clusters of houses and each cluster is like a straight line. Also notice that when projected on a 2D plot, it is easier to see the proximity between the houses. The numbers in the plot correspond to instance ID's in the actual dataset.}
\label{fig:distance}
\end{figure}





\bibliographystyle{abbrv}
\bibliography{myref}

\end{document}

