\section{Initial Data Exploration}
Before conductive elaborate data mining experiments, it is good practice to first check if simpler methods
would work adequately.

Accordingly, after discretizing the data\footnote{Discretization
is the processing of dividing a continuos range into a small number
of bins. A repeated result is that discretization improves the
performance of data miners~\cite{Dou95,Yang09}.} using minimum
description length criterion \citep{Fay1992}, we performed visual
data analysis to find interesting patterns in the data before running
any learners. Figure \ref{figParentsEdLevelRET3} shows the third-year
retention percentage for different education levels of parents. The circles on that diagram show the 
probability that some attribute range selects for third year retention. For exampXXXX and
it is clear that retention percentage increases with the parent's
education level.


\begin{figure}[t]
\centering
\includegraphics[scale=0.7]{ParentEducationvsRET3}
\caption{Parent's education level vs. RET3 percentage. Red dashed line represents the baseline RET3 percentage}
\label{figParentsEdLevelRET3}
\end{figure}

\begin{figure}[t]
\centering
\includegraphics[scale=0.7]{ParentHouseHoldSizeRET3}
\caption{Parent's household size vs. RET3 percentage (left), and distribution of parent's household size (right). Red dashed line represents the baseline RET3 percentage.  Note that the ``0'' household size denotes missing and auto-computed entries. }
\label{figParentsHHSizeRET3}
\end{figure}

As shown in Figure\ref{figParentsHHSizeRET3}, parent's household size had an effect on third-year retention. This trend is counter intuitive, as bigger household would suggest less parental attention and sharing resources.

\begin{figure}[t]
\centering
\includegraphics[scale=0.7]{RET3HSGPAS_TA}
\caption{Student tax form type vs. RET3 percentage, grouped by high school GPA. Red dashed line represents the baseline RET3 percentage. On the right, a log-frequency histogram of student tax form type.}
\label{figRET3HSGPAS_TA}
\end{figure}

Figure \ref{figRET3HSGPAS_TA} shows the positive effect of high school GPA on the third-year retention percentage; and  for the tax form type 3 and 4, the retention percentages are very high compared to the retention percentages for tax form type 1 and 2. 


