\section{Introduction}


This article uses data mining to find patterns of student retention at American Universities.  Such an analysis is urgently required. In our work, we have seen  a disconnect between accepted best practices and the data available to support those practices:
\begin{itemize}
\item Based on our discussion with the university administrators, we assert that there is much {\em informal} agreement on the factors that influence retention (for the most part, the financial status of the student is considered to be the most important factor after the student's high-school GPA).
\item However, as shown below, when we look recent experiments with student records, we find little clear support for that informal belief. In fact, we know of many universities that try to improve retention with a wide range of programs such as:
\begin{itemize}
\item attracting students with high performance indicators (such as their test scores)
\item developing student success programs (such as first-year experience)
 \item or encouraging tenured-faculty to teach undergraduates 
\end{itemize}
\end{itemize}

Given the large levels of public support allocated to universities, it is important that we check the validity of these {\em informal} intuitions as well as the utility of the various retention programs such as those conducted at Kent State.

This article applies data mining methods to the problem of studying student retention. Our general conclusions will be:
\begin{itemize}
\item It is possible to find patterns of student retention, using data mining; 
\item Previous data mining studies on these student records can be greatly improved using discretization, attribute selection and cross-validation over various algorithms.
\end{itemize}

More specifically, we will show that data mining can uncover a rich level of detail about particular universities. For example, while mining data from Kent State,  we found:
\begin{itemize}
\item A small and specific population of students at high risk of dropping out of university.
\item That the above programs (using tenured-faculty for lecturing and focusing on student performance data) is far less important than the financial status of a student.
\end{itemize}

Hence, we would recommend:
\begin{itemize}
\item Focusing more resources on  the high-risk group of students,  in order to improve their chances of completing a university degree.
\item Discontinuing the retention programs that primarily focus on student performance indicators or that advocate using tenured-faculty for lecturing.
\end{itemize}

While these conclusions are specific to Kent State, the method for finding them is quite general and could be applied to other universities in order to find their most specific and most important student retention patterns. We welcome contacts from other researchers who wish to repeat our analysis on their local data.





%Our results are based on data from Kent State University, Ohio, USA from 2001 to 2007.
%Using that data we conclude that the following factors predict student retention during the first three years
%of an American undergraduate degree:
%\begin{itemize}
%\item family background and family's social-economic status;
%\item  high school GPA;
%\item  test scores.
%\end{itemize}

%We also found that some current policies at Kent State aimed at increasing student retention are {\em not} useful (specifically, 
%using experienced research faculty for instructors). 

%It is important to stress that these results could just be local to Kent State.
%For example, we show below one example where students enrollment in a specific first year class (Kent State's English 10000 subject) 
%had a dramatically higher rate of non-retention. Clearly, this result is very specific to that Kent State subject.
%
%Nevertheless,  we think that our analysis  method
%of study could be followed by any institution and 
% predictive theories along with top-attributes can be found. 
%The data mining methods used in this paper can be used to provide detailed feedback on retention patterns at university. 
%For example, we could isolate specific high-risk groups within the overall student 
%body (those students enrolled in Kent State's English 10000 subject).  
%We welcome contacts
%from other researchers who wish to repeat our analysis on their
%local data.


%The rest of this paper is structured as follows: XXXX