A Gentle Intro to Bayesian Statistics for College Admissions
Predictive analytics is becoming increasingly important in college admissions modeling. It helps predict the likelihood that a student will enroll, if admitted. Predictive analytics is one of the reasons why demonstrated interest has become a key factor in college admissions decisions.
Bayesian statistics, which was first developed in 1763, is a key component of the mathematics behind predictive analytics. It begins with the concept of a conditional probability. For example, what is the probability that a student will enroll if admitted, given the student’s GPA and zip code?
The conditional probability of an event A, given an event B, is written as P( A | B ). This is the same as the probability that both A and B occur, divided by the independent probability of B. Thus,
P( A | B ) = P( A and B ) / P( B )
This definition can be used to prove Bayes’ Theorem, which relates the conditional probability of A, given B, to the conditional probability of B, given A. Thus,
P( A | B ) = P( B | A ) * P( A ) / P( B )
For example, suppose that A occurs half the time and B one-quarter of the time, and A and B occur together one-tenth of the time. Then, we can use these equations to calculate the conditional probability of A, given B, as (1/10) / (1/4) = 2/5.
Actual Bayesian analysis is a bit more complicated than this, but in effect uses known information to reason about the probabilities of unknown information. You’ll often hear practitioners talking about the Monte Carlo method and Markov Chains.
There’s one other important thing you need to know about Bayesian Statistics, and that’s that a lot of Bayesian methods involve an independence assumption. Two probability distributions can be combined if the variables are independent of each other. For example, among applicants for college admission, age and gender are usually independent of each other. The mutual information between the two variables is very small. However, age and year-in-school are not independent. There is a high degree of mutual information between the two variables. If you know a student’s age, you can usually predict their year in school with a high degree of accuracy.
One of the core benefits of the Cappex system is that it provides student inquiries based on demonstrated interest. As these students are more likely to enroll if admitted, admissions departments can more accurately predict student behaviors and optimize their overall recruiting efforts.