Introduction
In econometrics, many real-world situations involve binary outcomes — for example, whether a person purchases a product (yes or no), passes an exam (pass/fail), or defaults on a loan (default/no default). These binary dependent variable models require special treatment. One of the most widely used models for such data is the probit model.
What is the Probit Model?
The probit model is a type of regression where the dependent variable is binary, typically taking values of 0 and 1. The model assumes there is an underlying (unobserved) continuous latent variable that determines the observed binary outcome.
Latent Variable Framework
Let the model be defined as:
Yi* = Xiβ + εi
Where:
- Yi* is an unobserved (latent) variable
- Xi is a vector of explanatory variables
- β is a vector of coefficients
- εi ~ N(0,1) (normally distributed error term)
The observed binary variable Yi is defined as:
- Yi = 1 if Yi* > 0
- Yi = 0 if Yi* ≤ 0
Thus, the probability that Yi = 1 is:
P(Yi = 1 | Xi) = Φ(Xiβ)
Where Φ(.) is the cumulative distribution function (CDF) of the standard normal distribution.
Why Use the Probit Model?
The probit model is preferred when we believe the unobserved error term follows a normal distribution. It is particularly useful for modeling decisions that follow a latent propensity structure, such as consumer choice or risk behavior.
Parameter Estimation in Probit Model
The coefficients in a probit model cannot be estimated using ordinary least squares (OLS) because the dependent variable is not continuous. Instead, we use the Maximum Likelihood Estimation (MLE) method.
1. Likelihood Function
Let Yi be the observed binary outcome (0 or 1). The likelihood function for a single observation is:
- P(Yi = 1) = Φ(Xiβ)
- P(Yi = 0) = 1 – Φ(Xiβ)
The likelihood for the full sample is:
L(β) = ∏ [Φ(Xiβ)]Yi × [1 – Φ(Xiβ)](1 − Yi)
2. Log-Likelihood Function
To simplify computation, the log of the likelihood function is taken:
ln L(β) = Σ [Yi × ln Φ(Xiβ) + (1 − Yi) × ln(1 − Φ(Xiβ))]
3. Maximization
The log-likelihood function is maximized with respect to β using numerical optimization algorithms (e.g., Newton-Raphson method) to obtain the maximum likelihood estimates.
Interpretation of Coefficients
The coefficients (β) from a probit model do not represent direct marginal effects. However, the sign of β tells us the direction of the relationship between X and the probability of success (Y=1).
To compute marginal effects:
∂P(Y=1)/∂Xj = φ(Xβ) × βj
Where φ(.) is the standard normal probability density function (PDF).
Applications
- Modeling purchase decisions
- Default risk modeling in banking
- Labor market participation
Conclusion
The probit model is a powerful tool for analyzing binary outcomes. It assumes a normal distribution of the underlying latent variable and uses maximum likelihood estimation to determine model parameters. While coefficients are not directly interpretable, marginal effects provide the change in probability for a unit change in the explanatory variable. The probit model is widely applied across economics, finance, and social sciences for understanding decision-making processes.