What is the underlying idea behind the probit model? Explain how parameters are estimated in the probit model.

Introduction

In econometrics, many real-world situations involve binary outcomes — for example, whether a person purchases a product (yes or no), passes an exam (pass/fail), or defaults on a loan (default/no default). These binary dependent variable models require special treatment. One of the most widely used models for such data is the probit model.

What is the Probit Model?

The probit model is a type of regression where the dependent variable is binary, typically taking values of 0 and 1. The model assumes there is an underlying (unobserved) continuous latent variable that determines the observed binary outcome.

Latent Variable Framework

Let the model be defined as:

Y_i* = X_iβ + ε_i

Where:

Y_i* is an unobserved (latent) variable
X_i is a vector of explanatory variables
β is a vector of coefficients
ε_i ~ N(0,1) (normally distributed error term)

The observed binary variable Y_i is defined as:

Y_i = 1 if Y_i* > 0
Y_i = 0 if Y_i* ≤ 0

Thus, the probability that Y_i = 1 is:

P(Y_i = 1 | X_i) = Φ(X_iβ)

Where Φ(.) is the cumulative distribution function (CDF) of the standard normal distribution.

Why Use the Probit Model?

The probit model is preferred when we believe the unobserved error term follows a normal distribution. It is particularly useful for modeling decisions that follow a latent propensity structure, such as consumer choice or risk behavior.

Parameter Estimation in Probit Model

The coefficients in a probit model cannot be estimated using ordinary least squares (OLS) because the dependent variable is not continuous. Instead, we use the Maximum Likelihood Estimation (MLE) method.

1. Likelihood Function

Let Y_i be the observed binary outcome (0 or 1). The likelihood function for a single observation is:

P(Y_i = 1) = Φ(X_iβ)
P(Y_i = 0) = 1 – Φ(X_iβ)

The likelihood for the full sample is:

L(β) = ∏ [Φ(X_iβ)]^Y_i × [1 – Φ(X_iβ)]^{(1 − Y_i)}

2. Log-Likelihood Function

To simplify computation, the log of the likelihood function is taken:

ln L(β) = Σ [Y_i × ln Φ(X_iβ) + (1 − Y_i) × ln(1 − Φ(X_iβ))]

3. Maximization

The log-likelihood function is maximized with respect to β using numerical optimization algorithms (e.g., Newton-Raphson method) to obtain the maximum likelihood estimates.

Interpretation of Coefficients

The coefficients (β) from a probit model do not represent direct marginal effects. However, the sign of β tells us the direction of the relationship between X and the probability of success (Y=1).

To compute marginal effects:

∂P(Y=1)/∂X_j = φ(Xβ) × β_j

Where φ(.) is the standard normal probability density function (PDF).

Applications

Modeling purchase decisions
Default risk modeling in banking
Labor market participation

Conclusion

The probit model is a powerful tool for analyzing binary outcomes. It assumes a normal distribution of the underlying latent variable and uses maximum likelihood estimation to determine model parameters. While coefficients are not directly interpretable, marginal effects provide the change in probability for a unit change in the explanatory variable. The probit model is widely applied across economics, finance, and social sciences for understanding decision-making processes.