Bank Loan Decision Prediction

Overview

Developed interpretable ML models for predicting loan defaults on 10K records, with a focus on class imbalance handling, model explainability, and fairness auditing. The project delivered regulator-ready explanations and informed equitable loan-approval policies compliant with lending regulations.

Problem

Loan default prediction is a high-stakes classification problem where the cost of false negatives (missed defaults) is asymmetric — far more expensive than false positives. The dataset exhibited severe class imbalance, and any deployed model needed to be explainable to regulators and fair across protected demographics.

Approach

Modeling: Trained Logistic Regression, Random Forest, and XGBoost; applied SMOTE for minority oversampling and threshold tuning to optimize the recall/precision tradeoff under the specific business cost structure.
Explainability: Applied SHAP, LIME, and permutation feature-importance to surface the top default drivers (debt-to-income ratio, credit-history length) and deliver clear, regulator-ready explanations.
Fairness auditing: Audited model fairness across borrower demographics, maintaining parity in false-negative rates and informing equitable loan-approval policies.
Variance reduction: Used bagging to reduce variance and improve PR-AUC to 0.84.

Results & Impact

Achieved 92% recall on defaults with PR-AUC of 0.84
Identified debt-to-income ratio and credit-history length as top default drivers via SHAP/LIME
Maintained demographic parity in false-negative rates across protected groups
Delivered regulator-ready explanations that built stakeholder trust

Lessons Learned

In high-stakes ML, the metric choice is the most important modeling decision — optimizing for accuracy would have missed the asymmetric cost of defaults
Explainability is not optional in regulated domains — SHAP and LIME are not post-hoc add-ons but core requirements
Fairness auditing should be part of the evaluation pipeline, not an afterthought