Bank Loan Decision Prediction
Developed interpretable ML models (Logistic Regression, Random Forest, XGBoost) on 10K loan records with SMOTE and threshold tuning for class imbalance. Applied SHAP, LIME, and permutation feature-importance to surface top default drivers and deliver regulator-ready explanations. Audited fairness across borrower demographics to inform equitable loan-approval policies.
Overview
Developed interpretable ML models for predicting loan defaults on 10K records, with a focus on class imbalance handling, model explainability, and fairness auditing. The project delivered regulator-ready explanations and informed equitable loan-approval policies compliant with lending regulations.
Problem
Loan default prediction is a high-stakes classification problem where the cost of false negatives (missed defaults) is asymmetric — far more expensive than false positives. The dataset exhibited severe class imbalance, and any deployed model needed to be explainable to regulators and fair across protected demographics.
Approach
- Modeling: Trained Logistic Regression, Random Forest, and XGBoost; applied SMOTE for minority oversampling and threshold tuning to optimize the recall/precision tradeoff under the specific business cost structure.
- Explainability: Applied SHAP, LIME, and permutation feature-importance to surface the top default drivers (debt-to-income ratio, credit-history length) and deliver clear, regulator-ready explanations.
- Fairness auditing: Audited model fairness across borrower demographics, maintaining parity in false-negative rates and informing equitable loan-approval policies.
- Variance reduction: Used bagging to reduce variance and improve PR-AUC to 0.84.
Results & Impact
- Achieved 92% recall on defaults with PR-AUC of 0.84
- Identified debt-to-income ratio and credit-history length as top default drivers via SHAP/LIME
- Maintained demographic parity in false-negative rates across protected groups
- Delivered regulator-ready explanations that built stakeholder trust
Lessons Learned
- In high-stakes ML, the metric choice is the most important modeling decision — optimizing for accuracy would have missed the asymmetric cost of defaults
- Explainability is not optional in regulated domains — SHAP and LIME are not post-hoc add-ons but core requirements
- Fairness auditing should be part of the evaluation pipeline, not an afterthought