Predictive Analytics: Predicting Loan Outcomes Based on Origination Characteristics
Mortgage borrowers have the option of early prepayment (inducing interest rate risk for lenders) and the option of default (inducing credit risk for lenders). Therefore, lenders, financial institutions, and investors seek high quaity data analysis that can help them predict the probabilities of the loan outcomes. Understanding the drivers and magnitudes of these risks helps lenders and investors determine criteria for conforming loans, price these loans to reflect their inherest risks, and to accurately value mortgage-backed securities.
Task outline
Download the train / test sets described in this section.
Prepare your dataset: Define a new target variable column named “ZBC_status” and compute it as follows:
- Set “ZBC_status” to “Prepaid” if “Zero Balance Code” equals “01”;
- Set “ZBC_status” to “Defaulted” if “Zero Balance Code” equals “02”, “03”, “09”, or “15”;
- Set “ZBC_status” to “Other” if “Zero Balance Code” equals “06”, “16”, “96”;
- Set “ZBC_status” to “Current”, otherwise.
Inspect the train dataset. In particular, examine: - The extent and pattern of missing (NA) values; - The value counts of records belonging to each ZBC_status category.
Plan your analysis (carefully):
- Identify the main challenges involved in classifying loan outcomes based on the dataset you constructed and the predictive variables you have identified above;
- Describe how you tackle these challenges in your analysis. What method (or sequence of methods) will you implement. Assume you have only a couple of hours to produce a “first-iteration” predictive model;
- How do you plan to evaluate the quality of your predictive model. Be precise.
Execute (Run) and review your analysis:
- Construct and run your analysis in a notebook. Try running different ML models and compare their test set predictive performance.
- Examine any unexpected results or anomalies. Revise your analysis as necessary.
Report initial findings:
- Produce one or more exhibits summarizing your main findings. What is the bets fitting model. WHat is its performance on the test set (based on your selected metric)
- Report on any shortcomings / limitation of your model in predicting the outcome of loans originating next month. WHat ideas, if any, do you have for improving your analysis?