Data Consolidation

Why Consolidate?

For the most part, the questions addressed in this case do not require a detailed examination of the monthly time-series of each loan. It sufficies to work with a consolidated dataset, consisting of one row per loan, with fields capturing only the acquisition characteristics and the final or current status of each loan. This consolidation requires some initial pre-processing of the raw data and entails the loss of some detail, but it has the advantage of vastly reducing the size (number of rows) of the datasets required for analysis.

How to Consolidate Mortgage Level Data

Fannie Mae refers to this loan consolidation process in their Loan Performance Data Tutorial and provides, a perhaps somewhat outdated, R Code (Primary dataset) to consolidate the data.

It is instructive for students with sufficient time and computing resources to write their own code to consolidate the mortgage loans. We recommend Python, especially the aggregation functionality of pandas. The process is conceptually simple: For each loan, extract the earliest available data for static fields and the latest available data for dynamic fields. See this section for a definition of the static and dynamic fields. Alternatively, we provide samples of consolidated datasts in the next sections.