Data Pre-Processing: Loan-Level Consolidation

Task outline

  1. Download from Fannie Mae’s Data Dynamics portal the single-family historical loan performance datasets corresponding to loans acquired in the most recent years (say 2018-2022). For more extensive analysis, download all the datasets starting year 2000. (Warning: The datasets are large). We recommend you first examine the Loan Performance Data Tutorial, the Sample File, and CRT Glossary and File Layout all available here.

  2. Filter the data to consider only on 30 year mortgages (the data already excludes adjustable rate and other less common mortgage types)

  3. Identify the static fields (those related to acquisition characteristics and unlikley to change my reporting period), the dynamic fileds (those likely to change by reporting period), and non-applicable fields that do not apply to fixed-rate mortgages.

  4. Consolidate (aggregate) the data by Loan Identifier to create a single record per loan. In particular, extract the the first (available) data for static fields and the last available data for dynamic fields for each loan. Drop all non-applicable fields.

  5. Save the consolidated files by acquisition year.

  6. The year Fannie Mae acquired a loan may differ from the year the loan originated. For our purposes, it is more meaningful to conduct analysis by loan vintage (origination year). Therefore, you nneed to re-read the files saved in Step 5, split and re-combine them by acquisition year, and save the resuting files.