Fannie Mae’s Data

The Dataset

Fannie Mae publishes granular data representative of a portion of the single-family home mortgages it acquired since year 2000. The data is available on their website here: Fannie Mae Single-Family Loan Performance Data. The types of mortgages included in the dataset, and various exclusions and limitations, are described in items 1 and 2 of this FAQ document.

Throughout this case, we refer to the July 27, 2023 release1 of the primary2 dataset. This dataset covers approximately 55 million mortgage loans. Monthly performance data is provided for each loan starting from the time of acquisition through to the end of the first quarter of year 2023 or until the loan reaches zero balance (through prepayment, liquidation, etc.) whichever is earlier. Therefore, each record (row) of this dataset is a snapshot of a particular loan at a particular month. The full dataset consists of approximately 2.5 billion records (rows) and approximately 65-70 applicable attributes (columns).

Access

To access the Fannie Mae Single-Family Loan Performance data you need to register on Data Dynamics. Data Dynamics is an online tool provided by Fannie Mae to access and explore its loan performance, and other, datasets.

Details

The data is subdivided into quarterly files. Data on all loans acquired by Fannie Mae in a given quarter are included in that quarter’s file. That is, each loan appears in exactly one quarterly file.

A quarterly file contains all monthly reporting periods for the loans acquired in that quarter. That file provides the full monthly performance history of each loan starting from its acquisition quarter until the release date of the dataset (end of Q1 of year 2023) or until the loan reaches zero balance (through prepayment, liquidation, etc.) whichever is earlier.

Note that some loans acquired in a given quarter may have originated in earlier quarters. Note also that, while the files are labelled as “csv”, the fields are in fact separated by vertical line “|” delimiters, not by commas3.

Fannie Mae provides a (tiny) sample file consisting of a handful of loan to illustrate the structure of the dataset. The data files contain no column headings. However, a detailed description of the fields, identified positionally, is available in the accompanying “CRT Glossary and File Layout” document [pdf][Excel].

There is a total of 108 columns in each quarterly data file. The following ordered list of column names can be helpful in importing the data:

col_names = ["Reference Pool ID", "Loan Identifier", "Monthly Reporting Period", "Channel", "Seller Name", 
             "Servicer Name", "Master Servicer", "Original Interest Rate", "Current Interest Rate", "Original UPB", 
             "UPB at Issuance", "Current Actual UPB", "Original Loan Term", "Origination Date", "First Payment Date", 
             "Loan Age", "Remaining Months to Legal Maturity", "Remaining Months To Maturity", "Maturity Date", "Original Loan to Value Ratio (LTV)", 
             "Original Combined Loan to Value Ratio (CLTV)", "Number of Borrowers", "Debt-To-Income (DTI)", "Borrower Credit Score at Origination", "Co-Borrower Credit Score at Origination", 
             "First Time Home Buyer Indicator", "Loan Purpose", "Property Type", "Number of Units", "Occupancy Status", 
             "Property State", "Metropolitan Statistical Area (MSA)", "Zip Code Short", "Mortgage Insurance Percentage", "Amortization Type", 
             "Prepayment Penalty Indicator", "Interest Only Loan Indicator", "Interest Only First Principal And Interest Payment Date", "Months to Amortization", "Current Loan Delinquency Status", 
             "Loan Payment History", "Modification Flag", "Mortgage Insurance Cancellation Indicator", "Zero Balance Code", "Zero Balance Effective Date",
             "UPB at the Time of Removal", "Repurchase Date", "Scheduled Principal Current", "Total Principal Current", "Unscheduled Principal Current", 
             "Last Paid Installment Date", "Foreclosure Date", "Disposition Date", "Foreclosure Costs", "Property Preservation and Repair Costs",
             "Asset Recovery Costs", "Miscellaneous Holding Expenses and Credits", "Associated Taxes for Holding Property", "Net Sales Proceeds", "Credit Enhancement Proceeds", 
             "Repurchase Make Whole Proceeds", "Other Foreclosure Proceeds", "Modification-Related Non-Interest Bearing UPB", "Principal Forgiveness Amount", "Original List Start Date", 
             "Original List Price", "Current List Start Date", "Current List Price", "Borrower Credit Score At Issuance", "Co-Borrower Credit Score At Issuance", 
             "Borrower Credit Score Current", "Co-Borrower Credit Score Current", "Mortgage Insurance Type", "Servicing Activity Indicator", "Current Period Modification Loss Amount",
             "Cumulative Modification Loss Amount", "Current Period Credit Event Net Gain or Loss", "Cumulative Credit Event Net Gain or Loss", "Special Eligibility Program", "Foreclosure Principal Write-off Amount", 
             "Relocation Mortgage Indicator", "Zero Balance Code Change Date", "Loan Holdback Indicator", "Loan Holdback Effective Date", "Delinquent Accrued Interest", 
             "Property Valuation Method", "High Balance Loan Indicator", "ARM Initial Fixed-Rate Period  ≤ 5 YR Indicator", "ARM Product Type", "Initial Fixed-Rate Period", 
             "Interest Rate Adjustment Frequency", "Next Interest Rate Adjustment Date", "Next Payment Change Date", "Index", "ARM Cap Structure", 
             "Initial Interest Rate Cap Up Percent", "Periodic Interest Rate Cap Up Percent", "Lifetime Interest Rate Cap Up Percent", "Mortgage Margin", "ARM Balloon Indicator",
             "ARM Plan Number", "Borrower Assistance Plan", "High Loan to Value (HLTV) Refinance Option Indicator", "Deal Name", "Repurchase Make Whole Proceeds Flag", 
             "Alternative Delinquency Resolution", "Alternative Delinquency Resolution Count", "Total Deferral Amount"
            ]

The fields “Loan Identifier” and “Monthly Reporting Period”, when used in conjunction, serve as unique key identifier to the dataset records.

We also provide a Python dictionary of suggested data types for each field that can be optionally used when importing the data:

col_dtypes = {"Reference Pool ID": "string", "Loan Identifier": "string", "Monthly Reporting Period": "object", "Channel": "string", "Seller Name": "string", 
              "Servicer Name": "string", "Master Servicer": "string", "Original Interest Rate": "float64", "Current Interest Rate": "float64", "Original UPB": "float64", 
              "UPB at Issuance": "float64", "Current Actual UPB": "float64", "Original Loan Term": "float64", "Origination Date": "object", "First Payment Date": "object", 
              "Loan Age": "float64", "Remaining Months to Legal Maturity": "float64", "Remaining Months To Maturity": "float64", "Maturity Date": "object", "Original Loan to Value Ratio (LTV)": "float64", 
              "Original Combined Loan to Value Ratio (CLTV)": "float64", "Number of Borrowers": "Int64", "Debt-To-Income (DTI)": "float64", "Borrower Credit Score at Origination": "float64", "Co-Borrower Credit Score at Origination": "float64", 
              "First Time Home Buyer Indicator": "string", "Loan Purpose": "string", "Property Type": "string", "Number of Units": "float64", "Occupancy Status": "string", 
              "Property State": "string", "Metropolitan Statistical Area (MSA)": "string", "Zip Code Short": "string", "Mortgage Insurance Percentage": "float64", "Amortization Type": "string", 
              "Prepayment Penalty Indicator": "string", "Interest Only Loan Indicator": "string", "Interest Only First Principal And Interest Payment Date": "object", "Months to Amortization": "float64", "Current Loan Delinquency Status": "string", 
              "Loan Payment History": "string", "Modification Flag": "string", "Mortgage Insurance Cancellation Indicator": "string", "Zero Balance Code": "string", "Zero Balance Effective Date": "object",
              "UPB at the Time of Removal": "float64", "Repurchase Date": "object", "Scheduled Principal Current": "float64", "Total Principal Current": "float64", "Unscheduled Principal Current": "float64", 
              "Last Paid Installment Date": "object", "Foreclosure Date": "object", "Disposition Date": "object", "Foreclosure Costs": "float64", "Property Preservation and Repair Costs": "float64",
              "Asset Recovery Costs": "float64", "Miscellaneous Holding Expenses and Credits": "float64", "Associated Taxes for Holding Property": "float64", "Net Sales Proceeds": "float64", "Credit Enhancement Proceeds": "float64", 
              "Repurchase Make Whole Proceeds": "float64", "Other Foreclosure Proceeds": "float64", "Modification-Related Non-Interest Bearing UPB": "float64", "Principal Forgiveness Amount": "float64", "Original List Start Date": "object", 
              "Original List Price": "float64", "Current List Start Date": "object", "Current List Price": "float64", "Borrower Credit Score At Issuance": "float64", "Co-Borrower Credit Score At Issuance": "float64", 
              "Borrower Credit Score Current": "float64", "Co-Borrower Credit Score Current": "float64", "Mortgage Insurance Type": "string", "Servicing Activity Indicator": "string", "Current Period Modification Loss Amount": "float64",
              "Cumulative Modification Loss Amount": "float64", "Current Period Credit Event Net Gain or Loss": "float64", "Cumulative Credit Event Net Gain or Loss": "float64", "Special Eligibility Program": "string", "Foreclosure Principal Write-off Amount": "float64", 
              "Relocation Mortgage Indicator": "string", "Zero Balance Code Change Date": "object", "Loan Holdback Indicator": "string", "Loan Holdback Effective Date": "object", "Delinquent Accrued Interest": "float64", 
              "Property Valuation Method": "string", "High Balance Loan Indicator": "string", "ARM Initial Fixed-Rate Period  ≤ 5 YR Indicator": "string", "ARM Product Type": "string", "Initial Fixed-Rate Period": "float64", 
              "Interest Rate Adjustment Frequency": "float64", "Next Interest Rate Adjustment Date": "object", "Next Payment Change Date": "object", "Index": "string", "ARM Cap Structure": "string", 
              "Initial Interest Rate Cap Up Percent": "float64", "Periodic Interest Rate Cap Up Percent": "float64", "Lifetime Interest Rate Cap Up Percent": "float64", "Mortgage Margin": "float64", "ARM Balloon Indicator": "string",
              "ARM Plan Number": "float64", "Borrower Assistance Plan": "string", "High Loan to Value (HLTV) Refinance Option Indicator": "string", "Deal Name": "string", "Repurchase Make Whole Proceeds Flag": "string", 
              "Alternative Delinquency Resolution": "string", "Alternative Delinquency Resolution Count": "float64", "Total Deferral Amount": "float64"
             }

Several columns do not apply to, or do not vary across, the loans included in this dataset and can be excluded from analysis4. These columns are:

cols_na = ["Reference Pool ID", "Master Servicer", "UPB at Issuance", "Prepayment Penalty Indicator", "Interest Only Loan Indicator", 
           "Interest Only First Principal And Interest Payment Date", "Months to Amortization""Mortgage Insurance Cancellation Indicator", "Repurchase Date", "Scheduled Principal Current", "Unscheduled Principal Current", 
           "Original List Start Date", "Original List Price", "Current List Start Date", "Current List Price", "Borrower Credit Score At Issuance", 
           "Co-Borrower Credit Score At Issuance", "Borrower Credit Score Current", "Co-Borrower Credit Score Current", "Current Period Modification Loss Amount", "Cumulative Modification Loss Amount", 
           "Current Period Credit Event Net Gain or Loss", "Cumulative Credit Event Net Gain or Loss", "Zero Balance Code Change Date", "Loan Holdback Indicator", "Loan Holdback Effective Date", 
           "Delinquent Accrued Interest", "ARM Initial Fixed-Rate Period  ≤ 5 YR Indicator", "ARM Product Type", "Initial Fixed-Rate Period", "Interest Rate Adjustment Frequency", 
           "Next Interest Rate Adjustment Date", "Next Payment Change Date", "Index", "ARM Cap Structure", "Initial Interest Rate Cap Up Percent", 
           "Periodic Interest Rate Cap Up Percent", "Lifetime Interest Rate Cap Up Percent", "Mortgage Margin", "ARM Balloon Indicator", "ARM Plan Number", 
           "Deal Name"
          ]

The remaining columns or fields, aside from “Loan Identifier” and “Monthly Reporting Period”, can be partitioned into two sets: static and dynamic. In general, a loan’s static columns fields do not change over reporting periods and typically denote loan or borrower characteristics at the time of origination or acquisition; e.g. “Original UPB”. Dynamic fields, on the other hand, can or are expected to change over time; e.g. “Current Actual UPB”. We provide below a list of static and of dynamic column names:

col_static = ["Channel", "Seller Name", "Original Interest Rate", "Original UPB", "Original Loan Term", 
              "Origination Date", "First Payment Date", "Original Loan to Value Ratio (LTV)", "Original Combined Loan to Value Ratio (CLTV)", "Number of Borrowers", 
              "Debt-To-Income (DTI)", "Borrower Credit Score at Origination", "Co-Borrower Credit Score at Origination", "First Time Home Buyer Indicator", "Loan Purpose", 
              "Property Type", "Number of Units", "Occupancy Status", "Property State", "Metropolitan Statistical Area (MSA)", 
              "Zip Code Short", "Mortgage Insurance Percentage", "Amortization Type", "Mortgage Insurance Type", "Special Eligibility Program", 
              "Relocation Mortgage Indicator", 
             ]
col_dynamic = ["Servicer Name", "Current Interest Rate", "Current Actual UPB", "Loan Age", "Remaining Months to Legal Maturity", 
               "Remaining Months To Maturity", "Maturity Date", "Current Loan Delinquency Status", "Loan Payment History", "Modification Flag", 
               "Zero Balance Code", "Zero Balance Effective Date", "UPB at the Time of Removal", "Total Principal Current", "Last Paid Installment Date", 
               "Foreclosure Date", "Disposition Date", "Foreclosure Costs", "Property Preservation and Repair Costs", "Asset Recovery Costs", 
               "Miscellaneous Holding Expenses and Credits", "Associated Taxes for Holding Property", "Net Sales Proceeds", "Credit Enhancement Proceeds", "Repurchase Make Whole Proceeds", 
               "Other Foreclosure Proceeds", "Modification-Related Non-Interest Bearing UPB", "Principal Forgiveness Amount", "Servicing Activity Indicator", "Foreclosure Principal Write-off Amount", 
               "Property Valuation Method", "High Balance Loan Indicator", "Borrower Assistance Plan", "High Loan to Value (HLTV) Refinance Option Indicator", "Repurchase Make Whole Proceeds Flag", 
               "Alternative Delinquency Resolution", "Alternative Delinquency Resolution Count", "Total Deferral Amount"
               ] 

Note: We listed “Metropolitan Statistical Area (MSA)” as a static column though, strictly speaking, MSA definitions do change over time.

Finally, it is important to recognize date columns and parse them as such during file import:

col_dates = ["Monthly Reporting Period", "Origination Date", "First Payment Date", "Maturity Date", "Interest Only First Principal And Interest Payment Date",  
             "Zero Balance Effective Date", "Repurchase Date", "Last Paid Installment Date", "Foreclosure Date", "Disposition Date", 
             "Original List Start Date", "Current List Start Date", "Zero Balance Code Change Date", "Loan Holdback Effective Date", "Next Interest Rate Adjustment Date", 
             "Next Payment Change Date"]

Helpful resources

Fannie Mae provides various resources to help access, understand, and summarize the data. In particular, start by examining this Loan Performance Data Tutorial and browse the other resources available on their website here.

You may also find this case expository note Working with Large Datasets helpful.

Footnotes

  1. Fannie Mae updates the dataset quarterly↩︎

  2. We ignore the accompanying HARP dataset↩︎

  3. The “|” delimiter is preferable when some values include commas↩︎

  4. Fannie Mae provides them for compatibility with other dataset disclosures↩︎