On February 16, 2017, the Consumer Financial Protection Bureau issued a request for information (“RFI”) regarding the potential benefits and risks of relying on alternative data and modeling techniques in the consumer credit marketplace.

In contrast with information that has traditionally defined a consumer’s credit history—such as debt repayment history and the existence of current or past loans, mortgages, and debt collection actions—the RFI describes “alternative data” as encompassing an array of nontraditional sources that lenders might consider when assessing consumers, such as:

  • Payment data relating to non-loan products requiring regular payments, such as rent, insurance, utilities, or cell phone bills.
  • Information about a consumer’s assets, which could include the regularity of a consumer’s cash inflows and outflows or information about prior income or expense shocks.
  • Data that may be related to a consumer’s stability, such as the frequency of changes in residences, employment, phone numbers, or email addresses.
  • Information about a consumer’s educational or occupational attainment, such as schools attended, degrees obtained, and job positions held.
  • Behavioral data, such as how consumers interact with a web interface or answer specific questions, or data about how they shop, browse, use devices, or move about their daily lives.
  • Data about consumers’ social and professional connections, including on social media.

In terms of “alternative modeling techniques,” the RFI provides the following list of machine learning processes as examples: “decision trees, random forests, artificial neural networks, k-nearest neighbor, genetic programming, ‘boosting’ algorithms, etc.”

In remarks at a field hearing on alternative data, CFPB Director Richard Cordray stated that alternative data and modeling techniques could help expand access to financial products and services to some of the 26 million Americans the agency estimates are “credit invisible” due to a lack of credit history and to some of the additional 19 million Americans who lack sufficient traditional data for generating a credit score.

The RFI identifies the following as potential benefits from expanded use of alternative data and modeling techniques:

  • Greater credit access, particularly for the unbanked and underbanked.
  • Enhanced creditworthiness predictions, including for consumers who already have usable credit histories.
  • More timely information, lower costs, and better service and convenience, largely through the increased automation of credit-checking tasks.

As potential risks or challenges, the RFI highlights:

  • Discrimination risks, given that alternative data can potentially serve as proxies for groups protected by antidiscrimination laws.
  • Privacy issues, in terms of both data collection and sharing.
  • Data quality issues, due to the nature of the types of data used or due to a lack of the accuracy and quality obligations that are commonly expected for traditional data.
  • Issues with transparency, controlling and correcting data, and consumer education, particularly given the difficulty of explaining complex machine learning processes in consumer-friendly terms.
  • Increased difficulties in improving one’s credit standing, especially when the underlying calculation includes data outside the consumer’s control, such as data relating to behavior by peers or broader consumer segments.
  • Unintended or undesirable effects, such as the possibility of an algorithm mistaking a servicemember’s frequent changes in base assignments as a sign of residential instability.
  • Risks associated with other legal obligations, such as CRA requirements or UDAAP prohibitions.