Alternative Data

How Fintechs Can Employ Benford’s Law to Detect Fraud

April 7, 2023
5 MInutes

As we continue to identify sources and expand our data library, we are always looking for ways to validate our data, for decisioning accuracy and to detect fraud.

For risk professionals leveraging technology, the goal is efficiency and accuracy, while further enhancing the reliability of the data being consumed.

Our investigations, models' testing and recent integration of Benford's law into our own work has proven beneficial. And its wide acceptance and use among the financial regulatory and services communities is encouraging.

Benford's law is a statistical tool that can be used to detect potential fraud or anomalies in numerical datasets. It is based on the observation that in many naturally occurring datasets, the first digit of the numbers follows a specific pattern, with the digit 1 being the most common, followed by 2, 3, and so on, up to 9.

It was astronomer Simon Newcomb, in 1881, that first recognized the pattern (his paper is here1), however, it wasn’t until 1938, when physicist Frank Benford tested Newcomb's hypothesis against 20 sets of data and published a scholarly paper verifying the law (a great Abstract is here2), that knowledge and use of this law began to take hold. That happens sometimes, as it did here, and we have what is now commonly referred to Benford's Law.

Benford’s Law, Distribution of the First Digit

d Ideal
1 30.10%
2 17.60%
3 12.50%
4 9.70%
5 7.90%
6 6.70%
7 5.80%
8 5.10%
9 4.60%

As we began to build models to test our own datasets, we were encouraged to see this pattern recurring. At the borrower level, we found these models helpful in analyzing bank data, company financials and tax returns. At the macroeconomic level, running these models on industry data, economic health and population data proved valuable as a test of the accuracy of the data we use in our models.

Benford’s Law, Looking at Transaction Data

d Bank 1 Bank 2
1 31.00% 19.00%
2 16.00% 17.20%
3 14.00% 12.50%
4 10.30% 26.80%
5 8.00% 5.00%
6 7.00% 6.20%
7 0.01% 4.10%
8 4.50% 1.50%
9 4.20% 7.70%

In this magnified example, the point is to identify datasets that require extra attention.

Take care, as there are some limitations to its applicability, which include:

Sample size: Benford's law is more accurate when applied to large datasets with numbers that have 3, 4 or more digits, and may not be reliable when applied to small datasets.

Selection bias: If the data is not representative of the entire population, such as credit scores with lower and upper limits, or is deliberately manipulated, as is the case in the calculated % revenue available for debt-service, this law provides little benefit.

Overall, while Benford's law is a useful tool for detecting anomalies in datasets, it should be used with caution and in conjunction with other safeguards (chain-of-custody) and analytical methods (overlap and inferences among sources) to ensure its validity.

If you’re interested to dive deeper into the sources used to write this, please see here3 for an explanation on and derivation of the law, and here4 for a practical example using Microsoft Excel.

Have a look at some of our work here.

Footnotes

1. Note on the Frequency of Use of the Different Digits in Natural Numbers., Simon Newcomb, 1881.
2. The Law of Anomalous Numbers., Frank Bedford, 1938.  
3. A Quick Introduction to Benford’s Law, Steven J. Miller, 2015.
4. Using Excel and Benford’s Law to detect fraud, By J. Carlton Collins, CPA April 1, 2017

Sherif Hassan is the principal of Syh Strategies, a financial and technology services advisory firm based in New York City. Among the services they provide in Lending are business strategy, portfolio analysis, credit modeling, product and pricing optimization, and operations architecture for lenders and brokers of all sizes and at all stages of development. He can be contacted at sherif@syhstrategies.com.

Similar posts

Our latest insights into fintech, credit, and small business.