Intuitor.com Intuitor.com
by Tom Rogers, Twitter Link
Local hex time:   
Local standard time:   

Benford's Law Part 1 - How to Spot Tax Fraud

Everyone knows that our number system uses the digits 1 through 9 and that the odds of randomly obtaining any one of them as the first significant digit in a number is 1/9. (First significant digit means we ignore zeros.)  This works well for fake data generated with a random number generator or the type of data an embezzler would create. With naturally occurring data this generally isn't true. The odds of obtaining a 1 for the first significant digit of a number are much higher than the odds of obtaining any other digit as shown below:

 

Digit 1 2 3 4 5 6 7 8 9
Odds of Obtaining as 1st Digit (%) 30.1 17.6 12.5 9.7 7.9 6.7 5.8 5.1 4.6
This rather amazing fact was discovered in 1881 by the American astronomer Simon Newcomb. Pocket calculators were not even a dream at that time. Calculations were made using pencil and paper. Books with page after page of logarithm tables were used for complex calculations. Newcomb noticed that the pages of the logarithm books containing numbers starting with 1 were much more worn than the other pages. After analyzing several sets of naturally occurring data Newcomb went on to derive what later became Benford's law. Newcomb was rewarded for his effort by being ignored.


In 1938 a physicist Dr. Frank Benford made the same discovery. However, he studied a much larger amount of data than Newcomb. He analyzed about 20,229 different sets of data, including the areas of rivers, baseball statistics, numbers in magazine articles and the street addresses of the first 342 people listed in the book "American Men of Science (ref 1). Unlike Newcomb, Benford was recognized for his contributions and the relationship he derived was eventually named Benford's law in his honor.

When the logarithms of the digits 1 through 9 are plotted they look like the number line shown below:

Logarithmic Scale

1               

30.1%

2         

17.6%

3   

12.5%

9.7%

5

7.9%

6

6.7%

7

5.8%

8

5.1%

9

4.6%

This means that all numbers starting with a "1" will occupy 30.1% of the total length of the scale. Numbers like 1.23784, 1.5, or 1.879 would fall in this region.

Note that these relative distances are independent of the power of ten a number is multiplied by. For example, the distance between .001 and .002 on a logarithmic scale is identical to the distance between 1000 and 2000. In other words the distance between 1 x 10 -3 and  2 x 10 -3 is identical to the distance between  1 x 10 3 and 2 x 10 3. Again the power of ten makes no difference on a logarithmic scale.

Zeros are also not considered as first significant digits in a decimal fraction because  they are only used as place holders to indicate the location of the decimal point. For example, .001 would be written as 1 x 10 3. One would be considered the first significant digit.

Benford reasoned that the length of the distance from one number to the next divided by the length of the entire scale would give the probability of the digit being the first one in a given data value. Mathematically this is expressed as follows for base 10 numbers:

Log10 (n+1) - Log10

Log10 10 -Log10 1

  

=

Log10 (n+1) - Log10

=

Log10 (1+1/n)

where: n = the first significant digit of a number

Notice that if a data entry (base 10) begins with a 1, the entry has to be at most doubled to have a first significant digit of 2. However, if a data entry begins with a 9, it only has to be increased by, at most, 11% to change the first significant digit into a 1. This once again illustrates that a first significant digit of 1 is more likely to occur than a 9.

Benford's law has been used as a method for spotting fraudulent accounting data by looking at the first significant digit of each data entry and comparing the actual frequency of occurrence with the predicted frequency. Most white collar criminals are unaware of Benford's law and will use each digit about 10% of the time for the first significant digit in a number.

Benford's law doesn't work for numbers controlled to a specific value, nor does it work for truly random numbers such as those generated by a random number generator. Benford's law also doesn't work well for small sample sizes. However, it holds true in a surprising number of situations. Benford's law shows that natural processes can be remarkably resistant to complete randomness. 

References:

1. "Following Benford's Law, or Looking Out for No. 1", By Malcolm W. Browne (From The New York Times, Tuesday, August 4, 1998)

2. "The First-Digit Phenomenon" by T. P. Hill, American Scientist, July-August 1998)

< Return to Contents

 
[ Intuitor Home | Mr. Rogers AP Statistics  | Physics | Insultingly Stupid Movie Physics | Forchess | Hex | Statistics t-Shirts | About Us | E-mail Intuitor ]
Copyright © 1996-2001 Intuitor.com, all rights reserved
on the web since April 2, 1996
Twitter Link