Skip to content

How to spot a faker!

November 6, 2019

In a fascinating article published in the New York Times, Malcolm Browne relates how Dr. Theodore Hill would ask his mathematics students to go home and either toss a coin 200 times and record the results, or else pretend that they had done so. Either way, he would ask them to produce for him the results of their (real or imaginary) coin-tossing experiment.

Dr. Hill’s purpose in this experiment was to show just how difficult it is to fake data convincingly. It just isn’t that easy to make up a random sequence. Based on this knowledge, he would astound his students by almost unerringly picking out the fakers from the tossers!

One of the ways he would do this would be to spot how many times heads or tails would be listed six or more times in a row. In real life, this occurrence is overwhelmingly probable in 200 coin throws. To most of his students this long a sequence is counter-intuitive, an example of what is often termed the Gamblers’ Fallacy, i.e. the erroneous perception that independent random sequences will balance out over time, so that for example an extended sequence of heads is more likely to be followed by a tail than a head. The fakers, susceptible to the Fallacy, are thus easily exposed. Ordinary people, even mathematics students, simply can’t help introducing patterns into what is random noise.

This is an example of a broader analysis which is usually referred to a Benford’s Law, which essentially states that if we randomly select a number from a table of real-life data, the probability that the first digit will be one particular number is significantly different to it being a different number. For example, the probability that the first digit will be a ‘1’ is about 30%, rather than the intuitive 10%, which assumes that all digits are equally likely. In particular, Benford’s Law applies to the distribution of leading and trailing digits in naturally occurring phenomena, such as the population of different countries or the heights of mountains. For example, choose a paper with a lot of numbers and circle the numbers that occur naturally, such as stock prices. So lengths of rivers lakes could be included, but not artificial numbers like telephone numbers. 30% or so of these numbers will start with a 1, and it doesn’t matter what units they are in. So the lengths of rivers could be denominated in kilometres, miles, feet, centimetres, without it making a difference to the distribution frequency of the digits. 

The empirical support for this proportion can be traced to the man after whom the Law is named, physicist Dr. Frank Benford, in a paper he published in 1938, called ‘The Law of Anomalous Numbers’. In that paper he examined 20,229 sets of numbers, as diverse as baseball statistics, the areas of rivers, numbers in magazine articles and so forth, confirming the 30% rule for number 1. For information, the chance of throwing up a ‘2’ as first digit is 17.6%, and of a ‘9’ just 4.6%. The same principle applies to trailing (i.e. last) digits. It’s a great way, therefore, of checking the veracity of receipts. If, for example, there is an unusual number of trailing digit ‘7’s, there’s a decent chance that the figures are cooked.

To explain the basis of Benford’s Law, take £1 as a base. Assume this now grows at 10% per day.

£1.10, £1.21, £1.33, £1.46, £1.61, £1.77, £1.94, £2.14, £2.35, £2.59, £2.85, £3.13, £3.45, £3.80, £4.18, £4.59, £5.05, £5.56, £6.11, £6.72, £7.40, £8.14, £8.95, £9.84, £10.83, £11.92, £13.11, £14.42, £15.86, £17.45, £19.19, £21.11, £23.22, £25.50, £28.10, £30.91, £34.00, £37.40, £41.14, £45.26, £49.79, £54.74, £60.24, £72.89, £80.18, £88.20, £97.02 …

So we see that the numbers stay a long time in the teens, less in the 20s, and so on through the 90s, and this pattern continues through three digits and so forth. Benford noticed that the probability that a number starts with n = log (n+1) – log (n).

NB log10 1 = 0; log10 2 = 0.301; log10 3 = 0.4771 … log10 10 = 1.

Leading digit                                                        Probability

      1                                                                 30.1%

      2                                                                 17.6%

      3                                                                 12.5%

      4                                                                 9.7%

      5                                                                 7.9%

      6                                                                 6.7%

      7                                                                 5.8%

      8                                                                 5.1%

      9                                                                 4.6%

 

Tax authorities are alert to this, or should be, which should make fraudulent activity just that little bit easier to detect, especially when the fraudster is unaware of the Benford distribution. For all right-minded citizens, we can call that Benford’s Bonus.

Links:

http://www.rexswain.com/benford.html

http://www.jstor.org/pss/984802

 

From → Uncategorized

Leave a Comment

Leave a comment