Showinghow Benford´s law applies to real-lifedata IntroductionWe have been talking inTOK whether mathematics is discovered orinvented and I have been always thinking that it was invented, but when I heardthat mathematics can explain things in the realworld such as magnetism and waves I immediately wanted to see with myown eyes if mathematics really applies to real world. And even if we can´´see´´ mathematics in nature, does it actually explain anything or can we useit somehow? Finding mathematical explanations in nature is great, but if wecould use that information in real life then that would be glorious.Studying Benford’s law is interesting asthe law applies to so many real-lifelists of numbers such as houseprices, population numbers, and deathrates.
To do that I will investigate 3 sets of data, calculate the incidence ofthe leading digits and see how it matches the distribution of leading digitsobtained by the law. The aim of this exploration to me is to see howmathematics can be seen in nature. For sources I have many electronic sourcesas literary about the topic of this exploration was not available in my libraryin my country and one of the main sources used is Wikipedia as it contained themost detailed information about the law and many other sources recommended itfor more detailed information.
I also have one university-levelessay I used which is more detailed than Wikipedia, but the mathematics in itis above IB standard level. The Benford´s lawIt was 1881 when SimonNewcomb was reading through logarithmtables when he observed that the earlier pages were more common than the otherpages. In 1983 physicist Frank Benford tested it on real data that included numberstaken from newspapers, population sizes,air pressure measurements and many else,introducing the law in a more detailed wayto other mathematicians1. Benford´s law is amathematical theory which determines the distribution of leading digits.
Leadingdigit is the first digit of a number sofor example, a leading digit of 1099 is 1and 1,2,3,4,5,6,7,8 and 9 are all possible leading digits. When observing alist of numbers, we often assume that the leading digits would be evenlydistributed and number 1 occurs as often as number nine, but Benford´s lawstates that actually number one is the first digit with 30.1% probability whichis a lot greater than the first guess, 11.1%2. Benford´s law makespredictions for distribution of other digits too and so Benford’s law can be used to explain thedistribution of leading digits in sets of numbers. This distribution of firstdigits can be seen from a bar graph below:Graph 1. Distribution of first digits 3 This graph is not mine Each bar represents a digit, and theheight of the bar is the percentage of numbers that start with that digit.
Thegraph shows how number one has the greatest probability of appearing as aleading digit and then the probabilitygradually decreases as the numbers getbigger until number nine that has the smallest probability to appear as aleading digit. Theprobability of the first digit (d) in a setof numbers that satisfy Benford´law can also be represented by the formula:Formula forprobability of the first digit (d) where d ? (1,…,9)4And it can be simplified as?Here we have base 10, but Benford´s law also works withany other base when b ? 1.5 Now, we can use this formula to calculate the probability of a leadingdigit 2 (or any other leading digit): P(2) = log10 (1+12) =log10(1.
. =17.6% (rounded)This way we can calculate thedistribution of leading digits and show them in a table.Table 1. Distribution ofleading digits calculated using the formula This table shows the samething as graph 1.
, but in tabular form showing the same trend -the probabilitygets smaller when the leading digit gets bigger.Logarithmic scale6 Thisscale is not mineOne way to explain Benford´slaw is to look at the logarithmic scale.If we take a number, for example a number 5645, we can observe that log10(5645)=3,75.On the scale value 0.75 lies between log104 (0.70) and log105(0.78). So, number 5645 has a leading digit 5.
7 Also, the distance betweeneach value gets shorter when you move along the scale. The width of eachsection is proportional to log10(d+1)-log10(d)8.Now, we can take this scaleto the next level and have a scale below where the colored area in the logarithmic scale shows the probability of eachleading digit (check which color represents which leading digit from the table below it:9 Thisscale is not mineThe table below shows theleading digit, its probability, and thecolor representing it.
10 Thistable is not mineRestrictions of the lawIt should be noted thatBenford´s law does not apply to all sets of numbers and the law only works ifthe values are distributed across multiple orders of magnitude, therefore thelaw works the best with large sets of numbers11. Order of magnitude is ameasure of the size of a number and values distributed across multiple ordersof magnitude differ a lot from each other when compared. So, Benford´s lawwould not work with for example heights of humans as the values are notdistributed across multiple orders of magnitude as all humans have a heightvarying from zero to two meters12. AnalysisNow, I can take a look atdata from online and see if the law works there and if it does then how precisethe distribution of leading digits is compared to table 1. Each of my set ofnumbers will contain about 200 or fewervalues and I will calculate the number of incidences by copy-pasting all the values from the source one by one to Excel andthen use Excel to calculate the number of incidences. I will use datacontaining about 200 values as Benford´s law works the best with a large set of numbers (100 and above) and thebigger the data, the smaller the difference between the distribution of leading digits when comparing incidences obtainedwith values from table 1. The raw data will be found at the end of this essayin section Appendix.
First, I will show youstep-by-step how Excel was used. I learned these steps from a document that hasinstructions about how to apply Benford´s law, a link to the dovument will befound in footnote 13.13 1.
Start with Excelthat has all the values you need.2. On the cell (box)next to the first number, perform following steps: Type in =LEFT(Copy the first numberType in )Press the Enter key3. Now, the leadingdigit for the first number will appear. Click it, hold the left mouse and dragthe cursor down until you reach the end of the list of numbers. Then releasethe mouse. 4.
Click A-Z button on the top of themenu.5. When the Sort Warning window appears, selectExpand the selection and click sort.
6. All the transactions now will be arranged by thefirst digit.7.
Select the column containing all of the leading digit 1.Click on data on the top menu and choose subtotal. 8. Choose Use Function in the subtotal window and click count. Thenclick OK. 9. Now you will havethe total number of leading digit 1s. 10.
Repeat steps 7. to8. for other leading digits. 11. Make a table. The incidence % for eachleading digit I calculated by dividing the number of incidences with totalincidences and multiplied by 100%.
Trial 1.First, we are going to lookat populations of 200 different countries and see if the law works here and ifit does then how precisely do the probabilityof the leading digits obey the law. I will take a look at all the valuesin Excel and count how many times does each leading digit occur in the data andthen calculate the incidences as a percentage. Table 2. Number of incidencesand the incidence as percentage, for each leading digit in data containingpopulations of 200 countriesWhen the incidences as a percentage in this table are compared with the distributionof leading digits in table 1 or observe from Graph 1. we can observe that theincidence percentage of leading digit being 3 is exactly the same and almostthe same for leading digits 1,9 and 6.
Overall the values obey the law almostperfectly and the incidence does gradually decrease as the leading digit getsbigger, the only error is between leading digit 7 and 8 where the incidence ingreater for leading digit 8 than 7. This is remarkable as we can see that themathematical law exists in the real world and I can say with no doubt now thatBenford´s law does apply to real sets of numbers.Graph 2.
The distributionof leading digits for populations of countriesEach bar in this graphrepresents a leading digit and the height the incidence percentage. The trendlineshows the trend of decreasing incidence when the leading digit gets bigger. Thetrend is not as smooth here as in graph 1 obtained by Benford’s law and the reason for that can be the size of the data.As Benford’s law works the best with large sets of numbers, the bigger the set ofnumbers, the more the graph is like to graph 1.
Trial 2.We can also look at 199countries listed by their total area and see if Benford´s law works here andhow precisely it works if it works. My gut already tells me that it works, butmaybe not as precisely as in trial 1 for leading digit 1 as it is difficult toimagine for me why it would occur so often here, but let see if I am right.Table 3. Number of incidencesand the incidence as percentage, for each leading digit in data containingtotal areas of 200 countriesAs we compare the incidencesin this table with the values from graph 1 or the distribution of leadingdigits from table 1.
we observe that the values are really close to each other.For example, the difference between firstdigit being 5 in this table and in graph 1 is only 2,4% and the difference getseven smaller when we calculate the difference for other values. For some reason, this data did not obey the law asclearly as trial 1, but still, we canclearly say that the law works here. Also, in graph 1 we saw a trend where the incidence gets gradually smaller when theleading digit gets smaller, but unfortunately here we see the same trend foronly the first 4 leading digits. The total areas of countries existing todayare set my wars and history in general, so I think it is pretty remarkable thatthe law works even here as I think it is not natural to think that mathematicaltheory could match history.
Graph 3. The distributionof leading digits for total areas of countriesEach bar in this graphrepresents a leading digit and the height the incidence percentage. Thetrendline shows the general trend of decreasing incidence when the leadingdigit gets bigger, but there are bars higher than the bar of biggerleading digit and for example, the bar of leading digit 9 is higher than barsfor leading digits 8, 7 and 5. Trial 3.We can then take a look at something in nature and itcould be linked to rivers, lakes, altitudes etc., but I have chosen to look at elevations of 166 countries as I foundenough data to examine them. I will have my elevations is meters, but the lawshould work as well with other units.
The original data contained elevations of200 countries, but some of them had an elevationof 0 meters and I will not include them in my investigation as Benford’s law does not consider 0 to be aleading digit. Table 4. Number of incidences and the incidence aspercentage, for each leading digit in data containing elevations of countriesFrom this table, wecan see that the incidences are similar compared to incidences obtained by Benford’s law, but definitely differ a lot more than in trials 1 and 2.
I am not surewhy, but possible reasons could be the decreased size of the data and possiblehuman manipulation of the numbers as Benford’slaw works the best with numbers that are not changed by humans and are natural. Graph 4. The distributionof leading digits for evaluations of countriesEach bar in this graphrepresents a leading digit and the height the incidence percentage. Fromthis graph, we can see the trend of decreasingincidence as the leading digit gets bigger, but the trend is scattered and alot less precise than in Benford’s graph(graph 1) or in graph 2. For some reason,there are bars higher than the bar of bigger leading digit and an example ofthis would be bars for leading digit 4and 5. ApplicationsOne of the reasons the Benford’s law is so amazing is the variety ofapplications. The most famous one would be its uses in fraud detection.
If aperson is to make up numbers to cheat for example the government or the taxsystem, the person is likely to aim to distribute the numbers uniformly, but asBenford’s law shows this should not benaturally possible in large sets of numbers and so can be used in frauddetection when the values are compared with distribution according to Benford’s law. The law can also be used whenchecking the reliability of electionresults and was used to catch a fraud inIranian election 2009. But it should be noted that some experts don´t support the reliability of the law in case ofelections. There are also other caseswhere Benford’s law has been used tocatch a fraud.
For example some years after Greece joined the eurozone, theirmacroeconomic data they used to get into the eurozone was shown to be falseusing the law.14 ConclusionOveral the data I tested tosee if the Benford’s law works matchedthe distribution of leading digits obtained from the use of the formula (values in table 1) almost perfectlyand best showed the incidence of leading digit 1 being always about 30%. Thetrend of decreasing probability when the leading digit gets bigger, is notshown as clearly and there were errors, but overall the results still show thetrend. These errors would have been possible to minimize by the use of largerdata as Benford´s law works the best with large sets of numbers. It would havebeen interesting to also examine the law with smaller (about 100 values) setsof numbers, which is significantly less than in trial 1. and 2. and Iwould like to see if decreasing the values so dramatically causes thedistribution of leading digits be further away from the values obtained usingthe formulae (table 1).
As mentioned before, Benford´s law works the best whichlarge set of numbers, but about 100 values should still be enough to see thelaw working.The importance of the law can be seenfrom the applications of the law, arguably most importantly when catching frauds against the tax systems and government. Tome, the importance of this exploration was to understand and ´´see with my owneyes´´ that math is discovered, not invented as if it would be invented I don´tthink it would be possible that mathematical theories could be seen in so many real-life scenarios and have such a variety ofapplications. 1 Jamain, Adrian. “Benford´s Law.” Imperial Collageof London, Sept. 2001,www.bing.
pdf=DevEx,5037.1.,30.12.172 “Benford’s Law.” From Wolfram MathWorld,mathworld.wolfram.com/BenfordsLaw.
html., 30.12.173 “Benford’s Law.
” Wikipedia, Wikimedia Foundation,9 Dec. 2017, en.wikipedia.org/wiki/Benford%27s_law#History., 30.
12.174 Corn, Patrick. “Benford’s Law.” Brilliant Math& Science Wiki, brilliant.org/wiki/benfords-law/., 01.
01.185 “Benford’sLaw.” Wikipedia, Wikimedia Foundation, 9 Dec. 2017,en.
wikipedia.org/wiki/Benford%27s_law#History., 03.01.18 6 Corn, Patrick.
“Benford’s Law.” Brilliant Math& Science Wiki, brilliant.org/wiki/benfords-law/.
,20.01.18 7 Corn, Patrick. “Benford’s Law.
” Brilliant Math& Science Wiki, brilliant.org/wiki/benfords-law/.,20.
01.188 Berry, Nick. “Benford´s Law.” Benford’s Law,datagenetics.com/blog/march52012/index.html., 20.01.
189 Berry, Nick. “Benford´s Law.” Benford’s Law,datagenetics.com/blog/march52012/index.
html., 20.01.1810 Berry, Nick. “Benford´s Law.” Benford’s Law,datagenetics.com/blog/march52012/index.
html., 20.01.1811 “Number 1 and Benford’s Law – Numberphile.”Numberphile, 20 Jan.
201812 “Benford’s Law.” Wikipedia, Wikimedia Foundation,9 Dec. 2017, en.wikipedia.org/wiki/Benford%27s_law#History.
, 31.12.1713 “APPLYING BENFORD’S LAW.” Benford’s Law,datagenetics.com/blog/march52012/index.
html., 20.01.1814 “Benford’s Law.” Wikipedia, WikimediaFoundation, 9 Dec. 2017, en.wikipedia.org/wiki/Benford%27s_law#History., 01.01.18