The last time Hackerfall tried to access this page, it returned a not found error. A cached version of the page is below, or click here to continue anyway


Charles Petzold on writing books, reading books, and exercising the internal UTM

De-Obfuscating the Statistics of Mass Shootings

July 5, 2015Roscoe, N.Y.

After the horrifying killings at the Mother Emanuel African Methodist Episcopal Church in Charleston, South Carolina, President Obama once more had to speak publicly about a mass shooting. "Lets be clear," he said. "At some point, we as a country will have to reckon with the fact that this type of mass violence does not happen in other advanced countries. It doesnt happen in other places with this kind of frequency."

Of course, those people whose function in life is to contradict everything this President says or does were quick to note that other countries do have mass shootings. Some right-wing web sites even went a step further by posting statistics that seem to suggest that when mass shootings are corrected for population, the United States doesn't come out too bad. One such article on included the following chart:

The web site triumphantly exclaimed "Boom, here we go."

The table shows data of 12 of the 34 countries that comprise the Organisation for Economic Co-operation and Development (OECD). These countries are generally considered to be examples of "industrialized" or "advanced" countries and can legitimately be compared.

The first four columns of the table show (not in this order) the number of rampage shootings in these 12 countries during the five-year period from 2009 through 2013; the number of fatalities of the shootings; and the per-capita rates per million of population. Regardless whether you look at the number of shooting incidents or the number of fatalities, the United States ranks 6th after Norway, Finland, Slovakia, Israel, and Switzerland. obtained its numbers from a defunct web site called, but an archived page is available that lists the other 22 countries of the OECD and their populations. (Six additional countries had one rampage shooting each during this five-year period but were not listed in the IJReview summary.) The site even highlights the five countries with higher rates than the U.S. with graphics that form guns out of the countries' flags:

Do you see American flag here? The graphic emphasizes that the United States has lower rates of mass shootings than these five countries. In this analysis, we're not number one.

Im not going to argue with the validity of the data themselves. Im going to assume that all the numbers are all correct. But I am going to question the validity of ranking countries in this way and drawing conclusions from that ranking.

For this analysis, Ill focus exclusively on the number of incidents of mass shootings, and not the number of people killed in these mass shootings. The second figure seems to me to involve a second variable, which relates to the average number of people killed in such shootings.

Here is my table that reproduces the countries that experienced mass shootings, ordered by rate of mass shootings per million of population:

CountryRampageShootingsPopulationShootingsPer Million Finland25,421,8270.369 Israel27,941,9000.252 Switzerland28,000,0000.250 Norway15,033,6750.199 Slovakia15,445,3250.184 United States38314,941,0000.121 Hungary19,942,0000.101 Greece110,787,6900.093 Belgium111,041,2660.091 Netherlands116,751,3230.060 Canada235,010,0000.057 Germany381,799,6000.037 Spain147,190,4930.021 Italy160,813,3260.016 United Kingdom162,262,0000.016 France165,350,0000.015 Mexico1113,910,6080.009 Japan1126,659,6830.008

Here's a second version of the same table including the other countries that comprise the OECD. These countries are again sorted by rate of mass shootings per million of population, and then by population:

CountryRampageShootingsPopulationShootingsPer Million Finland25,421,8270.369 Israel27,941,9000.252 Switzerland28,000,0000.250 Norway15,033,6750.199 Slovakia15,445,3250.184 United States38314,941,0000.121 Hungary19,942,0000.101 Greece110,787,6900.093 Belgium111,041,2660.091 Netherlands116,751,3230.060 Canada235,010,0000.057 Germany381,799,6000.037 Spain147,190,4930.021 Italy160,813,3260.016 United Kingdom162,262,0000.016 France165,350,0000.015 Mexico1113,910,6080.009 Japan1126,659,6830.008 Turkey074,724,2690.000 South Korea050,004,4410.000 Poland038,186,8600.000 Australia022,841,9210.000 Chile016,572,4750.000 Portugal010,581,9490.000 Czech Republic010,512,2080.000 Sweden09,540,0650.000 Austria08,414,6380.000 Ireland06,399,1520.000 Denmark05,580,4130.000 New Zealand04,445,4360.000 Slovenia02,055,4960.000 Estonia01,340,1940.000 Luxembourg0524,8530.000 Iceland0320,0600.000   Total611,250,346,1460.049 Total Non-US23935,405,1460.025

This table, however, includes two summary lines at the bottom that neither the nor bothered with. These are totals with and without the United States. Perhaps there was a reason why these obviously important totals were excluded.

Just to reiterate: These are the 34 countries that comprise the OECD, with the population and number of mass shootings from the years 2009 through 2013 taken directly from the web site. This is all the information that I'll be analyzing.

Statistical Significance

The primary purpose of statistics is to help us understand various phenomena of the real world and possibly to predict what might happen in the future. How meaningful is the fact that Finland tops the chart with a rate of 0.369 mass shootings per million of population over a five-year period? Does it tell us anything significant about Finland? Does it mean that Finland is the mass shooting capitol of the world? How could it, with only two mass-shooting incidences in five years? Does it mean that Finland will continue to have two mass shootings every five years? Not necessarily. The numbers are too small to tell us anything.

Tiny numbers do not make good statistics. Yet, all the countries in this table (except one) experienced just three mass shootings or fewer. These are very tiny numbers and their statistical significance is pretty much negligible.

What's additionally interesting is that the top five countries in this table all have populations under 10 million:

CountryRampageShootingsPopulationShootingsPer Million Finland25,421,8270.369 Israel27,941,9000.252 Switzerland28,000,0000.250 Norway15,033,6750.199 Slovakia15,445,3250.184

Only seven of the other countries in the OECD have populations less than 8 million. Keep in mind that the lower the population, the higher the per-capita rate. So we're dealing here not only with tiny numbers of incidents — because mass killings are not overall very common — but also small populations.

There is a phenomenon in statistics called "regression towards the mean." As you examine larger and larger populations, they tend to gravitate towards the average. Smaller populations are statistically more erratic and unstable because they more susceptible to random fluctuations. For a small country, 1 or 2 additional mass shootings in a five-year period can propel it to the top of the list.

Suppose we were to plot a graph with a horizontal axis based on ranges of rates of mass shootings. For each range of rates, the graph shows the total population of the countries that fit into that range. What should we expect?

We would expect the larger countries to cluster towards the range of tiny rates of mass shootings. By contrast, the smaller countries are the outliers where 1 or 2 mass shootings affect the rate a great deal. These smaller countries should be further from the average and tend more towards extremes, but with small heights in the graph because the populations are so small. In other words, we should expect a graph like this with a long but miniscule tail:

The four tiny bumps to the right of 0.100 are the five countries with the highest rates of mass shootings.

But the problem with that table is that it doesn't include the United States. Let's add the United States to the table:

And now we see a bar in this graph with much more statistical significance because the population is very large, but which at the same time is also quite removed from the average established by the other OECD countries.

While it's interesting to examine comparisons of mass-shooting incidents in various countries, it is statistically invalid to compare these countries based on rankings that result from 1 or 2 or 3 mass shootings in the five-year period. When medical statistics are compiled, populations with less than a certain number of incidents of a particular disease or injury are considered to be unreliable. Here's a web page from the New York Department of Health that answers the question "Why are rates based on fewer than 20 cases marked as being unreliable?" The conclusion is that "When the rates are based on only a few cases or deaths, it is almost impossible to distinguish random fluctuation from true changes in the underlying risk of disease or injury."

Most of the countries in the tables posted by and have far lower than 20 incidents of mass shootings. Claiming that these data have statistical validity is either deliberately deceitful or ignorantly deceptive.

In the entire table of mass shooting statistics, only three lines meet any type of criteria for being statistically meaningful. Here they are:

CountryRampageShootingsPopulationShootingsPer Million United States38314,941,0000.121 All Other Countries23935,405,1460.025   Total611,250,346,1460.049

If you want a quick takeaway, the United States has a population that is one-quarter of the total population of the OECD countries, but accounts for more than half of the mass-shooting incidents. That is the truest statement that can be deduced from these data.

Nevertheless, let's continue the analysis to understand why a tiny number of incidents is usually treated as statistically insignificant.

A Computer Simulation

This talk about statistical stability and fluctuation of course prompts us to wonder if any of these data are valid. Let's explore this a bit by doing a few computer simulations. Here is an image showing the relative populations of the 34 OECD countries arranged alphabetically from left to right:

During the five-year period from 2009 through 2013 there occurred 61 incidents of mass shootings. Let us randomly distribute those 61 shootings throughout these countries. The implicit assumption is that the rate of mass shooting for each country is the same as the overall actual rate. Each shooting incident is symbolized as a black vertical bar:

Now let's put the results in a table, ordered by the rate of shootings per million:

Random Shootings (Seed = 10097, Incidents = 61) CountryRampageShootingsPopulationShootingsPer Million Slovakia15,445,3250.184 Chile316,572,4750.181 Denmark15,580,4130.179 Ireland16,399,1520.156 Switzerland18,000,0000.125 Mexico11113,910,6080.097 Belgium111,041,2660.091 Australia222,841,9210.088 France465,350,0000.061 South Korea350,004,4410.060 Netherlands116,751,3230.060 Turkey474,724,2690.054 United States15314,941,0000.048 Japan6126,659,6830.047 Spain247,190,4930.042 United Kingdom262,262,0000.032 Canada135,010,0000.029 Poland138,186,8600.026 Italy160,813,3260.016   Total611,250,346,1460.049 Total Non-US46935,405,1460.049

This table doesn't look much like the table of the actual numbers. Many more countries have mass shootings, and some of them have quite a few. While the United States still has more than anyone else — it is after all, the largest country here — the rate of mass shootings isn't nearly has high as the actual figure of 0.121.

Since these data were generated from a pseudo-random sequence of numbers that began with a "seed" number indicated in the heading, maybe a different seed will produce different results. Let's try another:

Here's the table with the results:

Random Shootings (Seed = 37542, Incidents = 61) CountryRampageShootingsPopulationShootingsPer Million Ireland16,399,1520.156 Australia322,841,9210.131 Switzerland18,000,0000.125 Sweden19,540,0650.105 Greece110,787,6900.093 Belgium111,041,2660.091 South Korea450,004,4410.080 United States24314,941,0000.076 Germany581,799,6000.061 Netherlands116,751,3230.060 Canada235,010,0000.057 Spain247,190,4930.042 Turkey374,724,2690.040 Italy260,813,3260.033 Japan4126,659,6830.032 France265,350,0000.031 Mexico3113,910,6080.026 Poland138,186,8600.026   Total611,250,346,1460.049 Total Non-US37935,405,1460.040

This demonstrates that simple random fluctuation can produce very different results when not very many incidents are involved. Now the United States has a rate of shootings per million that is 50% higher than the average (but still not as high as its actual value). Let's try it again:

And here's the table:

Random Shootings (Seed = 8422, Incidents = 61) CountryRampageShootingsPopulationShootingsPer Million Sweden39,540,0650.314 Slovakia15,445,3250.184 Switzerland18,000,0000.125 Chile216,572,4750.121 Poland438,186,8600.105 Hungary19,942,0000.101 Belgium111,041,2660.091 Canada335,010,0000.086 Turkey574,724,2690.067 Japan8126,659,6830.063 France465,350,0000.061 Mexico6113,910,6080.053 United States14314,941,0000.044 Spain247,190,4930.042 United Kingdom262,262,0000.032 Germany281,799,6000.024 South Korea150,004,4410.020 Italy160,813,3260.016   Total611,250,346,1460.049 Total Non-US47935,405,1460.050

And now the U.S. is lower than the average. That's the way randomness works. You really can't anticipate what can happen. But these irregularities are accentuated when small numbers are involved.

But where are these random "seeds" coming from? Am I making them up or experimenting with different values to see which ones will tell a particular story?

Not at all. The seeds that I'm using are from the first several entries in the famous book A Million Random Digits with 100,000 Normal Deviates. I'm using these seeds to generate random numbers and draw the results in a WPF program that you can download and experiment with yourself.

If we keep trying different random distributions of 61 mass shootings, will we ever find a case where 38 of the shootings are in the United States? Perhaps. But it should be clear by this time that the incidence of mass shootings in the United States is intrinsically different from the other OECD countries taken in aggregate.

One approach to see the difference is to artifically inflate the population of the United States by a factor of 4 and then distribute the 61 mass shootings among this artificial population. Because the United States is now 4 times its normal size (and larger than all the other countries combined) it gets more of the random shootings:

And here's the table summarizing the results:

Random Shootings (Seed = 99019, Incidents = 61)US Population Increased by Factor of 4 CountryRampageShootingsPopulationShootingsPer Million New Zealand14,445,4360.225 Czech Republic210,512,2080.190 Denmark15,580,4130.179 Hungary19,942,0000.101 Australia222,841,9210.088 Spain347,190,4930.064 Canada235,010,0000.057 Italy260,813,3260.033 United States401,259,764,0000.032 Turkey274,724,2690.027 Poland138,186,8600.026 South Korea150,004,4410.020 United Kingdom162,262,0000.016 Mexico1113,910,6080.009 Japan1126,659,6830.008   Total612,195,169,1460.028 Total Non-US21935,405,1460.022

This table looks a lot like the one with the real data. The other countries in the table all have incidences of 1, 2, or 3 mass shootings while the United States has 40 mass shootings. The actual figure is 38.

Let's try another random number seed:

And here's the table summarizing the results:

Random Shootings (Seed = 12807, Incidents = 61)US Population Increased by Factor of 4 CountryRampageShootingsPopulationShootingsPer Million Slovakia15,445,3250.184 Greece110,787,6900.093 Australia222,841,9210.088 South Korea350,004,4410.060 Turkey474,724,2690.054 Mexico4113,910,6080.035 United Kingdom262,262,0000.032 Japan4126,659,6830.032 France265,350,0000.031 United States331,259,764,0000.026 Poland138,186,8600.026 Germany281,799,6000.024 Spain147,190,4930.021 Italy160,813,3260.016   Total612,195,169,1460.028 Total Non-US28935,405,1460.030

Now there are a few countries with 4 mass shooting incidences and the United States is down to 33. Shall we try one more? Here goes:

And here's the table summarizing the results:

Random Shootings (Seed = 32533, Incidents = 61)US Population Increased by Factor of 4 CountryRampageShootingsPopulationShootingsPer Million Sweden29,540,0650.210 Turkey574,724,2690.067 United Kingdom462,262,0000.064 Poland238,186,8600.052 Spain247,190,4930.042 Germany381,799,6000.037 Mexico4113,910,6080.035 France265,350,0000.031 United States331,259,764,0000.026 Japan3126,659,6830.024 Italy160,813,3260.016   Total612,195,169,1460.028 Total Non-US28935,405,1460.030

Again, 33 in the United States.

But we are now generating tables of random mass shootings that generally resemble the table of actual mass shootings.

In other words, mass shootings among the OECD countries seems to resemble a random distribution but only if the United States is assumed to have a population that is four times its actual size.

Probability Distributions

Let's come at this analysis from another direction. If we know the probability of a particular event, we can also calculate the probability that a population of a certain size will experience a specific number of those events.

For example, consider a six-sided die. Toss it ten times. What is the probability that it will land 4 every time in these ten tosses? The probability of landing 4 just once is 1/6, so the probability of ten tosses in a row landing 4 is (1/6)10.

If you toss a die ten times, what is the probability of it landing on 4 only once, and something else the other nine times? The probability of it landing on 4 is 1/6, and the probability of it landing on something other than 4 is 5/6. For that to happen nine time is (5/6)9. However, there are ten ways this can happen. The first toss can land on 4, or the second, or the third etc, so the complete probability is 10 (1/6) (5/6)9. For the probability of two 4's coming up in ten tosses of a die, you have to figure out the combinations of how many ways that can happen, which is 45, so the probability is 45 (1/6)2 (5/6)8.

In general, for n trials where the probability of a "success" is p the probability of k successes is given by the binomial probability formula:

Let's assume that the probability of a mass shooting over a five-year period is the overall OECD average of 0.049 per million of population. The probability is actually 0.000000049 per person. That's the value p. What is the probability of 1 mass shooting in a population of 10,000,000, which is roughly characteristic of countries like Switzerland and Sweden? The variable n is 10,000,000 and the value r is 1. We can actually calculate the probabilities of 0 mass shootings, 1 mass shooting, 2, and so forth, and put them in a graph:

The dark bars show the probabilities. The probability of there being no shootings is a bit over 60% while the probability of there being just one shooting is a bit less than 30%. The gray bars show the accumulated probability, which is often useful. The probability of there being 0 or more shootings is obviously 1 or 100%, while the probability of there being 1 or more shootings is close to 40%.

Here's a similar graph for a population of 25,000,000, which is (very roughly) the population of Australia:

Now the most likely outcome is one mass shooting in a five-year period. Here's the distribution for a population of 50,000,000, such as Italy, Spain, France, the UK, and South Korea:

You can pretty much anticipate which will be the highest bar by just multiplying the probability of 0.049 times the population. For this example it's about 2.5, which is the highest likelihood of the expected number of mass shootings.

Here's a population of 100,000,000, which is about the size of Mexico and Japan:

Now we're seeing a likelihood that is closer to 4 or 5 mass shootings. In reality, both Mexico and Japan had just one mass shooting in the five-year period. Why the big difference?

The probability of 0.049 per million that is being used for these graphs is the overall rate of mass shootings for the OECD countries, and that number is distorted by the high rate of mass shooting in the United States. For the non-US countries, the rate is actually 0.025. Let's try that with a population of 100,000,000:

And now we get something much closer to reality.

Finally, let's jump up to a population of 300,000,000, which applies to the United States. Here's the distribution using the total OECD mass shooting rate of 0.049:

In reality the United States had 38 mass shootings. This graph is telling us that the likelihood of that happening is essentially zero.

Again, the problem is that we're using the overall OECD rate of 0.049. If we instead use the US rate of 0.121, then we see something quite different:

But this isn't telling us anything that we didn't already know — that the mass shooting rate in the United States is much higher than the other OECD countries.

At the other extreme, here is the distribution for countries with a population of 5,000,000 — the approximate population of the three countries at the top of the IJReview and RampageShooting rankings. This uses the rate chacteristic of the total OECD countries excluding the United States:

To be sure, it is expected that these countries will have no mass shootings, but there's a 10% probability that they will have at least one.


To get meaningful information from data concerning mass shootings, it is necessary to be aware of statistical fluctuations that result from an insufficient numbers of incidents. Once that is done, it becomes obvious that the rate of mass shootings in the United States is significantly higher than the other OECD countries.

Of course, this isn't an academic exercise. Nobody will be surprised to learn that there is political motivation behind these attempts to demonstrate that the United States doesn't have horrendous incidences of mass shootings and other gun crimes. If the United States has levels of gun violence comparable with the rest of the world, there is certainly no need for gun-safety legislation.

Our political arena is open enough to debate these issues. But the debate should not involve the abuse of statistics. If people are opposed to gun-safety legislation, they should own the consequences of that opposition rather than try to hide those consequences behind a bogus interpretation of statistics.

Actual lives are at stake.

(c) Copyright Charles

Continue reading on