Probability and Significance

We often see things that say “No Link Between Cancer and Nuclear Plants” or nuclear power is safe. One of the reasons for this is the failure to understand the results of various studies or even gross misrepresentation of the results. This is just a basic information about probability and the statistical significance of some scientific studies.

If you really want to understand what the risks of nuclear accidents or the effects of radiation are then I would recommend that you look into probability a lot more. For example see the Khan Academy video series on probability.

This is not about risk which is related and I will probably (I will not give a numerical probability at this point) write about later.

Anyway here is a basic introduction.

What Is Probability

We are often faced with uncertainty but that uncertainty is not always unlimited. For example if I throw a normal dice I do not know what number will come up but I do know that it will be a number between one and six.

What are my chances of throwing a three? Since there are six possible outcomes then I have a one in six chance of getting a three. This basically means that if I throw the dice many, many, many times then one sixth (0.167 or 16.7%) of these throws will be a three.

Dice Throwing — Results of throwing a dice 10, 100 and 1000 times

The graph above shows the outcomes of different numbers of dice throws.

As you can see with only ten throws I did not get any sixes but I did get three fives and three fours. What we would expect is that after many throws we would get approximately an equal number of each outcome.

Notice that even after 1000 throws there is still a noticable difference between the number of 1,2,3,4,5,6s thrown but it is approaching the 1/6 = 0.167 expected.

What is the probability of throwing more than one 6 in a row?

This is just the probability of getting a 6 on the first throw (one in six) times the probability of getting a 6 on the second throw (again one in six) that is 1/6 x 1/6 = 1/36.

What about five 6s in a row. That would be 1/6 x 1/6 x 1/6 x 1/6 x 1/6 = 1/7776.

What is the probability of throwing another 6? Surely it is extremely unlikely. Well the probability of getting a 6 in the next throw is still 1/6. We have already thrown the five 6s – it has actually happened (i.e. the probability is 1).

Is The Die Fair

I throw the die ten times and do not notice anything unusual. I throw one hundred times and notice that I seem to get quite a few 6s. I begin to wonder if the die is fair. I throw one thousand times and now I am pretty sure that the die is not fair i.e. I am getting too many 6s.

However, I cannot be totally sure since it could still just be chance that I get this result. In fact even with millions of throws I cannot be sure that the die was not fair because I may just get a large number of 6s just by chance.

Significance

However, it is possible for me to quantify my uncertainty about how fair the die is. I will not go into the mathematics of this (this is better explained elsewhere) but I will show how this can be done.

First of all I will assume that the die is fair – this is called my Null Hypothesis. Now I can work out what the probability of getting this result is. Say that I get the calculation and get the a probability of 0.05 (5%) that the result I see is just by chance. I may use that to decide the die is not fair.

However, I may want to be more certain and decide that I will only decide that the die is not fair if there is only a probability of 0.01 (1%) that the result I get is just from random chance.

This probability is often used in statistics and is called the p value.

If we get a p value of 0.05 (5%) then that does not mean that there is a 95% probability that the Null Hypothesis is false. We calculated this figure assuming that the Null Hypothesis is true. All it tells us is that there is a 5% chance of getting this result assuming that the Null Hypothesis is true.

Neither does it mean that just because the p value is very low that the study is correct. The study can have many other faults which would invalidate the results.

For example of how the p value can be used there may be a study of cancer clusters. However, this clustering could just be due to chance so you take this as your Null Hypothesis. You can then work out the p value for the probability that the clustering is just due to chance and not the nuclear plant or whatever you are looking at.

Now you have to decide at what point you decide that the Null Hypothesis (i.e. the clustering is not due to the nuclear plant) is false. Do you decide to reject the hypothesis if the chance that the clustering is due to chance is 5% (p=0.05) or 10%(p=0.1). Personally I would have a much higher figure and say even if there was an 50% chance that the clustering was due to random variation this an important finding. (I would not necessarily use this alone to reject nuclear power on health grounds – it is a bit more complicated and I will go into this when I write about risk).

Unfortunately what happens is a study is done and the significance of the data does not meet the required level (5%). This is often reported as ‘No Link Between Cancer and Nuclear Plants’.