How many times has someone asked you, ‘is that result statistically significant?”  As a researcher I’ve probably heard it 6,321 times +/- 30 with a p value of .02.

I’m arguing here for some humility and some qualitative thinking on testing.  Here is the argument for some humility, perhaps and especially from those who consider themselves data analysts or quant jocks.

  • You have absolutely disproved the null hypothesis (that is, there is no difference between the population means). [] true/false []
  • You have found the probability of the null hypothesis being true. [] true/false []
  • You have absolutely proved your experimental hypothesis (that there is a difference between the population means). [] true/false []
  • You can deduce the probability of the experimental hypothesis being true. [] true/false []
  • You know, if you decide to reject the null hypothesis, the probability that you are making the wrong decision. [] true/false []
  • You have a reliable experimental finding in the sense that if, hypothetically, the experiment were repeated a great number of times, you would obtain a significant result on 99% of occasions. [] true/false []

This quiz was done with 44 psychology students, 39 professors and lecturers of psychology and 30 statistics teachers.  Every professor taught null-hypothesis testing and every student had successfully passed one or more statistics courses in which it was taught.

Percentage of participants in each group who endorsed one or more of the six false statements about the meaning of “p = 0.01” (Gigerenzer et al. 2004; Haller and Krauss 2002)

Here is a way to accurately think about significance testing that benefits from being non-technical and humble, which fosters better conclusions and next steps using donor dollars.

Saying something is statistically significant is akin to saying there is some reason to believe the test idea works.   The operational meaning is that we should repeat the test.

This does happen or at least did with test, re-test and rollout in the ol’ days of direct mail with the larger volume players.

One reason all this matters?  Too many wannabe-behavioral scientists mistaking one experiment producing “statistically significant” results as proof of a universal law and established truth.  Nonsense.  People are messy and complicated; misinterpreting statistical significance and declaring victory suggests they aren’t.

Kevin

 

%d bloggers like this: