P-value
|
In statistical hypothesis testing, the p-value of a random variable T used as a test statistic is the probability that T will assume a value "at least as extreme" as the observed value tobserved, given that a null hypothesis being considered is true. "More extreme" would mean less favorable to the null hypothesis; in some cases that means greater than, in some cases less than, and in some cases further away from a specified center.
In other words, assume that a simple null hypothesis is rejected if a test statistic T exceeds a critical value c. Suppose that in a particular case the T was observed to be equal to tobserved. Then the p-value of T in that case is the probability that T would equal or exceed tobserved.
The p-value does not depend on unobservable parameters, but only on the data, i.e., it is observable; it is a "statistic." In classical frequentist inference, one rejects the null hypothesis if the p-value is smaller than a number called the level of the test. In effect, the p-value itself is then being used as the test statistic. If the level is 0.05, then the probability that the p-value is less than 0.05, given that the null hypothesis is true, is 0.05, provided the test statistic has a continuous distribution. In that case, the p-value is uniformly distributed if the null hypothesis is true.
Frequent misunderstandings
There are several common misunderstandings about p-values. All of the following statements are FALSE:
a) The p-value is the probability that the null hypothesis is true, justifying the "rule" of considering as significant p-values closer to 0 (zero).
Comment: In fact, frequentist statistics does not, and cannot, attach probabilities to hypotheses. Comparison of Bayesian and classical approaches shows that p can be very close to zero while the posterior probability of the null is very close to unity. This is the Jeffreys-Lindley paradox.
b) The p-value is the probability of falsely rejecting the null hypothesis. This error is called the prosecutor's fallacy.
Comment: Suppose one selects the 5% significance level. The Type I error rate is the average value over all possible outcomes of the p-value in the range 0 to 0.05. If after carrying out the calculation the p-value is computed to be, say, 0.049999 then the Type I error rate is in fact around 29%. On the other hand, if the p-value is very close to zero then the Type I error rate is much lower than 5%.
c) The p-value is the probability that a replicating experiment would not yield the same conclusion.
Reference
"Calibration of P-values for Testing Precise Null Hypotheses". Sellke, T., Bayarri, M.J. and Berger, J. (2001) The American Statistician (55), 62--71.