GitS 2 : John Friend – A lack of Confidence

This post, on confidence intervals, is by frequent commenter, John Friend. It is the second of our guest posts; the first, by Anthony Harradine, is here. A version of John’s post is also available as a PDF, here.

A Lack of Confidence

The Wald Interval, commonly called the ‘Standard Interval’, is the confidence interval most commonly presented in introductory statistics textbooks for the population proportion \boldsymbol{p}. It is the confidence interval prescribed in the VCE Study Design for Mathematical Methods (Word), and takes the form

    \[\boldsymbol{\left(\hat p - k\sqrt{\frac{\hat p\left(1-\hat p\right)}{n}},\ \hat p + k\sqrt{\frac{\hat p\left(1-\hat p\right)}{n}}\right)\,.}\]

Here, \boldsymbol{n} is the sample size, \boldsymbol{\hat p} is the observed sample proportion and \boldsymbol{{\mbox{\bf Pr\!}}\,(-k < Z <k) = \frac{C}{100}} , where \boldsymbol{C} is the confidence level and \boldsymbol{Z} is the standard normal random variable.

Problems with the Wald interval

The Wald interval possesses a number of defects:

1. In the special cases \boldsymbol{\hat p = 0} or \boldsymbol{\hat p = 1}, the Wald Interval has zero width, and thus disappears. The Wald Interval also performs very poorly when \boldsymbol{\hat p \approx 0} or \boldsymbol{\hat p \approx 1}.

2. Intervals can have ‘overshoot’. For example, when \boldsymbol{n = 30} and \boldsymbol{\hat p = 0.9}, the approximate 95% Wald interval is (0.793,1.007).

3. The Wald Interval often performs very poorly in practical scenarios, in that the coverage probability (see below) is often less than the nominal confidence level (e.g. the coverage of the 95% Wald Interval is often less than 95%). That is not good since we hope to have a reasonable ‘coverage’ when constructing a confidence interval. 

Note: The probability that an interval contains or covers the true value of an unknown parameter is called the coverage probability. It is a property of the procedure that produces the interval. The interval produced for a particular sample, using a procedure with coverage probability \boldsymbol{\frac{C}{100}}, is said to have a confidence level of \boldsymbol{C}. The coverage probability of intervals such as the Wald Interval can be investigated by simulating random sampling from a population with a known value of \boldsymbol{p}. A confidence interval is constructed for each of the random samples to see how many such confidence intervals actually ‘cover’ (include) \boldsymbol{p}.

Simulated coverage probability of the nominal 95% Wald Interval

(From Five Confidence Intervals For Proportions You Should Know About – Dr Dennis Robert Aug 2020.)

Alternatives to the Wald Interval

Given the imposition of statistical inference onto Mathematical Methods by VCAA, the ubiquity of CAS technology in VCE mathematics and the ‘black box’ (or should that be ‘black CAS’) approach to calculating confidence intervals, it is puzzling that the Wald Interval is the only confidence interval mentioned in the VCE Study Design for Mathematical Methods, and even more puzzling that its defects are not mentioned.

Elsewhere, it has been strongly recommended that instructors present the Wilson Interval (see the Appendix) as a better alternative:

    \[\boldsymbol{\left( \tilde{p} - \frac{ k\sqrt{\frac{\hat p\left(1-\hat p\right)}{n} + \frac{k^2}{4n^2} } } {1+\frac{k^2}{n}}    ,\ \tilde{p} - \frac{ k\sqrt{\frac{\hat p\left(1-\hat p\right)}{n} + \frac{k^2}{4n^2} } } {1+\frac{k^2}{n}}\right)\,,}\]

where the midpoint, \boldsymbol{\tilde{p}}, of the Wilson Interval is given by

    \[\boldsymbol{\tilde{p} = \frac{\hat p+\frac{k^2}{2n}}{1+ \frac{k^2}{n}}\,.}\]

See, for example, Approximate is Better Than “Exac” for Interval Estimation of Binomial Proportions, by Alan Agresti and Brent Coull (The American Statistician, 52: 2, 119-126, 1998).

The Wilson Interval does not suffer from any of the above-stated defects of the Wald Interval, and in particular its coverage probability is superior:

Simulated coverage probability of the nominal 95% Wilson Interval

(From Five Confidence Intervals For Proportions You Should Know About – Dr Dennis Robert Aug 2020.)

Nonetheless, many instructors might hesitate to present such a complicated formula in an elementary statistics course. A simpler alternative is the Agresti-Coull Interval:

    \[\boldsymbol{\left(\tilde p - k\sqrt{\frac{\tilde p\left(1-\tilde p\right)}{n}},\ \tilde p + k\sqrt{\frac{\tilde p\left(1-\tilde p\right)}{n}}\right)\,,}\]

where \boldsymbol{\tilde{p}} is again the midpoint of the Wilson Interval, given above.

The Agresti-Coull Interval also has none of the above-stated defects of the Wald Interval. In particular, the coverage probability of the Agresti-Coull Interval is superior to that of the Wald Interval, although not as good as the Wilson Interval:

Simulated coverage probability of the nominal 95% Agresti-Coull Interval

(From Five Confidence Intervals For Proportions You Should Know About – Dr Dennis Robert Aug 2020.)

Conclusion

Although VCAA has imposed statistical inference onto Mathematical Methods, the Wald Interval is the only confidence interval mentioned in the Study Design. Given the obvious defects of the Wald Interval and the ubiquity of CAS technology, it is bewildering that superior intervals, such as the Wilson Interval and the Agresti-Coull Interval, are not also considered.

 

Appendix:  Derivation of the Confidence Interval Formulae 

Here, we derive the \boldsymbol{C\%} confidence intervals for the population proportion \boldsymbol{p}. The following two assumptions on the sample size n are made:

1) The sample size is ‘small’ relative to the population size. Under this assumption, the distribution of the sample proportion, \boldsymbol{\widehat P} which is a random variable, can be approximated by the binomial distribution.

2)The sample size is ‘large’ enough (see below) that the Normal approximation to the Binomial distribution can be used:

    \[\boldsymbol{\widehat P \sim \mbox{\bf Norm}\!\left(\mu = p,\sigma = \sqrt{\frac{p(1-p)}{n}} \right)}\,.\]

Note: The standard conditions for the Normal distribution to be a good approximation to the Binomial distribution are \boldsymbol{np>5} and \boldsymbol{n(1-p)>5} (or, even better, \boldsymbol{np>10} and \boldsymbol{n(1-p)>10}). It will not be known if these conditions are met, however, because the population proportion \boldsymbol{p} is not known.

In summary, the sample size is assumed to be simultaneously small enough that the binomial approximation can be used, and large enough so that the normal approximation to the binomial distribution can be used.

Now, let

    \[\boldsymbol{\mbox{\bf Pr}(-k<Z<k) = \frac{C}{100}\,.}\]

where \boldsymbol{Z} is the standard normal random variable. Then, with the assumptions above, we can substitute

    \[\boldsymbol{Z = \frac{\widehat P - \mu}{\sigma}= \frac{\widehat P - p}{\sqrt{\frac{p(1-p)}{n}}}\,,}\]

giving

    \[\boldsymbol{\mbox{\bf Pr}\left(-k<\frac{\widehat P - p}{\sqrt{\frac{p(1-p)}{n}}}<k\right) = \frac{C}{100}\,.}\]

The idea now is to somehow ‘invert’ the inequalities

(1)   \[\boldsymbol{-k<\frac{\widehat P - p}{\sqrt{\frac{p(1-p)}{n}}}<k}\]

in order to ‘trap’ the population proportion \boldsymbol{p} between lower and upper values, \boldsymbol{L} and \boldsymbol{U}:

(2)   \[\boldsymbol{\mbox{\bf Pr}\left(L(\widehat P,k,n)<p<U(\widehat P,k,n)\right)\,.}\]

(2) is not a standard probability statement, because the population proportion \boldsymbol{p} is not a random variable. Rather, (2) defines a random interval

    \[\boldsymbol{\left(L(\widehat P,k,n),U(\widehat P,k,n)\right)\,,}\]

which contains the fixed but unknown population proportion \boldsymbol{p} with probability \boldsymbol{\frac{C}{100}}.

The substitution into this random interval of an observed value \boldsymbol{\hat p} of \boldsymbol{\widehat P} (calculated from a sample) gives the C% confidence interval for \boldsymbol{p}. This constitutes the realisation of this random interval. The differing methods of inverting (1), which underlies this realisation, results in differing confidence interval formulas.

The Wald Interval

Approximate \boldsymbol{\sqrt{\hat p\left(1-\hat p\right)}} for (the unknown) \boldsymbol{\sqrt{p(1-p)}} in (1):

    \[\boldsymbol{-k<\frac{\widehat P - p}{\sqrt{\frac{\hat p\left(1-\hat p\right)}{n}}}<k}\,.}\]

Note: This is not a realisation of a random interval. It is an approximation that is used solely to avoid the cumbersome algebra in solving exactly for \boldsymbol{p}, and is unnecessary when CAS technology is so ubiquitous.

Solving the inequalities for \boldsymbol{p} gives the random interval

    \[\boldsymbol{\left(\widehat P - k\sqrt{\frac{\hat p\left(1-\hat p\right)}{n}},\ \widehat P + k\sqrt{\frac{\hat p\left(1-\hat p\right)}{n}}\right)\,.}\]

The realisation of this random interval, by substituting the observed value \boldsymbol{\hat p} for \boldsymbol{\widehat P}, gives the Wald or ‘Standard’ Interval.

The Wilson Interval

To realise the Wilson Interval, we exactly invert the inequalities (1) for \boldsymbol{p}:

    \[\boldsymbol{ \aligned{& && -k<\frac{\widehat P - p}{\sqrt{\frac{p(1-p)}{n}}}<k\\[3\jot] &\Longleftrightarrow && -k\sqrt{\frac{p(1-p)}{n}}< \widehat P - p <k\sqrt{\frac{p(1-p)}{n} }   \\[3\jot] & \Longleftrightarrow &&\left(\widehat P - p\right)^2 <k^2\frac{p(1-p)}{n} }\\[3\jot] &  \Longleftrightarrow && {\widehat P}^2 -2p\widehat P + p^2 < \frac{k^2}{n}p - \frac{k^2}{n}p^2\\[3\jot] &  \Longleftrightarrow &&\left(1+ \frac{k^2}{n}\right)p^2 - \left(2\widehat P+\frac{k^2}{n}\right)p + {\widehat P}^2 < 0\,.\endaligned}}\]

This is a standard quadratic inequality for \boldsymbol{p}, with an interval \boldsymbol{(L,U)} of solutions. The endpoints of the interval are obtained by solving the corresponding quadratic equation for \boldsymbol{p}:

    \[\boldsymbol{p = \frac{\left(2\widehat P+\frac{k^2}{n}\right)\pm \sqrt{\left(2\widehat P+\frac{k^2}{n}\right)^2 - 4\left(1+ \frac{k^2}{n}\right){\widehat P}^2}}{2\left(1+ \frac{k^2}{n}\right)}\,.}\]

Expanding, cancelling and factorising, the quantity within the root simplifies to \boldsymbol{4k^2\!\left(\frac{\widehat P\left(1-\widehat P\right)}{n}+ \frac{k^2}{4n^2}\right)}. Then, taking the \boldsymbol{4k^2} out of the root, and dividing top and bottom by \boldsymbol{2}, we obtain

    \[\boldsymbol{p = \frac{  \left(\widehat P+\frac{k^2}{2n}\right)  \pm    k\sqrt{ \frac{\widehat P\left(1-\widehat P\right)}{n} + \frac{k^2}{4n^2} } }{\left(1+ \frac{k^2}{n}\right)}\,.    }\]

The realisation of this random interval, by substituting the observed value \boldsymbol{\hat p} for \boldsymbol{\widehat P}, produces the Wilson Interval, as given above.

The Agresti-Coull Interval

Substitute the midpoint \boldsymbol{\tilde{p} = \frac{\hat p+\frac{k^2}{2n}}{1+ \frac{k^2}{n}}} of the Wilson Interval into (1) before inverting. The realisation of the resulting random interval produces the Agresti-Coull Interval, as given above.

13 Replies to “GitS 2 : John Friend – A lack of Confidence”

  1. Thank you for this post, JF.

    “The interval produced for a particular sample, using a procedure with coverage probability \frac{C}{100}, is said to have a confidence level of C.” – To my knowledge, it is standard to take confidence levels to lie between 0 and 1, not 0 and 100. Perhaps one should also clarify in this context that the confidence level of a procedure for a confidence interval is always supposed to be given a priori; as is implied by the more complete term “nominal confidence level” used elsewhere in the text.

    “(2) is not a standard probability statement, because the population proportion p is not a random variable.” – I find this statement a little confusing. Why should a statement of the sort {\mathrm P}(X < a < Y), with X,Y random (which is the situation we are in here, as the subsequent sentence makes clear), not be "standard" in probability?

    One should be aware – even if this may not be suitable for a secondary-level classroom situation – that a big part of the problem that motivates those interval corrections comes not from the different ways of inverting an equation, but from the (non-)closeness of the standardized/normalized sums obtained from the binomial distribution to the normal. It is just more unstable than one may think when used to docile Galton boards.

    The post uses sometimes {\widehat p} and sometimes {\widehat P}. Perhaps this is an issue introduced in transcription onto this page. I tend to slightly favor the lower-case version, if only this has become standard.

    A final note: I once encountered the topic of this post (as a novice!) when, together with others, I was contemplating a revised textbook for introductory statistics teaching at university (in Australia). The corrected intervals found their way into the revision. I recall clearly that the author told me personally that time that they included the "more complicated" (non-Wald) interval so that students who use whatever statistical software, could make sense of why the results from those may differ from what the Wald procedure gives. I found this argument convincing. (We did not adopt the revision though, for other reasons.)

    1. Thanks for your comments, Christian. I was hoping this blog might stimulate some discussion. You probably know a lot more stats than I do. Nevertheless I will try to address the points you’ve raised and am happy to be corrected in the process.

      1) From my reading, confidence levels are typically referred to as 95% CI, 99% CI etc. So a 95% CI corresponds to a coverage probability of 95/100. And when the 95% is nominal (and my understanding of this is that it is – roughly – a claim that 95 out of 100 intervals constructed from 100 different samples will contain the parameter), for the Wald interval the coverage probability is often less than 95%.

      2) My understanding is that, at the introductory level, a ‘standard’ probability statement has the form Pr(a < X < b). As opposed to the probability of a parameter being trapped between two random variables. Perhaps I should have chosen a different phrasing to indicate that it's not the sort of probability statement that one might meet in, for example, secondary school mathematics.

      3) I agree that interval corrections mainly arise from "the (non-)closeness of the standardized/normalized sums obtained from the binomial distribution to the normal are motivated by approximations". With the Wald Interval you are using an approximation of an approximation of an approximation. This is certainly not made clear in the VCE Study Design or textbooks, where one is led to believe that the interval is infallible.

      4) P-hat is a random variable, p-hat is a number. I'm not sure what favouring p-hat means … What happens to the random variable P-hat?

      But … the main point of this particular blog is that there are better confidence intervals than the standard Wald interval. If a syllabus such as that of VCE Mathematical Methods is going to include/impose confidence intervals, then I think it should either include/mention the better intervals or, at the very least, mention the shortcomings of the Wald interval. The Maths Methods syllabus could easily do this but it doesn't. I wonder if the people who wrote the syllabus are even aware of any of this.

      Some questions: Have Maths Methods teachers come across questions where the, say, 95% CI for a proportion has an endpoint either less than 0 or greater than 1? Are teachers aware this can happen? Have teachers had students ask about this? Have teachers contemplated what implications a sample proportion p-hat = 0 or 1 has for the Wald Interval? Or had students ask?

      Statistical inference is a much more complicated business than the VCE Study Design would have students and teachers believe. I think it's ridiculous to include statistical inference in VCE mathematics and again I wonder whether VCAA has the faintest idea how totally dumbed down and appalling its syllabus is.

      And I didn't even mention the Clopper-Pearson interval, which is certainly accessible within the scope of the course and using a CAS.

  2. Thank you for your reply, John. Your intent of the post was clear and my focus on those relatively minor points was perhaps rather tangential; yet I hope that I was staying on the right side of pedantry! My final paragraph in my last post will have at least given “some” hint of how the discussion about those corrected intervals made its way into statistical education in large classes (such as the one I was teaching back then). While I am not familiar with what happened on that front in Australia in the past decade, I am sure that what you write is at least a pretty good approximation to the (lamentable) truth.

    Quickly on your responses to the points:
    1) I think that 95% means 0.95. Once one accepts this, our issue seems to disappear.
    2) I understand that you used “standard” probability in a loose way, without a definition of “standard”. There is of course nothing wrong with that. (Slightly off-topic, in this sense, even a probability such as {\mathrm P}(X<Y), which is definitely useful, should probably be termed "non-standard" – and requires the consideration of bivariate distributions.)
    4) I seem to have seen {\widehat P} very rarely in print, if at all. It is perhaps because of (i) tradition (it is different with {\widebar X} and {\widebar x} in the case of continuous random variables X_1,X_2,\ldots) and (ii) the sloppiness in using the same notation for a random confidence interval and its realisation, with fixed numbers as its limits, is not bothering people too much.

    1. Thanks, Christian.

      1) I’ve rarely seen, say, a 95% CI referred to as a 0.95 CI … Even Wikipedia (admittedly not always totally reliable) talks about 95% CI etc. Maybe it’s a ‘generational’ thing …? But I would agree it’s a less ‘misleading’ name.

      2) Re: “\displaystyle {\mathrm P}(X<Y)." Yes, I must admit that these statements do occur in VCE, but they quickly (and correctly) get converted into statements such as \displaystyle {\mathrm P}(X-Y<0). What's not seen in VCE is the probability that a parameter is trapped between two random variables. I think this is a pity because in the context of confidence intervals it shows how the CI should be interpreted (via where the interval actually comes from). As I remarked earlier, the inclusion of statistical inference into Mathematical Methods, as set out in the Study Design, is diabolical and does more harm than good if the goal is for students to have an understanding of the concept. Such things should be learnt in a specialised statistics subject.

      4) Re: "(ii) the sloppiness in using the same notation for a random confidence interval and its realisation, with fixed numbers as its limits, is not bothering people too much."

      This bothered/confused me for a long time. It was only when I first read the textbook Into Statistics by Peter J Smith (an Australian textbook, Smith was a lecturer at RMIT) that I saw the clear distinction made and the idea of \displaystyle realising the interval. Things I'd read in other textbooks made sense after this because I could see what was missing.

  3. Thanks, John.
    Regarding 1), I tend to think that this may be a difference between a theoretical setup, which ultimately draws on probability – and probabilists tend to prefer to give probabilities as numbers between 0 and 1, not percentages – and practical language. I agree that a statement like “0.95 CI” would sound odd. It would be much less odd to say “CI [or confidence region in higher dimensions] at level p = 0.95” in a numerical study in some statistics paper. Others may disagree with me here.
    Regarding 2), perhaps we have found another example of why a good grounding in probability does help with statistics – an issue that Marty highlighted on some occasion (or several) in this blog. Unless one has that (or deals with independent normally distributed random variables, say), the gain obtained by writing {\mathrm P}(X<Y) as {\mathrm P}(X-Y<0) is IMHO negligible. (An idea for a post by Marty?)
    Sorry that the overbars in {\bar X } and {\bar x} didn't render correctly in my previous post.
    Thanks for your indulgence in all this side-tracking. I hope some core issues of John’s post will be discussed by others.

    1. Hi Christian.

      Not side-tracking at all!

      Re: “It would be much less odd to say “CI [or confidence region in higher dimensions] at level p = 0.95” in a numerical study in some statistics paper.”
      Can you give an example of where you’ve seen this language used (I’m not disagreeing with you, I’ve just never seen it said like this anywhere).

      Re: “the gain obtained by writing \displaystyle {\mathrm P}(X<Y) as \displaystyle {\mathrm P}(X-Y<0) is IMHO negligible."

      Yes that will often be so in undergraduate courses. But in VCE (Specialist Maths), there is significant gain. X and Y will typically be independent normal random variables, in which case X – Y is normal with a readily calculated mean and standard deviation. Then {\mathrm P}(X<Y) as \displaystyle {\mathrm P}(X-Y<0) is a standard calculation. Although the calculation will be done using a CAS, an understanding of the background behind the calculation is clear. (Unlike many questions where buttons get pressed with little understanding as to why). Of course, in undergraduate courses X and Y will often NOT be independent and often not normal, in which case I agree that there's no advantage. But that's due to other techniques – not learnt in VCE – being able to be used. Which of course is yet another argument for why this stuff should be taught as a separate subject rather than an ad hoc add-on to Methods and Specialist. When it comes to these sorts of things, what's taught in both those subjects is NOT mathematics. Statistical inference is completely misrepresented in VCE mathematics.

      1. Hi John,

        here is an example of a, well, approximate usage of “confidence level given as a number in the interval [0,1]” as you requested; see Figure 2 in that paper. (It uses the erroneous spelling “significant level” instead of “significance level”; the duality, or equivalence, of testing and confidence intervals/regions is used and, I hope, not confusing. At least, with “confidence” being mentioned in the figure caption, it does seem to me that we are within the parlance that we discussed here.)

        https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3852254/figure/pone-0081179-g002/

        I agree with what you say about probabilities such as {\mathrm P}(X<Y). Life in the "independence" world is much easier and going beyond it in secondary-level mathematics is, for the very most of the students, probably a Pandora's box. I guess some of the characters severely criticized in this blog would say that "convolution" (which is what we really have here, at least if we think of the replacement of Y by (-Y) as minor) cannot be marketed due to its negative connotations… leading them to do the right thing for the wrong reasons.

        1. PS: A clarification to my post: convolution is of course what’s behind the “independent” case, and the “non-independent” case is often far worse (unless we are in the normally-distributed vectors world) or even intractable.

        2. Thanks for this example, Christian.
          I – kind of – understand “the confidence region is constructed by setting the significant [sic] level for each test at …”

          I’ve never seen this language used when it’s referring to a confidence \displaystyle interval. What would we say for confidence intervals? Perhaps

          The confidence interval is constructed by setting the significance level for the test at 0.05.

          But problems with this would include:

          i) What test? A test is not used to construct a confidence interval. A sample is collected, a proportion calculated and substitution into a formula occurs. (I’d have to read more of the article to get a sense of the “tests” it’s using and how this relates back to the region).

          ii) It seems a complicated way of simply saying 95% confidence interval.

          iii) \displaystyle Is the 95% Wald interval significant at the 0.05 level of significance? In reality, I’m not sure it always is …

          Maybe
          The confidence interval is constructed at the 0.05 level of significance. As I said, a complicated way of saying 95% confidence interval …

          Anyway, I suppose the semantics is small beer when it’s all said and done. The point is that statistical inference in VCE mathematics is grossly misrepresented. I have no confidence in it or VCAA.

          The only meaning convolution can have when juxtaposed with VCAA is complicated and muddled. VCAA is expert at convolution.

Leave a Reply

Your email address will not be published.

The maximum upload file size: 128 MB. You can upload: image, audio, video, document, spreadsheet, interactive, text, archive, code, other. Links to YouTube, Facebook, Twitter and other services inserted in the comment text will be automatically embedded. Drop file here

This site uses Akismet to reduce spam. Learn how your comment data is processed.