This one is like complaining about the deck chairs on the Titanic, but what the Hell. The WitCH is courtesy of John the Merciless. It is from the 2018 Specialist Mathematics Exam 2:
The Examiners’ Report notes the intended answer:
H0: μ = 150, H1: μ < 150
The Report indicates that 70% of students gave the intended answer, and the Report comments on students’ answers:
The question was answered well. Common errors included: poor notation such as H0 = 150 or similar, and not understanding the nature of a one-tailed test, evidenced by answers such as H1: μ ≠ 150.
23 Replies to “WitCH 20: Tattletail”
Nowhere in the question is “μ” used, but it appears to be expected in the answer. While it *is* convention to use “μ” to denote the mean of a normal distribution, it’s just like any other unknown; you could use “m” or “q” or “H_0” if you wanted, as long as you stated that you were referring to the mean (which neither the question nor the model answer does).
Thanks, Eddie. The Examiners tend to be nitpicking nitwits, but I imagine that any expression of the form “mean is” would have sufficed.
Why do a statistical test at all? Can’t we just ask the person (or persons) who somehow discovered that the population heights are normally distributed with a standard deviation of 15 cm? Surely they know the population mean as well. Or maybe that person wrote the question…
Thanks, SRK, and I agree entirely. This kind of nonsense seems standard in VCE’s appalling version of this appalling topic. However, there’s something else as well.
I also entirely agree, SRK.
There are dozens of contexts involving some sort of ‘treatment’ that could be used and hence validate actually doing a statistical test. There are certainly enough contexts to last (for both the NHT and November exams) until some new muppet comes along, decides that statistics is no longer the flavour of the month, deletes it from the syllabus and replaces it with some new clump of irrelevancy (perish the thought that mechanics would get beefed up again to include moments of inertia, coefficient of friction, limiting equilibrium, conservation of momentum etc. like it was back in the 70’s).
Now if the question had at least paid lip service to the idea that H0 is a “no affect” statement and H1 is an “effect” hypothesis, then there would at least be one less piece of crap.
Wouldn’t a two-tailed test be more appropriate? There’s no reason to hypothesise that the population mean differs in one direction only, in this case less than.
I absolutely agree. For me, that is a major piece of crap in the question. It’s possible that students who answered
H1: mu neq 150 cm
have a better understanding of hypothesis testing than whoever wrote the question (or the Examiners Report).
While I was still at Monash, teaching a basic stats course with exam questions roughly such as this, a senior peer noted to me in personal conversation: students often confuse data with the underlying distribution. This confusion seems to be at the core of the question: measured data cannot differ (in)significantly from a distribution (at least in decent scientific parlance as ought to be taught in schools); only another distribution can. Characteristics of distributions, such as the mean in this case, may be used to assess the difference. But I repeat, it’s not about the actual measured data — which “sometimes” (even if rarely) may throw up a result suggesting the opposite of what’s true.
Marty, the sceptic in me thinks (and has since 2016) that the statistics questions on Specialist papers in particular are rarely much more than recipe-following in order to have the student average look good and thus reduce the complaints about the new material.
But on another note, I really wonder how someone thinks to “claim” that the mean height is 150cm.
Surely a more appropriate process would be:
1. Sample multiple times.
2. Determine a reasonable point-estimate for the heights.
3. Have someone else, who hasn’t seen the original data, take some measurements and make a similar estimate.
4. Then do your statistical tests, changing H0 if required to improve the estimate of both mean and variance.
RF, I agree with your observation that “… the statistics questions on Specialist papers in particular are rarely much more than recipe-following”. However I think the reason is more sinister:
The material is so trivial and mundane that it’s difficult to write an exam question worth 8 marks or so that is anything other than recipe-driven. Particularly, I suspect, for the exam writing panel (whose expertise in statistics I would seriously question …) Unless of course you start asking questions such as
“…. Find the minimum sample size such that the probability of a type 2 error is smaller than ….”
As for the ‘process’… I’d have less objection to the question’s ‘process’ if it said something like:
“… It was claimed several years ago that the mean height …. To decide whether the claim about the mean height is currently true …. A two-tailed statistical test …..”
Actually, there’s a further change of wording I’d use, but that would take away the fun of finding more crap in the question ….
As an aside, I find it irritating that the type of test put on the Specialist Maths course is not given its name: The Z-test. In my view, the constant use of the general phrase “statistical test” is likely to generate a misconception among students that this test is the only test. IF you’re going to put this stuff in the Study design, it wouldn’t have hurt to have included some qualitative understanding of the t-test (a simple statement that this is a test used when the standard deviation of the population is unknown and the sample size is small), as well as mentioning some other tests and when they might be used. You’d do this simply to ensure that students understood that there are many tests apart from the specific one taught.
Thanks JF. I have a family member (Marty may well remember meeting them) who is rather good at statistics and I recently showed him the NHT Specialist statistics question. He thought it was a reasonably good question until I showed him how to use the modern calculators that are not allowed at university level where he did most of his teaching. Technology also plays a big part in the problem (but what else is new)
Re: 2019 NHT exam (and 2018 Nov exam and most others). And because the Study Design simply states the nebulous “Errors in hypothesis testing”, making no explicit mention of Type 1 and Type 2 errors, you get the verbal diarrhea of Q6 part (f).
Yeah, that part seems forever gimmicky. I originally thought (the first time such a question appeared) that it was just to get an extra mark to make the total. Now, it seems a bit of a fixture.
Thanks, Everyone. The question is dumb, and pointless, in the ways you all indicate. I’ll update in the near future (using this blog’s definition of “near future”).
It also seems like crap to decide the alternative hypothesis AFTER taking the sample.
I totally agree, HA. Since when is it good statistical practice to collect your data first and then decide on what your hypotheses will be …? And note that the level of significance is never stated until *after* the sample is collected – very poor practice once again.
I’d like to expand on the earlier comment implying that probabiitiy and statistics in Specialist is a “… clump of irrelevancy …”
It’s very clear that no sensible thought was given to *how* the statistics in Specialist would mesh with the Statistics in Maths Methods – an understanding of the normal distribution is an essential pre-requisite and yet the normal distribution is not taught in Methods until after it is needed in Specialist (given that statistics is taught before mechanics) ….
Yes, there are ways around this but they are contrived and inconvenient:
1. Teaching normal distibution earlier in Methods doesn’t preserve the natural ‘flow’ of topics in Methods.
2. One could argue that Probability and Statistics could be taught last in Specialist so that the normal distribution has been covered in Methods – but the problem with this is that it essentially leaves Mechanics as the only topic mandated in the Study design for SAC 3 …. And mechanics has been cuckolded so severely by VCAA that trying to write a decent non-trivial mechanics SAC is doomed (at least probability and statistics has more creative scope ….)
For a bit of light relief check out this satirical cartoon on misuse of p values
leading to a ‘false positive’ family wise error
I’d like to see some statistical questions requiring lateral thought but I guess they’d be harder to design and mark….
Unless it’s outside the scope of specialist perhaps something involving ‘ family wise ‘ errors on false positives and misuse of p-values would be more interesting
Eg green jelly beans may cause acne ?
Steve, the whole topic is outside the scope of Specialist. That’s why the questions are so predictably shallow and flawed.
On the SAC 3 I use the Probability and Statistics topic. Questions I put on it include the concepts of type 2 errors the power of the test (although I don’t use the term power) and rejection regions for sample means (again, I don’t use the term rejection region although I *do* when teaching this stuff). I also include questions that require finding sample sizes given various criteria. I also include questions requiring the use of the Central Limit Theorem when samples are drawn from given populations with given distributions that are not normal.
I find using the concept of power is a good way of addressing the forced ‘generalisation’ whereby students have to choose their own values and ‘explore’ (the muppets at VCAA clearly have no comprehension of how difficult and time-consuming it is to mark such questions. It’s pretty easy to sit in the ivory tower and make proclamations when you’re not personally affected by such pontifications).
Next year I hope to include a scaffolded question that leads students through a proof that the sum of two standard normal random variables is another normal random variable. This result is obviously *given* – unproved – to students but I try to give a sense of the fact that it’s NOT true in general that the pdf of W = X1 + X2 belongs to the same ‘family’ as the pdf of X (for what it’s worth it’s only true if X is a *stable* random variable eg. normal random variable).
I have found more creative scope using this topic for SAC 3 than the Mechanics topic, which has been laid waste by VCAA. But it’s a shame that the topic is not more of a ‘mathematical statistics’ nature …. (For exmaple, it wouldn’t be hard to include the moment generating function if some of the other stuff was deleted).
Related to all this, VCAA are ‘reviewing’ the curriculum …. How hard would it be to break the subjects up into Semester modules – You could have a calculus module, a functions module etc. and students could mix and match (perhaps under some given criteria of having to include modules of a specific nature). How good would it be to have a Calculus A module that covered all of the calculus done in Methods and Specialist so that there was no double dipping between subjects …. This would make for much more efficient teaching and you could actually do more as a result of this efficiency. It would take a lot of careful thought to set up, but I think the end result would be worth it. And of course you could have a Probability and Statistics A module, again being able to avoid the double dipping. You would allocate twice the teaching time to each module that is currently allocated to a single subject like Methods …. Anyway ….
In statistical problems, the phrase “differs significantly from” usually suggests a two-tailed test. I would have written “whether the sample mean height is significantly less than 150”.