This WitCH come courtesy of a smart VCE student. It concerns the newly instituted VCE subject of Algorithmics, and comes from the 2017 exam:
The Examiners’ Report indicates that half of students gave the intended answer of A, and notes
It is important for students to understand that Big-O notation describes an upper bound, and so is only used for analysis of worst-case running times.
18 Replies to “WitCH 21: Just Following Orders”
Where do I start? The worst of many inanities is the first phrase: “For extremely large values of n” As well as being meaningless (“large” is so context dependent that it means nothing here), big-O notation and its siblings all refer to limits; so that f(n) ∈ O(g(n)) iff there exists M > 0 and y so that |f(n)| y. For this question we have |f(n)| y, for some (integer valued) y. There is thus a colossal confusion about what big-O actually means. An algorithm is either in O(n) or it is not. Actual values of n are irrelevant to the definition.,
Talking about an algorithm that “runs in O(n) time” is a mis-use of the notation, and a misunderstanding. Big-O refers to an upper bound, so algorithms that run in log(n), log(log(n)), sqrt(n) time are all in O(n). You would never talk about an algorithm running in log(log(n)) time as being in O(n); it is true but useless. You could just as well say that an algorithm which runs in log(log(n)) time is in O(n^100). or in O(e^x) which are also true.
This is a clumsy, badly worded question which assumes that students must be as woolly in their thinking and understanding as the examiners.
Thanks, amca01. (Do you want me to edit your comment to fix the inequalities?) For you, does “algorithm running in O(n)” mean the same as “algorithm that is [in] O(n)”, or something different, or nothing?
I meant to quote the question, but I was so annoyed I misquoted! Do please feel free to make whatever changes you like.Also I think the editor, or the text box, ate some of my symbols, especially all text between two inequality symbols. I’ve only just noticed this. I’ll check it and aim for a fix tomorrow.
Thanks, amca01. I’ll let you take a look and will correct however once you’ve looked. I’m also still curious about the exam’s use of language, and your objection to it. What does it mean for an algorithm to be/be-in/be-running-in O(n)? Of course usage can vary, and since this is not my world, I’m not making judgment on good/bad or usual/unusual usage. But usage within a particular culture should be clear and consistent. And, for the life of me, I have no idea what VCAA Algorithmics thinks O(n) means in their context, or how they justify their usage.
Ok with A but less so with examiners note of ‘only worse case scenarios ‘
I think Rob Bell’s article provides a good introduction to Big O Notation from a programming perspective in terms of comparing the efficiency of algorithms on very large data sets
For mathematicians one application is
the Prime Number Theorem and the probability of N being prime for large N approaches 1/ln N
Thanks to Steve and amca01 for the comments. For me, the central question is, is answer B wrong? If so, then on the basis of what definition, and specifically on what definition that VCE students would be presumed to have access to?
I keep forgetting to set people right: where it says “|f(n)| y” in my original comment, what is meant is: |f(n)| is less than Mg(n) for all n greater than y. The trouble is that using the inequality signs here ate the middle part of the definition.
In any case O(g(n)) represents a set of functions: a function f(n) is either in the set O(g(n)) or it isn’t. It is thus meaningless to talk about an algorithm that “runs in O(n) time.” You could say that it runs in linear time, which is probably what the examiners meant, but O(n) is not a fancy alternative definition for linear.
Then there is the use of Omega in the last part of the question. I have no idea which competing definition of Omega the examiners might be using or mis-using; but in any case they would be better using Theta, which provides both upper and lower bounds, and so is tighter than big-O.
All very sloppy and inadequate.
Marti et al
I think A is the only correct answer …but have not read the ‘ curriculum definition of Big O ‘ so I could be wrong?
With B a linear search O(n) is likely to be slower than a binary search O(log n) for large n for an ordered data set for example . worse case scenario is n for linear
eg think finding your shirt size in a retailer’s rack. Better to start in the middle and binary chop rather than from the smallest size sequentially upwards in my case
Steve, “likely to be” has no logical weight. In any case, we agree it is a matter of (VCAA’s and others’) definition: what, if anything, is the meaning of “an algorithm running in O(n) time”?
running with the ‘worse case ‘ definitions given in Rob Bell’s beginners guide to O notation (link in previous comment)
an algorithm running in linear time O(n) on a lookup in a very large data set of size 2N will take approximately double the time as to a lookup on a dataset of size N Whereas one running in logarithmic time O(Logn) the time taken will only increase by approximately 1.
Sorry for being vague ‘likely to be’ ….but you can get lucky and pick the correct sized shirt from the size sorted rack first time with both methods in the retailer analogy but on average and worse case scenarios the binary chop will be quicker.
Thanks, Steve. Again, these are imprecise notions. “Very large” is not defined, and “running in linear O(n) time” is either redundant or (as seems to be suggested amca01), self-contradictory.
I’m not trying to be pedantic here, but this seems a case where formal limit definitions are required. VCE Algorithmics has Os and Thetas and Omegas, but I don’t see clear definitions. I’m not yet convinced that “algorithm running in O(n) time” has a sufficiently universally accepted definition. And, more to the point, I’m not convinced VCE Algorithmics students work with a clear definition. (Note the “upper bound” comment above on the Examiner’s Report. A 2018 exam question refers to “the big-O for the best case running time” of an algorithm.)
Marti et al,
My commercial experience of “large” tables/datasets has been querying warehouse fact tables ( > 10^12)
which have to be joined to several smaller dimension tables (<10^6).
Most queries on these would use a hashing function ( to create an ordered key) as well as partitioning ( chopping the sorted table into sections and using many cores to subdivide the processing task)
The overhead on creating the hash etc would be more than offset the gains by the O(log n) lookup
So even in this relatively small case a linear lookup O(n) would be too slow .
These days most querying software comes with a tuning workbench where you can trial different different methods of joining the various tables to gain a satisfactory result.
So yes you are correct …
These are inexact notions where empirical testing of your datasets will determine
the definition of large in each case
For Mathematicians checking for Mersene Primes with a billion digits then the efficiency of the testing algorithms is obviously critical . An application for Quantum Computing perhaps
Hi, Steve. I didn’t mean to imply that large and small don’t have (variable) meaning in the real world. The VCAA, for example, is a large pain in the ass. But for a senior level algorithmics subject using mathematical notation, you really want some clear definitions, and here that requires limits and/or strict bounds, not approximations. Which brings us back to the VCAA. Did I mention they’re a large pain in the ass?
How can students be expected to understand asymptotic classes (like a function or algorithm being in O(n)) without limits?
Thanks, Glen (and hi). At least here they can use upper and lower bounds. But, they don’t seem to, at least not clearly and consistently.
It appears that the examiners don’t understand O notation themselves, or have invested it with their own meaning. Either way, it’s shoddy treatment of both the mathematics and the students.
As a person who has done Algorithmics myself, I initially didn’t question the dubious treatment of the big O/Omega/Theta notation (with the concerns only being brought up by a smart IIO friend who pointed out the incorrectness). I cannot vouch for the credibility of it’s university equivalent subjects however it seems prevalent that programmers slot O notation to “worst case growth”, Omega to “best case growth” and Theta to “guaranteed case growth”. None of the formal mathematical definitions are given in the study design or any resources which amca01 succinctly summarizes. The statement “and so is only used for analysis of worst-case running times” was what VCAA attempts to convey as “the order of growth for the longest execution of an algorithm”. I’ve only recently begun trying to piece together my own definitive understanding of the notation with the help of a document I found online: http://web.mit.edu/16.070/www/lecture/big_o.pdf. The sheer irony (in my eyes) is that VCAA finds it appropriate to throw in a proof by induction pertaining to the sum of odd squares yet dismiss the mathematics required for rigorously defining the O notation. Scrutinizing the study design gives little to teachers/students as they are just considered “concepts”. Finally while other commenters have commented on it, the use of O notation can be summarized as a relation between the input size and how the time it takes to execute grows. The problem lies within the lack of rigor defining the actual system so students see it as a “sophisticated maths term” to answer the question (that is how I see it).
Thanks, MK. It really is a mess. Of course different academic communities can use words and notation in different ways, with more or less consistency and rigour, as the community deems worthwhile. But it does appear that common usage of big O for algorithms differs from the mathematical definition of big O, at least in some sub-communities. The pure mathematicians (combinatoricists) I talked to, however, were firmly of the opinion that big O means big O. Whatever the general understanding, of course you’re right, that it is the responsibility of the VCAA to clearly state the usage in their subject, and to be consistent with that usage.