This is just a straight post, framed around answering the question:
How does one introduce-explain integration by substitution to high school students?
That is the question, but I’ll declare from the outset that I cannot answer it. What I will do is explain as clearly as I can why integration by substitution works in the form(s) in which we use it. It is then up to the teacher to decide how much of this “why” message, if any, is required or helpful for their students. (It is not at all clear to me that delving into the proper “why” of substitution will have much meaning or benefit for more than a few school students.)
The post was motivated by a related request on a recent WitCH. Also, having pondered and hunted through the blog, I notice that frequent commenter SRK made a similar request long ago, and there was a related WitCH. The extensive discussion on those posts may be of interest.
THE BASIC MEANINGS
Just so were all on the same page, the only thing we’re considering on this post are antiderivatives: there is no calculation of areas, no fundamental theorem of calculus. I shall use the term “integration” and integral notation because it is common to do so, but the word and notation properly refer to the summing up of bits, which is not what we’re doing here.
So, the function is an antiderivative of if . We then use the integral sign to represent the general antiderivative:
(Just as a function may be referred to as f(x) or simply as f, the dx notation in integrals is optional, and I’ll use it or not as seems to be clearer.)
An alternative name for this general antiderivative is indefinite integral. Then, the definite integral indicates for us the evaluation of the antiderivative at the “endpoints”:
Again, there is no “integration” here, no computation of areas. It is almost solely definition and notation. The only substantive point is to recognise that any two antiderivatives of differ by a constant, which is intuitive but takes a proof. Then, this +c, whatever it is, cancels out in the evaluation of the definite integral, implying it doesn’t matter which antiderivative we happened to choose.
STATEMENTS OF INTEGRATION BY SUBSTITUTION
Integration by substitution in indefinite form is standardly presented as,
Here, on the left hand side, u is some (differentiable) function of x. On the right side, there is a double think: we antidifferentiate thinking of u as any old variable, and then, when done, we think of u as the given function of x.
If we write u = g(x) explicitly as a function of x then substitution takes the form,
And, the definite integral version takes the form,
Note that the definite integral form requires no double-think: the u on the right hand side is simply a who-cares variable of integration. We can also do without x and u entirely, writing the definite integral equation more simply, more purely and less helpfully as
Finally, a quick word on the intermediary, dodgy line:
Whether or not one permits the dodgy line is really just a detail, since it is immediately followed by a non-dodgy line. It is, however, better to permit the dodgy line, because: (a) it works; (b) it helps; (c) it really annoys people who object to it.
JUSTIFICATION OF INTEGRATION BY SUBSTITUTION
At its heart, of course, integration by substitution is simply the chain rule in reverse. The chain rule for the composition is,
The chain rule can then be written in antidifferentiation form as,
Or, with u = g(x), we can write the anti-chain rule as,
But now, given an integral of the form , it is easy to apply the anti-chain rule. All we need is to give a name to the antiderivative of f.
So, let’s write F for the (an) antiderivative of f: that is, F’ = f. Then, by the anti-chain rule,
But also, just thinking of F as a straight antiderivative of f, we have,
Combining the two lines, and keeping in mind we think of u = g(x) after antidifferentiating, we have integration by substitution:
The other forms of the formula can be thought of and derived similarly. For example, again setting F’ = f, the definite integral form can be justified as follows:
Will this help? Probably not: the introduction of (and then disappearance of) the antiderivative F is not so easy to understand. So, it is not necessarily wrong to take a “looks kinda right” Leibniz shortcut, or to focus upon a specific chain rule or two. But, ideally, teachers should have some sense of why things are true, even if they then decide to not try to convey this sense to their students. And the sense, as best as I can express it, is the above.