## Saturday, May 14, 2011

### Big O Notation for Calculus

A brief explanation: I am teaching calculus, and it has been a problem how to introduce various concepts. The best way to approach things, I have found, is to

1. introduce big O notation as a way of indicating "higher order terms" in a more precise way, and
2. use the identity exp(ix)=cos(x)+i·sin(x) for calculating anything trig-related.

I am obliged by scholarly tradition to document other sources which have taken a similar approach. Donald Knuth's letter "Teach Calculus with Big O" (Notices of the American Mathematical Society 45 (6): 687, unabridged version) suggests heuristically introducing an expedient "big A" function. More importantly, Knuth writes:

Students will be motivated to use O notation for two important reasons. First, it significantly simplifies calculations because it allows us to be sloppy—but in a satisfactorily controlled way. Second, it appears in the power series calculations of symbolic algebra systems like Maple and Mathematica, which today's students will surely be using.

The blog Mathematics under the Microscope's April 14, 2008 post Donald Knuth: Calculus via O notation discusses the advantages and disadvantages of Big O in calculus (Kevin's comment on April 14, 2008 at 4:01 pm extensively notes the disadvantages when using "big A" as Knuth describes).

I take a completely different approach that is more historical. We have this number, i, which is not a real number but satisfies

 i2 = -1 (1)
which IS a real number!

Likewise, what if we introduce a "number" ε that's not zero but really small. Recall, if we have a number x between 0 and 1, we have

 0 < x2 < x (2)
but if we had something "smallest" we expect
 0 = ε2 < ε. (3)
Now, we should ask ourselves: is this really a good idea?

The serious problem is that we cannot divide by ε, which causes problems later on. There are two questions that come to mind:

1. How would we use ε even if we could divide by it?
2. We want to be able to divide by all of our numbers, so how do we rectify this problem?

Lets consider a really simple function

 f(x) = xn (4)
for some natural number nℕ. So what is f(x+ε) in terms of ε?

Suppose we didn't know. What can we do? Cheat! Well, mathematicians call it "induction on n", but it amounts to thinking about what happens when n is 1, 2, 3, and so on until we get a general formula. The trick is to treat ε as a variable, so when we expand f(x+ε), we just use high school algebra to do the calculation.

 (x+ε) = x+ε (5a) (x+ε)2 = x2+2εx+ε2 (5b)

But note that by equation (3) we have ε2=0, so really (5b) is

 (x+ε)2 = x2+2εx (5b')

We then consider

 (x+ε)3 = (x+ε)2(x+ε) (5c)

We plug in our result (5b') in the right hand side to find

 (x+ε)3 = (x2+2εx)(x+ε). (5c.i)

Then we just multiply the right hand side out as if they were polynomials

 (x+ε)3 = (x2+2εx)x+(x2+2εx)ε. (5c.ii)

We have

 (x2+2εx)x = x3+2εx2 (5c.iii)

for the first term, and

 (x2+2εx)ε = x2ε+2ε2x (5c.iv)

for the second term. Notice, however, we can simplify this

 x2ε+2ε2x = x2ε (5c.v)

since ε2=0. We can combine these results to conclude

 (x+ε)3 = x3+3εx2 (5c.vi)

It seems like the general pattern appears to be

 (x+ε)n = xn+nεxn-1 (6)

If we knew the binomial theorem, then immediately we can recognize this as true.

At any rate, the deviation from f(x) is then

 (x+ε)n - xn = nεxn-1 (7)

So why do we need this fancy number ε anyways? Well, if we didn't have it, then we would be working with some finite (positive or negative real) number

 Δx≠0. (8)

Let us try to reconstruct equation (7) with Δx instead of ε! We get

 (x+Δx)n - xn = nΔx·xn-1+(nonzero terms). (9)

Like anything bought from Ikea, we end up with bonus parts. But we don't want bonus parts! That's the problem.

The solution is to do what we just described, actually, to write "bonus parts" on the right hand side of the equation. But we can be a little more rigorous than this. We can describe the bonus parts. That's the beauty of big O notation.

Revisiting equation (9) we use big O notation to write it as

 (x+Δx)n - xn = nΔx·xn-1+O([Δx]2). (9')

The square brackets in O(...) are just to indicate we are considering (Δx)2 instead of Δ(x2). Here we indicate the behavior of the bonus parts. How? Well, what we are really saying is

 (x+Δx)n - xn = nΔx·xn-1+(Δx)2(cxn-2+...). (10)

For us, the parenthetic term on the right is a polynomial in Δx. But we don't care about it because Δx is "small" (or more precisely, close to 0).

So, what to do? We sweep it under the rug. We collect it all into a single term, which we simply denote by "O([Δx]2)".

Exercise: Prove that O(x2)=O(x)2.

Exercise: Prove although O(x)=O(x2) we cannot write O(x2)=O(x). That is, equality for big O terms turns out to be not reflexive.

Just to be absolutely clear, we are considering "small" Δx. That permits us to consider the big O notation for infinitesimal uses. The terminology is very much classical, it is beyond me to call it anything else. But the contributions of (Δx)n gets smaller and smaller as n gets bigger, so we can neglect the terms with factors of Δxn for n>1. How to be rigorously neglectful? By using O([Δx]2).

## The Derivative

So is this it? No! We can do more!

With a linear function, we can meaningfully talk about the slope (the rate of change). It turns out that the slope is the same everywhere. Can we talk about the slope of xn?

Lets try! We see that we really have introduced

 Δf(x) := f(x+Δx) - f(x) (11)

which we see is another function. The idea is that Δ is a "function" that turns a given function, say f(x), into another function precisely as described in equation (11).

Now we can consider

 Δf(x)/Δx = g(x)+O(Δx) (12)
for an attempt at describing the slope of f(x). Is this a good definition? Not really.

Why? Well, we still have the O(Δx) term to consider. So why not take Δx→0? Why not? Well, because we haven't introduced the notion of the limit yet!

So, if we are calculators (since we are studying calculus), we don't mind. We just write the left hand side of (12) as df/dx, and for the right hand side we set Δx=0 which kills the O(Δx) term.

Exercise: why does setting Δx to be 0 kill a O(Δx) term?

This is a bit dangerous, but symbolically calculating it out...this is what we do in practice.

Is this a good idea? It depends who you are. Historically, this is what was done more or less. People got concerned if it was rigorous and correct, which is how real analysis started out: rigorously define limits so we can take derivatives without accidentally dividing by zero.

But now, using this notion for the derivative, we can easily prove many theorems regarding its behavior. We will do this next time.

### Appendix: What is O(x)?

It turns out that O(x) is not a function. It's a set! More precisely, it's a convex cone, an exotic structured set that is important in functional analysis.