A brief explanation: I am teaching calculus, and it has been a problem how to introduce various concepts. The best way to approach things, I have found, is to

- introduce big O notation as a way of indicating "higher order terms" in a more precise way, and
- use the identity exp(i
*x*)=cos(*x*)+i·sin(*x*) for calculating anything trig-related.

I am obliged by scholarly tradition to document other sources which have taken a similar approach. Donald Knuth's letter "Teach Calculus with Big O" (Notices of the American Mathematical Society **45** (6): 687, unabridged version) suggests heuristically introducing an expedient "big A" function. More importantly, Knuth writes:

Students will be motivated to use O notation for two important reasons. First, it significantly simplifies calculations because it allows us to be sloppy—but in a satisfactorily controlled way. Second, it appears in the power series calculations of symbolic algebra systems like Maple and Mathematica, which today's students will surely be using.

The blog *Mathematics under the Microscope*'s April 14, 2008 post Donald Knuth: Calculus via O notation discusses the advantages and disadvantages of Big O in calculus (Kevin's comment on April 14, 2008 at 4:01 pm extensively notes the disadvantages when using "big A" as Knuth describes).

I take a completely different approach that is more historical. We have this number, i, which is not a real number but satisfies

i^{2} = -1 |
(1) |

*IS*a real number!

Likewise, what if we introduce a "number" ε that's not zero but really small. Recall, if we have a number *x* between 0 and 1, we have

0 < x^{2} < x |
(2) |

0 = ε^{2} < ε. |
(3) |

*really*a good idea?

The serious problem is that we cannot divide by ε, which causes problems later on. There are two questions that come to mind:

- How would we use ε even if we could divide by it?
- We want to be able to divide by all of our numbers, so how do we rectify this problem?

Lets consider a really simple function

f(x) = x^{n} |
(4) |

*n*∈ℕ. So what is

*f(x+ε)*in terms of ε?

Suppose we didn't know. What can we do? Cheat! Well, mathematicians call it "induction on *n*", but it amounts to thinking about what happens when *n* is 1, 2, 3, and so on until we get a general formula. The trick is to treat ε as a variable, so when we expand *f(x+ε)*, we just use high school algebra to do the calculation.

(x+ε) = x+ε | (5a) |

(x+ε)^{2} = x^{2}+2εx+ε^{2} | (5b) |

But note that by equation (3) we have ε^{2}=0, so really (5b) is

(x+ε)^{2} = x^{2}+2εx | (5b') |

We then consider

(x+ε)^{3} = (x+ε)^{2}(x+ε) | (5c) |

We plug in our result (5b') in the right hand side to find

(x+ε)^{3} = (x^{2}+2εx)(x+ε). | (5c.i) |

Then we just multiply the right hand side out as if they were polynomials

(x+ε)^{3} = (x^{2}+2εx)x+(x^{2}+2εx)ε. | (5c.ii) |

We have

(x^{2}+2εx)x = x^{3}+2εx^{2} | (5c.iii) |

for the first term, and

(x^{2}+2εx)ε = x^{2}ε+2ε^{2}x | (5c.iv) |

for the second term. Notice, however, we can simplify this

x^{2}ε+2ε^{2}x = x^{2}ε | (5c.v) |

since ε^{2}=0. We can combine these results to conclude

(x+ε)^{3} = x^{3}+3εx^{2}
| (5c.vi) |

It seems like the general pattern *appears* to be

(x+ε)^{n} = x^{n}+nεx^{n-1}
| (6) |

If we knew the binomial theorem, then immediately we can recognize this as true.

At any rate, the *deviation* from *f*(*x*) is then

(x+ε)^{n} - x^{n} = nεx^{n-1}
| (7) |

So why do we need this fancy number ε anyways? Well, if we didn't have it, then we would be working with some finite (positive or negative real) number

Δx≠0.
| (8) |

Let us try to reconstruct equation (7) with Δ*x* instead of ε! We get

(x+Δx)^{n} - x^{n} = nΔx·x^{n-1}+(nonzero terms).
| (9) |

Like anything bought from Ikea, we end up with bonus parts. But *we don't want bonus parts!* That's the problem.

The solution is to do what we just described, actually, to write "bonus parts" on the right hand side of the equation. But we can be a little more rigorous than this. We can describe the bonus parts. That's the beauty of big O notation.

Revisiting equation (9) we use big O notation to write it as

(x+Δx)^{n} - x^{n} = nΔx·x^{n-1}+O([Δx]^{2}).
| (9') |

The square brackets in O(...) are just to indicate we are considering (Δ*x*)^{2} instead of Δ(*x*^{2}).
Here we indicate the behavior of the bonus parts. How? Well, what we are really saying is

(x+Δx)^{n} - x^{n} = nΔx·x^{n-1}+(Δx)^{2}(cx^{n-2}+...).
| (10) |

For us, the parenthetic term on the right is a polynomial in Δ*x*. But we don't care about it because Δ*x* is "small" (or more precisely, close to 0).

So, what to do? We sweep it under the rug. We collect it all into a single term, which we simply denote by "O([Δ*x*]^{2})".

**Exercise:** Prove that O(*x*^{2})=O(*x*)^{2}.

**Exercise:** Prove although O(*x*)=O(*x*^{2}) we *cannot* write O(*x*^{2})=O(*x*). That is, equality for big O terms turns out to be not reflexive.

Just to be absolutely clear, we are considering "small" Δ*x*. That permits us to consider the big O notation for *infinitesimal* uses. The terminology is very much classical, it is beyond me to call it anything else. But the contributions of (Δ*x*)^{n} gets smaller and smaller as *n* gets bigger, so we can neglect the terms with factors of Δ*x*^{n} for *n*>1. How to be rigorously neglectful? By using O([Δ*x*]^{2}).

## The Derivative

So is this it? No! We can do more!

With a linear function, we can meaningfully talk about the slope (the rate of change). It turns out that the slope is the same everywhere. Can we talk about the slope of *x*^{n}?

Lets try! We see that we really have introduced

Δf(x) := f(x+Δx) - f(x)
| (11) |

which we see is another function. The idea is that Δ is a "function" that turns a given function, say *f*(*x*), into another function precisely as described in equation (11).

Now we can consider

Δf(x)/Δx = g(x)+O(Δx)
| (12) |

*f*(

*x*). Is this a good definition? Not really.

Why? Well, we still have the O(Δ*x*) term to consider. So why not take Δ*x*→0? Why not? Well, because we haven't introduced the notion of the limit yet!

So, if we are calculators (since we are studying *calcul*us), we don't mind. We just write the left hand side of (12) as d*f*/d*x*, and for the right hand side we **set** Δ*x*=0 which kills the O(Δ*x*) term.

**Exercise:** why does setting Δ*x* to be 0 kill a O(Δ*x*) term?

This is a bit dangerous, but symbolically calculating it out...this is what we do in practice.

Is this a good idea? It depends who you are. Historically, this is what was done more or less. People got concerned if it was rigorous and correct, which is how real analysis started out: rigorously define limits so we can take derivatives without accidentally dividing by zero.

But now, using this notion for the derivative, we can easily prove many theorems regarding its behavior. We will do this next time.

### Appendix: What is O(*x*)?

It turns out that O(*x*) is not a function. It's a set! More precisely, it's a *convex cone*, an exotic structured set that is important in functional analysis.

## No comments:

## Post a Comment