Thursday, May 26, 2011

Calculus program

I brought up the problem of coming up with calculus exercises previously. Out of laziness, I used Python and the Sympy package. The following program needs to be modified, so the weighting scheme is actually better.

Right now it produces a specified number of complicated exercises, some of them are not even solvable (the integral exercises needs to be reconsidered). But it's fun regardless!

It's dawned on me that it would be better if it had default options, so one could just hit [enter] to get to the daily dose of calculus. I'll revise the code to produce exercises that vary in difficulty and have this option added...

At any rate, here's the code:


from sympy import *
from import x # so x is a variable, we'll work with fns of it
import random

# returns a nonzero integer constant from -10 to 10
def constant():
    return random.randrange(-10,10,1)

# This method will generate a polynomial with random integer
# coefficients of a specified degree.
def polynomial(deg=7):
    if deg==0:
        return constant()
    elif deg>0:
        return (x**deg)*constant()+polynomial(deg-1)
    elif deg<0:
        return (x**deg)*constant()+polynomial(deg+1)

# There are 13 special functions:
# arccos, arcsin, arctan, cos, cosh, cot, coth, exp, ln,
# sin, sinh, sup, tan, tanh 
specialFn = {0 : acos,
             1 : asin,
             2 : atan,
             3 : cos,
             4 : cosh,
             5 : cot,
             6 : coth,
             7 : exp,
             8 : ln,
             9 : sin,
             10 : sinh,
             11 : tan,
             12 : tanh }

# Need to fix this method, which should produce exercises 
# depending on the weight
def term(weight=30):
    if weight<=0:
        return constant()
    if random.randrange(0,10,1)%3==0:
        tmp = specialFn[random.randrange(0,12,1)]
        return term(weight-10)+tmp(polynomial(3+(weight%4)))
    elif random.randrange(0,10,1)%3==0:
        return term(2*weight/3)**term(weight/2)
        return term(weight-5)+polynomial()

# when printing stuff out, be sure to use "latex(...)"
numDeriv = raw_input("How many derivative exercises do you want? ")
numInt = raw_input("How many integral exercises do you want? ")

except ValueError:
    print "Sorry, invalid input."

# print out the exercises
print "\n"

if numDeriv>0:
    print "Calculate the derivatives of the following functions:"
    print "\n"
    for i in xrange(numDeriv):
        print latex(term(), inline=False)
    print "\n"

if numInt>0:
    print "Calculate the following integrals:"
    print "\n"
    for i in xrange(numDeriv):
        print latex(Integral(term(),x), inline=False)
    print "\n"

TODO: rewrite this so there is a better notion of "weight" to the problem. That is, the difficulty rating is more appropriate.

TODO: rewrite this so it produces a .tex file, which has two sections (problems and solutions), and the problems section has various subsections (e.g. "Derivatives," "Integrals," etc.).

Monday, May 16, 2011

Math Problems

I've been looking for a copy of the Kourovka notebook. It's a collection of group theory problems ranging from PhD thesis level to open research problems.

Born February 16, 1965, the first copy was handwritten at the First All-Union Symposium on Group Theory.

The name derives from the Kourovka tourist centre near Sverdlovsk (where the Symposium took place).

However, sadly enough, I cannot find a modern edition of the Kourovka notebook! I would be eternally grateful to anyone that can point me in the right direction to obtaining a copy...

In its place, however, I have stumbled upon a few open source-ish gems:

  1. Open Problem Garden a collection of open problems (it's a wiki: help it grow!)
  2. Open Problems In Mathematics And Physics - what appears to be an index of other people's "open questions."
  3. Wikipedia's Page, good old wikipedia!
  4. Robert Wilson's Research problems, a collection of group theory problems. (Warning: javascript should be enabled for proper rendering...MathJax is used!)

Sunday, May 15, 2011

Calculus problems

I have been wondering how to come up with good calculus problems. Well, a mathematical exercise in general is either (a) proof based, or (b) calculation based. Sadly calculus is a calculation based subject. So it requires producing a larger quantity of problems rather than higher quality problems.

But such a thing could be automated, couldn't it? Why not?

After reading Stefan Weinzierl's "Computer Algebra in Particle Physics" (arXiv:hep-ph/0209234v1) which documents the construction of a toy computer algebra system, it seems quite easy to come up with a program that would generate functions the student is expected to differentiate.

Conversely, by coming up with a method to carry out differentiation, it seems equally trivial to come up with such a problem, differentiate, then ask the student to integrate.

By a strange coincidence, Haskell has the ability to do symbolic differentiation (see SymbolicDifferentiation.hs or a better implementation Yrogirg describes on Enteropia).

Lets not get caught up with the particular Computer Algebra implementation details. I just wish to sketch some ideas.

We could then consider constructing a function in some manner. E.g., we allot some "weight" to a function depending on how time intensive it would be to calculate it.

The heavier a function is, the longer it takes to compute its derivative. So composing functions results in a heavier function than adding functions. The trick is to relate the weight to the number of operations it would take, and make certain that it is "reasonable" for undergraduates to do.

So a polynomial with k terms in it would be weight k, special functions would have a weight of wt(function)·wt(argument of special function) which depends on the function and its argument. Composing functions would just multiply their weight.

So the program would take in some positive integer, and return a function the student should differentiate. The bigger the integer, the harder the problem.

Conversely, if we can symbolically take derivatives, then we can present integration exercises by taking a derivative problem, take its derivative, and give that to the student to integrate. I think there should be a better way to come up with integration exercises though...


There are a number of wonderful resources on the internet with excellent problems, I'll just note a few of them:

  1. The Calculus Page Problems List
  2. Pi Project
  3. Exercises in Calculus

Differentiation with Big O notation

This is a follow up on the post regarding Big O Notation for Calculus. You will need MathML enabled in order to see this post properly...


1 Weird Numbers

1.1 Slope

Consider some linear function

g(x) = mx + b (1.1)

for some nonzero real number m, and an arbitrary real number b. We can calculate the slope by considering

Δx0 (1.2)

some constant "shift" in x, and using this to figure out the change in g

Δg(x) = g(x + Δx) g(x). (1.3)

What is this? Well, we plug in the definition of g to find

Δg(x) = (m(x + Δx) + b) (mx + b) (1.4)

which reduces to

Δg(x) = mΔx. (1.5)

Thus we may write the slope of g as

m = Δg(x) Δx (1.6)

which is independent of both x and the choice of Δx.

Can we do this in general for a polynomial

f(x) = xn (1.7)

for some n ? Let us try! We consider some nonzero Δx term, and we write (for n > 1)

f(x + Δx) = (x + Δx)n = xn + nxn1Δx + (Δx)2( bonus parts) (1.8)

where the "bonus parts" are other stuff. Actually by the binomial theorem, it would have to be a polynomial in Δx with a nonzero constant term. This information is really encoded in

(Δx)2( bonus parts) = O(Δx)2 (1.9)

where O() is a more rigorous way of saying "bonus parts at least quadratic in Δx." This gives us a more precise way to specify the error when writing out terms at most linear in Δx.

Problem 1.1. What is ΔxO(Δx)? What is (Δx)1O(Δx)2?

We see that we are abusing notation and writing

h(x)Δx + (Δx)2( bonus parts) = h(x)Δx + O(Δx)2 (1.10)

for some h(x). So by dividing through by Δx we obtain

h(x) + (Δx)( bonus parts) = h(x) + O(Δx). (1.11)

This implies

(Δx)1O(Δx)2 = O(Δx) (1.12)

and similar reasoning suggests

(Δx)O(Δx)2 = O(Δx)3 . (1.13)

So let us go on with our considerations.

We then have

f(x + Δx) = f(x) + nxn1Δx + O(Δx)2 , (1.14)

where we were slick and noted the definition of f in order to plug it in. So, we can rewrite this as

f(x + Δx) f(x) = nxn1Δx + O(Δx)2 (1.15)

and we want to divide both sides by Δx. But we know how to do this now! First we will write

Δf(x) = f(x + Δx) f(x) (1.16)

as shorthand, and rewrite our equation as

Δf(x) = nxn1Δx + O(Δx)2 . (1.17)

We divide both sides by Δx

Δf(x) Δx = nxn1 + O(Δx). (1.18)

But we have a problem that we didn't have before: the slope depends on Δx and x.

Historically, people noted that we were working with a term O((Δx)2). If we could make that term equal to 0, then everything would work out nicely. How do we do this? Well, we formally invent a number ε and use it instead of a finite nonzero number Δx.

1.2 ε and 1

We know that we have a "number" i satisfying

i2 = 1. (1.19)

There is no real number which satisfies this, but we can "adjoin" i to . That is, we pretend that i is a variable satisfying equation (1.19), then we have polynomials of the form

p(x,y) = x + i y. (1.20)

Of course, we can formally multiply these polynomials together, and we end up with the number system ("ring") of complex numbers (we would have to prove that 1i exists to make it a field).

Problem 1.2. Why do we not have higher order terms in i? That is, a general polynomial x + i y + i2 z + ?

Lets consider it. Suppose we did have

p(x,y) = x + i y + i2 z. (1.21)

Then we plug in (1.19) to find

p(x,y) = x + i y + (1)z (1.22)

which simplifies to merely

p(x,y) = (x z) + i y. (1.23)

But this is precisely of the form we described: there is some term which is a multiple of i (the imaginary term) and another independent of i (the real term).

Lets consider a similar problem. We want a nonzero "number" ε which is the "smallest" number possible. What would this mean? Suppose we have a "small" finite number

0 < x < 1. (1.24)

Then we see the property specifying that x is small would be

0 < x2 < x. (1.25)

But if we had the smallest number, then the general argument is we expect

ε2 = 0. (1.26)

We call such a ε an "Infinitesimal" number. If we formally consider such an ε (i.e., pretend it exists and obeys this relationship), then we can run into some problems. For example: what is ε1?

1.3 Division by Zero?

The problem is: what is ε1? The answer is: we don't know.

However, why would ε ever be useful? We can consider

f(x) = xn (1.27)

for some n . Then

f(x + ε) = (x + ε)n (1.28)

can be simplified to what? Lets consider the n = 2 case:

(x + ε)2 = x2 + 2ε x + ε2. (1.29)

But the ε2 term vanishes, so

(x + ε)2 = x2 + 2ε x. (1.30)

We see that

(x + ε)3 = (x2 + 2ε x)(x + ε) (1.31)

can be carried out as if it were polynomial multiplication. We then obtain

x2(x + ε) + 2ε x(x + ε) = x3 + x2ε + 2εx2 + 2ε2x (1.32)

and again, the ε2 term vanishes. We thus obtain

(x + ε)3 = x3 + 3ε x. (1.33)

Indeed the general pattern appears to be

(x + ε)n = (x)n + nε xn1. (1.34)

We would like to write

(x + ε)n (x)n = nε xn1. (1.35)

Notice the difference this time: we don't have any O(ε2) terms. The only price we paid is we cannot get rid of the factor ε.

1.4 Big O for the Bonus Parts

The take home moral is that O() enables us to rigorously consider infinitesimals. How? Well, the most significant terms are written out explicitly, and the rest are swept under the rug with O(). For our example of

f(x) = xn (1.36)

we saw we could write

f(x + Δx) f(x) = nxn1Δx + O(Δx)2 (1.37)

which tells us the error of "truncating," or cutting off the polynomial to be explicitly first order plus some change. This change we consider to be in effect "infinitesimal" in comparison to the Δx term.

2 Derivative

We still have these bonus parts when considering the slope. That is, for some nonzero Δx and arbitrary f(x), we have

f(x + Δx) f(x) = h(x)Δx + O(Δx)2 (2.1)

which gives us

Δf(x) Δx = h(x) + O(Δx). (2.2)

We want to get rid of that Δx on the right hand side. How to do this?

Lets be absolutely clear before moving on. We want to consider the slope of our function f. To do this we considered a nonzero Δx, and then constructed

Δf(x) = f(x + Δx) f(x). (2.3)

This function described the difference between the values of f at x + Δx and at x. So, to describe the rate of change we take

Δf(x) Δx = h(x) + O(Δx). (2.4)

But we want to describe the instantaneous rate of change. Although this sounds scary, it really means we don't want to work with some extra parameter Δx. We want to consider the rate of change and describe it in such a way that it doesn't depend on Δx.

So what do we do? Well, the first answer is to set Δx to be 0. This is tempting, but wrong, because we end up with

f(x + Δx) f(x) Δx f(x + 0) f(x) 0 (2.5)

which is not well-defined. The second answer is to consider the limit Δx 0, so we can avoid division-by-zero errors. This is better, and we write

limΔx0Δf(x) Δx = df(x) dx (2.6)

following Leibniz's notation. This is the definition of the derivative of f.

2.1 Divide by Zero, and You Go To Hell!

Well, formally, we need to take the limit Δx 0. What does that mean for the left hand side? Could we accidentally be dividing by Δx and get infinities? This is a problem we have to seriously consider.

The first claim is that

f(x + Δx) = f(x) + O(Δx). (2.7)

This would imply that

Δf(x) Δx = h(x) + O(Δx) (2.8)

for some function h(x). There would be no division by zero errors, but still we have to prove that equation (2.7) is true in general, i.e. for every function f(x). We have seen it is true only for polynomials.

So, let us consider a function

F(x) = 1 xn (2.9)

for some n . What to do? Well, lets consider what happens when xx + Δx, we change x to be x + Δx. We have

F(x + Δx) = 1 (x + Δx)n (2.10)

by definition of F. We would expect then

ΔF(x) = F(x + Δx) F(x) = 1 (x + Δx)n 1 xn. (2.11)

What to do? Well, lets gather the terms together

ΔF(x) = xn xn(x + Δx)n (x + Δx)n xn(x + Δx)n (2.12)

which we can do, since we multiply both terms by 1 (the first term is xnxn, the second term is (x + Δx)n(x + Δx)n). We can then add the fractions together

ΔF(x) = xn (x + Δx)n xn(x + Δx)n (2.13)

and consider expanding the numerator and denominators out. We see that to first order, we have

xn (x + Δx)n = nxn1Δx + O(Δx)2 (2.14)

which shouldn't be surprising (we've proven this many times so far!). The denominator expands out to be

xn(x + Δx)n = xn(xn + O(Δx)) (2.15)

which, for nonzero x, cannot be made 0.

We combine these results to write

ΔF(x) = nxn1Δx + O(Δx)2 xn(xn + O(Δx)) . (2.16)

We observe that we can factor out a Δx in the numerator (the upstairs part of the fraction) and then we can divide both sides by it:

ΔF(x) Δx = nxn1 + O(Δx) xn(xn + O(Δx)) . (2.17)

So what happens if we set Δx = 0 on the right hand side? Do we run into problems? Well, we run into problems on the left hand side, but not on the right hand side.

So what to do? Well, the formal mathematical procedure is to take the limit Δx 0, which then lets us write

limΔx0ΔF(x) Δx = dF(x) dx (2.18)

for the left hand side. For the right hand side, we can symbolically just set Δx = 0. This is sloppy, because it's not quite true. But this is what's done in practice. We get

limΔx0 nxn1 + O(Δx) xn(xn + O(Δx)) = nxn1 xn(xn + 0). (2.19)

Observe that we can combine these results to write

dF(x) dx = n x2n(n1) = nxn1. (2.20)

There was no risk of dividing by zero anywhere.

2.2 Product Rule

Suppose we have two arbitrary functions f(x) and g(x). Lets define a new function

h(x) = f(x)g(x), (2.21)

then what's

Δh(x) =? (2.22)

I don't know, let us look. We see that we first pick some nonzero Δx and then consider

Δh(x) = h(x + Δx) h(x). (2.23)

Now we plug in this expression to equation (2.21), the equation where we defined h, and we find

h(x + Δx) h(x) = f(x + Δx)g(x + Δx) f(x)g(x). (2.24)

We do the following trick: add to both sides

0 = f(x)g(x + Δx) f(x)g(x + Δx) (2.25)

and we obtain

Δh(x) = f(x + Δx)g(x + Δx) f(x)g(x) + f(x)g(x + Δx) f(x)g(x + Δx). (2.26)

We can gather terms together

Δh(x) = (f(x + Δx) f(x))g(x + Δx) + f(x)(g(x) + g(x + Δx)) (2.27)

which simplifies to

Δh(x) = Δf(x) g(x + Δx) + f(x) Δg(x). (2.28)

As usual, we divide both sides by Δx

Δh(x) Δx = Δf(x) Δx g(x + Δx) + f(x)Δg(x) Δx . (2.29)

By taking the limit Δx 0 we end up with

dh(x) dx = df(x) dx g(x) + f(x)dg(x) dx . (2.30)

Notice that we implicitly noted

limΔx0g(x + Δx) = g(x). (2.31)

Of course, we assume that g is continuous at x, which turns out to be correct since differentiability implies continuity (we will prove this at some other time).

Theorem 2.1 (Product Rule). Let f, g be differentiable and

h(x) = f(x)g(x), (2.32)


dh(x) dx = df(x) dx g(x) + f(x)dg(x) dx (2.33)

is the derivative.

We've already proven this. So lets consider an example.

f(x) = xn1 (2.34)

where (n 1) , and

g(x) = x. (2.35)


h(x) = xn. (2.36)

The claim is that

dh(x) dx = nxn1. (2.37)

Is this surprising? No, but the surprising part is that it is a consequence of the product rule. How to prove this? Well, we need to do it by induction on n.

Base Case (n = 2) we see that

f(x) = x (2.38)

and we can see immediately that

dh(x) dx = df(x) dx g(x) + f(x)dg(x) dx = 1 x + x 1 = 2x. (2.39)

So this proves the base case.

Inductive Hypothesis: suppose this will work for arbitrary n.

Inductive Case: for n + 1, we have

dh(x) dx = d(xn1x) dx g(x) + xndg(x) dx (2.40)

Observe we can consider the first term and apply the base case

d(xn1x) dx g(x) = d(xn1) dx xg(x) + xn1d(x) dx g(x) (2.41)

which is then

d(xn1x) dx g(x) = (n 1)(xn2)xg(x) + xn1g(x) = nxn1g(x). (2.42)

The second term is (recall g(x) = x) simpler

xndg(x) dx = xn. (2.43)

We add both of these together to find

dh(x) dx = nxn + xn = (n + 1)xn. (2.44)

But this is precisely what we wanted! And that concludes the inductive proof.

2.3 Chain Rule

We can combine functions together through composition. This looks like

h(x) = g(f(x)). (2.45)

The question is: what's the derivative (rate of change) of h in terms of the derivatives of g and f?

Here we really take advantage of big-O notation. Observe for some nonzero Δx we have

h(x + Δx) = g(f(x + Δx)) (2.46)

but we argued that

f(x + Δx) = f(x) + F(x)Δx + O(Δx)2 . (2.47)

Lets plug this in

h(x + Δx) = g(f(x) + F(x)Δx + O(Δx)2). (2.48)

So we conclude that

Δh(x) = g(f(x) + F(x)Δx + O(Δx)2) g(f(x)). (2.49)

We can divide both sides by Δx simply

Δh(x) Δx = g f(x) + F(x)Δx + O(Δx)2 g(f(x)) Δx . (2.50)

Now what to do?

Well, we can do the following trick: multiply both sides by

1 = Δf(x) Δf(x). (2.51)

This would give us

Δh(x) Δx = g(f(x) + F(x)Δx + O(Δx)2) g(f(x)) Δf(x) Δf(x) Δx . (2.52)

But what is Δf(x)? We recall equation (2.47) and write

Δf(x) = f(x + Δx) f(x) = F(x)Δx + O(Δx)2 . (2.53)

Using this, we can simplify our equation

Δh(x) Δx = g(f(x) + Δf(x)) g(f(x)) Δf(x) Δf(x) Δx . (2.54)

Observe that we may take the limit as Δx 0, which gives us

dh(x) dx = dg(f(x)) df(x) df(x) dx (2.55)

which intuitively looks like fractions cancelling out to give the right answer. Although this is the intuitive idea, DO NOT cancel terms!

Moreover, we should really clarify what is meant by

dg(f(x)) df(x) = (2.56)

Let us first consider

y = f(x). (2.57)

Then really

dg(f(x)) df(x) = dg(y) dy (2.58)

describes what we should do. Namely, first take the derivative of g and then evaluate it at y = f(x).

Theorem 2.2. Let f, g be differentiable at x, and let

h(x) = g(f(x)). (2.59)


dh(x) dx = dg(f(x)) df(x) df(x) dx (2.60)

describes the derivative of h at x.

Again, we also proved this, which concludes this post.