TeX-nical Stuff

Saturday, May 14, 2011

Big O Notation for Calculus

A brief explanation: I am teaching calculus, and it has been a problem how to introduce various concepts. The best way to approach things, I have found, is to

introduce big O notation as a way of indicating "higher order terms" in a more precise way, and
use the identity exp(ix)=cos(x)+i·sin(x) for calculating anything trig-related.

I am obliged by scholarly tradition to document other sources which have taken a similar approach. Donald Knuth's letter "Teach Calculus with Big O" (Notices of the American Mathematical Society 45 (6): 687, unabridged version) suggests heuristically introducing an expedient "big A" function. More importantly, Knuth writes:

Students will be motivated to use O notation for two important reasons. First, it significantly simplifies calculations because it allows us to be sloppy—but in a satisfactorily controlled way. Second, it appears in the power series calculations of symbolic algebra systems like Maple and Mathematica, which today's students will surely be using.

The blog Mathematics under the Microscope's April 14, 2008 post Donald Knuth: Calculus via O notation discusses the advantages and disadvantages of Big O in calculus (Kevin's comment on April 14, 2008 at 4:01 pm extensively notes the disadvantages when using "big A" as Knuth describes).

I take a completely different approach that is more historical. We have this number, i, which is not a real number but satisfies

i² = -1

(1)

which IS a real number!

Likewise, what if we introduce a "number" ε that's not zero but really small. Recall, if we have a number x between 0 and 1, we have

0 < x² < x

(2)

but if we had something "smallest" we expect

0 = ε² < ε.

(3)

Now, we should ask ourselves: is this really a good idea?

The serious problem is that we cannot divide by ε, which causes problems later on. There are two questions that come to mind:

How would we use ε even if we could divide by it?
We want to be able to divide by all of our numbers, so how do we rectify this problem?

Lets consider a really simple function

f(x) = xⁿ

(4)

for some natural number n∈ℕ. So what is f(x+ε) in terms of ε?

Suppose we didn't know. What can we do? Cheat! Well, mathematicians call it "induction on n", but it amounts to thinking about what happens when n is 1, 2, 3, and so on until we get a general formula. The trick is to treat ε as a variable, so when we expand f(x+ε), we just use high school algebra to do the calculation.

(x+ε) = x+ε	(5a)
(x+ε)² = x²+2εx+ε²	(5b)

But note that by equation (3) we have ε²=0, so really (5b) is

(x+ε)² = x²+2εx

(5b')

We then consider

(x+ε)³ = (x+ε)²(x+ε)

(5c)

We plug in our result (5b') in the right hand side to find

(x+ε)³ = (x²+2εx)(x+ε).

(5c.i)

Then we just multiply the right hand side out as if they were polynomials

(x+ε)³ = (x²+2εx)x+(x²+2εx)ε.

(5c.ii)

We have

(x²+2εx)x = x³+2εx²

(5c.iii)

for the first term, and

(x²+2εx)ε = x²ε+2ε²x

(5c.iv)

for the second term. Notice, however, we can simplify this

x²ε+2ε²x = x²ε

(5c.v)

since ε²=0. We can combine these results to conclude

(x+ε)³ = x³+3εx²

(5c.vi)

It seems like the general pattern appears to be

(x+ε)ⁿ = xⁿ+nεx^n-1

(6)

If we knew the binomial theorem, then immediately we can recognize this as true.

At any rate, the deviation from f(x) is then

(x+ε)ⁿ - xⁿ = nεx^n-1

(7)

So why do we need this fancy number ε anyways? Well, if we didn't have it, then we would be working with some finite (positive or negative real) number

Δx≠0.

(8)

Let us try to reconstruct equation (7) with Δx instead of ε! We get

(x+Δx)ⁿ - xⁿ = nΔx·x^n-1+(nonzero terms).

(9)

Like anything bought from Ikea, we end up with bonus parts. But we don't want bonus parts! That's the problem.

The solution is to do what we just described, actually, to write "bonus parts" on the right hand side of the equation. But we can be a little more rigorous than this. We can describe the bonus parts. That's the beauty of big O notation.

Revisiting equation (9) we use big O notation to write it as

(x+Δx)ⁿ - xⁿ = nΔx·x^n-1+O([Δx]²).

(9')

The square brackets in O(...) are just to indicate we are considering (Δx)² instead of Δ(x²). Here we indicate the behavior of the bonus parts. How? Well, what we are really saying is

(x+Δx)ⁿ - xⁿ = nΔx·x^n-1+(Δx)²(cx^n-2+...).

(10)

For us, the parenthetic term on the right is a polynomial in Δx. But we don't care about it because Δx is "small" (or more precisely, close to 0).

So, what to do? We sweep it under the rug. We collect it all into a single term, which we simply denote by "O([Δx]²)".

Exercise: Prove that O(x²)=O(x)².

Exercise: Prove although O(x)=O(x²) we cannot write O(x²)=O(x). That is, equality for big O terms turns out to be not reflexive.

Just to be absolutely clear, we are considering "small" Δx. That permits us to consider the big O notation for infinitesimal uses. The terminology is very much classical, it is beyond me to call it anything else. But the contributions of (Δx)ⁿ gets smaller and smaller as n gets bigger, so we can neglect the terms with factors of Δxⁿ for n>1. How to be rigorously neglectful? By using O([Δx]²).

The Derivative

So is this it? No! We can do more!

With a linear function, we can meaningfully talk about the slope (the rate of change). It turns out that the slope is the same everywhere. Can we talk about the slope of xⁿ?

Lets try! We see that we really have introduced

Δf(x) := f(x+Δx) - f(x)

(11)

which we see is another function. The idea is that Δ is a "function" that turns a given function, say f(x), into another function precisely as described in equation (11).

Now we can consider

Δf(x)/Δx = g(x)+O(Δx)

(12)

for an attempt at describing the slope of f(x). Is this a good definition? Not really.

Why? Well, we still have the O(Δx) term to consider. So why not take Δx→0? Why not? Well, because we haven't introduced the notion of the limit yet!

So, if we are calculators (since we are studying calculus), we don't mind. We just write the left hand side of (12) as df/dx, and for the right hand side we set Δx=0 which kills the O(Δx) term.

Exercise: why does setting Δx to be 0 kill a O(Δx) term?

This is a bit dangerous, but symbolically calculating it out...this is what we do in practice.

Is this a good idea? It depends who you are. Historically, this is what was done more or less. People got concerned if it was rigorous and correct, which is how real analysis started out: rigorously define limits so we can take derivatives without accidentally dividing by zero.

But now, using this notion for the derivative, we can easily prove many theorems regarding its behavior. We will do this next time.

Appendix: What is O(x)?

It turns out that O(x) is not a function. It's a set! More precisely, it's a convex cone, an exotic structured set that is important in functional analysis.

Tuesday, May 10, 2011

Bibliographic information

Mary-Claire van Leunen's Handbook for Scholars argues against bibliographic footnotes (which saves space, makes a "handsomer" page, and avoids interrupting the reader's attention – all of these are "automatic benefits").

The main change is to embed the citation. But embedding is not a mechanical process.

Knuth embeds citations entirely in The Art of Computer Programming which tends to be frustrating (since he uses the old fashioned approach using abbreviations harder to crack than the Purple Code). Just remember that the volume is always in bold...

The basic elements to a bibliography item amounts to three things

1. Author;
2. Title;
3. Bibliographic Information.

There are also two optional extra data that could be provided:

4. More bibliographic information;
5. Annotation.

The bibliographic data in (3) describes the work you have before you that you are quoting and referring to. The bibliographic data in (4) describes some other version of the work that you know about and that your reader may find more accessible than your version.

So, for example, consider this paper on the arXiv. How to format it in the bibliographic format appropriate? Well, it would be:

Author	→	John C. Baez, James Dolan.
Title	→	"From Finite Sets to Feynman Diagrams."
Bibliographic Information I	→	In Björn Engquist and Wilfried Schmid, editors, Mathematics Unlimited – 2001 and Beyond 1, pages 29–50, Springer 2001.
Bibliographic Information II	→	`arXiv:math/0004133v1 [math.QA]`

Regarding the extra bibliographic information, van Leunen advises:

Are you referring to a rare work? Your inexperienced reader will want to know how to get hold of another copy. Are you referring to an expensive hardcover? Your impoverished reader will bless you for mentioning the cheap paperback as well. Are you referring to a pirated Taiwan edition? Your reader of tender conscience will appreciate knowing about a version on the up-and-up.

Mary-Claire van Leunen, A Handbook for Scholars
Random House (1978) page 174

Now what's with the last bit of information? Well, it's annotations that may be useful. Is the book factually incorrect but has useful diagrams to refer to? Is the author of the eprint a crackpot? What should we know? Why are we using the reference anyway?

Personally, the rule of thumb I try to abide by is: throw all the details you can about the reference you are using. Don't use ambiguous and cryptic abbreviations. And, in my own notes, when I refer to a library book...I also tend to put the call number down (and the library I got it from).

Sunday, May 8, 2011

The "Mathematical Vernacular"

I'd like to write some remarks about Wiedijk's Mathematical Vernacular article. Wiedijk credits de Bruijn's "The mathematical vernacular, a language for mathematics with typed sets" (in P. Dybjer et al.'s Proceedings of the Workshop on Programming Languages, Marstrand, Sweden 1987) as the origin of a formal language — a "mathematical vernacular" — underlying mathematical proofs.

Wiedijk departs from de Bruijn by inspecting declarative proof assistants (e.g. Mizar or Isar) and using this as the foundation for the mathematical vernacular.

There are four "syntactic categories" in Wiedijk's mathematical vernacular: label, variable, term and formula.

Let us recall from mathematical logic that terms are expressions which can be obtained from constants, variables, and functions (or, if you'd prefer, formulas). I will be a little bit sloppy here, since "we all know" what a variable and a formula are.

A label is a generalisation of the notion of equation numbering. It is used to indicate references to specific locations in the proof, typically to justify some reasoning in a proof.

Introducing the Formal Grammar

The mathematical vernacular can be pictured as a programming language for proofs. So it has a formal grammar. Lets quickly run through it.

statement = proposition justification ;
proposition = [ label : ] formula
justification =
  empty
| by reference {, reference}
| proof {step} [ cases ] end
reference =
  label
| -
step =
  statement
| thus statement
| let variable {, variable} ;
| assume proposition ;
| consider variable {, variable} such that proposition justification ;
| take term {, term} ;
cases = per cases justification ; {suppose proposition ; {step}}
empty =

Note that the keywords are typed in a different font. Also note that when justifying the current step by the previous line, we indicate this as "by - ;".

Most of these are straightforward in their design, but lets consider an example of how to translate a given proof into the vernacular.

Theorem 43 (Pythagoras' Theorem). √2 is irrational.

The traditional proof ascribed to Pythagoras runs as follows. If √2 is rational, then the equation

a² = 2b² (4.3.1)

is soluble in integers a, b with gcd(a, b) = 1. Hence a² is even, and therefore a is even. If a = 2c, then 4c² = 2b² , 2c² = b² , and b is also even, contrary to the hypothesis that gcd(a, b) = 1.

a² = 2b²	(4.3.1)

G.H. Hardy and E.M. Wright An Introduction to the Theory of Numbers,
4th edition, Clarendon Press, Oxford, 1960. Pages 39–40

Lets try to consider this in the mathematical vernacular. We are assuming for contradiction that 2^1/2 is rational, then we obtain a contradiction.

Theorem 43: sqrt(2)∉ℚ
proof
    assume sqrt(2)∈ℚ;
    consider a, b∈ℤ such that
(4.3.1): a² = 2b² and
    gcd(a, b)=1;
    thus a² is even by (4.3.1);
    thus a is even by -;
    consider c∈ℤ such that a=2c;
    thus 4c²=2b² by -, (4.3.1);
    thus 2c²=b² by -;
    thus b is even by -;
    thus ⊥ ;
end;

Note we use the notation ⊥ for "falsum," or contradiction.

Of course, here we are a little bit sloppy since the assertion that a² being even implies a being even is not proven. But it is elementary arithmetic that may be verified quickly.

Also we are a little bit sloppy since the label "Theorem 43" should really be "Theorem_43", but still little is affected by that. We are humans!

It's actually kind of fun to start verifying proofs in the mathematical vernacular, because it makes any mistakes in a proof manifestly evident. You also get the same feeling of rigor you get if you tried putting the proofs in a purely symbolic way (cf. Russell and Whitehead's Principia Mathematica).

Explaining the Formal Grammar

Wiedijk does a wonderful job explaining the logical structure of each of these statements, but I'll give a "user's manual" explanation of what these steps mean in practice.

let step

This has always confused me, at least distinguishing it from the consider…such that…; step. When you consider it as introducing a variable taking values in a given collection, it becomes clear how to use it. This is where we declare a variable (in the computer science sense), and the consider step initializes it to a value.

If we want to prove, e.g., a set X satisfies some property P(x) for all x∊X, it's done in two steps: let x∊X; [and then prove that P(x) is true].

assume step

This can be used in several different ways.

Suppose we wanted to prove a statement of the form "If p, then q" then we have our proof be of the form "assume p; [prove q logically follows]".
Proof by contradiction. "If p, then q" we want to "assume p and ¬q; [Prove a contradiction]".

thus statement

This is really the meat and potatoes of a proof, it's the transition to do deductions.

per cases statement

Often in practice, we end up reducing our proof to several simple cases. This is precisely what we do in this step.

consider step

This step initializes a variable to a certain value, or introduces a constant.

take step

This introduces a "term", with an existential quantifier. That is to say, if we want to prove "∃x P(x)", we first "take a;" for our "x". Then we will prove (constructively!) that P(a) is true.

Alternatively, if we use the approach of proof by contradiction to consider proving "⊤ and ¬(∃x P(x))" which is logically equivalent to "⊤ and ∀x ¬P(x)", then we could use the let step. More precisely it would be "let x; assume ¬P(x); [prove a contradiction]". We should note that ⊤ is always assumed at each step, so we don't need to explicitly state "assume ⊤ and ¬P(x);".

Although to be truthful, I don't believe I've used the take step. This may be due to my lack of work applying the vernacular to proofs.

Automatic Proof Checker?

If anyone was crazy enough to want to write an automatic proof checker using the mathematical vernacular as the language, one would first have to consider writing a first-order theorem prover.

I will not do this, but give a few references for the curious among us, namely David Amos's A Theorem Prover for Intuitionistic Propositional Logic. LC Paulson's eprint "Designing a Theorem Prover" would be a useful reference as well.

Perhaps Marnix Klooster's A Haskell module for sound Ghilbert-style proofs would be useful to examine too.

Problems

I've decided to keep a list of problems to think about that are not solved to my knowledge, or that I want to work out explicitly for myself. These are random in subject, but interesting...

I've noted some other people's math problems while looking for the Kourovka notebook that may be interesting to some people. I would just like people to note the wiki Open Problem Garden: help it grow!

List 1 (5:02 PM on Sunday, May 8, 2011)

Stuff, Structure, and Properties related Problems

Bourbaki formalised the notion of structure, specifically there are three mother structures: topological, algebraic, or ordered. It is easy to see that algebraic and ordered structures fit into the "stuff, structure, properties" paradigm; what about topological structure?

Sunday August 28, 2011 at 05:36:04PM (PDT): It appears that if we have a mathematical gadget, we consider the category Gad consisting of these gadgets and "gadgetomorphisms", we may equip a given mathematical gadget with some topological structure if Gad is a topos. (Is there a weaker condition to consider equipping some gadget a topology?)

We can construct the topology by considering all subobjects of a given object, and demanding the inclusions from the pullback diagram to the subobject classifier are "continuous" with respect to the topology...then construct the topology in this manner.
Bourbaki's Set Theory (Springer–Verlag, 2004) notes that "A given species of structure therefore does not imply a well-defined notion of morphisms" (pp 272). Is this true for the Baez–Dolan notion of "stuff, structure, properties"?
Stuff, structure, and properties enables internalisation in the "obvious way" — the first problem is to make this rigorous. The second problem: can we internalise topological structure?

Sunday August 28, 2011 at 05:39:51PM (PDT): one problem is that topological spaces are "nonfirstorderizable". See Anand Pillay's "First Order Topological Structures and Theories" (Journal of Symbolic Logic 52 3 (1987) 763–778, JSTOR) for more on making topology something first order (ish), and that would be a step towards internalisation.

Tuesday October 25, 2011 at 08:19:24AM (PST): figured out a solution to this, posted it to my notebook [code.google.com].

In a few papers by R J Low (e.g. "Twistor linking and causal relations" Class. Quantum Grav. 7 (1990) 177; "Spaces of causal paths and naked singularities" Class. Quantum Grav. 7 (1990) 943; and "Twistor linking and causal relations in exterior Schwarzschild space" Class. Quantum Grav. 11 (1994) 453) linking is related to causality. Arnold pointed out this would be interesting to investigate with knot theory. What are knots telling us?

Additionally, Roger Penrose's Techniques of Differential Topology in Relativity (J. W. Arrowsmith Ltd., Bristol: England 1972) discusses the space of causal curves in section 6. This is a worth-while supplement (complement?) to Low's papers.

Answer: Vladimir Chernov (Tchernov) and Yuli B. Rudyak's "Linking and causality in globally hyperbolic spacetimes" (arXiv:math/0611111v3 [math.GT]) appears to solve this problem.
Categorification (Vertically) of Various Mathematical Gadgets

Can we categorify number theory? Specifically the Dedekind η function?

Tom Leinster discusses the Möbius inversion of a category at the n-Category Cafe.
Can we categorify the Jacobi theta function?
When we (vertically) categorify a sheaf, we get a stack. If we consider a vertex operator algebra as a functor, we may view it as almost like a Vect-valued sheaf. If we look at it in this light, can we vertically categorify a vertex operator algebra analogous to categorifying a sheaf to obtain a stack?

Moreover, Baez and Dolan's "From Finite Sets to Feynman Diagrams" (in Mathematics Unlimited — 2001 and Beyond 1, eds. Bjorn Engquist and Wilfried Schmid, Springer, Berlin 2001, pp. 29–50 arXiv:math/0004133v1 [math.QA]) gives us a categorification of ladder operators; can we pull-back the construction of vertex operator algebras to obtain the same categorification of them? Would it make a big, nice diagram commute between these different approaches?

Is there any hope of classifying 2-groups?
Concerning the whole topology of X encoded in C(X) = Hom(X,ℂ) idea.
1. Consider the group Hom(X,ℂ) of continuous complex-valued functions on a surface X. What data can we recover, and how do we recover the topological information of X from this C(X) algebra alone?
2. A related problem: study Hom(𝕆ℙ²,ℂ) of continuous complex-valued functions on the octonionic plane 𝕆ℙ². What sort of topological information do we recover?
3. If X is a smooth manifold and we restrict C(X) to real-valued functions — how does it relate to Morse theory? Is this whole scheme really just a "poor man's Morse theory"?
Just a few thoughts about this. We know that the set of Morse functions is dense in the set of smooth functions on a manifold. So the set of Morse functions is dense on a subset of Re(C(X)).

Answer: Gennadi Sardanashvily provided a solution to this confusion "relating Morse functions with the smooth real-valued functions on a manifold" idea:
No, there is no direct relation between the differential calculus over the ring of smooth real functions on a smooth manifold X and the Morse theory. Of course, the both constructions deal with smooth functions, but in different aspects.

Sunday May 8, 2011 at 10:17:14PM (PDT)

Concering infinite-dimensional differential geometry and Lie theory.
1. How do we define an infinite-dimensional manifold? What does infinite-dimensional differential geometry look like?
2. How do we define an infinite-dimensional Lie group? What is its tangent space to the identity element? And, of course, the structure theory of infinite-dimensional Lie theory, and the problem of categorifying all of this…
3. With L_∞-algebras (which are vertical categorification of Lie algebras — that is, we work with chain complexes and relax the equalities to homotopy equivalences), how does it relate to categorification of Kac–Moody Algebras? How does all this relate to categorifying Vertex Operator Algebras outlined before?
It seems that vertical categorification amounts to internalisation in Cat. (a) Is this a good definition? (b) Would providing a "rigorous method" to do one induce a method to do the other?

Answer to (a): this gives us a strict categorification, a better approach would be: consider categorifying a "mathematical gadget". We have the category Gad of "mathematical gadgets". We consider a category object internal to Gad, and that gives us a (vertically) categorified version of our mathematical gadget. If we consider a groupoid object internal to Gad, we get a groupoidified (or "horizontally categorified") version of the gadget.
Concering biology (or what Baez calls, I think, "Green Mathematics").
1. Vernon Avila's Biology: Investigating life on earth (Jones and Bartlett (1995) pp. 11–18) gives a number of axioms for biology, namely (1) Cells are the basic unit of life; (2) New species and inherited traits are the product of evolution; (3) Genes are the basic unit of heredity; (4) An organism regulates its internal environment to maintain a stable and constant condition; (5) Living organisms consume and transform energy.
  
  This is a little bit sloppy but could we make it (more) rigorous? Is there some stronger axiomatic schemata which describes biology? What theorems can we derive from this?
  
  For example, how would we model a cell? It is tempting to model it as a category, so that we can really use the objects as various states of the cell. This is kind of like modeling it as a dynamical system (see Lawvere's Conceptual Mathematics for this). How do we model an organism? Tissues and organs? Species (of life)? Etc. etc. etc.
2. What is the topology of DNA? What does it tell us? Can we use the machinery of knot theory to say anything interesting with DNA? Doesn't DNA exhibit a sort of hysteresis? On a related note...what would the moduli space of DNA look like?

Notes on Set Theory

So, I've been writing notes about mathematics and have been pondering about the problem "What is a mathematician's notebook like?"

Elements of Mathematics

I've decided to start writing a series of cohesive notes on mathematics. This is partially inspired by Bourbaki, among others.

So, I just finished some notes on set theory (Fascicles 0: Naive Foundations). It's around 120 pages of notes, and 10 pages of front matter.

I think that, like Donald Knuth, I'll try to write fascicles that are roughly 128 pages in length. This will limit my focus, and allow self-contained "quanta" of mathematical writing to be released.

One issue that came up: how to come up with good exercises?

I asked one of my professors. He said, "Ah, that's a good question Alex. I don't have a good answer. However, some times I add exercises which are examples I cannot get to in class. The real answer is: you'll know when you start teaching."

However, in Atish Bagchi and Charles Wells' Varieties of Mathematical Prose, they note that there are really 6 types of examples in mathematical writing:

An "Easy Example" is one that can be verified quickly.
A "Motivating Example" is one that is given before some definition. In my notes, for example, the discussion of a span of sets is a motivational example for binary relations.
A "Delimiting Example" is an example with the smallest possible number of elements or an example with degenerate structure.
A "Deceptive Non-Example" is one which innocently resembles something to be true, but really beneath the skin level is false.
An "Elucidating Example" is one that really clarifies various aspects to a notion or definition.
An "Inventory Example" is just a grocery list of useful examples that are not too general or specific.

I suppose that exercises follow this as well? It seems like the best exercises should have some sort of "moral" or "lesson". Frequently one finds "Prove or find a counter-example: ..." type of exercises, but is there any trick to coming up with good exercises?

A Mathematician's Notebook

It is the stuff of folk-lore that all mathematicians have cherished notebooks. But what do they do with them? I started reading The Kourovka Notebook (a notebook of unsolved problems in group theory), and noticed that V. I. Arnold had Online Problem Lists.

I also observed that Robert Wilson also keeps online a list of Research Problems.

I suppose, in short, a mathematician's notebook is focused more on problems than notes. This is what characterizes a good mathematician: the ability to come up with good exercises, good problems, good questions.

Of course, presenting them and coming up with solutions are also vital. It's not good if no one understands what you are asking! Donald Knuth, et al.'s Mathematical Writing lecture notes are an excellent introduction, they tackle various rules to bear in mind on pp. 3–8.

Gel'fand wrote in Pavel Etingof et al.'s The Unity of Mathematics: In Honor of the Ninetieth Birthday of I.M. Gelfand Progress in Mathematics 244 (2006) pp. xiv, that:

The next question was: How can I work at my age? The answer is very simple. I am not a great mathematician. I speak seriously. I am just a student all my life. From the very beginning of my life I was trying to learn. And for example now, when listening to the talks and reading notes of this conference, I discover how much I still do not know and have to learn. Therefore, I am always learning. In this sense I am a student—never a "Führer."

I forgot who said it, but I think it's best to have some applications in mind. Even if it's something along the lines of "How do I [accomplish some pure mathematical calculation]?" My use of "applications" is far more liberal than it appears!

Arnold's lecture On teaching mathematics had the opinion that: "Since scholastic mathematics that is cut off from physics is fit neither for teaching nor for application in any other science, the result was the universal hate towards mathematicians — both on the part of the poor schoolchildren (some of whom in the meantime became ministers) and of the users."

More liberally I think it is fair to say that mathematics is the language of science. So to really come up with good exercises, a mathematician should be constantly reading scientific literature — even popular texts on scientific topics.

But the short, and honest, answer is: I don't know how to come up with good problems. If anyone has any tips or advice, please share!

Case Studies

Curiously Knuth writes on pg 21 of his notes on mathematical writing:

Exercises are some of the most difficult parts of a book to write. Since an exercise has very little context, ambiguity can be especially deadly; a bit of carefully chosen redundancy can be especially important. For this reason, exercises are also the hardest technical writing to translate to other languages.

Copyright law has changed, making it technically necessary to give credit to all previously published exercises. Don says that crediting sources is probably sufficient (he doesn't plan to write every person referenced in the exercises for his new book, unless the publisher insists). Tracing the history of even well-known theorems can be difficult, because mathematicians have tended to omit citations. He recently spent four hours looking through the collected works of Lagrange trying to find the source of "Lagrange's inequality," but he was unsuccessful. Considering the benefit to future authors and readers, he's not too unhappy with the new law.

We can dispense with some of our rhetorical guidelines when writing the answers to exercises. Answers that are quick and pithy, and answers that start with a symbol, are quite acceptable.

Curiously, Terry Tao writes on page 2 that:

A good problem should be inviting enough to naturally draw the student in to attempt it, challenging enough that the student recognises that something nontrivial has to be done, and clean enough that the insight provided by the solution is not obscured by a mass of time-consuming calculations. It is here that a student can finally get a taste of how one's mathematical power increases when a tool is understood and used correctly, and how one can use the current class material to revisit earlier subjects and clarify them even further.

Inviting, challenging, clean — these are the mark of a good exercise. Perhaps, then, it is not too far of a stretch to suggest these are also the mark of a good example?

In an attempt to answer my own questions, let me list a number of "problem books".

Kirby's Problems in Low-Dimensional Topology: A collection of problems in knot theory, surfaces, 3-manifolds, 4-manifolds, complexes, graphs, TQFTs, etc.
The Open Problems Project: Open mathematical problems relating to computational geometry.
Arnold's Problem List (February 1998).
: 11 Problems on various analysis topics.
Arnold's Problem List (September 1998).: Dealing with Pseudoring of Lie Algebras, Quaternionic matrices, Quaternionic vector bundles, Contact version of Liouville's theorem, Hofer fields, Uncertainty Principle for Lattices, Singularities of curves in contact geometry, causality in terms of linking, triangulation of knots, "Nekrasov's Discriminant", among other topics covered. I like how it is simple, but covers slightly different disciplines (e.g. relativity and knot theory).
Two Problems by Arnold (September 2001).: Dealing with Betti numbers of parabolic sets, and caustics of periodic functions.
Arnold's Problem List (January 2002).: 12 pages of fascinating problems, mostly relating to aspects of curves, etc.

Pages

CSS