Saturday, November 27, 2021

ISO-2145 numbering in Plain TeX

I've been writing terse notes using something like the ISO-2145 standard. Could some TeX macros be written to support writing such notes? Of course!

Like amsart's subsections and subsubsections, I wanted my sections to be "runin style" (i.e., an unindented paragraph with a leading number, possibly with some label in bold, which is separated by vertical space from the previous paragraph). Following Knuth's WEB macros, I use \M to create a new unlabeled "section", and \N{label} to create a new labeled "section". These macros increment the section number, makes the section number (and label, if present) in bold, and does not indent the section's first paragraph.

But I wanted to allow arbitrary depths for "section numbers". What I did was treat the section number like a stack. When I wanted to work on a new subsection level, I call \M[1] (or, for labeled subsections, \N[1]{My label for the new subsection.}).

If I want to "ascend" back to the parent section, I call \M[-1] (and \N[-1]{...}) which will pop the section stack, then increment the top of the stack.

Should I want to move more than one level up or down, I simply call \M[n] (or \N[n]{...}) for any nonzero integer n. If called with n=0, then it's treated as \M (and \N{...}, resp.).

Retrospective

If I were to do this all over again, I would probably use different variable names (for example, I would probably use \chunkctr instead of \section for the counter name). But I just wanted to see if it was possible.

Also, this code could work in LaTeX, but needs some small modifications. (Well, mostly dealing with making \chunkctr a bona fide counter, so \refstepcounter could use its magic for referencing things properly.)

I guess if I wanted to be fully general, I would have some additional parameters to allow customizing the vertical spacing for separating the sections, the horizontal space between the section number and the next symbol, etc.

Code and Worked Example

\newcount\section
\section=0

% technically, sectionDepth is not needed, but it's to avoid popping off
% "too many sections"
\newcount\sectionDepth \sectionDepth=0

% \secprefix will be redefined to append the current section number
% followed by a period
\def\secprefix{}

% \pushSection simply appends "\the\section." to \secprefix's current value
\def\pushSection{
  \global\advance\sectionDepth by1
  \let\sectmp\secprefix
  \xdef\secprefix{\sectmp\the\section.}
}

% update \secprefix from "a.b.(...).c.d" to "a.b.(...).c"
% and set \section to "d" (as a number)
\def\popSection{
  \def\newPrefix{}
  \global\advance\sectionDepth by-1
  \def\updatePrefix##1.##2\endUpdate{
    \ifx\relax##2 % if we are at the last number
      % globally set \secprefix to be the leading (n-1) section numbers
      \xdef\secprefix{\newPrefix}
      % globally set \section to be the last section number from \secprefix
      \global\section=\number##1
    \else
      \xdef\newPrefix{\newPrefix##1.}
      \updatePrefix##2\endUpdate
    \fi}
  \ifx\secprefix\relax % if \secprefix is empty
  \else
    \expandafter\updatePrefix\secprefix\relax\endUpdate
  \fi
}

\def\thesection{\secprefix\the\section.}
\def\stepSec{\global\advance\section by1}

\def\downOne{
  \pushSection
  \global\section=0
}

\def\upOne{
  \popSection
}

\newcount\secIter \secIter=0

% \up ascends back to one of the parent sections, ascending #1 levels
\def\up#1{ % #1 is a negative integer
  \ifnum\sectionDepth>0%
    \secIter=#1%
    \multiply\secIter by-1
    % but \sectionDepth is a non-negative integer, so transform it
    \ifnum\secIter>\sectionDepth%
      \global\secIter=\sectionDepth
    \fi%
    \loop
      \ifnum\secIter>0
      \upOne
      \advance\secIter by-1
    \repeat%
  \fi%
}

% down appends #1 subsections to the given section number
\def\down#1{ % #1 is a positive integer
  \global\secIter=#1%
  \loop
    \ifnum\secIter>0
    \downOne
    \advance\secIter by-1
  \repeat%
}

% move up or down #1 layers, does nothing if #1 is zero
\def\adjustSection#1{\ifnum#1<0\up{#1}\else{\ifnum#1>0\down{#1}\fi}\fi}

% \M is used for unlabeled paragraphs
\def\Mlabel{\medbreak\noindent\ignorespaces%
  {\bf\thesection\ignorespaces}\enspace\ignorespaces}
\def\Mnone{\stepSec\Mlabel}
\def\Mmany[#1]{\adjustSection{#1}\Mnone}
\def\xM{\ifx\tmp[\expandafter\Mmany\else\Mnone\fi}
\def\M{\futurelet\tmp\xM}

% \N is used for labeled paragraphs
\def\Nlabel#1{\medbreak\noindent\ignorespaces%
  {\bf\thesection\ignorespaces\enspace#1\ignorespaces}\enspace\ignorespaces}
\def\Nnone#1{\stepSec\Nlabel{#1}}
\def\Nmany[#1]#2{\adjustSection{#1}\Nnone{#2}}
\def\xN{\ifx\tmp[\expandafter\Nmany\else\Nnone\fi}
\def\N{\futurelet\tmp\xN}

\M
This is a test to see if this works.

\M I hope it works.

\M[1]
And this should be \S{2.1}, I hope.

\N[1]{Claim:} This should be \S{2.1.1}.

\M[-1]
This should be \S{2.2}.

\M[3]
This should be \S{2.2.0.0.1}, I hope?

\N[-3]{Theorem.}%
This is \S3, yes?

\M[-1]
No, this is \S3.

\bye

Addendum: in LaTeX

It turns out to be a straightforward generalization to LaTeX:

\documentclass{article}
\usepackage{fullpage}
\newcounter{chunk}

\def\setSection#1{\setcounter{chunk}{#1}}
\def\stepSec{\refstepcounter{chunk}}

% technically, sectionDepth is not needed, but it's to avoid popping off
% "too many sections"
\newcount\sectionDepth \sectionDepth=0

% \secprefix will be redefined to append the current section number
% followed by a period
\def\secprefix{}
\renewcommand\thechunk{\secprefix\arabic{chunk}.}

% \pushSection simply appends "\arabic{chunk}." to \secprefix's current value
\def\pushSection{
  \global\advance\sectionDepth by1
  \let\sectmp\secprefix
  \xdef\secprefix{\sectmp\arabic{chunk}.}
}

% update \secprefix from "a.b.(...).c.d" to "a.b.(...).c"
% and set \chunk counter to "d" (as a number)
\def\popSection{
  \def\newPrefix{}
  \global\advance\sectionDepth by-1
  \def\updatePrefix##1.##2\endUpdate{
    \ifx\relax##2 % if we are at the last number
      % globally set \secprefix to be the leading (n-1) section numbers
      \xdef\secprefix{\newPrefix}
      % globally set \chunk to be the last section number from \secprefix
      \setSection{\number##1}
    \else
      \xdef\newPrefix{\newPrefix##1.}
      \updatePrefix##2\endUpdate
    \fi}
  \ifx\secprefix\relax % if \secprefix is empty
  \else
    \expandafter\updatePrefix\secprefix\relax\endUpdate
  \fi
}

\def\downOne{\pushSection\setSection{0}}

\def\upOne{\popSection}

\newcount\secIter \secIter=0

% \up ascends back to one of the parent sections, ascending #1 levels
\def\up#1{ % #1 is a negative integer
  \ifnum\sectionDepth>0%
    \secIter=#1%
    % but \sectionDepth is a non-negative integer, so transform it
    \multiply\secIter by-1
    \ifnum\secIter>\sectionDepth%
      \global\secIter=\sectionDepth
    \fi%
    \loop
      \ifnum\secIter>0
      \upOne
      \advance\secIter by-1
    \repeat%
  \fi%
}

% down appends #1 subsections to the given section number
\def\down#1{ % #1 is a positive integer
  \global\secIter=#1%
  \loop
    \ifnum\secIter>0
    \downOne
    \advance\secIter by-1
  \repeat%
}
\def\adjustSection#1{\ifnum#1<0\up{#1}\else{\ifnum#1>0\down{#1}\fi}\fi}

%% \M is used for unlabeled paragraphs
\newcommand\M[1][0]{\adjustSection{#1}\medbreak%
  \stepSec\noindent\ignorespaces%
  \textbf{\thechunk\ignorespaces}\enspace\ignorespaces}

%% \N is used for labeled paragraphs
\newcommand\N[2][0]{\adjustSection{#1}\medbreak%
  \stepSec\noindent\ignorespaces%
  \textbf{\thechunk\ignorespaces\enspace#2\ignorespaces}\enspace\ignorespaces}

\begin{document}

\M
This is a test to see if this works.

\M I hope it works.

\M[1]
And this should be \S{2.1}, I hope.

\N[1]{Claim:} This should be \S{2.1.1}.

\M[-1]
This should be \S{2.2}.

\M[3]
This should be \S{2.2.0.0.1}, I hope?

\N[-3]{Theorem.}%
This is \S3, yes?

\M[-1]
No, this is \S3.

\M[0] This should be \S4.

\N[0]{Corollary.} This is \S5?
\end{document}

Wednesday, November 24, 2021

Mathematical Language and Soft Type Systems

1. After writing some thoughts about the next generation of provers, I'd like to draw your attention to a couple papers:

F. Wiedijk,
"Mizar's Soft Type System."
In: K. Schneider & J. Brandt (eds.), Theorem Proving in Higher Order Logics 2007, LNCS 4732, 383-399, 2007; pdf
Cezary Kaliszyk, Florian Rabe,
"A Survey of Languages for Formalizing Mathematics".
arXiv:2005.12876, 19 pages

I mention these papers as worth reading, for a couple reasons: first, they discuss an example of what a "higher level of abstraction" actually looks like; second, Kaliszyk and Rabe's paper complements Avigad's paper quite well.

2. Definition. Recall that type theory works by making typing judgements part of the metalanguage. A Soft Type System introduces typing judgements as propositions (and types as predicates) in the object language.

2.1. Example. For a concrete example of soft types in predicate logic, a dependent type with $n$ parameters is represented as an $n+1$-ary predicate $T(x, c_{1}, \dots, c_{n})$ interpreted as $x : T~c_{1}\dots c_{n}$. Subtyping is interpreted as $\forall x, S(x)\implies T(x)$ to assert $S$ is a subtype of $T$, and so on.

2.2. Remark (Soft Types in Type Theory). This is not limited to "non-type theoretic foundations". For example C-Corn in Coq uses a version of a soft type system, as a means to increase the level of abstraction.

2.3. I bring this up because soft types increase the level of abstraction for formalizing mathematics. It's a non-obvious step, which approximates mathematical practice, and demonstrates the sort of "extensibility" I think the next generation of provers may need. Amazingly enough, it can encode the axiom of choice (see, e.g., slides about the [type] usage in Mizar for encoding choice).

3. Mathematical Language. The only addendum I have to add from my last post is just a reference to Kaliszyk and Rabe's paper. It works in a "dual" manner to Avigad's paper, trying to think about a "pseudocode" for formalizing mathematics by surveying what different provers do.

The only qualm I have with their paper is precisely addressed by Avigad's paper: it's useful to give a lot of "small snippets" when trying to think about designing a language. Working in the abstract is, well, abstract. And hard.

Tuesday, November 23, 2021

The Next Generation of Theorem Provers

1. A parable. Back in the 1980s and '90s, there was a Cambrian explosion of CPU architectures: SPARC, POWER, PA-RISC, MIPS, and presumably many others. Each processor architecture had its own customized UNIX-like operating system (SPARC had Solaris née SunOS, POWER had AIX, PA-RISC had HP-UX, MIPS had IRIX, etc.).

If you wrote a program and compiled it for one architecture, but I worked on a computer with a different architecture, then you couldn't share your program with me: you had to send me your source code, and hopefully I had a compiler for the language you used, then I could compile your program for my computer.

Of course, this was a nuisance. Some compilers had special features which hindered porting programs. Thus motivated the Java programming language with the motto, "Write once, run anywhere".

2. Theorem Provers. The situation with theorem provers reminds me of the situation with competing CPU architectures in the '80s and '90s. If you wrote a proof in Lean (or Coq or HOL or...), and I work with Coq (or HOL or anything else), then I can't verify your proof in my system. Worse: I can't even "compile it" to my theorem prover. I would have to rewrite it from scratch.

2.1. Terminology. I'm going to generically refer to proof checkers, interactive theorem provers, and automatic theorem provers under the umbrella term Prover. This is non-standard, but the field is so small that any choice would be non-standard.

3. "Generations" of Provers. For the sake of clarity, I think it's useful to introduce a notion of "Generations of Provers", analogous to generations of programming languages.

"First generation provers" include: Automath, Metamath, possibly Twelf, and others lacking automation (or auto-generation of infrastructure, like inductive types which [when defined] introduce an induction principle). First generation provers require you do this manually.

"Second generation provers" range from LCF to HOL, Coq, and Lean (and, I would argue, Mizar). There is some degree of automation, the proof assistant does some of the heavy lifting for you, but you may be required to explicitly state some proof steps.

And that's where we are today: after about 70 years, we've got two "generations" of provers. (This isn't too bad, considering how few people have worked on theorem provers and proof assistants.) We emphasize that "generation" is used metaphorically for the "level of abstraction" afforded by the prover.

4. Hitting a moving target. Unlike the situation facing CPU architects or programming language designers, we who think about theorem provers are trying to formalize an evolving language (mathematics).

We could berate mathematicians for not using theorem provers. But then we have our "tower of Babel" problem, where your prover can't talk to mine. (~~Fanatics~~ Evangelists resolve this problem by insisting on using their "one true Prover", but that's unrealistic and unmerited.)

A better alternative would be to try to come up with the "next generation" of provers. This is easy to say, hard to accomplish. And mathematicians would just say, "Well, can't we make it like LaTeX with custom macros for proving results?" Again: easier said than done.

Designing a language is very hard, even with simple requirements. But a step towards the next generation would be (I believe Freek Wiedijk pointed this out in some article) analogous to how compilers "eat in" a high level language, process it, and then "spit out" assembly. What's the "assembly language" of mathematics? Some first generation prover, like Metamath or Automath.

5. Designing a mathematical language. The way we have learned to design a programming language (from many years of trial and error): write many small programs in the proposed language, and in comments specify the intended outcome. This strategy applies to provers, as well: write down many examples of mathematics [definitions, theorems, proofs or proof sketches] in the hypothetical prover's vernacular, and comment on what it should be doing.

There is much freedom when designing a new prover's vernacular, and it's easy to "cheat". Simply say, "...and somehow the prover will magically do the following: ...".

Perhaps we can turn this on its head, and begin thinking about how to do mathematics as a controlled natural language? This is how I stumbled into theorem proving, running across Freek Wiedijk's note "The Mathematical Vernacular".

6. Semiformal mathematics: controlling the vernacular. I just started jotting these ideas down when I came across Jeremy Avigad's interesting recent paper, "The design of mathematical language". Although I agree with much of the paper, and I especially praise the use of "many small examples from mathematical literature" to guide the discussion, I thought it was limited in exploring the "design space" and eager to accept Lean's design decisions.

For example, consider the definition of a "normal subgroup". Are we defining an adjective ("normal") or a type modifier (since "subgroup" is a type, a "normal subgroup" would be a subtype of "subgroup") or a new type with implicit coercions (any "normal subgroup" is automatically a "subgroup", and satisfy the properties that a subgroup would have)?

Another undiscussed dimension to mathematical definitions (as a domain specific language) would be examining the "stuff, structure, and properties" idea of Baez and Dolan. This template gives us a lot "for free", and can be made somewhat "independent of foundations" by using objects in toposes. It's a small thing, but seldom discussed among provers.

The discussion also lacked adequate examples from applied mathematics (which, for me, would be things like weak solutions to PDEs, or theorems about regularity of PDEs, or something like that...my mind is focused on the Navier-Stokes equations). Granted, this is a domain where many mathematicians flinch and say, "Ouch, this is an unpleasant field." I do not intend it as a criticism, but an observation.

6.1. Analysis, Synthesis. The next natural step in Avigad's paper would be to move from analyzing examples to synthesizing some semi-formal "vernacular".

I honestly don't have a "semi-formal vernacular" at hand to present, but I have been using something similar to Wiedijk's vernacular for the past year. Since April or so of 2021, I have been transcribing my notes into a Zettelkasten (i.e., onto A6 size slips of paper). Proofs and proof sketches have been written using a slight variant of Wiedijk's vernacular.

It works fine for proofs, and forces me to fill in missing steps (or at least, make note of them). But I think proofs as a domain-specific language don't require much thought: we've had very smart people think about it for a century, and they came up with pretty good ideas (natural deduction, judgements, etc.). Now my task boils down to coming up with the right "reserved keywords" corresponding to the proof steps found in mathematical logic.

For definitions, in Field theory and Galois theory, most definitions don't neatly fit the "stuff, structure, properties" mould. This is great! Now I have some ready-to-present examples where it fails, namely whenever a definition looks like: "We call a gadget $G$ Hairy if [hairy property] holds." This occurs a lot in field theory and Galois theory. It also forces me to think deeply about what "species" of definition this is, and what's "really being defined" here.

6.2. Future of Provers? The short answer is, I don't know. The next steps may be to think about the "next generation" of provers, and try to move towards a "foundation independent framework" which resembles how working mathematicians actually operate.

But I think it's wrong-headed to say, as Leslie Lamport put it, mathematicians are "doing proofs wrong" because it resembles Euclid (and Euclid is old, thus bad). It's probably equally as misguided as suggesting that the design of the input language for provers is irrelevant, as any compiler designer will tell you, since this alienates the user-base and hinders adoptation.

Taken together, this suggests we need to make provers resemble "ordinary mathematics" as published and practiced by working mathematicians. Thus investigations along the lines of Avigad's article is well worth pursuing.

My intuition tells me that the third or fourth generation of Provers will resemble compilers, accepting more human intelligible input, producing "assembly code"-like output for a particular foundations of mathematics. Changing foundations would be a "one liner", if that. But what does the input language look like? That's hard to say.

Group Morphisms

1. There is some "Kabuki theater" whenever introducing a new mathematical gadget, thanks to category theory: we have our new gadget, then we could ask about morphisms (very important), "subgadgets", and universal constructions (products, quotients, etc.). The exciting thing, as in Kabuki theater, is the order and manner of presentation.

2. Definition. Let ${G}$ and ${H}$ be groups. We define a Group Morphism to consist of a function ${\varphi\colon G\to H}$ of the underlying sets such that

group operation is preserved: for any ${g_{1}}$, ${g_{2}\in G}$, we have ${\varphi(g_{1}g_{2})=\varphi(g_{1})\varphi(g_{2})}$;
identity element is preserved: if ${e_{G}\in G}$ is the identity element of ${G}$ and ${e_{H}\in H}$ is the identity element of ${H}$, then ${\varphi(e_{G})=e_{H}}$;
inverse is preserved: for any ${g\in G}$, ${\varphi(g^{-1})=\varphi(g)^{-1}}$.

2.1. Remark. Older texts refer to group morphisms as "group homomorphisms". After category theory became popular and part of the standard curriculum, the "homo-" prefix was dropped because group morphisms live in the same category, so it was redundant.

3. Example. One "low hanging fruit" for morphism examples is the identity morphism. We should check, for any group ${G}$, the identity function ${\mathrm{id}\colon G\to G}$ is a bona fide group morphism.

We see it preserves the group operation. For any ${g_{1}}$, ${g_{2}\in G}$, we have ${\mathrm{id}(g_{1}g_{2})=g_{1}g_{2}}$ by definition of the identity function. But this is also equal to ${\mathrm{id}(g_{1})\mathrm{id}(g_{2})}$. Thus the group operation is preserved.

The identity element is preserved ${\mathrm{id}(e_{G})=e_{G}}$.

Inversion is also preserved ${\mathrm{id}(g^{-1})=g^{-1}=\mathrm{id}(g)^{-1}}$ for any ${g\in G}$.

Thus taken together, it follows the identity mapping satisfies the axioms of a group morphism.

4. Example. Let ${G}$ be any group, and consider ${\mathbb{Z}}$ equipped with addition as a group. For each ${g\in G}$, we have a group morphism ${\varphi\colon\mathbb{Z}\to G}$ sending ${1\in\mathbb{Z}}$ to ${g\in G}$. Is this really a group morphism?

We can check that the properties are (or, ought to be) satisfied. If the group operation is preserved, then ${\varphi(1+1)=\varphi(1)\varphi(1)=g^{2}}$ and more generally, for any ${m\in\mathbb{Z}}$, we have ${\varphi(m+1)=\varphi(1)^{m}=g^{m}}$.

For the identity element being preserved, that means ${\varphi(0)=e_{G}}$, which is fine: it corresponds to ${g^{0}=e_{G}}$.

Group inverses would be ${\varphi(-m)=\varphi(m)^{-1}=(g^{m})^{-1}}$. And we know this is precisely the same as ${g^{-m}}$.

5. Example. Consider the group ${\mathrm{GL}(2,\mathbb{R})}$ and the multiplicative group ${\mathbb{R}^{\times}}$ of nonzero real numbers. Then the determinant $$ \det\colon\mathrm{GL}(2,\mathbb{R})\to\mathbb{R}^{\times} \tag{1}$$ is a group morphism. Let us prove it!

We see, for any matrices ${M}$, ${N\in\mathrm{GL}(2,\mathbb{R})}$ we have $$ \det(MN)=\det(M)\det(N). \tag{2}$$ This is a familiar fact in linear algebra. But for us, it tells us the group operation is preserved.

The identity element must be mapped to the identity element. We see the identity matrix ${I\in\mathrm{GL}(2,\mathbb{R})}$ has ${\det(I)=1}$. Thus the determinant preserves the group identity element.

As far as the group inverse, well, this follows from previous results, right? After all, if ${M\in\mathrm{GL}(2,\mathbb{R})}$, then ${M^{-1}\in\mathrm{GL}(2,\mathbb{R})}$, and $$ I = M^{-1}M \tag{3}$$ so $$ \det(M^{-1}M)=\det(M^{-1})\det(M)=1 \tag{4}$$ and thus by division $$ \det(M^{-1})=\det(M)^{-1}. \tag{5}$$ Thus the group inverse operator is preserved.

6. Definition. Let ${\varphi\colon G\to H}$ be a group morphism. We define the Kernel of ${\varphi}$ to be the pre-image of the identity element of ${H}$: \begin{equation*} \ker(\varphi)=\{g\in G|\varphi(g)=e_{H}\}. \end{equation*}

7. Example. For the group morphism ${\det\colon\mathrm{GL}(2,\mathbb{R})\to\mathbb{R}^{\times}}$, the kernel would be $$ \ker(\det)=\{M\in\mathrm{GL}(2,\mathbb{R})|\det(M)=1\}. \tag{6}$$ That is to say, it consists of matrices with unit determinant. Observe, this is a group under matrix multiplication: if two matrices have unit determinant, their product has unit determinant; the identity matrix is in the kernel; and it's closed under inverses. This is an important group called the Special Linear Group, denoted ${\mathrm{SL}(2,\mathbb{R})}$.

1. Properties of Morphisms

8. Proposition. Let ${\varphi\colon G\to H}$ be a group morphism. If ${g\in G}$ is any element, then ${\varphi(g^{-1})=\varphi(g)^{-1}}$.

Proof: Let ${g\in G}$ (so ${\varphi(g)\in H}$). We find ${\varphi(g\cdot g^{-1})=\varphi(g)\varphi(g^{-1})=e_{H}}$, thus multiplying on the left by ${\varphi(g)^{-1}}$ gives the result. ∎

9. Proposition. Let ${\varphi\colon G\to H}$ be a group morphism. If ${g\in G}$ is any element and ${n\in\mathbb{Z}}$ is any integer, then ${\varphi(g^{n})=\varphi(g)^{n}}$.

Proof: Per cases since ${n<0}$ or ${n=0}$ or ${n>0}$. The ${n=0}$ case is obvious.

For ${n>0}$, by induction. The base case ${n=1}$ gives ${\varphi(g^{1})=\varphi(g)^{1}}$, which is obvious. Assume this holds for arbitrary ${n}$. Then the inductive case ${n+1}$ is $$ \varphi(g^{n+1})=\varphi(g^{n}g)=\varphi(g)^{n}\varphi(g)=\varphi(g)^{n+1}. \tag{7}$$ Thus we have proven the result for non-negative ${n}$.

For negative ${n\in\mathbb{Z}}$, the proof is analogous. ∎

10. Theorem. The composition of group morphisms is a group morphism. More explicitly, if ${\varphi\colon G\to H}$ and ${\psi\colon H\to K}$ are group morphisms, then ${\psi\circ\varphi\colon G\to K}$ is a group morphism.

11. Theorem. Let ${\varphi\colon G\to H}$ be a group morphism. If ${\ker(\varphi)=\{e_{G}\}}$, then ${\varphi}$ is injective.

Proof: Assume ${\ker(\varphi)=\{e_{G}\}}$. Let ${g_{1}}$, ${g_{2}\in G}$ be completely arbitrary. (We want to show if ${\varphi(g_{1})=\varphi(g_{2})}$, then ${g_{1}=g_{2}}$.) Assume ${\varphi(g_{1})=\varphi(g_{2})}$. Then ${\varphi(g_{1})\varphi(g_{2})^{-1}=e_{H}}$ by multiplying both sides on the right by ${\varphi(g_{2})^{-1}}$. And ${\varphi(g_{1})\varphi(g_{2}^{-1})=\varphi(g_{1}\cdot g_{2}^{-1})=e_{H}}$. Thus ${g_{1}\cdot g_{2}^{-1}\in\ker(\varphi)}$ by definition of the kernel. But we assumed the only member of the kernel was identity element. Thus ${g_{1}\cdot g_{2}^{-1}=e_{G}}$, and moreover ${g_{1}=g_{2}}$. Hence ${\varphi}$ is injective. ∎

2. Exercises

Exercise 1. Let ${G}$ be a group, ${n\in\mathbb{N}}$ be a fixed positive integer. Prove or find a counter-example: ${\varphi\colon G\to G}$, sending ${g}$ to ${\varphi(g)=g^{n}}$ is a group morphism.

Exercise 2. Prove or find a counter-example: the matrix trace ${\mathop{\rm tr}\nolimits\colon\mathrm{GL}(2,\mathbb{R})\to\mathbb{R}}$ is a group morphism.

Exercise 3. Is the exponential function on the real numbers a group morphism ${\exp\colon\mathbb{R}\to\mathbb{R}^{\times}}$?

Exercise 4. Is the matrix exponential a group morphism ${\exp\colon\mathrm{Mat}_{2}(\mathbb{R})\to\mathrm{GL}(2,\mathbb{R})}$? [Hint: $\mathrm{Mat}_{2}(\mathbb{R})$ has its group operation be matrix addition. Is this preserved?]

Monday, November 15, 2021

Typography of Lie groups, Lie algebras

This is a note for myself as I review my notes on finite groups, Lie groups (both the classic infinite ones and the finite simple groups of Lie type), Lie algebras, and specifically the typography for them.

First, no one agrees completely, and everyone's conventions is slightly different. What I mean by this: although everyone agrees, e.g., that the A_n family of Lie algebras refers to the same thing, some people use serif font for the "A", others use bold, some italicize, others do not, etc. Each field uses its own notation with good reason, but I think we can standardize notation a bit better in group theory.

I'm inclined to follow Robert Wilson's conventions from his book The Finite Simple Groups (2009).

Families of Groups

Families of Simple Lie groups: use upright, teletype font for the family and "math italics" for the subscript if it's a variable, e.g., $\mathtt{A}_{n}$, $\mathtt{B}_{5}$, $\mathtt{C}_{m}$, $\mathtt{D}_{n^{2}}$, $\mathtt{E}_{8}$

Exceptional Finite Simple Groups of Lie Type: don't treat the formatting as special, so, e.g., the Steinberg groups would be ${}^{2}A_{n}(q^{2})$, ${}^{2}D_{n}(q^{2})$, ${}^{2}E_{6}(q^{2})$, ${}^{3}D_{4}(q^{3})$.

Sporadic simple group: these should be made upright, e.g., the Suzuki group is $\mathrm{Suz}$, the Matthieu groups look like $\mathrm{M}_{11}$, the Conway groups $\mathrm{Co}_{1}$ and $\mathrm{Co}_{2}$, and so on. BUT the exception to this rule is that the Monster group is written $\mathbb{M}$ and the Baby Monster $\mathbb{B}$.

Alternating, Cyclic, Symmetric group. These are just written as $A_{n}$, $C_{n}$, or $S_{n}$. The dihedral group, too, is $D_{n}$.

Classical Lie Groups: Here there is a double standard. For classical Lie groups over the reals or complex numbers, we write something of the form $\mathrm{GL}(n, \mathbb{F})$, $\mathrm{SL}(n, \mathbb{F})$, $\mathrm{U}(n, \mathbb{F})$, $\mathrm{SU}(n, \mathbb{F})$, $\mathrm{O}(n, \mathbb{F})$, $\mathrm{SO}(n, \mathbb{F})$, $\mathrm{Sp}(n)=\mathrm{USp}(n)$ for the compact Symplectic group, $\mathrm{Sp}(2n,\mathbb{F})$ for the generic Symplectic group.

The finite groups corresponding to these guys are written a little differently in my notes: the $n$ parameter is pulled out as a subscript, because frequently we write $q$ instead of $\mathbb{F}_{q}$ for finite fields...and then looking at $\mathrm{SL}(8,9)$ is far more confusing than $\mathrm{SL}_{8}(9)$. Thus we have $\mathrm{GL}_{n}(q)$, and so on.

Projective classical groups: the projective classical groups are prefixed by a "P", not a blackboard bold $\mathbb{P}$. E.g., $\mathrm{PSL}_{2}(7)$. At present, the projective orthogonal group wikipedia page seems to agree with this convention.

Operations

For finite groups: the Atlas of finite groups seems to have set the standard conventions for finite groups, Wilson changes them slightly. We'll find $G = N{:}H$ for the semidirect product $G = N \rtimes H = H\ltimes N$. Also $A\mathop{{}^{\textstyle .}}\nolimits B = A{\,}^{\textstyle .} B$ for a non-split extension with quotient $B$ and normal subgroup $A$, but no subgroup $B$. And $A{.}B$ is an unspecified extension.

Just a quick remark concerning $A^{\textstyle{.}}B$, I was unsatisfied with this, and asked TeX.stackexchange for help. The answers were fantastic! I'll quote the two solutions submitted:

% solution one
\makeatletter
\newcommand*{\udot}{{\mathpalette\ud@t\relax}} % or \mathrel/\mathbin/etc?
\newcommand*{\ud@t}[2]{%
   \sbox\z@{\m@th$#1.$}%
   \sbox\tw@{$#1:$}%
   \raise\dimexpr\ht\tw@-\ht\z@\relax\box\z@
}
% Example usage:
%
% $A . A {:} A \udot A$\par
% $\scriptstyle A . A {:} A \udot A$\par
% $\scriptscriptstyle A . A {:} A \udot A$
\makeatother

% solution two
\makeatletter
\newcommand{\udot}{\mathpalette\udot@\relax}
\newcommand{\udot@}[2]{%
  \begingroup
  \sbox\z@{$#1{:}$}%
  \sbox\tw@{$#1{.}$}%
  \raisebox{\dimexpr\ht\z@-\ht\tw@}{$\m@th#1.$}%
  \endgroup
}
\makeatother
\newcommand{\genericextension}[1]{%
  \mathbin{%
    \nonscript\mkern-0.75\medmuskip
    {#1}%
    \nonscript\mkern-0.75\medmuskip
  }%
}
\DeclareRobustCommand{\gext}{\genericextension{.}}
\DeclareRobustCommand{\sdp}{\genericextension{:}}
\DeclareRobustCommand{\nsext}{\genericextension{\udot}}
% Example usage:
%
% $A\gext B$ is a group extension
%
% $A\sdp B$ is a semidirect product
% 
% $A\nsext B$ is a nonsplit group extension
%
% ${:}{\udot}$ for checking the alignment
% 
% $X_{A\gext B} \quad X_{A\sdp B} \quad X_{A\nsext B}$

Lie algebras. Writing notes on paper, for a given Lie group $G$, I write $\mathrm{Lie}(G)$ as its Lie algebra. (It turns out to be a functor...neat!) If I have to write Fraktur by hand, I approximate it using Pappus's caligraphy tutorial.

Friday, November 12, 2021

Charts and Atlases, Manifolds and Smooth Structures

1. Manifolds

We will introduce the machinery necessary for defining a smooth manifold.

1.1. Charts

1. Definition. Let ${X\subset M}$ be some set. An ${n}$-dimensional Chart consists of

an open subset ${U\subset\mathbb{R}^{n}}$
a map ${\varphi\colon U\to X}$

such that ${\varphi}$ is an appropriate isomorphism (for topological manifolds, it is a homeomorphism; smooth manifolds require a diffeomorphism; and so on).

2. Remark. We call ${\varphi\colon U\to X}$ a Parametrization of ${X}$, and ${\varphi^{-1}\colon X\to U}$ a Local System of Coordinates.

3. Remark. Since ${\varphi}$ is an isomorphism, the literature mixes up using ${U\to X}$ and ${X\to U}$. Milner uses ${\varphi\colon U\to X}$, but John Lee uses the opposite convention.

4. Definition. Let ${(U,\varphi)}$, ${(V,\psi)}$ be two charts. We say they are Compatible if

the set ${(\varphi^{-1}\circ\psi)(V)\subset U}$ is an open set;
the set ${(\psi^{-1}\circ\varphi)(U)\subset V}$ is an open set;
the map ${\psi^{-1}\circ\varphi\colon\varphi^{-1}(\psi(V))\to\psi^{-1}(\varphi(U))}$ is smooth; and
the map ${\varphi^{-1}\circ\psi\colon\psi^{-1}(\varphi(U))\to\varphi^{-1}(\psi(V))}$ is smooth.

In particular, the charts are compatible if ${\varphi(U)\cap\psi(V)=\emptyset}$ is disjoint.

5. Remark. We refer to the maps ${\psi^{-1}\circ\varphi}$ as Transition Functions. The condition of smooth is ${C^{\infty}(\mathbb{R}^{n})}$, but different manifolds have different conditions (we could have ${C^{k}}$ charts, or ${C^{0}}$ charts, or analytic ${C^{\omega}}$ charts, or...).

In the older literature (e.g., Kobayashi and Nomizu's Foundations of Differential Geometry), the collection of transition functions form a gadget called a Pseudogroup.

6. Remark. We abuse notation, and could be more explicit by writing $$ \psi^{-1}\circ\varphi\colon\varphi^{-1}(\varphi(U)\cap\psi(V))\to\psi^{-1}(\varphi(U)\cap\psi(V)) \tag{1}$$

▸ Exercise 1. Prove chart compatibility is an equivalence relation.

1.2. Atlases

7. Definition. Let ${M}$ be a set. An (${n}$-dimensional) Atlas consists of a collection ${\{(U_{\alpha},\varphi_{\alpha})\mid\alpha\in A\}}$ of ${n}$-dimensional charts on ${M}$ such that

Covers ${M}$: $\displaystyle{\bigcup_{\alpha\in A}\varphi_{\alpha}(U_{\alpha})=M}$
Pairwise compatible: for any ${\alpha}$, ${\beta\in A}$ the charts ${(U_{\alpha},\varphi_{\alpha})}$ and ${(U_{\beta},\varphi_{\beta})}$ are compatible.

8. Definition. Two ${n}$-dimensional atlases on ${M}$, ${\mathcal{A}}$ and ${\mathcal{B}}$, are called Equivalent if their union ${\mathcal{A}\cup\mathcal{B}}$ is also an atlas. That is to say, if any chart of ${\mathcal{A}}$ is compatible with any chart of ${\mathcal{B}}$.

9. Remark. Remember: charts are compatible, but atlases are equivalent.

10. Lemma. Let ${\mathcal{B}}$ be an atlas, let ${(U,\varphi)}$ and ${(V,\psi)}$ be two charts not contained in ${\mathcal{B}}$. If ${(U,\varphi)}$ is compatible with every chart of ${\mathcal{B}}$, and if ${(V,\psi)}$ is compatible with every chart of ${\mathcal{B}}$, then ${(U,\varphi)}$ is compatible with ${(V,\psi)}$.

11. Theorem. Equivalence of atlases is an equivalence relation.

Proof: Let ${\mathcal{A}}$, ${\mathcal{B}}$, ${\mathcal{C}}$ be arbitrary atlases on ${M}$.

Reflexivity: ${\mathcal{A}}$ is equivalent to itself, since by definition any pair of charts in ${\mathcal{A}}$ are compatible.
Symmetry: let ${\mathcal{A}}$ and ${\mathcal{B}}$ be equivalent atlases, then ${\mathcal{B}}$ and ${\mathcal{A}}$ are equivalent atlases.
Transitivity: this is the nontrivial part. Let ${\mathcal{A}}$ and ${\mathcal{B}}$ be equivalent atlases, and ${\mathcal{B}}$ be equivalent to ${\mathcal{C}}$. Then transitivity follows by considering arbitrary charts ${(U,\varphi)\in\mathcal{A}}$ and ${(V,\psi)\in\mathcal{C}}$, then applying Lemma 10.

Thus "equivalence of atlases" forms an equivalence relation. ∎

12. Proposition. The collection of atlases on a given set ${M}$ is a set, not a proper class.

Proof: The class of atlases is a subcollection of $$ \mathcal{X}=\mathcal{P}\left(\bigcup_{U\in\mathcal{P}(\mathbb{R}^{n})}\mathop{\rm Hom}\nolimits(U,M)\right) \tag{2}$$ where ${\mathop{\rm Hom}\nolimits(U,\mathbb{R}^{n})}$ is the collection of (appropriately smooth, or continuous, or holomorphic, or...) functions from ${U}$ to ${M}$. By ZF axioms, ${\mathcal{X}}$ is a set. ∎

1.3. Manifolds

13. Definition. Let ${M}$ be a set, let ${\mathcal{A}}$ be an ${n}$-dimensional atlas on ${M}$. We call a subset ${B\subset M}$ Open (with respect to ${\mathcal{A}}$) if for any chart ${(U,\varphi)\in\mathcal{A}}$ the preimage ${\varphi^{-1}(B)}$ is open (in ${U}$, and thus open in ${\mathbb{R}^{n}}$). In particular, the images ${\varphi(U)}$ are open.

14. Theorem. If two atlases ${\mathcal{A}_{1}}$ and ${\mathcal{A}_{2}}$ on ${M}$ are equivalent, then a subset ${B\subset M}$ is open with respect to ${\mathcal{A}_{1}}$ if and only it is open with respect ${\mathcal{A}_{2}}$.

15. Remark. This theorem shows an equivalence class of atlases on ${M}$ makes ${M}$ a topological space. We may therefore meaningfully speak about topological properties of ${M}$ (like compactness, connectedness, and so forth).

16. Corollary. Let ${\mathcal{A}}$ be an ${n}$-dimensional atlas for ${M}$. Then the collection of open sets with respect to ${\mathcal{A}}$ form a topology on ${M}$.

17. Definition. Let ${M}$ be a fixed set. A ${n}$-Dimensional Differential Structure (or ${n}$-Dimensional Smooth Structure) on ${M}$ consists of an equivalence class ${\mathfrak{D}}$ of ${n}$-dimensional atlases on ${M}$ such that

Second-Countable: ${\mathfrak{D}}$ contains an at most countable atlas;
Hausdorff: for any distinct ${p,q\in M}$, there exists disjoint open neighborhoods ${U,V\subset M}$ such that ${p\in U}$ and ${q\in V}$.

18. Remark (Smooth Structure using a Maximal Atlas). Equivalence classes are awkward to work with, and so it is more popular to consider maximal atlases. An atlas ${\mathcal{A}}$ is maximal if it contains all charts compatible with every chart in ${\mathcal{A}}$. Given an equivalence class ${\mathfrak{A}}$ of atlases, we may obtain a maximal atlas by considering $$ \mathcal{A}_{\text{max}} = \bigcup_{\mathcal{A}\in\mathfrak{A}}\mathcal{A}. \tag{3}$$ This may be used instead of an equivalence class of atlases in defining a differential structure, provided the second-countable axiom is reworded as: ${\mathcal{A}_{\text{max}}}$ contains an at most countable subatlas.

19. Remark (Convenient Fiction). No one actually constructs either a maximal atlas or a differential structure. We typically construct a smooth atlas on ${M}$, then announce we are working with the differential structure containing our atlas. Thus maximal atlases and, to some degree, differential structures are a convenient fiction.

20. Definition. A (Smooth) ${n}$-Dimensional Manifold consists of a set ${M}$ equipped with an ${n}$-dimensional differential structure.

21. Puzzle. Is this definition correct? By this, I mean: is an "${n}$-Dimensional Differential Structure" actually structure (in the sense of "stuff, structure, and properties")?

Monday, November 8, 2021

Introducing Groups to Beginners

[This is an experiment to see if some software to translate LaTeX to html works.]

1. Introduction. We will do some group theory. Here "group" refers to a "group of symmetry transformations", and we should think of elements of the group as functions mapping an object to itself in some particularly symmetric way.

2. Definition. A Group consists of a set ${G}$ equipped with

a law of composition ${\circ\colon G\times G\to G}$,
an identity element ${e\in G}$, and
an inverse operator ${(-)^{-1}\colon G\to G}$

such that

Associativity: For any ${g_{1}}$, ${g_{2}}$, ${g_{3}\in G}$, ${(g_{1}\circ g_{2})\circ g_{3}=g_{1}\circ(g_{2}\circ g_{3})}$
Unit law: For any ${g\in G}$, ${g\circ e=e\circ g=g}$
Inverse law: For any ${g\in G}$, ${g^{-1}\circ g=g\circ g^{-1}=e}$.

3. Effective Thinking Principle: Create Examples. Whenever encountering a new definition, it's useful to construct examples. Plus, it's fun. Now let us consider a bunch of examples!

4. Example (Trivial). One strategy is to find the most boring example possible. We can't use ${G=\emptyset}$ since a group must contain at least one element: the identity element ${e\in G}$. Thus the next most boring candidate is the group containing only the identity element ${G=\{e\}}$. This is the Trivial Group.

5. Example (Dihedral). Consider the regular ${n}$-gon in the plane ${X\subset\mathbb{R}^{2}}$ with vertices located at ${(\cos(k2\pi/n), \sin(k2\pi/n))}$ for ${k=0,1,\dots,n-1}$. We also require ${n\geq3}$ to form a non-degenerate polygon (${n=2}$ is just a line segment, and ${n=1}$ is one dot).

We can rotate the polygon by multiples of ${2\pi/n}$ radians. There are several ways to visualize this, I suppose we could consider rotations of the plane by ${2\pi/n}$ radians: $$ r\colon\mathbb{R}^{2}\to\mathbb{R}^{2} \tag{1}$$ which acts like the linear transformation $$ r \begin{pmatrix} x\\ y \end{pmatrix} := \begin{pmatrix} \cos(2\pi/n) & -\sin(2\pi/n)\\ \sin(2\pi/n) & \cos(2\pi/n) \end{pmatrix} \begin{pmatrix} x\\ y \end{pmatrix}. \tag{2}$$ We see that the image of our ${n}$-gon under this transformation ${r(X)=X}$ remains invariant.

The other transformation worth exploring is reflecting about the ${x}$-axis, ${s\colon\mathbb{R}^{2}\to\mathbb{R}^{2}}$ which may be defined by $$ s \begin{pmatrix} x\\ y \end{pmatrix} := \begin{pmatrix} 1 & 0\\ 0 & -1 \end{pmatrix} \begin{pmatrix} x\\ y \end{pmatrix}. \tag{3}$$ This transformation also leaves our polygon invariant ${s(X)=X}$.

We can compose these two types of transformations. Observe that ${s\circ s=\mathrm{id}}$ and the ${n}$-fold composition ${r^{n}=r\circ\dots\circ r=\mathrm{id}}$ both yield the identity transformation ${\mathrm{id}(x)=x}$ for all ${x\in\mathbb{R}^{2}}$. Then we have ${2n}$ symmetry transformations: ${\mathrm{id}}$, ${r}$, ..., ${r^{n-1}}$; and ${s}$, ${s\circ r}$, ..., ${s\circ r^{n-1}}$. What about, say, ${r\circ s}$? We find ${s\circ r^{k}\circ s=r^{-k}}$, so ${r^{k}\circ s = s\circ r^{-k}}$. Thus it's contained in our list of symmetry transformations.

The symmetry group thus constructed is called the Dihedral Group. Geometers denote it by ${D_{n}}$, algebraists denote it by ${D_{2n}}$, and we denote it by ${D_{n}}$.

6. Example (Rotations of regular polygon). We can restrict our attention, working with the previous example further, to only rotations of the regular ${n}$-gon by multiples of ${2\pi/n}$ radians. We can describe this group as "generated by a single element", i.e., symmetries are of the form ${r^{k}}$ for ${k\in\mathbb{Z}}$. This is an example of a Cyclic Group. In particular, it is commutative: any symmetries ${r_{1}}$ and ${r_{2}}$ satisfy ${r_{1}\circ r_{2}=r_{2}\circ r_{1}}$. These are special situations, let us carve out space to define these concepts explicitly.

7. Definition. We call a group ${G}$ Abelian if it is commutative, i.e., for any transformations ${f}$, ${g\in G}$ we have ${f\circ g = g\circ f}$. In this case, we write ${f\circ g}$ as ${f+g}$, using the plus sign to stress commutativity.

8. Definition. We call a group ${G}$ Cyclic if there is at least one element ${g\in G}$ such that ${\{g^{n}\mid n\in\mathbb{Z}\}=G}$ the entire group consists of iterates of ${g}$ and ${g^{-1}}$.

9. Example (Number Systems). Another few examples the reader may know are the familiar number systems under addition: the integers ${\mathbb{Z}}$, the rational numbers ${\mathbb{Q}}$, the real numbers ${\mathbb{R}}$, and the complex numbers ${\mathbb{C}}$. They are commutative groups.

10. Example (Infinite dihedral). We can take the infinite limit of the dihedral group to get the infinite dihedral group ${D_{\infty}}$. We formally describe it as consisting of "rotations" ${r}$ and "reflections" ${s}$ such that

${r^{m}\circ r^{n} = r^{m+n}}$ for any ${m}$, ${n\in\mathbb{Z}}$;
${s\circ r^{m}\circ s = r^{-m}}$ for any ${m\in\mathbb{Z}}$;
${s\circ s = e}$;
${r^{n}\circ r^{-n} = r^{-n}\circ r^{n} = e}$ for any ${n\in\mathbb{Z}}$, in particular ${r^{0}=e}$.

In this sense, the "infinite limit" turns rotations into something like the integers.

11. Example (Circular dihedral). A more intuitive "infinite limit" of the dihedral group is the symmetries of the unit circle ${S^{1}}$ in the plane ${\mathbb{R}^{2}}$. These are anti-clockwise rotations and reflection about the ${x}$-axis, but rotations are parametrized by a real parameter (the "angle"): $$ r_{\theta}~``=\!\!\mbox{"} \begin{pmatrix}\cos(\theta) & -\sin(\theta)\\ \sin(\theta) & \cos(\theta) \end{pmatrix}. \tag{4}$$ Here we write an "equals" sign in quotes because this is the intuition. A group is abstract, whereas the matrix is a concrete realization of the symmetry.

The reader should verify the axioms for a group are satisfied, with the hint that ${r_{\theta}\circ r_{\phi} = r_{\theta+\phi}}$ and the usual relation between reflection and rotation holds.

This group is called the Orthogonal Group in 2-dimensions.

Exercises

Exercise 1. Is ${\mathbb{Z}}$ a cyclic group? Is ${\mathbb{C}}$ a cyclic group?

Exercise 2. Is the non-negative integers ${\mathbb{N}_{0}}$ a group under addition? Under multiplication?

Exercise 3. Are the positive real numbers ${\mathbb{R}_{\text{pos}}}$ a group under multiplication?

Exercise 4. Pick your favorite polyhedron in 3-dimensions. Determine its symmetry group.

Exercise 5. Complex conjugation acts on ${\mathbb{C}}$ by sending ${x+i\cdot y}$ to ${x-i\cdot y}$. Does this give us a symmetry group?

Exercise 6 (challenging). If we consider polynomials with coefficients in, say, rational numbers (denoted ${\mathbb{Q}[x]}$ for polynomials with the unknown ${x}$), then how can we form a symmetry group of ${\mathbb{Q}[x]}$?

Exercise 7 (General Linear Group). Take ${n\in\mathbb{N}}$ to be a fixed positive integer, preferably ${n\geq2}$. Consider the collection of invertible ${n}$-by-${n}$ matrices with entries which are rational numbers $${\mathrm{GL}(n, \mathbb{Q}) = \{ M\in\mathrm{Mat}(n\times n, \mathbb{Q}) \mid \det(M)\neq0\}.}$$ Prove this is a group under matrix multiplication.

TeX-nical Stuff

Pages

CSS