From the Wallis formula to the Gaussian distribution
1 From the Wallis formula to the Gaussian distribution Mark Reeder Department of Mathematics, Boston College Chestnut Hi...
From the Wallis formula to the Gaussian distribution Mark Reeder Department of Mathematics, Boston College Chestnut Hill, MA 02467 February 11, 2012
The Wallis formula for π
The basic integrals
The Binomial Theorem
Wallis and coin-tossing
The Gaussian integral and factorial of
Wallis and the binomial distribution
The function erf(x).
Wallis and the ellipse
The Wallis formula for π
And he made a molten sea, ten cubits from the one brim to the other: it was round all about, and his height was five cubits: and a line of thirty cubits did compass it round
about. (I Kings 7, v.23).
This ancient text says that π = 3.0. Actually, much better approximations were known prior to this, for example, the Egyptians used the approximation 3.16. Now, of course, billions of digits of π are known. The next 57 of them are π = 3.14159265358979323846264338327950288419716939937510582097 . . . These digits do not repeat themselves, and have no recognized pattern. However, John Wallis (1616-1703, Savilian professor of Geometry at Oxford) discovered that there is a pattern, if we write π as a product of fractions, instead of a sum of powers of 10. Wallis found that π 224466 = ··· 2 133557
Note that we have written the right side as a product of little fractions; we have not written π ? 2 · 2 · 4 · 4 · 6 · 6··· = 2 1 · 3 · 3 · 5 · 5 · 7··· because we might be tempted to cancel the odds below by their doubles above, and get π ?! = 2 · 2 · 4 · 2 · 2 · 8 · 8 · 2 · 2··· 2 which is absurd. To be clear, let us define the Wallis fractions: Wk =
224466 2k 2k ··· , 133557 2k − 1 2k + 1
22 4 2244 64 = , W2 = = , 13 3 1335 45 Then the precise statement of Wallis’ formula is W1 =
lim Wk =
π . 2
Let us try to convince ourselves of this, using a computer. We’ll actually compute 2 · Wk , which approaches the more recognizable decimal π. We find that 2 · W3 = 2.92571 . . . ,
2 · W4 = 2.97215 . . . , 2
W5 = 3.00218 . . . ,
2 · W100 = 3.13379 . . . ,
2 · W1000 = 3.14081 . . .
So it seems to be getting there, but very slowly. When we say “limit as k goes to infinity” we mean it! The above approximations are so weak that it seems no one armed with just a quill pen and expensive parchement would be able to guess that the sequence 2Wk actually converges to π. Wallis arrived at his formula for π by a wild and creative path, guided by guessing and intuition, along with lots of persistance. It appears as Proposition 191 in his book Arithmetica Infinitorum (Arithmetic of Infinitesimals) published in 1656. This book was Wallis’ effort to find systematic methods to compute areas under curves, based on recent new ideas of Cavallieri, as expounded by Torrecelli. In those days, finding area was called “quadrature” from the Latin quadratus, meaning “square”. The aim was to express the area of curved regions in terms of square units. In Wallis’ day, the most famous quadrature was Archimedes’ Quadrature of the Parabola, which set the standard. In his preface to Arithmetica Infinitorum1 : Wallis writes: And indeed if the quadrature of one parabola rendered so much fame to Archimedes (so that then all mathematicians since that time placed him as though on the columns of Hercules), I felt it would be welcome enough to the mathematical world if I taught the quadrature also of infinitely many kinds of figures of this sort. When Wallis showed his formula to the Christian Huygens (1629-1695, inventor of the pendulum clock), the latter was highly skeptical until Wallis could demonstrate that the right side of (1) agreed with π/2 to at least nine decimal places. Although W1000 is nowhere close to this, William Brouckner (1620-1684, first President of the Royal Society) showed Wallis a remarkable way approximate Wallis’ product using continued fractions. This occupies the mystifying last few pages of Arithmetica Infinitorum, and we’ll eventually understand this via the much later work of Thomas Stieltjes (1856-1894). Let us see how Wallis found his formula. First of all, the letter π, standing for the “periphery” of a circle with unit diameter, seems to have been first used by the Welsh mathematician William Jones in his textbook, Synopsis Palmariorum Mathesos, printed in 1706 just after Wallis’ death in 1703. So Wallis had his own private notation; he wrote to stand for what we call π4 . Thus, Wallis wrote 1
After three and a half centuries, the Arithmetica Infinitorum has finally been translated into English, by Jacqueline Stedall: The Arithmetic of Infinitesimals , Springer-Verlag, 2004. See also [commentaries]
(1 − x2 )1/2 dx =
π . 4
The goal is to compute the integral explicitly, to get a formula for π. Wallis does not have the Binomial Theorem (that must wait for Newton) so he cannot expand the integrand to integrate term by term . However, if he switched the 2 and the 1/2, he could expand: Z 1 Z 1 2 1 1 1/2 2 (1 − x ) dx = 1 − 2x1/2 + x dx = 1 − 2 · + = . 3 2 6 0 0 Wallis sees that for any integers p, q, he can similarly compute the integral Z 1 (1 − x1/p )q dx
by expanding the integrand. He does this for various p, q and arrives at the first table below. The integral (3) is always 1 divided by an integer, and he just writes the integer, so the entry in row p and column q in the table below is Z
(1 − x
p\q 0 1 2 3 4 5
0 1 1 1 1 1 1
1 2 1 1 2 3 3 6 4 10 5 15 6 21
3 4 5 6 7 8 9 1 1 1 1 1 1 1 4 5 6 7 8 9 10 10 15 21 28 36 45 55 20 35 56 84 120 165 220 35 70 126 210 330 495 715 56 126 252 462 792 1287 2002
Actually, Wallis computed many more entries than this, because, by examining this table, he hoped to discover a general formula, in terms of p and q, for the integral (3). Then, if this formula still held true even if p and q were not integers, he could substitute p = q = 1/2 (since those are the numbers in his original integral (2)) and he would have a formula for , hence a formula for π! However, things did not go so smoothly. Wallis writes2 : 2
Although no small hope seemed to shine, what we have in hand is slippery, like Proteus, who in the same way, often escaped, and disappointed hope. You may have already noticed the Binomial Triangle in Wallis’ table: The entry in row p, column q is −1 Z 1 (p + q)! 1/p q (1 − x ) dx = . (5) ap,q = p! q! 0 For example, the numbers in the second row are a2,q = 12 (q + 1)(q + 2),
a3,q = 51 (q + 1)(q + 2)(q + 3),
and in the third row we have
and so on. But remember that Wallis wants p and q to be half-integers. The above formulas for ap,q make sense for any number q. So Wallis attempts the method of interpolation, where you look for a formula that is known to hold for integers, and whose terms make sense for all numbers. Then you hope that the formula holds for for all numbers. The trouble is that Wallis wants to compute a 1 , 1 , but the formula (5) gives 2 2
a 1 , 1 = ( 12 !)−2 , 2 2
which does not make sense to Wallis. So it looks as though his idea is doomed. It is at this point that Wallis shows his courage as a mathematician. He does not give up, and begins work on a new table, by optimistically adding new columns and rows for p = 12 , 32 etc, and similarly for q. If only Wallis can numerically compute a 1 , 1 (which is in the next table), he’ll be done. 2 2
p\q 0 1 2
1 3 2
2 5 2
0 1 1 1 1 1 1 1
He can fill in the entries as long as p is an integer, using formulas like (6) and (7), which make sense for any q. For example, 1 1 1 15 +1 +2 = , a2, 1 = 2 2 2 2 8 1 5 5 5 231 a3, 5 = +1 +2 +3 = . 2 6 2 2 2 16 (Wallis is guessing here- he does not know for sure that formulas (6) and (7) actually give the value of the integral when p or q are not integers. Luckily, they do.) Then he notices the formula ap,q =
p+q ap,q−1 . q
Here, we have only a recursive formula, but at least both sides make sense for any numbers p, q. Again, Wallis only knows (8) for integers (you can check it yourself, using factorials) and he just assumes that (8) is true for non-integers. With these scruples happily tossed, formula (8) lets him move two steps to right in any row (remember that the steps are now by halves), and this allows him to fill in the blank spaces in table two, using . In row p = 21 , for example, he gets a
1 3 , 2 2
a1,1 = 2 2
4 . 3
Continuing like this, he eventually completes row p = p\q 1 2
1 3 2
2 3·5 2·4
5 2 2·4 5
3 3·5·7 2·4·6
as follows 7 2 2·2·4·4 5·7
Wallis remembers these entries are reciprocals of integrals, and the integrals get smaller as q increases, so the entries get larger as q increases. So for example, we have 2·4 3·5·7 2·2·4·4 < < , 5 2·4·6 5·7 Which, after remembering that = π4 , can be written 2·2·4·4·6·6 π 2·2·4·4·6·6 8 < < · . 3·3·5·5·7 2 3·3·5·5·7 7 6
Thus we see Wallis’ formula emerging. The factor 87 says this approximation of π is accurate to within a factor of 71 . If you go farther out in the row, the error 2 becomes smaller than any given number. Wallis did not have the idea of limit, but instead invoked the argument of Euclid that two quantities whose difference is less than any given number are equal. Incidentally, Wallis’ value for seems to imply the mysterious-looking formula √ 1 π 1 != √ = . (9) 2 2 This will be explained later by Euler. Though he gets credit for the discovery, is there not some question whether Wallis actually proved his formula? After all, he did make some reckless assumptions along the way. In fact, Wallis’ intuition and assumptions turn out to be correct. Even his method can be made rigorous, as we show in the next section.
The basic integrals 1 2
Let us revisit the p =
row in Wallis’ table. It’s q th term is Z a 1 ,q = 2
(1 − x2 )q dx.
we are especially interested in the case where q = m/2 is a half-integer. Making the substitution x = cos θ, we get Z
1 2 m/2
(1 − x )
Z dx =
sinm+1 x dx.
It is convenient to shift the indices a bit and consider the integrals Z In =
sinn x dx,
n = 0, 1, 2, 3, . . .
which give the area under the graphs of y = sinn x, as shown below.
nth power of sin(x).nb
Without computing these integrals In yet, we can already see that In+1 < In .
This is because
π for x ∈ [0, ] 2 This implies, that for any positive integer k, we have 0 ≤ sin x ≤ 1,
sinn+1 x ≤ sinn x.
There is therefore less area under the graph of sinn+1 x than under sinn x. On the other hand, we can actually compute In using integration by parts (we postpone this to the end of the section). The results depend on whether n is even 8
or odd, as follows. It will help to have some new notation to express the answer. Define the double factorials by 0!! = 1 = (−1)!!, and for a positive integer k, (2k)!! = 2 · 4 · 6 · · · (2k), (2k + 1)!! = 1 · 3 · 5 · · · (2k + 1). Then for all integers k ≥ 0 we have Z π/2 I2k+1 = sin2k+1 x dx = 0
Z I2k = 0
(2k)!! (2k + 1)!!
π (2k − 1)!! sin x dx = · , 2 (2k)!! 2k
Note that Wallis’ fraction can also be expressed in terms of double factorials: 2 (2k)!! 1 Wk = . (14) (2k − 1)!! 2k + 1 So all the parts are fitting together. Now recall that Wallis used inequalities from three successive entries in his last table. We can do the same thing with our integrals: Applying (11) twice we have I2k+2 < I2k+1 < I2k . Using the formulas (13) for I2k+2 , I2k+1 , I2k respectively, this chain of inequalities says (2k)!! (2k − 1)!! π (2k + 1)!! π < < . (2k + 2)!! 2 (2k + 1)!! (2k)!! 2 Multiplying by the reciprocal of the rightmost term and using (14), we get 2k + 1 2Wk < < 1. 2k + 2 π As k → ∞, we have Thus,
→ 1, and
is trapped inside, so
(15) 2Wk π
→ 1 as well.
π , k→∞ 2 as we wished to show. This completes the proof of Wallis’ formula. We followed his same steps, but with more precision, because we had the formulas (13) for the integrals In . lim Wk =
We postponed the calculation of these integrals, but now it is time to do it. At the beginning, it doesn’t matter if n is even or odd. Recall that Z π/2 sinn x dx. In = 0
Use integration by parts with u = sinn−1 x du = (n − 1) sinn−2 x cos x dx
dv = sin xdx v = − cos x
You should be able to see that π/2 π/2 uv 0 = − cos x sinn−1 x 0 = 0. So Z
sin x dx = 0 − (n − 1) (sinn−2 x cos x)(− cos x) dx 0 0 Z π/2 = (n − 1) sinn−2 x(cos2 x) dx 0 Z π/2 sinn−2 x(1 − sin2 x) dx = (n − 1) 0 Z π/2 = (n − 1) (sinn−2 x − sinn x) dx 0 Z π/2 Z π/2 n−2 = (n − 1) sin x dx − (n − 1) sinn x dx
= (n − 1)In−2 − (n − 1)In . So In = (n − 1)In−2 − (n − 1)In . Add (n − 1)In to both sides and get nIn = (n − 1)In−2 . This gives us the recursion formula In =
n−1 In−2 . n 10
To get it started, we have the initial values Z π/2 Z π/2 π dx = , sin x dx = 1. I0 = I1 = 2 0 0 Then the recursion formula (18) takes over: 2−1 1 π I0 = · , 2 2 2
3−1 2 I1 = , 3 3
4−1 3 1 π I2 = · · , 4 4 2 2
5−1 4 2 I3 = · , 5 5 3
I2 = I4 =
and so on. On the other hand, consider the numbers Jn defined by J0 =
π , 2
J1 = 1,
and for larger subscripts, J2k+1 =
(2k)!! (2k + 1)!!
(2k − 1)!! π . (2k)!! 2
n−1 Jn−2 (20) n for all n ≥ 0. This is the same recursion as (18) with Jn instead of In . Since the In ’s begin the same way as the Jn ’s, and have the same recursion, we have In = Jn for all n ≥ 0. Jn =
Exercise 1.1: Use u = π/2 − x and the identity cos(x) = sin(π/2 − x) to show that Z π/2 Z π/2 k cos x dx = sink x dx. 0
Then show that Z
0 π k
sin x dx = 2 0
sink x dx.
Exercise 1.2: Using the substitution u = 1 − 2x, show that Z 1 Z 1 1 2 q (1 − x2 )q dx (x − x ) dx = q 4 0 0 11
for any q ≥ 0. Now suppose m ≥ 0 is an integer, and calculate Z 1 (x − x2 )m/2 . 0
Check your result for m = 1 by considering the graph of (x − x2 )1/2 . Exercise 1.3: Here’s another approach to I2k , using complex numbers and Euler’s formula eix = cos x + i sin x. Note that sin x = (eix − e−ix )/2i. Rπ a) Compute 0 eimx dx for any integer m. Rπ b) Compute 0 (eix − e−ix )n dx by expanding the integrand. c) Compute I2k . What happens for I2k+1 ? Exercise 1.4: Use the identity cos2 x = 1 − sin2 x to calculate π/2
sin6 x cos4 x dx
(answer: I6 − 2I8 + I10 . The same method works when the powers on sin x and cos x are both even. If, say, cos x appears with odd power, you can split off a cos x, write the rest of the cos x’s in terms of sin2 x, and use the substitution u = sin x, instead of In .) Exercise 1.5: Letting u = 1 − x, it is easy to show that Z 1 Z 1 a b xb (1 − x)a dx. x (1 − x) dx = 0
If a or b is an integer, you can expand (1−x) in one of these integrals, and integrate term-by term, for example: Z
(1−x) dx =
2 16 2 x3/2 −2x5/2 +x7/2 dx = − + . 5 7 9
But how to do it if neither a nor b are integers? Let x = sin2 θ and show that Z
x (1 − x) = 0
cos2a+1 θ sin2b+1 θ dθ.
Then use the method of exercise 4 to calculate Z 1 x5/2 (1 − x)3/2 dx. 0
Exercise 1.6: Let p > 1 be a constant. Show that Z ∞ Z π/2 1 dx = cos2p−2 θ dθ 2 )p (1 + x 0 0 and then calculate this explicitly for p = m/2, where m > 1 is an integer. What happens if m = 1? Exercise 1.7: Prove the formula used to make Wallis’ first table: Z 1 p! q! = (1 − x1/p )q dx, (p + q)! 0 where p and q are positive integers. Hint: Think of p as fixed, and let Aq and Bq be the left and right sides, respectively. First show that A1 = B1 . Then show that q q Aq−1 , and Bq = Bq−1 . Aq = p+q p+q The result for Aq follows from the definition. Use integration by parts for Bq .
Exercise 1.8: Use (15) and the fact that π < 4 to show that 2 0 < π − Wn < . n+1 Check this with W1000 as computed above. Exercise 1.9: Here is a similar way to approximate e with fractions. Let Z e Ln = (ln x)n dx. 1
Use integration by parts to show that Ln = e − nLn−1 . Then show that Ln → 0 by looking at the graph of ln x, and conclude that nLn−1 → e. Calculate a few Ln ’s and approximate e. Exercise 1.10: What number is this? 1/2 1/4 1/8 2 2·4 4·6·6·8 ··· 1 3·3 5·5·7·7 (Perhaps you would like to use a machine and guess. The correct guess can be proved using the inequality ??????) 13
The Binomial Theorem
Wallis’ Arithmetica Infinitorum was published when Isaac Newton was ?? years old. Newton was late-bloomer, mathematically. When he was ?? he bought an astrology book but could not understand the trigonometry, so he tried to learn trig, but could not follow the geometry, so he turned to Euclid, which he found boring, until encountering the result that parallelograms with the same base and height have the same area. Finally understanding enough to be impressed, Newton returned to the beginning of Euclid and, standing there, he moved the world. The Arithmetica Infinitorum was also part of Newton’s education; we have his notes on it from 1664 3 . Newton made the observation that Wallis’ definite integrals would contain more information if they were replaced by indefinite integrals. So Newton reworked Wallis’ tables with Z X Z 1 1/p q (1 − x ) dx instead of (1 − x1/p )q dx 0
obtaining new tables whose entries were polynomials in X that reduced to Wallis’ tables when X = 1. The coefficients of these polynomials contained new patterns that were hidden in Wallis’ tables.
Wallis and coin-tossing
The number π is used for more than measuring circles, because it appears in many different areas of mathematics. Likewise, the Wallis formula for π has many applications. In this section, we show how Wallis is related to sequences of random 0/1 events, like coin-tossing. This includes a probabalistic interpretation of the integrals In used to prove Wallis’ formula. First, we need a bit of background on binomial coefficients. Recall that the factorial of a positive integer n is defined as n! = n(n − 1)(n − 2) · · · 2 · 1. We also define 0! = 1. Why define 0! this way? For now, just accept that we define 0! = 1 to make the formulas come out right. We will give a better reason in the next section. 3
Annotation out of Dr Wallis his Arithmetica Infinitorum, The mathematical papers of Isaac Newton, vol. I. pp. 96-115
There are many interpretations of n!. It is the number of ways to: • put n letters in n mailboxes, • arrange n people in a row, • paint n houses with n colors, • marry n men to n women, • permute n distinct objects. Now, if n and k are integers with 0 ≤ k ≤ n, we define the binomial coefficient by n n! . = k!(n − k)! k and call it “n choose k”, because nk is the number of ways to choose k objects from n objects. From a group of n people, you can form nk possible teams of k members. For example, from a class of 30 people, you can make 30 30! = = 142506 5!25! 5 possible basketball teams. Proof: make the class line up in all 30! possible ways, and each time take the first five for your team. You will get all possible teams this way, but you will get the same team several times, so we have to divide 30! by the number of times each team occurs. We get the same team from different lines by either permuting the first 5 members of the line or permuting the 25 members in the rest of the line. Thus, a total of 5!25! lines give the same team. There are many other interpretations of binomial coefficients. For example, in algebra, we have the binomial expansion n X n k (1 + x) = x , k k=0 n
because the coefficient of xk in (1 + x)n is the number of ways to choose k x’s from the n factors (1 + x). The Wallis formula has to do with the following interpretation of nk . If we label our n objects as 1, 2, . . . n, then a choice of k objects can be expressed as a
sequence of k 1’s and n − k 0’s, where the 1’s correspond to the chosen objects. 4 For example, the 2 = 6 teams of two from a group of 4 are the sequences 1100,
Such seqences are also the outcomes of an experiment of n coin tosses, where 1 means heads and 0 means tails. For example, if we toss the coin 4 times, and get heads, tails, tails, heads, this outcome is 1001. Thus, when we toss the coin n n times, the number of possible k-head outcomes is k . The probability of getting k heads is 1 n number of possible k head outcomes = n . number of all possible outcomes 2 k In particular, the probability of getting k heads from 2k tosses is the number we will call Pk , defined by 1 2k (2k − 1)!! 1 (2k)! Pk := 2k = . = 2k 2 2 (k!)2 (2k)!! k We have seen this number before. The formula for I2k in (13) may be written as 2 π
sin2k x dx = Pk .
This means that Pk is also the average of the function sin2k x on the interval [0, π2 ]. Recall that in the proof of Wallis formula, we used the fact that sin2k x → 0 as k → ∞. Hence lim Pk = 0. k→0
This means that the probability of getting half heads in a large even number of tosses is essentially nil. This may seem strange, since half-heads is the most likely outcome. But there are more and more outcomes that take their share of the probability. We will examine this more closely in later sections. The number Pk is given explicitly above. For large k it is very small, but its numerators and denominators are very big. This means we cannot compute Pk in practice. However, the Wallis formula may be viewed as an approximation to Pk for large k. Recall that Wallis says that lim Wk =
π , 2
where Wk =
224466 2k 2k ··· . 133557 2k − 1 2k + 1
1 = (2k + 1)Pk2 . Wk So Wallis’ formula for π can be written in the probabalistic form r √ 2 lim Pk 2k + 1 = . k→∞ π
This means that for large k, we have the approximation Pk ∼ √
c , 2k + 1
p where c = 2/π is a constant. Thus, Wallis tells us how fast the odds of getting half heads goes to zero. Exercise 2.1 Explain whyit is a good idea to define 0! = 1, by giving a coin n n tossing interpretation of 0 and n . Exercise 2.2 Use the formula for nk to prove that n n−1 n−1 = + . k k k Exercise 2.3 Explain in two ways why n n n + + ··· + = 2n : 0 1 n first, using the formula for (1 + x)n , then using the coin tossing interpretation. Exercise 2.4 The binomial triangle (often called “Pascal’s triangle” even though it was known in various parts of the world many centuries before Pascal) has nk in the k th entry of row n from the top. Starting at the top of the binomial triangle, and moving downward at each step, what is the number of ways to get to the entry n ? (Hint: interpret a choice of path as a seqence of coin-tosses.) k
Exercise 2.5 Cancelling (n − k)! from n!, we can write n 1 = n(n − 1) · · · (n − k + 1), k! k an expression which makes sense for any number n and positive integer k. Show that −1/2 1/2 Pk k = (−1) Pk , and = (−1)k−1 . k k 2k − 1 Exercise 2.6 The Binomial Theorem, proved by Isaac Newton, is the expansion ∞ X q k (1 + x) = x , k k=0 q
which is valid for any number q and |x| < 1. Show that formula (23) is consistent with formula (21) when q is an integer ≥ 0. Exercise 2.7 Show that arcsin x =
x2k+1 . 2k + 1
R Hint: Use the fact that arcsin x = (1 − x2 )−1/2 . The next four exercises follow Euler, who used the series for arcsin x to calculate the sum 1 1 1 + 91 + 25 + 49 + · · · . There is a long story behind this sum, and Euler’s calculation of it made him famous. More about this sum later.
Exercise 2.8 Show that Z 0
x2k+1 1 √ dx = . 2 Pk (2k + 1) 1−x
Hint: make the substitution u = x2 and use results from chapter 1.
Exercise 2.9 Use the previous two exercises to show that Z 1 arcsin x 1 1 1 √ dx = 1 + + + + ··· . 2 9 25 49 1−x 0 18
Exercise 2.10 Show that Z 0
arcsin x √ dx = 12 (arcsin t)2 . 1 − x2
Exercise 2.11 Combine the previous two exercises to compute the sum 1+
1 1 1 + + + ··· . 9 25 49
The Gaussian integral and factorial of
We have been talking about factorials of integers, which are the building blocks of binomial coefficients. But we also saw that Wallis’ pursuit of his formula for π led him to repeated confrontations with the strange number 12 !, leading to the version √ π 1 != 2 2 of Wallis’ formula that we mentioned in (9). This raises the question: What is x! if x is not an integer? The answer was given by Euler, but What follows uses only math that we already know, but it may seem tricky. That’s because it took 200 years to discover. It is also not very well known. I found it by accident in a paper of Thomas Stieltjes (1856-1894), who was a Dutch mathematician remembered today mainly for the “Stieltjes integral”, which you would encounter in graduate school. Before going into the tricky part, let’s examine the difficulty of computing 21 !. Recall that for a positive integer n, the factorial n! is defined as the product n! = n · (n − 1) · · · 2 · 1
of all positive integers ≤ n. Formula (25) only makes sense if n is a positive integer. The first step, due to Euler, is to find a different formula for n! that makes sense even if n is not a positive integer. Then we can just plug n = 12 into this formula to compute 12 ! . Right? The first observation is that n! can be defined recursively by the two rules n! = n · (n − 1)!
1! = 1, Next, Euler observed that
e−x dx = 1
and that integration by parts (with u = xn and dv = e−x dx) shows that Z ∞ Z ∞ n −x x e dx = n · xn−1 e−x dx. 0
So the integral 0 xn e−x dx satisfies the same recursion formulas (26) as n! , hence must equal n!: Z ∞ xn e−x dx. (27) n! = 0
This is a formula for n! that does not depend on n being a positive integer. We take (27) as a new definition of n!. It agrees with the old definition (26) when n is a positive integer, but it can also be used for other n. We just have to make sure the integral converges. There is no problem with the limit ∞ since e−x crushes any power of x as x 7→ ∞. But at the limit 0 there could be a problem, if n < 0. sinceR e0 = 1, we have the approximation xn e−x ∼ xn for x near 0, which shows ∞ that 0 xn e−x dx converges only when n > −1. So we can use formula (27) to compute n! for any real number n > −1. For example, formula (27) says that Z ∞ Z ∞ 0 −x 0! = x e dx = e−x dx = 1, 0
so we get 0! = 1 from the formula, instead of by fiat, as before. Most interesting to us now, however, is that formula (27) says Z ∞ Z ∞ 1/2 −x 1 1 != x e dx, and (− 2 )! = x−1/2 e−x dx. 2 0
All very nice, but what are the values of these integrals? They are actually famous integrals in disguise, used in many areas of mathematics. Let’s work on the second one starting with the substitution u = x1/2 . We get dx = 2u du, so Z ∞ Z ∞ Z ∞ 2 −1/2 −x −1 −u2 1 x e dx = 2 u e u du = e−u du. (− 2 )! = 0
so whatever it is, the number (− 21 )! is the area under the whole graph of e−x , which is the famous “Bell Curve” used in probability. And Z ∞ 2 1 1 1 ! = 2 · (− 2 )! = e−x dx (28) 2 0
is half of this area. 20
We keep talking about 12 ! without actually computing it. That’s because the 2 integral (28) cannot be computed by finding an antiderivative of e−x (go on, try it!). If only there were an extra x, we could do it. Namely, if instead of the integral in (28), we had Z ∞ 2 xe−x dx, 0 2
then taking u = x would turn it into Z Z ∞ −x2 1 xe dx = 2 0
e−u du = 12 .
If we had an extra x2 , and did u = x2 again, we’d get Z ∞ Z ∞ 2 −x2 1 xe dx = 2 u1/2 e−u du = 0
· 12 !
which is back to the hard integral we started with. A more clever idea is to use integration by parts, with 2
dv = xe−x ,
u = x,
since we just integrated dv. However, this will lead to the same hard integral again. But at least it we get back to the same hard integral, and not some new one. To analyze why some of these integrals are hard and some are easy, let us define Z ∞ 2 xn e−x dx. Gn = 0
The integral G0 = 0 e So far, we know that
dx is the famous Gaussian integral.
G0 = 12 ! =? ,
G1 = 12 ,
· 12 ! =? .
Even though we know it won’t work completely, lets try integration by parts on 2 Gn for n ≥ 2 and see what happens. With u = xn−1 and dv = xe−x dx, we get Z ∞ 2 Gn = xn e−x dx 0 Z 1 n−1 −x2 ∞ n − 1 ∞ n−2 −x2 = x e + x e dx 0 2 2 0 n−1 = Gn−2 , 2 21
since n ≥ 2 and limx→∞ xn−1 e−x = 0. So we have the recursion formula Gn =
n−1 Gn−2 , 2
with the initial values G0 = 12 ! (hard)
This means the even G’s are hard, and the the odd G’s are easy: 2k − 1 · G2k−2 2 2k − 1 2k − 3 = · · G2k−4 2 2 .. .
2k − 1 2k − 3 3 1 · · · · · · G0 2 2 2 2
= k! · Pk · G0 , where we recall from the previous section that 1 2k 1 · 3 · · · (2k − 1) Pk = k = 4 k 2 · 4 · · · (2k) is the probability of getting k heads in 2k coin-tosses. So all the even G0 s boil down to the single hard integral G0 . On the other hand, for the odd G0 s, we get an actual answer: 2k · G2k−1 2 2k 2k − 2 = · · G2k−3 2 2 .. .
2k 2k − 2 2 · · · · · G1 = 2 2 2 =
k! . 2 22
What have we achieved so far? Almost nothing, apparently. We have just been analyzing our difficulties. We want to compute G0 , and we have just seen that all the integrals G2k boil down to G0 , whereas the integrals G2k+1 can be computed exactly. But now comes the tricky part, that eluded Wallis and Euler, and was finally discovered by Stieltjes. The idea is to express the hard integral G2k in terms of the easy integrals G2k−1 and G2k+1 . This cannot be done by means of an equality, but instead using an inequality: We will show that G2n < Gn−1 Gn+1 ,
n ≥ 1.
The idea of (32) seems to me a spark of genius; I cannot explain how Stieltjes thought of it, but once thought of, it is not hard to prove. Think: where have we seen something like B 2 − AC? The quadratic formula, of course. The quadratic polynomial that gives rise to the terms in (32) is p(t) = t2 Gn−1 + 2tGn + Gn+1 . Remember that the G’s are just numbers (which we happen not to know completely), and they are the coefficients in the polynomial p(t). Of course, the G’s are integrals: Z ∞ Z ∞ Z ∞ 2 n−1 −x2 n −x2 Gn−1 = x e dx, Gn = x e dx, Gn+1 = xn+1 e−x dx, 0
so p(t) is a sum of integrals. Let’s combine these into one integral (note that t is a constant with respect to dx, so can be moved inside the integrals): Z ∞ Z ∞ Z ∞ 2 2 n−1 −x2 n −x2 p(t) = t x e dx + 2t x e dx + xn+1 e−x dx 0 0 Z ∞0 2 = (t2 xn−1 + 2txn + xn+1 )e−x dx Z0 ∞ 2 = xn−1 (t + x)2 e−x dx. 0
This integral is the worst one yet, but we are not going to compute it. Just note that the integrand (as a function of x) is ≥ 0 and is equal to zero at no more than two points. So there is positive area under the integrand, and the integral is positive. Hence, p(t) > 0 for all t. 23
This means p(t) has no real roots. Now, if a quadratic polynomial At2 + 2Bt + C has no real roots, then from the quadratic formula we must have B 2 − AC < 0. So G2n − Gn−1 Gn+1 < 0, which is the inequality (32) that we wanted to prove. Where does this brilliant step take us? We now have one equality (easy) n−1 Gn−2 2
G2n < Gn−1 Gn+1 ,
Gn = and one inequality (brilliant)
and believe it or not, the hard work is over. We are just going use (33) and (34) and then Wallis to get G0 . Applying (34) to n = 2k + 1 and then n = 2k, we get G22k+1 < G2k G2k+2 = (34)
2k + 1 2 2k + 1 G2k < G2k−1 G2k+1 . (34) 2 2
Now plug our computations (30) and (31) G2k = k!Pk G0 ,
into (35) to get 2 2k + 1 k! 2k + 1 (k − 1)! k! < (k!Pk G0 )2 < · · . 2 2 2 2 2 Dividing everything by (k!/2)2 , we get 2k + 1 . 2k
1 < 2(2k + 1)Pk2 G20 < We have seen that Wallis’ formula can be written lim (2k + 1)Pk2 =
2 , π
and clearly 2k + 1 = 1. k→∞ 2k lim
So taking k → ∞ in (36), we get 4 2 G = 1, π 0 meaning that
π , 2
G0 = and we are done.
Done with what? Let’s review: Recall that G0 is the Gaussian integral, and is also 12 ! : Z ∞
G0 = 0
e−x dx = 21 ! .
√ By computing that G0 = π/2, we have shown that √ Z ∞ π −x2 1 e dx = 2 ! = . 2 0 2
−x In is exactly √ other words, the area under the whole graph of the Bell Curve e π. This computation combined the work of Wallis, Euler and Stieltjes, from the 17th , 18th and 19th centuries, respectively. The limits 0 and ∞ were essential: We R 2 never computed the antiderivative e−x dx. Since π is involved, you might guess that G20 is somehow related to the area of a circle. This is true, and leads to another way to compute G0 , which is easier than what we just did (and was known long before Stieltjes), but requires double integrals and is beyond our course. In the other direction, knowing G0 allows one to compute the volume of a sphere in any dimension. This requires even more multiple integrals. Since all the even G’s boiled down to G0 , we have actually computed many other integrals. We leave this to the exercises.
Exercise 3.1: Use the recursion √ formula n! = n · (n − 1)! to compute (−1/2)!, (3/2)!, and (5/2)! in terms of π. Exercise 3.2: Give a formula for (k − 12 )! for any integer k ≥ 0. Exercise 3.3: We have seen that G0 = for G2k ? (Hint: use equation (33).)
· (− 21 !). What is the analogous formula
In the remaining exercises, make a substitution to turn the integral into the factorial integral, then compute it. Leave all answers as fractions.
Exercise 3.4: Calculate
x6 e−2x dx. 2
x6 e−4x dx. R ∞ √ −x3 dx. xe Exercise 3.6: Calculate 0 R∞ 2 Exercise 3.7: Calculate 0 3−4x dx. R∞ 2 Exercise 3.8: Calculate 0 e−ax dx (a is a positive constant). R∞ p Exercise 3.10: Show that 0 e−x dx = p1 !. Exercise 3.5: Calculate
Exercise 3.11: Calculate
√ dx . − log x
Wallis and the binomial distribution
If we graph the outcomes of n coin tosses, we get pictures like
• • • • •
• • • • • •
for n = 4 and
• • • • •
• • • • • • •
• • • • • • • • • • • • • • •
• • • • • • • • • • • • • • • • • • • •
• • • • • • • • • • • • • • •
• • • • • • •
for n = 6, where the number of dots in the k th column is the number nk of outcomes with k-heads. These look like discrete versions of bell curves. In the previous section, we saw that the middle column, which is the most likely outcome, and which grows as n increases, nevertheless becomes a negligible proportion of the total number of dots. Moreover, Wallis told us precisely how fast this proportion goes to zero. In this section we will see that Wallis also tells us the parameters of the bell curve which approximates the binomial distributions above. −x2 As we have mentioned, the basic bell curve is the √ graph of e . This has a maximum of at x = 0, and inflection points at ±1/ 2. The latter are the points −x2 where begins to flare outwards. The point 0 is the mean, and √ the graph of e 1/ 2 is the standard deviation of this bell curve. However, we need bell curves with an arbitrary mean µ and standard deviation σ. Such a curve is the graph of 2 ! 1 x−µ exp − . (37) 2 σ 27
This is basically the just the graph of e−x , but shifted to have its maximum at µ, and stretched to have its inflection points at µ ± σ. Exercise 4.1: Show that the function in (37) has its maximum at µ and inflection points at µ ± σ. R∞ √ 2 Exercise 4.2: Use our formula for the Gaussian integral −∞ e−x dx = π to show that 2 ! Z ∞ √ 1 x−µ exp − dx = σ 2π. 2 σ −∞ The function
2 ! 1 1 x−µ √ exp − 2 σ σ 2π
is called the Gaussian (or “normal”) distribution. It approximates, in an easily understood visual way, many different occurences of discrete random behavior that may be very difficult to compute one at a time. You just have to adjust µ and σ to the case at hand. For example, it is very difficult to compute the binomial coefficients necessary to determine the exact proportion of outcomes with k heads from n coin tosses, if n is large. Instead, we can use the approximation 2 ! 1 k−µ 1 1 n (n large). (38) ∼ √ exp − 2n k 2 σ σ 2π We just have to determine µ and σ. The maximum of the right side of (38), which is µ, should be the most likely outcome of the left side, which is n/2, so µ=
n . 2
That was easy. What about the standard deviation σ? By now it is clear that the Wallis formula knows everything about large binomial coefficients, so it is no surprise that Wallis will tell us σ. Taking k = µ = n/2 in , we get the approximation 1 n 1 √ ∼ 2n µ σ 2π
for large n. On the other hand, by Wallis (see equation (22)), we have r r 1 n 2 1 2 √ ∼ · ∼ 2n µ π nπ n+1 for large n. So we should have 1 √ = σ 2π meaning that σ = functions of k)
1√ n. 2
2 , nπ
In summary, for large n, we have the approximation (as
√ n σ= . 2 (39) This is one of many instances of a discrete function being approximated by a continuous function. It may seem paradoxical, but the latter is easier to work with, as we will see in the next chapter. 2 ! 1 1 n 1 k−µ ∼ √ exp − n 2 k 2 σ σ 2π
n where µ = , 2
The function erf(x).
If you make a large number n of coin tosses, we have seen that the probability of getting any particular outcome is almost nil. One is more interested in the probability that a certain range of outcomes will occur. For example, if we toss a coin 100 times, the probability of getting exactly 50 heads is almost zero, but what is the probability of getting between 48 and 52 heads? To answer this we could compute 5 very large binomial coefficients, add them up, and divide by 2100 . It is much easier to use our formula (39) in the previous section. The probability of getting between a and b heads in n coin tosses is exactly b 1 X n . 2n k=a k
By (39), this probability is approximately 1 √ σ 2π
exp − 21 29
2 ! dx
To handle this integral, we define the error function: Z x 1 2 e−t /2 dt. erf(x) = √ 2π 0 Here x can be any real number. So erf(x) is the part of the area under a certain bell curve, between 0 and x. Exercise 5.1: Show that erf(x) has the following properties. 1. erf(0) = 0. 2. limx→∞ erf(x) = 12 . 3. erf(x) is an odd function. That is, erf(−x) = − erf(x). 4. erf(x) is always increasing, is concave up for x < 0, and concave down for x > 0. Exercise 5.2: Show that 2 ! Z b 1 1 k−µ b−µ a−µ √ exp − dx = erf − erf . 2 σ σ σ σ 2π a Thus, if you make a large number n of tosses, the probability of getting between a and b heads is approximately √ a−µ n n b−µ . − erf , µ= , σ= erf σ σ 2 2 You can look up values of erf in tables or on your calculator, just as you would with trig or log functions. Example: If we toss 100 times, what is the approximate probability of getting between 48 and 52 heads? We have n = 100,
µ = 50,
σ = 5,
so the approximate probability is 52 − 50 48 − 50 erf − erf = 2 erf(.4) ' .3108 . . . . 5 5 30
With these 100 tosses, what is the approximate probability of getting at least 40 heads? Here, n, µ, σ are unchanged, but a = 40, b = ∞, so the approximate probability is 40 − 50 ∞ − 50 − erf = erf(∞) − erf(−2) ∼ .5 + .4773 . . . . erf 5 5 Exercise 5.3: For n = 100 tosses, find the approximate probabilities of getting a) between 45 and 55 heads (answer: .6826) b) between 50 and 60 heads c) at least 45 heads Exercise 5.4: For n = 10, 000 tosses, find the approximate probabilities of getting a) between 4950 and 5050 heads (answer: .6826 again) b) between 4900 and 5100 heads c) no more than 4500 heads Exercise 5.5: In this problem, you are not given n, so you won’t know µ and σ either. Nevertheless, please find the approximate probabilities of getting a) between µ − σ and µ + σ heads b) between µ − 2σ and µ + 2σ heads. Besides coin tossing, the same method can compute approximate probabilities in any situation where there are two possible results, equally likely, and the trial is repeated a large number of times.
Exercise 5.6: (A story problem) The town of Bumpkin has the shape of a triangle, with BankBumpkin at the northern peak, in Upper Bumpkin, where all the money resides. The streets of Bumpkin were made long ago by well-organized cows, and the map of Bumpkin looks like this: BB /\ /\/\ /\/\/\ /\/\/\/\ .. .. . . Below Upper Bumpkin lies Middle Bumpkin, and even further south is Lower Bumpkin (not shown), where the map is very complicated indeed. Jack and Jill grew up in Lower Bumpkin. After Jack fell down and broke his crown, he was never the same, and, after a series of Failures in Life, poor Jack had turned to a life of crime. And so one day, Jack went north to rob BankBumpkin. He tied up everyone in the bank, took all the cash he could find, and dashed out the door, making a run for it down the streets toward the labyrinth of Lower Bumpkin. Jack knew that if he got far enough south, the cops would never find him. Unfortunately, one of the tellers had recognized Jack, and had gotten loose and called the police. As for Jill, who came tumbling down after Jack in that famous accident, she had made a full recovery, and being very clever, went on to become a Pure Mathematician. But since Jill was not interested either in teaching or in “practical applications” of math, she was forced to support herself as a police dispatcher, which in the usually quiet town of Bumpkin allowed plenty of time for research, and many of her mathematical discoveries were made while contemplating the map of Bumpkin on the station wall. And of course it was Jill who answered the phone after the bank robbery. Jill knew Jack pretty well from the old days. She knew he could run fast, and that he was thinking only of getting to Lower Bumpkin as quickly as possible. Also he was surely panicked, and making a random choice at each intersection, though always heading in a southerly direction. So Jill, temporarily shelving her disdain for what everyone else called the “real world” (as though they knew what that meant, she would snort to herself), performed some brief calculations (which were amusing in and of themselves). She figured that by the time the police got 32
rolling, old Jack would be nearing his 100th intersection. But which one? There were 101 possibilities, and not nearly that many cops on the Bumpkin beat, so Jill suggested a deployment of just enough police to have a 95 percent chance of catching Jack. With one uniform at each intersection, how many police did she deploy?
Wallis and the ellipse
If an ant moves in the plane, the distance travelled is the integral of the ant’s speed. Suppose our ant moves on a curve, and at time t, she is at the point (x(t), y(t)). Then her velocity vector is (x0 (t), y 0 (t)) and her speed is p v(t) = x(t)2 + y(t)2 . Over a time interval [t1 , t2 ], the distance travelled by our ant is then Z t2 Z t2 p x(t)2 + y(t)2 dt. v(t) dt = t1
Suppose her coordinates are given by x(t) = a cos t,
y(t) = b sin t.
Then her path is an ellipse, with equation x2 y 2 + 2 =1 a2 b and she traverses the ellipse exactly once during the interval [0, 2π]. The quadrants divide the ellipse into four equal parts, so let us just consider our ant’s journey in the first quadrant, for 0 ≤ t ≤ π/2. Her speed is p v(t) = a2 sin2 t + b2 cos2 t, so the length of our quarter-ellipse is given by Z L=
a2 sin2 t + b2 cos2 t dt.
If the ellipse were actually a circle, that is, if a were equal to b, this integral would be just Z π/2 Z π/2 p aπ 2 2 2 2 a sin t + a cos t dt = a dt = . 2 0 0 If a 6= b, then the integral cannot be evaluated in an elementary way. In his textbook Calculus Integralis Euler shows how to calculate the integral L as a power series in a number which measures the failure of our ellipse to be a circle. His method depends on the basic integrals I2k used by Wallis, and his answer is expressed in terms of Wallis’ fractions. Euler actually uses a different parametrization of the ellipse (via rational functions) and his calculations are more complicated than what follows, but we’ll arrive at the same result he did. Euler takes so that b2 = (1 − )a2 . In other words, he defines b2 = 1 − 2. a Then we have a2 sin2 t + a2 cos2 t = a2 (1 − cos2 t) + a2 (1 − ) sin2 t = a2 (1 − cos2 t), so Z L=a
1 − cos2 t dt.
Now apply the Binomial Theorem to the integrand. (Euler does this too, with his different parametrization.) We get √
∞ X 1/2 k
(− cos t) = 1 −
∞ X (2k − 3)!! k=1
"Z # ∞ π/2 X (2k − 3)!! aπ 2k −a cos t dt k . L= 2 (2k)!! 0 k=1
This is our basic integral I2k again, in cosine form. Recall that Z
cos2k t dt =
π (2k − 1)!! . 2 (2k)!!
k cos2k t,
Putting this into the summation, we get " # 2 ∞ X k aπ (2k − 1)!! 1− . L= 2 (2k)!! 2k − 1 k=1 Using the probabalistic Wallis fractions (see (14)) Pk =
(2k − 1)!! (2k)!!
we can write the summation more succinctly as " # ∞ X aπ Pk2 L= k 1− 2 (2k − 1) k=1 (42) aπ 1 · 1 1 · 1 · 3 · 3 2 1 · 1 · 3 · 3 · 5 · 5 3 = 1− · − · − · + etc. 2 2·2 1 2·2·4·4 3 2·2·4·4·6·6 5 This is Euler’s formula for the arclength of the quarter-ellipse. It is a power series in whose coefficients are the Wallis fractions, except that the last odd number in the denominator is 2k − 1 instead of 2k + 1. Note that for = 0, the circle case, we get L = aπ/2 as before. If we fix a and flatten the circle by decreasing b, then increases from 0 to 1. The series measures the decrease in arclength as the circle is flattened. If we flatten all the way to b = 0, then = 1 and our quarter ellipse is just the line segment [0, a]. So it appears that " # ∞ X Pk2 aπ 1− , (43) a= 2 (2k − 1) k=1 provided the series converges. From the probabalistic version of Wallis’ formula (22), we have Pk2 2 4k 2 − 1 · = (2k + 1)Pk2 → . 1 (2k − 1) π Hence the series (43) converges by limit comparison with X 1 4k 2 − 1 and gives the formula
∞ X 2 Pk2 =1− . π (2k − 1) k=1