Please copy and paste this embed script to where you want to embed

Review problems with solutions

Class 21, 18.05, Spring 2014

Jeremy Orloﬀ and Jonathan Bloom

1

Summary • Data: x1 , . . . , xn • Basic statistics: sample mean, sample variance, sample median • Likelihood, maximum likelihood estimate (MLE) • Bayesian updating: prior, likelihood, posterior, predictive probability, probability in tervals; prior and likelihood can be discrete or continuous • NHST: H0 , HA , signiﬁcance level, rejection region, power, type 1 and type 2 errors, p-values.

2

Basic statistics

Data: x1 , . . . , xn . sample mean = x ¯ =

x1 + . . . + xn n

sample variance = s2 =

n i=1 (xi

−x ¯)2 n−1

sample median = middle value Example. Data: 1, 2, 3, 6, 8. x ¯ = 4,

3

s2 =

9+4+1+4+16 4

= 8.5,

median = 3.

Likelihood

x = data θ = parameter of interest or hypotheses of interest Likelihood: p(x | θ)

(discrete distribution)

f (x | θ)

(continuous distribution)

1

18.05 class 21, Review problems with solutions, Spring 2014

2

Log likelihood : ln(p(x | θ)). ln(f (x | θ)). Likelihood examples. Find the likelihood function of each of the following. 1. Coin with probability of heads θ. Toss 10 times get 3 heads. 2. Wait time follows exp(λ). In 5 trials wait 3,5,4,5,2 3. Usual 5 dice. Two rolls, 9, 5. (Likelihood given in a table) 4. x1 , . . . , xn ∼ N(µ, σ 2 ) 5. x = 6 drawn from uniform(0, θ) 6. x ∼ uniform(0, θ) Solutions. 1. Let x be the number of heads in 10 tosses. P (x = 3 | θ) =

10 3

θ3 (1 − θ)7 .

2. f (data | λ) = λ5 e−λ(3+5+4+5+2) = λ5 e−19λ 3. Hypothesis θ 4-sided 6-sided 8-sided 12-sided 20-sided

Likelihood P (data | θ)

0

0

0

1/144 1/1400 − (x1 −µ)2 +(x2 −µ)2 +...+(xn −µ)2 2σ 2

[ 4. P (data | µ, σ) = e � 0 if θ < 6 5. f (x = 6 | θ) = 1/θ if 6 ≤ θ � 0 if θ < x or x < 0 6. f (x | θ) = 1/θ if 0 ≤ x ≤ θ �

3.1

√1 2πσ

�n

]

Maximum likelihood estimates (MLE)

Methods for ﬁnding the maximum likelihood estimate (MLE). • Discrete hypotheses: compute each likelihood • Discrete hypotheses: maximum is obvious • Continuous parameter: compute derivative (often use log likelihood) • Continuous parameter: maximum is obvious Examples. Find the MLE for each of the examples in the previous section.

18.05 class 21, Review problems with solutions, Spring 2014

3

Solutions. 1. ln(f (x − 3 |θ) = ln

10 3

+ 3 ln(θ) − 7 ln(1 − θ).

Take the derivative and set to 0:

3 7 3 + = 0 ⇒ θˆ = . θ 1−θ 10

2. ln(f (data | λ) = 5 ln(λ) − 19λ. Take the derivative and set to 0:

5 ˆ= 5 . − 19 = 0 ⇒ λ λ 19

3. Read directly from the table: MLE = 12-sided die. 4. For the exam do not focus on the calculation here. You should understand the idea that we need to set the partial derivatives with respect to µ and σ to 0 and solve for the critical point (ˆ µ, σˆ2 ). The result is µ ˆ = x, σˆ2 =

(xi −µ ˆ)2 . n

5. Because of the term 1/θ in the likelihood. The likelihood is at a maximum when θ is as small as possible. answer: : θˆ = 6. 6. This is identical to problem 5 except the exact value of x is not given. answer: θˆ = x.

4 4.1

Bayesian updating Bayesian updating: discrete prior-discrete likelihood.

Jon has 1 four-side, 2 six-sided, 2 eight-sided, 2 twelve sided, and 1 twenty-sided dice. He picks one at random and rolls a 7. 1. For each type of die, ﬁnd the posterior probability Jon chose that type. 2. What are the posterior odds Jon chose the 20-sided die? 3. Compute the prior predictive probability of rolling a 7 on the ﬁrst roll. 4. Compute the posterior predictive probability of rolling an 8 on the second roll.

Solutions.

1.. Make a table. (We include columns to answer question 4.)

Hypothesis θ 4-sided 6-sided 8-sided 12-sided 20-sided Total

Prior P (θ) 1/8 1/4 1/4 1/4 1/8 1

Likelihood f (x1 = 7 | θ) 0 0 1/8 1/12 1/20

Unnorm. posterior 0 0 1/32 1/48 1/160 1 1 + 48 + c = 32

1 160

posterior f (θ | x1 = 7) 0 0 1/32c 1/48c 1/160c 1

likelihood P (x2 = 8 | θ) 0 0 1/8 1/12 1/20

The posterior probabilities are given in the 5th column of the table. The total probability 7 c = 120 is also the answer to problem 3.

unnorm. posterior 0 0 1/256c 1/576c 1/3200c

18.05 class 21, Review problems with solutions, Spring 2014 2. Odds(20-sideed | x1 = 7) =

P (20-sided | x1 =7) P (not 20-sided | x1 =7)

=

4

1/160c 1/32c+1/48c

=

1/160 5/96

=

96 800

=

3 25 .

3. P (x1 = 7) = c = 7/120. 4. See the last two columns in the table. P (x2 = 8 | x1 = 7) =

4.2

1 256c

+

1 576c

+

1 3200c

=

49 480 .

Bayesian updating: conjugate priors.

Beta prior, binomial likelihood Data: x ∼ binomial(n, θ). θ is unknown.

Prior: f (θ) ∼ beta(a, b)

Posterior: f (θ | x) ∼ beta(a + x, b + n − x)

1. Suppose x ∼ binomial(30, θ), x = 12. If we have a prior f (θ) ∼ beta(1, 1) ﬁnd the posterior for θ. Beta prior, geometric likelihood Data: x

Prior: f (θ) ∼ beta(a, b)

Posterior: f (θ | x) ∼ beta(a + x, b + 1).

2. Suppose x ∼ geometric(θ), x = 6. If we have a prior f (θ) ∼ beta(4, 2) ﬁnd the posterior for θ. Normal prior, normal likelihood 1 a= 2 σprior µpost =

b=

aµprior + bx ¯ , a+b

2 σpost =

n σ2 1 . a+b

3. In the population IQ is normally distributed: θ ∼ N(100, 152 ). An IQ test ﬁnds a person’s ‘true’ IQ + random error ∼ N (0, 102 ). Someone takes the test and scores 120. Find the posterior pdf for this person’s IQ. Solutions. 1. f (θ) ∼ beta(1, 1), x ∼ binom(30, θ). x = 12, so f (θ | x = 12) ∼ β(13, 19)

2. f (θ) ∼ beta(4, 2), x ∼ geomθ). x = 6, so f (θ | x = 6) ∼ β(10, 3)

3. Prior, f (θ) ∼ N(100, 152 ), x ∼ N(θ, 102 ).

2 = 152 , σ 2 = 102 , n = 1, x = x = 120.

So we have, µprior = 100, σprior

Applying the normal-normal update formulas: a = µpost =

100/152 +120/102 1/152 +1/102

2 = 113.8, σpost =

1 1/152 +1/102

1 , 152

b=

1 . 102

This gives

= 69.2

Bayesian updating: continuous prior-continuous likelihood Examples. Update from prior to posterior for each of the following with the given data. Graph the prior and posterior in each case.

18.05 class 21, Review problems with solutions, Spring 2014

5

1. Romeo is late:

likelihood: x ∼ U (0, θ), prior: U (0, 1), data: 0.3, 0.4. 0.4.

2. Waiting times:

likelihood: x ∼ exp(λ), prior: λ ∼ exp(2), data: 1, 2.

3. Waiting times:

likelihood: x ∼ exp(λ), prior: λ ∼ exp(2), data: x1 , x2 , . . . , xn .

Solutions. 1. In the update table we split the hypotheses into the two diﬀerent cases θ < 0.4 and prior likelihood unnormalized posterior hyp. f (θ) f (data | θ) posterior f (θ | data) θ < 0.4 dθ 0 0 0 θ ≥ 0.4 :

1 dθ 1 dθ θ ≥ 0.4 dθ θ3 θ3 T θ3 Tot. 1 T 1 The total probability 1

T = 0.4

dθ 1 1 21

⇒ T = −

=

= 2.625.

3 2 θ 2θ 0.4 8

We use 1/T as a normalizing factor to make the total posterior probability equal to 1.

0

2

4

6

Prior and posterior for θ

0.0

0.2

0.4

0.6

0.8

1.0

Prior in red, posterior in cyan 2. This follows the same pattern as problem 1. The likelihood f (data | λ) = λe−λ·1 λe−λ·2 = λ2 e−3λ . hyp. 0

View more...
Class 21, 18.05, Spring 2014

Jeremy Orloﬀ and Jonathan Bloom

1

Summary • Data: x1 , . . . , xn • Basic statistics: sample mean, sample variance, sample median • Likelihood, maximum likelihood estimate (MLE) • Bayesian updating: prior, likelihood, posterior, predictive probability, probability in tervals; prior and likelihood can be discrete or continuous • NHST: H0 , HA , signiﬁcance level, rejection region, power, type 1 and type 2 errors, p-values.

2

Basic statistics

Data: x1 , . . . , xn . sample mean = x ¯ =

x1 + . . . + xn n

sample variance = s2 =

n i=1 (xi

−x ¯)2 n−1

sample median = middle value Example. Data: 1, 2, 3, 6, 8. x ¯ = 4,

3

s2 =

9+4+1+4+16 4

= 8.5,

median = 3.

Likelihood

x = data θ = parameter of interest or hypotheses of interest Likelihood: p(x | θ)

(discrete distribution)

f (x | θ)

(continuous distribution)

1

18.05 class 21, Review problems with solutions, Spring 2014

2

Log likelihood : ln(p(x | θ)). ln(f (x | θ)). Likelihood examples. Find the likelihood function of each of the following. 1. Coin with probability of heads θ. Toss 10 times get 3 heads. 2. Wait time follows exp(λ). In 5 trials wait 3,5,4,5,2 3. Usual 5 dice. Two rolls, 9, 5. (Likelihood given in a table) 4. x1 , . . . , xn ∼ N(µ, σ 2 ) 5. x = 6 drawn from uniform(0, θ) 6. x ∼ uniform(0, θ) Solutions. 1. Let x be the number of heads in 10 tosses. P (x = 3 | θ) =

10 3

θ3 (1 − θ)7 .

2. f (data | λ) = λ5 e−λ(3+5+4+5+2) = λ5 e−19λ 3. Hypothesis θ 4-sided 6-sided 8-sided 12-sided 20-sided

Likelihood P (data | θ)

0

0

0

1/144 1/1400 − (x1 −µ)2 +(x2 −µ)2 +...+(xn −µ)2 2σ 2

[ 4. P (data | µ, σ) = e � 0 if θ < 6 5. f (x = 6 | θ) = 1/θ if 6 ≤ θ � 0 if θ < x or x < 0 6. f (x | θ) = 1/θ if 0 ≤ x ≤ θ �

3.1

√1 2πσ

�n

]

Maximum likelihood estimates (MLE)

Methods for ﬁnding the maximum likelihood estimate (MLE). • Discrete hypotheses: compute each likelihood • Discrete hypotheses: maximum is obvious • Continuous parameter: compute derivative (often use log likelihood) • Continuous parameter: maximum is obvious Examples. Find the MLE for each of the examples in the previous section.

18.05 class 21, Review problems with solutions, Spring 2014

3

Solutions. 1. ln(f (x − 3 |θ) = ln

10 3

+ 3 ln(θ) − 7 ln(1 − θ).

Take the derivative and set to 0:

3 7 3 + = 0 ⇒ θˆ = . θ 1−θ 10

2. ln(f (data | λ) = 5 ln(λ) − 19λ. Take the derivative and set to 0:

5 ˆ= 5 . − 19 = 0 ⇒ λ λ 19

3. Read directly from the table: MLE = 12-sided die. 4. For the exam do not focus on the calculation here. You should understand the idea that we need to set the partial derivatives with respect to µ and σ to 0 and solve for the critical point (ˆ µ, σˆ2 ). The result is µ ˆ = x, σˆ2 =

(xi −µ ˆ)2 . n

5. Because of the term 1/θ in the likelihood. The likelihood is at a maximum when θ is as small as possible. answer: : θˆ = 6. 6. This is identical to problem 5 except the exact value of x is not given. answer: θˆ = x.

4 4.1

Bayesian updating Bayesian updating: discrete prior-discrete likelihood.

Jon has 1 four-side, 2 six-sided, 2 eight-sided, 2 twelve sided, and 1 twenty-sided dice. He picks one at random and rolls a 7. 1. For each type of die, ﬁnd the posterior probability Jon chose that type. 2. What are the posterior odds Jon chose the 20-sided die? 3. Compute the prior predictive probability of rolling a 7 on the ﬁrst roll. 4. Compute the posterior predictive probability of rolling an 8 on the second roll.

Solutions.

1.. Make a table. (We include columns to answer question 4.)

Hypothesis θ 4-sided 6-sided 8-sided 12-sided 20-sided Total

Prior P (θ) 1/8 1/4 1/4 1/4 1/8 1

Likelihood f (x1 = 7 | θ) 0 0 1/8 1/12 1/20

Unnorm. posterior 0 0 1/32 1/48 1/160 1 1 + 48 + c = 32

1 160

posterior f (θ | x1 = 7) 0 0 1/32c 1/48c 1/160c 1

likelihood P (x2 = 8 | θ) 0 0 1/8 1/12 1/20

The posterior probabilities are given in the 5th column of the table. The total probability 7 c = 120 is also the answer to problem 3.

unnorm. posterior 0 0 1/256c 1/576c 1/3200c

18.05 class 21, Review problems with solutions, Spring 2014 2. Odds(20-sideed | x1 = 7) =

P (20-sided | x1 =7) P (not 20-sided | x1 =7)

=

4

1/160c 1/32c+1/48c

=

1/160 5/96

=

96 800

=

3 25 .

3. P (x1 = 7) = c = 7/120. 4. See the last two columns in the table. P (x2 = 8 | x1 = 7) =

4.2

1 256c

+

1 576c

+

1 3200c

=

49 480 .

Bayesian updating: conjugate priors.

Beta prior, binomial likelihood Data: x ∼ binomial(n, θ). θ is unknown.

Prior: f (θ) ∼ beta(a, b)

Posterior: f (θ | x) ∼ beta(a + x, b + n − x)

1. Suppose x ∼ binomial(30, θ), x = 12. If we have a prior f (θ) ∼ beta(1, 1) ﬁnd the posterior for θ. Beta prior, geometric likelihood Data: x

Prior: f (θ) ∼ beta(a, b)

Posterior: f (θ | x) ∼ beta(a + x, b + 1).

2. Suppose x ∼ geometric(θ), x = 6. If we have a prior f (θ) ∼ beta(4, 2) ﬁnd the posterior for θ. Normal prior, normal likelihood 1 a= 2 σprior µpost =

b=

aµprior + bx ¯ , a+b

2 σpost =

n σ2 1 . a+b

3. In the population IQ is normally distributed: θ ∼ N(100, 152 ). An IQ test ﬁnds a person’s ‘true’ IQ + random error ∼ N (0, 102 ). Someone takes the test and scores 120. Find the posterior pdf for this person’s IQ. Solutions. 1. f (θ) ∼ beta(1, 1), x ∼ binom(30, θ). x = 12, so f (θ | x = 12) ∼ β(13, 19)

2. f (θ) ∼ beta(4, 2), x ∼ geomθ). x = 6, so f (θ | x = 6) ∼ β(10, 3)

3. Prior, f (θ) ∼ N(100, 152 ), x ∼ N(θ, 102 ).

2 = 152 , σ 2 = 102 , n = 1, x = x = 120.

So we have, µprior = 100, σprior

Applying the normal-normal update formulas: a = µpost =

100/152 +120/102 1/152 +1/102

2 = 113.8, σpost =

1 1/152 +1/102

1 , 152

b=

1 . 102

This gives

= 69.2

Bayesian updating: continuous prior-continuous likelihood Examples. Update from prior to posterior for each of the following with the given data. Graph the prior and posterior in each case.

18.05 class 21, Review problems with solutions, Spring 2014

5

1. Romeo is late:

likelihood: x ∼ U (0, θ), prior: U (0, 1), data: 0.3, 0.4. 0.4.

2. Waiting times:

likelihood: x ∼ exp(λ), prior: λ ∼ exp(2), data: 1, 2.

3. Waiting times:

likelihood: x ∼ exp(λ), prior: λ ∼ exp(2), data: x1 , x2 , . . . , xn .

Solutions. 1. In the update table we split the hypotheses into the two diﬀerent cases θ < 0.4 and prior likelihood unnormalized posterior hyp. f (θ) f (data | θ) posterior f (θ | data) θ < 0.4 dθ 0 0 0 θ ≥ 0.4 :

1 dθ 1 dθ θ ≥ 0.4 dθ θ3 θ3 T θ3 Tot. 1 T 1 The total probability 1

T = 0.4

dθ 1 1 21

⇒ T = −

=

= 2.625.

3 2 θ 2θ 0.4 8

We use 1/T as a normalizing factor to make the total posterior probability equal to 1.

0

2

4

6

Prior and posterior for θ

0.0

0.2

0.4

0.6

0.8

1.0

Prior in red, posterior in cyan 2. This follows the same pattern as problem 1. The likelihood f (data | λ) = λe−λ·1 λe−λ·2 = λ2 e−3λ . hyp. 0