Please copy and paste this embed script to where you want to embed

Lecture 3: Di¤erences-in-Di¤erences Fabian Waldinger

Waldinger ()

1 / 55

Topics Covered in Lecture

1

Review of …xed e¤ects regression models.

2

Di¤erences-in-Di¤erences Basics: Card & Krueger (1994).

3

Regression Di¤erences-in-Di¤erences.

4

Synthetic Controls: Abadie & Gardeazabal (2003).

5

Combining Di¤erences-in-Di¤erences with IV: Waldinger (2010).

Waldinger (Warwick)

2 / 55

Very Brief Review of Fixed E¤ects Models - Introductory Example

Suppose you are interested in the question whether union workers earn higher wages. Problem: unionized workers may be di¤erent (e.g. higher skilled, more experienced) from non-unionized workers. Many of these factors will not be observable to the econometrician (standard omitted variable bias problem). Therefore the error term and union status will be correlated and OLS will be biased.

Waldinger (Warwick)

3 / 55

Very Brief Review of Fixed E¤ects Models We are interested whether Yit (earnings) is a¤ected by Dit (union status) which we assume to be randomly assigned. We also have time varying covariates Xii (such as experience) and unobserved but …xed confounders Ai (e.g. ability). E [Y0it jAi , Xit , t ] = α + λt + Ai0 γ + Xit0 β Assuming that the causal e¤ect of union membership is additive and constant we also have: E [Y1it jAi , Xit , t ] = E [Y0it jAi , Xit , t ] + ρ Together with the previous equation this implies: E [Yit jAi , Xit , t ] = α + λt + ρDit + Ai0 γ + Xit0 β Waldinger (Warwick)

4 / 55

Estimation of Fixed E¤ects Models This equation implies the following regression equation: Yit = αi + λt + ρDit + Xit0 β + εit where εit =Y0it

(1)

E [Y0it jAi , Xit , t ] and αi = α + Ai0 γ

Suppose you simply estimate this model with OLS (without including individual …xed e¤ects). You therefore estimate: Yit =constant + λt + ρDit + Xit0 β + αi + εit | {z } uit

As αi is correlated with union status Dit there is a correlation of Dit with the error term. This will lead to biased OLS estimates. Waldinger (Warwick)

5 / 55

2 Ways of Estimating Fixed E¤ects Models

A …xed e¤ect model would address this problem because αi would be included in the regression. Dit and the error term would therefore be uncorrelated and you would obtain an unbiased estimate of ρ. In practice there are two ways of estimating this …xed e¤ects model: 1 2

Demeaning (sometimes called "within estimator"). First di¤erencing.

Waldinger (Warwick)

6 / 55

Within-Estimator

With demeaning you (or the computer) …rst calculate individual averages of the dependent variable and all explanatory variables. You then substract these averages from regression equation (1): Yit

Y i = λt

λ + ρ(Dit

D i ) + (Xit

X i )0 β + (εit

εi )

Thus αi drops out and therefore the error and the regressor would no longer be correlated.

Waldinger (Warwick)

7 / 55

First-Di¤erencing

An alternative way of estimating the …xed e¤ects model is …rst di¤erencing which would also get rid of the αi . ∆Yit = ∆λt + ρ∆Dit + ∆Xit0 β + ∆εit With 2 periods the two methods are algebraically the same. Otherwise not. Both should work, but with …rst di¤erencing you introduce serial correlation of the error terms. Therefore demeaning is usually the best option.

Waldinger (Warwick)

8 / 55

The E¤ect of Unionization on Wages - OLS vs. FE

Freeman (1984) analyzed unionization comparing OLS and FE models for a number of datasets: Survey CPS 74-75 NLSY 70-78 PSID 70-79 QES 73-77

OLS 0.19 0.28 0.23 0.14

Fixed E¤ects 0.09 0.19 0.14 0.16

These results suggest that union workers are positively selected.

Waldinger (Warwick)

9 / 55

Measurement Error and Fixed E¤ects Models

OLS results were larger than FE ! selection may be important. Another plausible explanation is measurement error. Measurement error introduces attenuation bias. As the signal to noise ratio is smaller with …xed e¤ects (as we just use the deviations from the mean as signal) measurement error is typically a more important problem in …xed e¤ect models. In this case union status may be misreported for some individuals in each year. Observed year to year changes in union status for one individual may thus be mostly noise.

Waldinger (Warwick)

10 / 55

Di¤erences-in-Di¤erences: Card & Krueger (1994)

Suppose you are interested in the e¤ect of minimum wages on employment (a classic and controversial question in labour economics). In a competitive labour market, increases in the minimum wage would move us up a downward-sloping labour demand curve. ! employment would fall.

Waldinger (Warwick)

11 / 55

Di¤erences-in-Di¤erences: Card & Krueger (1994) Card & Krueger (1994) analyse the e¤ect of a minimum wage increase in New Jersey using a di¤erences-in-di¤erences methodology. In February 1992 NJ increased the state minimum wage from $4.25 to $5.05. Pennsylvania’s minimum wage stayed at $4.25.

They surveyed about 400 fast food stores both in NJ and in PA both before and after the minimum wage increase in NJ. Waldinger (Warwick)

12 / 55

Di¤erences-in-Di¤erences Strategy DD is a version of …xed e¤ects estimation. To see this more formally: Y1ist : employment at restaurant i, state s, time t with a high wmin . Y0ist : employment at restaurant i, state s, time t with a low wmin . In practice of course we only see one or the other. We then assume that: E [Y0ist js, t ] = γs + λt In the absence of a minimum wage change, employment is determined by the sum of a time-invariant state e¤ect γs and a year e¤ect λt that is common across states. Let Dst be a dummy for high-minimum wage states and periods. Assuming E [Y1ist Y0ist js, t ] = δ is the treatment e¤ect, observed employment can be written: Yist = γs + λt + δDst + εist Waldinger (Warwick)

13 / 55

Di¤erences-in-Di¤erences Strategy II In New Jersey: Employment in February is: E [Yist js = NJ, t = Feb ] = γNJ + λFeb Employment in November is: E [Yist js = NJ, t = Nov ] = γNJ + λNov + δ the di¤erence between February and November is: E [Yist js = NJ, t = N ] E [Yist js = NJ, t = F ] = λN

λF + δ

In Pennsylvania: Employment in February is: E [Yist js = PA, t = Feb ] = γPA + λFeb Employment in November is: E [Yist js = PA, t = Nov ] = γPA + λNov the di¤erence between February and November is: E [Yist js = PA, t = Nov ] E [Yist js = PA, t = Feb ] = λNov Waldinger (Warwick)

λFeb

14 / 55

Di¤erences-in-Di¤erences Strategy

The di¤erences-in-di¤erences strategy amounts to comparing the change in employment in NJ to the change in employment in PA. The population di¤erences-in-di¤erences are: E [Yist js = NJ, t = N ] E [Yist js = NJ, t = F ] E [Yist js = PA, t = Nov ] E [Yist js = PA, t = Feb ] = δ This is estimated using the sample analog of the population means.

Waldinger (Warwick)

15 / 55

Di¤erences-in-Di¤erences Table

Surprisingly, employment rose in NJ relative to PA after the minimum wage change. Waldinger (Warwick)

16 / 55

Regression DD We can estimate the di¤erences-in-di¤erences estimator in a regression framework. Advantages: It is easy to calculate standard errors. We can control for other variables which may reduce the residual variance (lead to smaller standard errors). It is easy to include multiple periods. We can study treatments with di¤erent treatment intensity. (e.g. varying increases in the minimum wage for di¤erent states).

The typical regression model that we estimate is: Outcomeit = β1 + β2 Treati + β3 Postt + β4 (Treat * Post)it + ε Treatment = a dummy if the observation is in the treatment group Post = post treatment dummy Waldinger (Warwick)

17 / 55

Regression DD - Card & Krueger

In the Card & Krueger case the equivalent regression model would be: Yist = α + γNJs + λdt + δ(NJs dt ) + εist NJ is a dummy which is equal to 1 if the observation is from NJ. d is a dummy which is equal to 1 if the observation is from November (post).

This equation takes the following values. PA Pre: α PA Post: α + λ NJ Pre: α + γ NJ Post: α + γ + λ + δ

Di¤erences-in-Di¤erences estimate: (NJ Post - NJ Pre) - (PA Post PA Pre) = δ

Waldinger (Warwick)

18 / 55

Graph - Observed Data

Waldinger (Warwick)

19 / 55

Graph - DD Yist = α + γNJs + λdt + δ(NJs

Waldinger (Warwick)

dt ) + εist

20 / 55

Graph - DD Yist = α + γNJs + λdt + δ(NJs

Waldinger (Warwick)

dt ) + εist

21 / 55

Graph - DD Yist = α + γNJs + λdt + δ(NJs

Waldinger (Warwick)

dt ) + εist

22 / 55

Graph - DD Yist = α + γNJs + λdt + δ(NJs

Waldinger (Warwick)

dt ) + εist

23 / 55

Key Assumption of Any DD Strategy: Common Trends The key assumption for any DD strategy is that the outcome in treatment and control group would follow the same time trend in the absence of the treatment. This does not mean that they have to have the same mean of the outcome! Common trend assumption is di¢ cult to verify but one often uses pre-treatment data to show that the trends are the same. Even if pre-trends are the same one still has to worry about other policies changing at the same time.

Waldinger (Warwick)

24 / 55

Regression DD Including Leads and Lags

Including leads into the DD model is an easy way to analyze pre-trends. Lags can be included to analyze whether the treatment e¤ect changes over time after treatment. The estimated regression would be: 1

m

Yist = γs + λt + ∑ δτ Ds τ + ∑ δτ Ds τ + Xist + εist τ= q

τ =0

treatment occurs in year 0. includes q leads or anticipatory e¤ects. includes m leads or post treatment e¤ects.

Waldinger (Warwick)

25 / 55

Study Including Leads and Lags - Author (2003)

Autor (2003) includes both leads and lags in a DD model analyzing the e¤ect of increased employment protection on the …rm’s use of temporary help workers. In the US employers can usually hire and …re workers at will. Some states courts have made some exceptions to this employment at will rule and have thus increased employment protection. Di¤erent states have passed these exeptions at di¤erent points in time. The standard thing to do is to normalize the adoption year to 0. Autor then analyzes the e¤ect of these exeptions on the use of temporary help workers.

Waldinger (Warwick)

26 / 55

Results

The leads are very close to 0. ! no evidence for anticipatory e¤ects (good news for the common trends assumption). The lags show that the e¤ect increases during the …rst years of the treatment and then remains relatively constant.

Waldinger (Warwick)

27 / 55

Standard Errors in DD Strategies

Many papers using a DD strategy use data from many years (not only 1 pre and 1 post period). The variables of interest in many of these setups only vary at a group level (say state) and outcome variables are often serially correlated. In the Card and Krueger study for example, it is very likely that employment in each state is not only correlated within the state but also serially correlated. As Bertrand, Du‡o, and Mullainathan (2004) point out, conventional standard errors often severely understate the standard deviation of the estimators.

Waldinger (Warwick)

28 / 55

Standard Errors in DD Strategies - Practical Solutions Bertrand, Du‡o, and Mullainathan propose the following solutions: 1

2

3

Block bootstrapping standard errors (if you analyze states the block should be the states and you would sample whole states with replacing for the bootstrapping). Clustering standard errors at the group level. (in STATA one would simply add cl(state) to the regression equation if one analyzes state level variation). Aggregating the data into one pre and one post period. Literally works only if there is only one treatment date. With staggered treatment dates one should adopt the following procedure: Regress Yst on state FE, year FE, and relevant covariates. Obtain residuals from the treatment states only and divide them into 2 groups: pre and post treatment. Then regress the two groups of residuals on a post dummy.

Correct treatment of standard errors sometimes makes the number of groups very small: in the Card and Krueger study the number of groups is only 2. Waldinger (Warwick)

29 / 55

Synthetic Control Methods

In some cases, treatment and potential control groups do not follow parallel trends. ! Standard DD method would lead to biased estimates. Abadie & Gardeazabal (2003) pioneered a synthetic control method when estimating the e¤ects of the terrorist con‡ict in the Basque Country using other Spanish regions as a comparison group. (Card (1990) implicitly used a very similar approach in his Mariel boatlift paper investigating the e¤ect of immigration on employment of natives).

The basic idea behind synthetic controls is that a combination of units often provides a better comparison for the unit exposed to the intervention than any single unit alone.

Waldinger (Warwick)

30 / 55

Abadie & Gardeazabal (2003) - The E¤ect of Terrorism on Growth

They want to evaluate whether Terrorism in the Basque Country had a negative e¤ect on growth. They cannot use a standard DD method because none of the other Spanish regions followed the same time trend as the Basque Country. They therefore take a weighted average of other Spanish regions as a synthetic control group.

Waldinger (Warwick)

31 / 55

The Basque Country is Di¤erent from the Rest of Spain

Waldinger (Warwick)

32 / 55

The Synthetic Control Method

They have J available control regions (the 16 Spanish regions other than the Basque Country). They want to assign weights W = (w1 , ..., wJ )0 a (J x 1) to each region. (wj 0 & ∑ wj = 1; this ensures that there is no extrapolation outside the support of the growth predictors for the control regions). The weights are chosen so that the synthetic Basque country most closely resembles the actual one before terrorism.

Waldinger (Warwick)

33 / 55

The Synthetic Control Method - Details Let X1 be a (K x 1) vector of pre-terrorism of K economic growth predictors (i.e. the values in the previous table: investment ratio, population density, ...) in the Basque Country. Let X0 be a (K x J) matrix which contains the values of the same variables for the J possible control regions. Let V be a diagonal matrix with nonnegative components re‡ecting the relative importance of the di¤erent growth predictors. The vector of weights W* is then chosen to minimize: (X1

X0 W)’V(X1

X0 W)

They choose the matrix V such that the real per capita GDP path for the Basque Country during the 1960s (pre terrorism) is best reproduced by the resulting synthetic Basque Country. Waldinger (Warwick)

34 / 55

The Synthetic Control Method - Details

The optimal weights they get are: Catalonia: 0.8508, Madrid: 0.1492, and all other regions: 0. Alternatively they could have just chosen the weights to reproduce only the pre-terrorism growth path for the Basque country (and not the growth predictors as well. In that case they would have minimized: (Z1

Z0 W)’(Z1

Z0 W)

Z1 is the (10 x 1) vector of pre-terrorism (1960-1969) GDP values for the Basque Country. Z0 is the (10 x J) vector of pre-terrorism (1960-1969) GDP values for the J potential control regions.

Waldinger (Warwick)

35 / 55

The Synthetic Basque Country Looks Similar

Waldinger (Warwick)

36 / 55

Constructing the Counterfactual Using the Weights

Y1 is a (T x 1) vector whose elements are the values of real per capital GDP values for T years in the Basque country. Y0 is a (T x J) matrix whose elements are the values of real per capital GDP values for T years in the control regions. They then constructed the counterfactual GDP (in the absence of terrorism) as: Y*1 =Y0 W*

Waldinger (Warwick)

37 / 55

Growth in the Basque Country with and without Terrorism

Waldinger (Warwick)

38 / 55

Terrorist Activity and Estimated GDP Gap

Waldinger (Warwick)

39 / 55

Combining DD and IV Sometimes combining DD and IV methods can be quite useful. In a recent paper (Waldinger, 2010), I have done that to estimate the e¤ect of faculty quality on the outcomes of PhD students. Estimating the e¤ect of faculty quality on PhD student outcomes is challenging because of: 1 2 3

Selection of good students into good universities. Omitted variables a¤ecting both faculty quality and student outcomes. Measurement error in faculty quality.

I address these issues by using the dismissal of scientists in Nazi Germany as an exogenous shock to faculty quality. The dismissal a¤ected some departments very strongly, while other departments were not a¤ected.

Waldinger (Warwick)

40 / 55

Historical Background Germany was the leading country for scienti…c research at the beginning of the 20th century. Immediately after gaining power in 1933 the new Nazi government dismissed all Jewish and ‘politically unreliable’scholars from the German universities.

Waldinger (Warwick)

41 / 55

Dismissed Professors Across German Universities

Waldinger (Warwick)

42 / 55

Dismissed Professors Across German Universities II

Waldinger (Warwick)

43 / 55

E¤ect of Dismissals on Department Size

Waldinger (Warwick)

44 / 55

E¤ect of Dismissals on Faculty Quality

Waldinger (Warwick)

45 / 55

Panel Data on PhD graduates from German Universities

I obtained a panel dataset of all mathematics PhD students graduating from all German universities between 1923 and 1938 and use the dismissal as exogenous variation in faculty quality. The empirical strategy essentially compares changes in outcomes of PhD students in a¤ected department before and after 1933 to changes in outcomes in una¤ected departments. I investigate the following outcomes: 1 2 3 4

Whether former PhD student publishes dissertation in a top journal. Whether former PhD student ever becomes full professor. # of lifetime citations. Positive lifetime citations.

Waldinger (Warwick)

46 / 55

Reduced Form Graphical Analysis - Publishing Dissertation

Waldinger (Warwick)

47 / 55

Reduced Form Graphical Analysis - Full Professor

Waldinger (Warwick)

48 / 55

Reduced Form Graphical Analysis - Lifetime Citations

Waldinger (Warwick)

49 / 55

Reduced Form Estimates The reduced form of the dismissal e¤ect is essentially a DD estimator. Outcomeidt = β1 + β2 (Dismissal induced Reduction in Faculty Quality)dt + β3 (Dismissal induced increase in Student/Faculty Ratio)dt + β4 Femaleidt + β5 Foreignidt + β6 CohortFEt + β5 DepFEd + εidt

Dismissal induced Reduction in Faculty Quality is 0 until 1933 and equal to the dismissal induced fall in faculty quality after 1933 (and remains 0 in departments without dismissals). Dismissal induced increase in Student/Faculty Ratio is also 0 until 1933 but equal to the dismissal induced increase in student/faculty ratio after 1933 ! essentially a di¤erences-in-di¤erences estimator but with di¤erent treatment intensities. Waldinger (Warwick)

50 / 55

Reduced Form Estimates

Waldinger (Warwick)

51 / 55

Common Robustness Check for Parallel Trend Assumption Only Look at Pre-Period Data and Move Placebo Treatment some Years Back

Here I move a placebo treatment to 1930.

Waldinger (Warwick)

52 / 55

Use Dismissal as IV OLS model to the e¤ect of university quality on PhD student outcomes: Outcomeidt = β1 + β2 (Avg. Faculty Quality)dt 1 + β3 (Student/Faculty Ratio)dt 1 + β4 Femaleidt + β5 Foreignidt + β6 CohortFEt + β7 DepFEd + εidt University quality and student/faculty ratio are endogenous ! use dismissal as IV. 2 Endogenous Variables ! 2 First Stage Regressions: 1

2

Avg. Faculty Qualityidt = γ1 + γ2 (Dismissal induced Reduction in Faculty Quality)dt + γ3 (Dismissal induced increase in Student/Faculty Ratio)dt + γ4 Femaleidt + γ5 Foreignidt + γ6 CohortFEt + γ5 DepFEd + εidt Student/Faculty Ratioidt = δ1 + δ2 (Dismissal induced Reduction in Faculty Quality)dt + δ3 (Dismissal induced increase in Student/Faculty Ratio)dt + δ4 Femaleidt + δ5 Foreignidt + δ6 CohortFEt + δ5 DepFEd + εidt

Waldinger (Warwick)

53 / 55

First Stages

To test for weak instruments one cannot simply look at the …rst stage F-statistics because here we have 2 endogenous regressors and 2 IVs. ! use Cragg-Donald EV statistic here critical value is 7.03.

Waldinger (Warwick)

54 / 55

OLS and IV

Waldinger (Warwick)

55 / 55

View more...
Waldinger ()

1 / 55

Topics Covered in Lecture

1

Review of …xed e¤ects regression models.

2

Di¤erences-in-Di¤erences Basics: Card & Krueger (1994).

3

Regression Di¤erences-in-Di¤erences.

4

Synthetic Controls: Abadie & Gardeazabal (2003).

5

Combining Di¤erences-in-Di¤erences with IV: Waldinger (2010).

Waldinger (Warwick)

2 / 55

Very Brief Review of Fixed E¤ects Models - Introductory Example

Suppose you are interested in the question whether union workers earn higher wages. Problem: unionized workers may be di¤erent (e.g. higher skilled, more experienced) from non-unionized workers. Many of these factors will not be observable to the econometrician (standard omitted variable bias problem). Therefore the error term and union status will be correlated and OLS will be biased.

Waldinger (Warwick)

3 / 55

Very Brief Review of Fixed E¤ects Models We are interested whether Yit (earnings) is a¤ected by Dit (union status) which we assume to be randomly assigned. We also have time varying covariates Xii (such as experience) and unobserved but …xed confounders Ai (e.g. ability). E [Y0it jAi , Xit , t ] = α + λt + Ai0 γ + Xit0 β Assuming that the causal e¤ect of union membership is additive and constant we also have: E [Y1it jAi , Xit , t ] = E [Y0it jAi , Xit , t ] + ρ Together with the previous equation this implies: E [Yit jAi , Xit , t ] = α + λt + ρDit + Ai0 γ + Xit0 β Waldinger (Warwick)

4 / 55

Estimation of Fixed E¤ects Models This equation implies the following regression equation: Yit = αi + λt + ρDit + Xit0 β + εit where εit =Y0it

(1)

E [Y0it jAi , Xit , t ] and αi = α + Ai0 γ

Suppose you simply estimate this model with OLS (without including individual …xed e¤ects). You therefore estimate: Yit =constant + λt + ρDit + Xit0 β + αi + εit | {z } uit

As αi is correlated with union status Dit there is a correlation of Dit with the error term. This will lead to biased OLS estimates. Waldinger (Warwick)

5 / 55

2 Ways of Estimating Fixed E¤ects Models

A …xed e¤ect model would address this problem because αi would be included in the regression. Dit and the error term would therefore be uncorrelated and you would obtain an unbiased estimate of ρ. In practice there are two ways of estimating this …xed e¤ects model: 1 2

Demeaning (sometimes called "within estimator"). First di¤erencing.

Waldinger (Warwick)

6 / 55

Within-Estimator

With demeaning you (or the computer) …rst calculate individual averages of the dependent variable and all explanatory variables. You then substract these averages from regression equation (1): Yit

Y i = λt

λ + ρ(Dit

D i ) + (Xit

X i )0 β + (εit

εi )

Thus αi drops out and therefore the error and the regressor would no longer be correlated.

Waldinger (Warwick)

7 / 55

First-Di¤erencing

An alternative way of estimating the …xed e¤ects model is …rst di¤erencing which would also get rid of the αi . ∆Yit = ∆λt + ρ∆Dit + ∆Xit0 β + ∆εit With 2 periods the two methods are algebraically the same. Otherwise not. Both should work, but with …rst di¤erencing you introduce serial correlation of the error terms. Therefore demeaning is usually the best option.

Waldinger (Warwick)

8 / 55

The E¤ect of Unionization on Wages - OLS vs. FE

Freeman (1984) analyzed unionization comparing OLS and FE models for a number of datasets: Survey CPS 74-75 NLSY 70-78 PSID 70-79 QES 73-77

OLS 0.19 0.28 0.23 0.14

Fixed E¤ects 0.09 0.19 0.14 0.16

These results suggest that union workers are positively selected.

Waldinger (Warwick)

9 / 55

Measurement Error and Fixed E¤ects Models

OLS results were larger than FE ! selection may be important. Another plausible explanation is measurement error. Measurement error introduces attenuation bias. As the signal to noise ratio is smaller with …xed e¤ects (as we just use the deviations from the mean as signal) measurement error is typically a more important problem in …xed e¤ect models. In this case union status may be misreported for some individuals in each year. Observed year to year changes in union status for one individual may thus be mostly noise.

Waldinger (Warwick)

10 / 55

Di¤erences-in-Di¤erences: Card & Krueger (1994)

Suppose you are interested in the e¤ect of minimum wages on employment (a classic and controversial question in labour economics). In a competitive labour market, increases in the minimum wage would move us up a downward-sloping labour demand curve. ! employment would fall.

Waldinger (Warwick)

11 / 55

Di¤erences-in-Di¤erences: Card & Krueger (1994) Card & Krueger (1994) analyse the e¤ect of a minimum wage increase in New Jersey using a di¤erences-in-di¤erences methodology. In February 1992 NJ increased the state minimum wage from $4.25 to $5.05. Pennsylvania’s minimum wage stayed at $4.25.

They surveyed about 400 fast food stores both in NJ and in PA both before and after the minimum wage increase in NJ. Waldinger (Warwick)

12 / 55

Di¤erences-in-Di¤erences Strategy DD is a version of …xed e¤ects estimation. To see this more formally: Y1ist : employment at restaurant i, state s, time t with a high wmin . Y0ist : employment at restaurant i, state s, time t with a low wmin . In practice of course we only see one or the other. We then assume that: E [Y0ist js, t ] = γs + λt In the absence of a minimum wage change, employment is determined by the sum of a time-invariant state e¤ect γs and a year e¤ect λt that is common across states. Let Dst be a dummy for high-minimum wage states and periods. Assuming E [Y1ist Y0ist js, t ] = δ is the treatment e¤ect, observed employment can be written: Yist = γs + λt + δDst + εist Waldinger (Warwick)

13 / 55

Di¤erences-in-Di¤erences Strategy II In New Jersey: Employment in February is: E [Yist js = NJ, t = Feb ] = γNJ + λFeb Employment in November is: E [Yist js = NJ, t = Nov ] = γNJ + λNov + δ the di¤erence between February and November is: E [Yist js = NJ, t = N ] E [Yist js = NJ, t = F ] = λN

λF + δ

In Pennsylvania: Employment in February is: E [Yist js = PA, t = Feb ] = γPA + λFeb Employment in November is: E [Yist js = PA, t = Nov ] = γPA + λNov the di¤erence between February and November is: E [Yist js = PA, t = Nov ] E [Yist js = PA, t = Feb ] = λNov Waldinger (Warwick)

λFeb

14 / 55

Di¤erences-in-Di¤erences Strategy

The di¤erences-in-di¤erences strategy amounts to comparing the change in employment in NJ to the change in employment in PA. The population di¤erences-in-di¤erences are: E [Yist js = NJ, t = N ] E [Yist js = NJ, t = F ] E [Yist js = PA, t = Nov ] E [Yist js = PA, t = Feb ] = δ This is estimated using the sample analog of the population means.

Waldinger (Warwick)

15 / 55

Di¤erences-in-Di¤erences Table

Surprisingly, employment rose in NJ relative to PA after the minimum wage change. Waldinger (Warwick)

16 / 55

Regression DD We can estimate the di¤erences-in-di¤erences estimator in a regression framework. Advantages: It is easy to calculate standard errors. We can control for other variables which may reduce the residual variance (lead to smaller standard errors). It is easy to include multiple periods. We can study treatments with di¤erent treatment intensity. (e.g. varying increases in the minimum wage for di¤erent states).

The typical regression model that we estimate is: Outcomeit = β1 + β2 Treati + β3 Postt + β4 (Treat * Post)it + ε Treatment = a dummy if the observation is in the treatment group Post = post treatment dummy Waldinger (Warwick)

17 / 55

Regression DD - Card & Krueger

In the Card & Krueger case the equivalent regression model would be: Yist = α + γNJs + λdt + δ(NJs dt ) + εist NJ is a dummy which is equal to 1 if the observation is from NJ. d is a dummy which is equal to 1 if the observation is from November (post).

This equation takes the following values. PA Pre: α PA Post: α + λ NJ Pre: α + γ NJ Post: α + γ + λ + δ

Di¤erences-in-Di¤erences estimate: (NJ Post - NJ Pre) - (PA Post PA Pre) = δ

Waldinger (Warwick)

18 / 55

Graph - Observed Data

Waldinger (Warwick)

19 / 55

Graph - DD Yist = α + γNJs + λdt + δ(NJs

Waldinger (Warwick)

dt ) + εist

20 / 55

Graph - DD Yist = α + γNJs + λdt + δ(NJs

Waldinger (Warwick)

dt ) + εist

21 / 55

Graph - DD Yist = α + γNJs + λdt + δ(NJs

Waldinger (Warwick)

dt ) + εist

22 / 55

Graph - DD Yist = α + γNJs + λdt + δ(NJs

Waldinger (Warwick)

dt ) + εist

23 / 55

Key Assumption of Any DD Strategy: Common Trends The key assumption for any DD strategy is that the outcome in treatment and control group would follow the same time trend in the absence of the treatment. This does not mean that they have to have the same mean of the outcome! Common trend assumption is di¢ cult to verify but one often uses pre-treatment data to show that the trends are the same. Even if pre-trends are the same one still has to worry about other policies changing at the same time.

Waldinger (Warwick)

24 / 55

Regression DD Including Leads and Lags

Including leads into the DD model is an easy way to analyze pre-trends. Lags can be included to analyze whether the treatment e¤ect changes over time after treatment. The estimated regression would be: 1

m

Yist = γs + λt + ∑ δτ Ds τ + ∑ δτ Ds τ + Xist + εist τ= q

τ =0

treatment occurs in year 0. includes q leads or anticipatory e¤ects. includes m leads or post treatment e¤ects.

Waldinger (Warwick)

25 / 55

Study Including Leads and Lags - Author (2003)

Autor (2003) includes both leads and lags in a DD model analyzing the e¤ect of increased employment protection on the …rm’s use of temporary help workers. In the US employers can usually hire and …re workers at will. Some states courts have made some exceptions to this employment at will rule and have thus increased employment protection. Di¤erent states have passed these exeptions at di¤erent points in time. The standard thing to do is to normalize the adoption year to 0. Autor then analyzes the e¤ect of these exeptions on the use of temporary help workers.

Waldinger (Warwick)

26 / 55

Results

The leads are very close to 0. ! no evidence for anticipatory e¤ects (good news for the common trends assumption). The lags show that the e¤ect increases during the …rst years of the treatment and then remains relatively constant.

Waldinger (Warwick)

27 / 55

Standard Errors in DD Strategies

Many papers using a DD strategy use data from many years (not only 1 pre and 1 post period). The variables of interest in many of these setups only vary at a group level (say state) and outcome variables are often serially correlated. In the Card and Krueger study for example, it is very likely that employment in each state is not only correlated within the state but also serially correlated. As Bertrand, Du‡o, and Mullainathan (2004) point out, conventional standard errors often severely understate the standard deviation of the estimators.

Waldinger (Warwick)

28 / 55

Standard Errors in DD Strategies - Practical Solutions Bertrand, Du‡o, and Mullainathan propose the following solutions: 1

2

3

Block bootstrapping standard errors (if you analyze states the block should be the states and you would sample whole states with replacing for the bootstrapping). Clustering standard errors at the group level. (in STATA one would simply add cl(state) to the regression equation if one analyzes state level variation). Aggregating the data into one pre and one post period. Literally works only if there is only one treatment date. With staggered treatment dates one should adopt the following procedure: Regress Yst on state FE, year FE, and relevant covariates. Obtain residuals from the treatment states only and divide them into 2 groups: pre and post treatment. Then regress the two groups of residuals on a post dummy.

Correct treatment of standard errors sometimes makes the number of groups very small: in the Card and Krueger study the number of groups is only 2. Waldinger (Warwick)

29 / 55

Synthetic Control Methods

In some cases, treatment and potential control groups do not follow parallel trends. ! Standard DD method would lead to biased estimates. Abadie & Gardeazabal (2003) pioneered a synthetic control method when estimating the e¤ects of the terrorist con‡ict in the Basque Country using other Spanish regions as a comparison group. (Card (1990) implicitly used a very similar approach in his Mariel boatlift paper investigating the e¤ect of immigration on employment of natives).

The basic idea behind synthetic controls is that a combination of units often provides a better comparison for the unit exposed to the intervention than any single unit alone.

Waldinger (Warwick)

30 / 55

Abadie & Gardeazabal (2003) - The E¤ect of Terrorism on Growth

They want to evaluate whether Terrorism in the Basque Country had a negative e¤ect on growth. They cannot use a standard DD method because none of the other Spanish regions followed the same time trend as the Basque Country. They therefore take a weighted average of other Spanish regions as a synthetic control group.

Waldinger (Warwick)

31 / 55

The Basque Country is Di¤erent from the Rest of Spain

Waldinger (Warwick)

32 / 55

The Synthetic Control Method

They have J available control regions (the 16 Spanish regions other than the Basque Country). They want to assign weights W = (w1 , ..., wJ )0 a (J x 1) to each region. (wj 0 & ∑ wj = 1; this ensures that there is no extrapolation outside the support of the growth predictors for the control regions). The weights are chosen so that the synthetic Basque country most closely resembles the actual one before terrorism.

Waldinger (Warwick)

33 / 55

The Synthetic Control Method - Details Let X1 be a (K x 1) vector of pre-terrorism of K economic growth predictors (i.e. the values in the previous table: investment ratio, population density, ...) in the Basque Country. Let X0 be a (K x J) matrix which contains the values of the same variables for the J possible control regions. Let V be a diagonal matrix with nonnegative components re‡ecting the relative importance of the di¤erent growth predictors. The vector of weights W* is then chosen to minimize: (X1

X0 W)’V(X1

X0 W)

They choose the matrix V such that the real per capita GDP path for the Basque Country during the 1960s (pre terrorism) is best reproduced by the resulting synthetic Basque Country. Waldinger (Warwick)

34 / 55

The Synthetic Control Method - Details

The optimal weights they get are: Catalonia: 0.8508, Madrid: 0.1492, and all other regions: 0. Alternatively they could have just chosen the weights to reproduce only the pre-terrorism growth path for the Basque country (and not the growth predictors as well. In that case they would have minimized: (Z1

Z0 W)’(Z1

Z0 W)

Z1 is the (10 x 1) vector of pre-terrorism (1960-1969) GDP values for the Basque Country. Z0 is the (10 x J) vector of pre-terrorism (1960-1969) GDP values for the J potential control regions.

Waldinger (Warwick)

35 / 55

The Synthetic Basque Country Looks Similar

Waldinger (Warwick)

36 / 55

Constructing the Counterfactual Using the Weights

Y1 is a (T x 1) vector whose elements are the values of real per capital GDP values for T years in the Basque country. Y0 is a (T x J) matrix whose elements are the values of real per capital GDP values for T years in the control regions. They then constructed the counterfactual GDP (in the absence of terrorism) as: Y*1 =Y0 W*

Waldinger (Warwick)

37 / 55

Growth in the Basque Country with and without Terrorism

Waldinger (Warwick)

38 / 55

Terrorist Activity and Estimated GDP Gap

Waldinger (Warwick)

39 / 55

Combining DD and IV Sometimes combining DD and IV methods can be quite useful. In a recent paper (Waldinger, 2010), I have done that to estimate the e¤ect of faculty quality on the outcomes of PhD students. Estimating the e¤ect of faculty quality on PhD student outcomes is challenging because of: 1 2 3

Selection of good students into good universities. Omitted variables a¤ecting both faculty quality and student outcomes. Measurement error in faculty quality.

I address these issues by using the dismissal of scientists in Nazi Germany as an exogenous shock to faculty quality. The dismissal a¤ected some departments very strongly, while other departments were not a¤ected.

Waldinger (Warwick)

40 / 55

Historical Background Germany was the leading country for scienti…c research at the beginning of the 20th century. Immediately after gaining power in 1933 the new Nazi government dismissed all Jewish and ‘politically unreliable’scholars from the German universities.

Waldinger (Warwick)

41 / 55

Dismissed Professors Across German Universities

Waldinger (Warwick)

42 / 55

Dismissed Professors Across German Universities II

Waldinger (Warwick)

43 / 55

E¤ect of Dismissals on Department Size

Waldinger (Warwick)

44 / 55

E¤ect of Dismissals on Faculty Quality

Waldinger (Warwick)

45 / 55

Panel Data on PhD graduates from German Universities

I obtained a panel dataset of all mathematics PhD students graduating from all German universities between 1923 and 1938 and use the dismissal as exogenous variation in faculty quality. The empirical strategy essentially compares changes in outcomes of PhD students in a¤ected department before and after 1933 to changes in outcomes in una¤ected departments. I investigate the following outcomes: 1 2 3 4

Whether former PhD student publishes dissertation in a top journal. Whether former PhD student ever becomes full professor. # of lifetime citations. Positive lifetime citations.

Waldinger (Warwick)

46 / 55

Reduced Form Graphical Analysis - Publishing Dissertation

Waldinger (Warwick)

47 / 55

Reduced Form Graphical Analysis - Full Professor

Waldinger (Warwick)

48 / 55

Reduced Form Graphical Analysis - Lifetime Citations

Waldinger (Warwick)

49 / 55

Reduced Form Estimates The reduced form of the dismissal e¤ect is essentially a DD estimator. Outcomeidt = β1 + β2 (Dismissal induced Reduction in Faculty Quality)dt + β3 (Dismissal induced increase in Student/Faculty Ratio)dt + β4 Femaleidt + β5 Foreignidt + β6 CohortFEt + β5 DepFEd + εidt

Dismissal induced Reduction in Faculty Quality is 0 until 1933 and equal to the dismissal induced fall in faculty quality after 1933 (and remains 0 in departments without dismissals). Dismissal induced increase in Student/Faculty Ratio is also 0 until 1933 but equal to the dismissal induced increase in student/faculty ratio after 1933 ! essentially a di¤erences-in-di¤erences estimator but with di¤erent treatment intensities. Waldinger (Warwick)

50 / 55

Reduced Form Estimates

Waldinger (Warwick)

51 / 55

Common Robustness Check for Parallel Trend Assumption Only Look at Pre-Period Data and Move Placebo Treatment some Years Back

Here I move a placebo treatment to 1930.

Waldinger (Warwick)

52 / 55

Use Dismissal as IV OLS model to the e¤ect of university quality on PhD student outcomes: Outcomeidt = β1 + β2 (Avg. Faculty Quality)dt 1 + β3 (Student/Faculty Ratio)dt 1 + β4 Femaleidt + β5 Foreignidt + β6 CohortFEt + β7 DepFEd + εidt University quality and student/faculty ratio are endogenous ! use dismissal as IV. 2 Endogenous Variables ! 2 First Stage Regressions: 1

2

Avg. Faculty Qualityidt = γ1 + γ2 (Dismissal induced Reduction in Faculty Quality)dt + γ3 (Dismissal induced increase in Student/Faculty Ratio)dt + γ4 Femaleidt + γ5 Foreignidt + γ6 CohortFEt + γ5 DepFEd + εidt Student/Faculty Ratioidt = δ1 + δ2 (Dismissal induced Reduction in Faculty Quality)dt + δ3 (Dismissal induced increase in Student/Faculty Ratio)dt + δ4 Femaleidt + δ5 Foreignidt + δ6 CohortFEt + δ5 DepFEd + εidt

Waldinger (Warwick)

53 / 55

First Stages

To test for weak instruments one cannot simply look at the …rst stage F-statistics because here we have 2 endogenous regressors and 2 IVs. ! use Cragg-Donald EV statistic here critical value is 7.03.

Waldinger (Warwick)

54 / 55

OLS and IV

Waldinger (Warwick)

55 / 55