Small Samples

August 1st, 2013 / By

I haven’t been surprised by the extensive discussion of the recent paper by Michael Simkovic and Frank McIntyre. The paper deserves attention from many readers. I have been surprised, however, by the number of scholars who endorse the paper–and even scorn skeptics–while acknowledging that they don’t understand the methods underlying Simkovic and McIntyre’s results. An empirical paper is only as good as its method; it’s essential for scholars to engage with that method.

I’ll discuss one methodological issue here: the small sample sizes underlying some of Simkovic and McIntyre’s results. Those sample sizes undercut the strength of some claims that Simkovic and McIntyre make in the current draft of the paper.

What Is the Sample in Simkovic & McIntyre?

Simkovic and McIntyre draw their data from the Survey of Income and Program Participation, a very large survey of U.S. households. The authors, however, don’t use all of the data in the survey; they focus on (a) college graduates whose highest degree is the BA, and (b) JD graduates. SIPP provides a large sample of the former group: Each of the four panels yielded information on 6,238 to 9,359 college graduates, for a total of 31,556 BAs in the sample. (I obtained these numbers, as well as the ones for JD graduates, from Frank McIntyre. He and Mike Simkovic have been very gracious in answering my questions.)

The sample of JD graduates, however, is much smaller. Those totals range from 282 to 409 for the four panels, yielding a total of 1,342 law school graduates. That’s still a substantial sample size, but Simkovic and McIntyre need to examine subsets of the sample to support their analyses. To chart changes in the financial premium generated by a law degree, for example, they need to examine reported incomes for each of the sixteen years in the sample. Those small groupings generate the uncertainty I discuss here.

Confidence Intervals

Statisticians deal with small sample sizes by generating confidence intervals. The confidence interval, sometimes referred to as a “margin of error,” does two things. First, it reminds us that numbers plucked from samples are just estimates; they are not precise reflections of the underlying population. If we collect income data from 1,342 law school graduates, as SIPP did, we can then calculate the means, medians, and other statistics about those incomes. The median income for the 1,342 JDs in the Simkovic & McIntyre study, for example, was $82,400 in 2012 dollars. That doesn’t mean that the median income for all JDs was exactly $82,400; the sample offers an estimate.

Second, the confidence interval gives us a range in which the true number (the one for the underlying population) is likely to fall. The confidence interval for JD income, for example, might be plus-or-minus $5,000. If that were the confidence interval for the median given above, then we could be relatively sure that the true median lay somewhere between $77,400 and $87,400. ($5,000 is a ballpark estimate of the confidence interval, used here for illustrative purposes; it is not the precise interval.)

Small samples generate large confidence intervals, while larger samples produce smaller ones. That makes intuitive sense: the larger our sample, the more precisely it will reflect patterns in the underlying population. We have to exercise particular caution when interpreting small samples, because they are more likely to offer a distorted view of the population we’re trying to understand. Confidence intervals make sure we exercise that caution.

Our brains, unfortunately, are not wired for confidence intervals. When someone reports the estimate from a sample, we tend to focus on that particular reported number–while ignoring the confidence interval. Considering the confidence interval, however, is essential. If a political poll reports that Dewey is leading Truman, 51% to 49%, with a 3% margin of error, then the race is too close to call. Based on this poll, actual support for Dewey could be as low as 48% (3 points lower than the reported value) or as high as 54% (3 points higher than the reported value). Dewey might win decisively, the result might be a squeaker, or Truman might win.

Is the Earnings Premium Cyclical?

Now let’s look at Figure 5 in the Simkovic and McIntyre paper. This figure shows the earnings premium for a JD compared to a BA over a range of 16 years. The shape of the solid line is somewhat cyclical, leading to the Simkovic/McIntyre suggestion that “[t]he law degree earnings premium is cyclical,” together with their observation that recent changes in income levels are due to “ordinary cyclicality.” (pp. 49, 32)

But what lies behind that somewhat cyclical solid line in Figure 5? The line ties together sixteen points, each of which represents the estimated premium for a single year. Each point draws upon the incomes of a few hundred graduates, a relatively small group. Those small sample sizes produce relatively large confidence intervals around each estimate. Simkovic & McIntyre show those confidence intervals with dotted lines above and below the solid line. The estimated premium for 1996, for example, is about .54, but the confidence interval stretches from about .42 to about .66. We can be quite confident that JD graduates, on average, enjoyed a financial premium over BAs in 1996, but we’re much less certain about the size of the premium. The coefficient for this premium could be as low as .42 or as high as .66.

So what? As long as the premiums were positive, how much do we care about their size? Remember that Simkovic and McIntyre suggest that the earnings premium is cyclical. They rely on that cyclicality, in turn, to suggest that any recent downturns in earnings are part of an ordinary cycle.

The results reported in Figure 5, however, cannot confirm cyclicality. The specific estimates look cyclical, but the confidence intervals urge caution. Figure 5 shows those intervals as lines that parallel the estimated values, but the confidence intervals belong to each point–not to the line as a whole. The real premium for each year most likely falls somewhere within the confidence interval for each year, but we can’t say where.

Simkovic and McIntyre could supplement their analysis by testing the relationship among these estimates; it’s possible that, statistically, they could reject the hypothesis that the earnings premium was stable. They might even be able to establish cyclicality with more certainty. We can’t reach those conclusions from Figure 5 and the currently reported analyses, however; the confidence intervals are too wide for certain interpretation. All of the internet discussion of the cyclicality of the earnings premium has been premature.

Recent Graduates

Similar problems affect Simkovic and McIntyre’s statements about recent graduates. In Figure 6, they depict the earnings premium for law school graduates aged 25-29 in four different time periods. The gray bars show the estimated premium for each time period, with the vertical lines indicating the confidence interval. Notice how long those confidence intervals are: The interval for 1996-1999 stretches from about 0.04 through about 0.54. The other periods show similarly extended intervals.

Those large confidence intervals reflect very small sample sizes. The 1996 panel offered income information on just sixteen JD graduates aged 25-29; the 2001 panel included twenty-five of those graduates; the 2004 panel, seventeen; and the 2008 panel twenty-six graduates. With such small samples, we have very little confidence (in both the every day and statistical senses) that the premium estimates are correct.

It seems likely that the premium was positive throughout this period–although the very small sample sizes and possible bimodality of incomes could undermine even that conclusion. We can’t, however, say much more than that. If we take confidence intervals into account, the premium might have declined steadily throughout this period, from about 0.54 in the earliest period to 0.33 in the most recent one. Or it might have risen, from a very modest 0.05 in the first period to a robust 0.80 more recently. Again, we just don’t know.

It would be useful for Simkovic and McIntyre to acknowledge the small number of recent law school graduates in their sample; that would help ground readers in the data. When writing a paper like this, especially for an interdisciplinary audience, it’s difficult to anticipate what kind of information the audience may need. I’m surprised that so many legal scholars enthusiastically endorsed these results without noting the large confidence intervals.

Onward

There has been much talk during the last two weeks about Kardashians, charlatans, and even the Mafia. I’m not sure any legal academic leads quite that exciting a life; I know I don’t. As a professor who has taught Law and Social Science, I think the critics of the Simkovic/McIntyre paper raised many good questions. Empirical analyses need testing, and it is especially important to examine the assumptions that lie behind a quantitative study.

The questions weren’t all good. Nor, I’m afraid, were all of the questions I’ve heard about other papers over the years. That’s the nature of academic debate and refining hypotheses: sometimes we have to ask questions just to figure out what we don’t know.

Endorsements of the paper, similarly, spanned a spectrum. Some were thoughtful, others seemed reflexive. I was disappointed at how few of the paper’s supporters engaged fully in the paper’s method, asking questions like the ones I have raised about sample size and confidence intervals.

I hope to write a bit more on the Simkovic and McIntyre paper; there are more questions to raise about their conclusions. I may also try to offer some summaries of other research that has been done on the career paths of law school graduates and lawyers. We don’t have nearly enough research in the field, but there are some other studies worth knowing.

,

About Law School Cafe

Cafe Manager & Co-Moderator
Deborah J. Merritt

Cafe Designer & Co-Moderator
Kyle McEntee

ABA Journal Blawg 100 HonoreeLaw School Cafe is a resource for anyone interested in changes in legal education and the legal profession.

Around the Cafe

Subscribe

Enter your email address to receive notifications of new posts by email.

Categories

Recent Comments

Recent Posts

Monthly Archives

Participate

Have something you think our audience would like to hear about? Interested in writing one or more guest posts? Send an email to the cafe manager at merritt52@gmail.com. We are interested in publishing posts from practitioners, students, faculty, and industry professionals.

Past and Present Guests