Law School Cafe

Equating, Scaling, and Civil Procedure

April 16th, 2015 / By Deborah J. Merritt

Still wondering about the February bar results? I continue that discussion here. As explained in my previous post, NCBE premiered its new Multistate Bar Exam (MBE) in February. That exam covers seven subjects, rather than the six tested on the MBE for more than four decades. Given the type of knowledge tested by the MBE, there is little doubt that the new exam is harder than the old one.

If you have any doubt about that fact, try this experiment: Tell any group of third-year students that the bar examiners have decided to offer them a choice. They may study for and take a version of the MBE covering the original six subjects, or they may choose a version that covers those subjects plus Civil Procedure. Which version do they choose?

After the students have eagerly indicated their preference for the six-subject test, you will have to apologize profusely to them. The examiners are not giving them a choice; they must take the harder seven-subject test.

But can you at least reassure the students that NCBE will account for this increased difficulty when it scales scores? After all, NCBE uses a process of equating and scaling scores that is designed to produce scores with a constant meaning over time. A scaled score of 136 in 2015 is supposed to represent the same level of achievement as a scaled score of 136 in 2012. Is that still true, despite the increased difficulty of the test?

Unfortunately, no. Equating works only for two versions of the same exam. As the word “equating” suggests, the process assumes that the exam drafters attempted to test the same knowledge on both versions of the exam. Equating can account for inadvertent fluctuations in difficulty that arise from constructing new questions that test the same knowledge. It cannot, however, account for changes in the content or scope of an exam.

This distinction is widely recognized in the testing literature–I cite numerous sources at the end of this post. It appears, however, that NCBE has attempted to “equate” the scores of the new MBE (with seven subjects) to older versions of the exam (with just six subjects). This treated the February 2015 examinees unfairly, leading to lower scores and pass rates.

To understand the problem, let’s first review the process of equating and scaling.

Equating

First, remember why NCBE equates exams. To avoid security breaches, NCBE must produce a different version of the MBE every February and July. Testing experts call these different versions “forms” of the test. For each of the MBE forms, the designers attempt to create questions that impose the same range of difficulty. Inevitably, however, some forms are harder than others. It would be unfair for examinees one year to get lower scores than examinees the next year, simply because they took a harder form of the test. Equating addresses this problem.

The process of equating begins with a set of “control” questions or “common items.” These are questions that appear on two forms of the same exam. The February 2015 MBE, for example, included a subset of questions that had also appeared on some earlier exam. For this discussion, let’s assume that there were 30 of these common items and 160 new questions that counted toward each examinee’s score. (Each MBE also includes 10 experimental questions that do not count toward the test-taker’s score but that help NCBE assess items for future use.)

When NCBE receives answer sheets from each version of the MBE, it is able to assess the examinees’ performance on the common items and new items. Let’s suppose that, on average, earlier examinees got 25 of the 30 common items correct. If the February 2015 test-takers averaged only 20 correct answers to those common items, NCBE would know that those test-takers were less able than previous examinees. That information would then help NCBE evaluate the February test-takers’ performance on the new test items. If the February examinees also performed poorly on those items, NCBE could conclude that the low scores were due to the test-takers’ abilities rather than to a particularly hard version of the test.

Conversely, if the February test-takers did very well on the new items–while faring poorly on the common ones–NCBE would conclude that the new items were easier than questions on earlier tests. The February examinees racked up points on those questions, not because they were better prepared than earlier test-takers, but because the questions were too easy.

The actual equating process is more complicated than this. NCBE, for example, can account for the difficulty of individual questions rather than just the overall difficulty of the common and new items. The heart of equating, however, lies in this use of “common items” to compare performance over time.

Scaling

Once NCBE has compared the most recent batch of exam-takers with earlier examinees, it converts the current raw scores to scaled ones. Think of the scaled scores as a rigid yardstick; these scores have the same meaning over time. 18 inches this year is the same as 18 inches last year. In the same way, a scaled score of 136 has the same meaning this year as last year.

How does NCBE translate raw points to scaled scores? The translation depends upon the results of equating. If a group of test-takers performs well on the common items, but not so well on the new questions, the equating process suggests that the new questions were harder than the ones on previous versions of the test. NCBE will “scale up” the raw scores for this group of exam takers to make them comparable to scores earned on earlier versions of the test.

Conversely, if examinees perform well on new questions but poorly on the common items, the equating process will suggest that the new questions were easier than ones on previous versions of the test. NCBE will then scale down the raw scores for this group of examinees. In the end, the scaled scores will account for small differences in test difficulty across otherwise similar forms.

Changing the Test

Equating and scaling work well for test forms that are designed to be as similar as possible. The processes break down, however, when test content changes. You can see this by thinking about the data that NCBE had available for equating the February 2015 bar exam. It had a set of common items drawn from earlier tests; these would have covered the six original subjects. It also had answers to 190 new items; these would have included both the original subjects and the new one (Civil Procedure).

With these data, NCBE could make two comparisons:

1. It could compare performance on the common items. It undoubtedly found that the February 2015 test-takers performed less well than previous test-takers on these items. That’s a predictable result of having a seventh subject to study. This year’s examinees spread their preparation among seven subjects rather than six. Their mastery of each subject was somewhat lower, and they would have performed less well on the common items testing those subjects.

2. NCBE could also compare performance on the new Civil Procedure items with performance on old and new items in other subjects. NCBE won’t release those comparisons, because it no longer discloses raw scores for subject areas. I predict, however, that performance on Civil Procedure items was the same as on Evidence, Property, or other subjects. Why? Because Civil Procedure is not intrinsically harder than these other subjects, and the examinees studied all seven subjects.

Neither of these comparisons, however, would address the key change in the MBE: Examinees had to prepare seven subjects rather than six. As my previous post suggested, this isn’t just a matter of taking all seven subjects in law school and remembering key concepts for the MBE. Because the MBE is a closed-book exam that requires recall of detailed rules, examinees devote 10 weeks of intense study to this exam. They don’t have more than 10 weeks, because they’re occupied with law school classes, extracurricular activities, and part-time jobs before mid-May or mid-December.

There’s only so much material you can cram into memory during ten weeks. If you try to memorize rules from seven subjects, rather than just six, some rules from each subject will fall by the wayside.

When Equating Doesn’t Work

Equating is not possible for a test like the new MBE, which has changed significantly in content and scope. The test places new demands on examinees, and equating cannot account for those demands. The testing literature is clear that, under these circumstances, equating produces misleading results. As Robert L. Brennan, a distinguished testing expert, wrote in a prominent guide: “When substantial changes in test specifications occur, either scores should be reported on a new scale or a clear statement should be provided to alert users that the scores are not directly comparable with those on earlier versions of the test.” (See p. 174 of Linking and Aligning Scores and Scales, cited more fully below.)

“Substantial changes” is one of those phrases that lawyers love to debate. The hypothetical described at the beginning of this post, however, seems like a common-sense way to identify a “substantial change.” If the vast majority of test-takers would prefer one version of a test over a second one, there is a substantial difference between the two.

As Brennan acknowledges in the chapter I quote above, test administrators dislike re-scaling an exam. Re-scaling is both costly and time-consuming. It can also discomfort test-takers and others who use those scores, because they are uncertain how to compare new scores to old ones. But when a test changes, as the MBE did, re-scaling should take the place of equating.

The second best option, as Brennan also notes, is to provide a “clear statement” to “alert users that the scores are not directly comparable with those on earlier versions of the test.” This is what NCBE should do. By claiming that it has equated the February 2015 results to earlier test results, and that the resulting scaled scores represent a uniform level of achievement, NCBE is failing to give test-takers, bar examiners, and the public the information they need to interpret these scores.

The February 2015 MBE was not the same as previous versions of the test, it cannot be properly equated to those tests, and the resulting scaled scores represent a different level of achievement. The lower scaled scores on the February 2015 MBE reflect, at least in part, a harder test. To the extent that the test-takers also differed from previous examinees, it is impossible to separate that variation from the difference in the tests themselves.

Conclusion

Equating was designed to detect small, unintended differences in test difficulty. It is not appropriate for comparing a revised test to previous versions of that test. In my next post on this issue, I will discuss further ramifications of the recent change in the MBE. Meanwhile, here is an annotated list of sources related to equating:

Michael T. Kane & Andrew Mroch, Equating the MBE, The Bar Examiner, Aug. 2005, at 22. This article, published in NCBE’s magazine, offers an overview of equating and scaling for the MBE.

Neil J. Dorans, et al., Linking and Aligning Scores and Scales (2007). This is one of the classic works on equating and scaling. Chapters 7-9 deal specifically with the problem of test changes. Although I’ve linked to the Amazon page, most university libraries should have this book. My library has the book in electronic form so that it can be read online.

Michael J. Kolen & Robert L. Brennan, Test Equating, Scaling, and Linking:
Methods and Practices (3d ed. 2014). This is another standard reference work in the field. Once again, my library has a copy online; check for a similar ebook at your institution.

CCSSO, A Practitioner’s Introduction to Equating. This guide was prepared by the Council of Chief State School Officers to help teachers, principals, and superintendents understand the equating of high-stakes exams. It is written for educated lay people, rather than experts, so it offers a good introduction. The source is publicly available at the link.

Data, Teaching, Bar Exam, Equating, MBE, NCBE

Equating, Scaling, and Civil Procedure

About Law School Cafe

Around the Cafe

Subscribe

Categories

Recent Comments

Recent Posts

Monthly Archives

Participate

Past and Present Guests

Equating, Scaling, and Civil Procedure

Share:

About Law School Cafe

Around the Cafe

Subscribe

Categories

Recent Comments

Recent Posts

Monthly Archives

Participate

Past and Present Guests