A Better Bar Exam—Look to Upper Canada?

July 25th, 2017 / By

 

Today, tens of thousands of aspiring lawyers across the United States sit for the bar exam in a ritual that should be designed to identify who has the ability to be a competent new lawyer. Yet a growing chorus of critics questions whether the current knowledge-focused exam is the best way to draw that line and protect the public. As Professor Deborah Merritt has noted, “On the one hand, the exam forces applicants to memorize hundreds of black-letter rules that they will never use in practice. On the other hand, the exam licenses lawyers who don’t know how to interview a client, compose an engagement letter, or negotiate with an adversary.”

For years, the response to critiques of the bar exam has been, in effect: “It’s not perfect, but it’s the best we can do if we want a psychometrically defensible exam.” The Law Society of Upper Canada (LSUC), which governs the law licensing process for the province of Ontario, developed a licensing exam that calls that defense into question.

Overview of Law Society of Upper Canada Licensing Exam

The LSUC uses a 7-hour multiple-choice test consisting of 220 to 240 multiple-choice questions to test a wide range of competencies. For barristers (the litigating branch of the profession), that includes ethical and professional responsibilities; knowledge of the law; establishing and maintaining the lawyer-client relationship; problem/issue Identification, analysis, and assessment; alternative dispute resolution; litigation process; and practice management issues. A 2004 report explains how the LSUC identified key competencies and developed a licensing test based upon them.

Unlike the US exams, the LSUC exam is open-book, so it tests the ability to find and process relevant information rather than the ability to memorize rules. Most important, it tests a wider range of lawyering competencies than US exams, and it does so in the context of how lawyers address real client problems rather than as abstract analytical problems.

Below, we discuss how these differences address many of the critiques of the current US bar exams and make the LSUC exam an effective test of new lawyer competence. We also provide sample questions from both the LSUC and the US exam.

Open-Book Exam

Like all bar licensing exams in the United States (with the New Hampshire Daniel Webster Scholars Program as the sole exception), the LSUC exam is a pencil-and-paper timed exam. However, unlike any United States exam, including the Uniform Bar Exam, the LSUC licensing exam is open book.

The LSUC gives all candidates online access to materials that address all competencies the exam tests and encourages candidates to bring those materials to the exam. To help them navigate the materials, candidates are urged to create and bring to the exam tabbing or color-coding systems, short summaries of selected topics, index cards, and other study aids.

Lawyering is an open-book profession. Indeed, it might be considered malpractice to answer a legal problem without checking sources! As we have previously noted, good lawyers “…know enough to ask the right questions, figure out how to approach the problem and research the law, or know enough to recognize that the question is outside of their expertise and should be referred to a lawyer more well-versed in that area of law.” Actually referring a problem to someone else isn’t a feasible choice in the context of the bar exam, of course, but accessing the relevant knowledge base is.

The open-book LSUC exam tests a key lawyering competency untested by the US exam—the ability to find the appropriate legal information—and it addresses a significant critique of the current U.S. exams: that they test memorization of legal rules, a skill unrelated to actual law practice.

Candidates for the bar in Canada no doubt pore over the written material to learn the specifics, just as US students do, but they are also able to rely on that material to remind them of the rules as they answer the questions, just as a lawyer would do.

Testing More Lawyering Competencies

Like all bar exams in the US, the LSUC exam assesses legal knowledge and analytical skills. However, unlike US bar exams, the LSUC exam also assesses competencies that relate to fundamental lawyering skills beyond the ability to analyze legal doctrine.

As Professor Merritt has noted, studies conducted by the National Conference of Bar Examiners [NCBE] and the Institute for the Advancement of the American Legal System confirm the gaps between the competencies new lawyers need and what the current US bar exams test, citing the absence of essential lawyering competencies such as interviewing principles; client communication; information gathering; case analysis and planning; alternative dispute resolution; negotiation; the litigation process; and practice management issues.

The NCBE has justified their absence by maintaining that such skills cannot be tested via multiple-choice questions. However, as illustrated below, the LSUC exam does just that, while also raising professional responsibility questions as part of the fact patterns testing those competencies.

Testing Competencies in Context of How Lawyers Use Information

The LSUC exam attempts to capture the daily work of lawyers. Rather than test knowledge of pure doctrine to predict a result as the US exams tend to do, the LSUC used Bloom’s taxonomy to develop questions that ask how knowledge of the law informs the proper representation of the client.

The LSUC questions seek information such as: what a client needs to know; how a lawyer would respond to a tribunal if asked “x”; where a lawyer would look to find the relevant information to determine the steps to be taken; and what issues a lawyer should research. That testing methodology replicates how lawyers use the law in practice much more effectively than do the US exams.

The LSUC exam format and content addresses a significant critique of US bar exams—that those exams ask questions that are unrelated to how lawyers use legal doctrine in practice and that the US exams fail to assess many of the key skills lawyers need.

Sample Questions from the LSUC and the MBE

Here is a sampling of LSUC questions that test for lawyering skills in a manner not addressed in US exams. These and other sample questions are available on the Law Society of Upper Canada’s website:

  1. Gertrude has come to Roberta, a lawyer, to draw up a power of attorney for personal care. Gertrude will be undergoing major surgery and wants to ensure that her wishes are fulfilled should anything go wrong. Gertrude’s husband is quite elderly and not in good health, so she may want her two adult daughters to be the attorneys. The religion of one of her daughters requires adherents to protect human life at all costs. Gertrude’s other daughter is struggling financially. What further information should Roberta obtain from Gertrude?
(a) The state of her daughters’ marriages.
(b) The state of Gertrude’s marriage.
(c) Gertrude’s personal care wishes.
(d) Gertrude’s health status.
  1. Tracy was charged with Assault Causing Bodily Harm. She has instructed her lawyer, Kurt, to get her the fastest jury trial date possible. The Crown has not requested a preliminary inquiry. Kurt does not believe that a preliminary inquiry is necessary because of the quality of the disclosure. How can Kurt get Tracy the fastest trial date?
(a) Waive Tracy’s right to a preliminary inquiry and set the trial date.
(b) Bring an 11(b) Application to force a quick jury trial date.
(c) Conduct the preliminary inquiry quickly and set down the jury trial.
(d) Elect on Tracy’s behalf trial by a Provincial Court Judge.
  1. Peyton, a real estate lawyer, is acting for a married couple, Lara and Chris, on the purchase of their first home. Lara’s mother will be lending the couple some money and would like to register a mortgage on title. Lara and Chris have asked Peyton to prepare and register the mortgage documentation. They are agreeable to Peyton acting for the three of them. Chris’ brother is also lending them money but Lara and Chris have asked Peyton not to tell Lara’s mother this fact. Should Peyton act?
(a) Yes, because the parties consented.
(b) No, because there is a conflict of interest.
(c) Yes, because the parties are related.
(d) No, because she should not act on both the purchase and the mortgage.
  1. Prior to the real estate closing, in which jurisdiction should the purchaser’s lawyer search executions?
(a) Where the seller previously resided.
(b) Where the seller’s real property is located.
(c) Where the seller’s personal property is located.
(d) Where the seller is moving.

[These questions test the applicant’s understanding of: the information a lawyer needs from the client or other sources, strategic and effective use of trial process, ethical responsibilities, and knowledge of the real property registration system, all in the service of proper representation of a client. Correct answers: c, a, b, b.]

Compare these questions to typical MBE questions, which focus on applying memorized elements of legal rules to arrive at a conclusion about which party likely prevails. [More available here.]

  1. A woman borrowed $800,000 from a bank and gave the bank a note for that amount secured by a mortgage on her farm. Several years later, at a time when the woman still owed the bank $750,000 on the mortgage loan, she sold the farm to a man for $900,000. The man paid the woman $150,000 in cash and specifically assumed the mortgage note. The bank received notice of this transaction and elected not to exercise the optional due-on-sale clause in the mortgage. Without informing the man, the bank later released the woman from any further personal liability on the note. After he had owned the farm for a number of years, the man defaulted on the loan. The bank properly accelerated the loan, and the farm was eventually sold at a foreclosure sale for $500,000. Because there was still $600,000 owing on the note, the bank sued the man for the $100,000 deficiency. Is the man liable to the bank for the deficiency?
(a) No, because the woman would have still been primarily liable for payment, but the bank had released her from personal liability.
(b) No, because the bank’s release of the woman from personal liability also released the man.
(c) Yes, because the bank’s release of the woman constituted a clogging of the equity of redemption.
(d) Yes, because the man’s personal liability on the note was not affected by the bank’s release of the woman.
  1. A man arranged to have custom-made wooden shutters installed on the windows of his home. The contractor who installed the shutters did so by drilling screws and brackets into the exterior window frames and the shutters. The man later agreed to sell the home to a buyer. The sales agreement did not mention the shutters, the buyer did not inquire about them, and the buyer did not conduct a walkthrough inspection of the home before the closing. The man conveyed the home to the buyer by warranty deed. After the sale closed, the buyer noticed that the shutters and brackets had been removed from the home and that the window frames had been repaired and repainted. The buyer demanded that the man return the shutters and pay the cost of reinstallation, claiming that the shutters had been conveyed to him with the sale of the home. When the man refused, the buyer sued. Is the buyer likely to prevail?
(a) No, because the sales agreement did not mention the shutters.
(b) No, because the window frames had been repaired and repainted after removal of the shutters.
(c) Yes, because the shutters had become fixtures.
(d) Yes, because the man gave the buyer a warranty deed and the absence of the shutters violated a covenant of the deed

[Correct answers: d, c]

We Can Build a Better Bar Exam

As illustrated above, the LSUC exam shows that it is possible to test a far wider range of competencies than those tested in US bar exams.

Does the LSUC exam address all of the flaws of US bar exams? No—one problem that persists for both the LSUC and US exams is the requirement for rapid answers (less than 2 minutes per question), which rewards an ability and practice not associated with effective lawyering.

Does the LSUC exam fully address experiential skills? No—LSUC also requires applicants to “article” (a kind of apprenticeship with a law firm) or participate in the Law Practice Program (a four-month training course and a four-month work placement).

But the exam does what the NCBE has told us cannot be done. It is a psychometrically valid exam that assesses skills far beyond the competencies tested on US bar exams: skills such as interviewing, negotiating, counseling, fact investigation, and client-centered advocacy. And its emphasis is on lawyering competencies—using doctrine in the context of client problems.

Eileen Kaufman is Professor of Law at the Touro College, Jacob D. Fuchsberg Law Center.

Andi Curcio is Professor of Law at the Georgia State University College of Law.

Carol Chomsky is Professor of Law at the University of Minnesota Law School.

, No Comments Yet

The Latest Issue of the Bar Examiner

March 15th, 2016 / By

The National Conference of Bar Examiners (NCBE) has released the March 2016 issue of their quarterly publication, the Bar Examiner. The issue includes annual statistics about bar passage rates, as well as several other articles. For those who lack time to read the issue, here are a few highlights:

Bar-Academy Relationships

In his Letter from the Chair, Judge Thomas Bice sounds a disappointingly hostile note towards law students. Quoting Justice Edward Chavez of the New Mexico Supreme Court, Bice suggests that “those who attend law school have come to have a sense of entitlement to the practice of law simply as a result of their education.” Against this sentiment, he continues, bar examiners “are truly the gatekeepers of this profession.” (P. 2)

NCBE President Erica Moeser, who has recently tangled with law school deans, offers a more conciliatory tone on her President’s Page. After noting the importance of the legal profession and the challenges facing law schools, she concludes: “In many ways, we are all in this together, and certainly all of us wish for better times.” (P. 5)

(more…)

, View Comment (1)

ExamSoft: New Evidence from NCBE

July 14th, 2015 / By

Almost a year has passed since the ill-fated July 2014 bar exam. As we approach that anniversary, the National Conference of Bar Examiners (NCBE) has offered a welcome update.

Mark Albanese, the organization’s Director of Testing and Research, recently acknowledged that: “The software used by many jurisdictions to allow their examinees to complete the written portion of the bar examination by computer experienced a glitch that could have stressed and panicked some examinees on the night before the MBE was administered.” This “glitch,” Albanese concedes, “cannot be ruled out as a contributing factor” to the decline in MBE scores and pass rates.

More important, Albanese offers compelling new evidence that ExamSoft played a major role in depressing July 2014 exam scores. He resists that conclusion, but I think the evidence speaks for itself. Let’s take a look at the new evidence, along with why this still matters.

LSAT Scores and MBE Scores

Albanese obtained the national mean LSAT score for law students who entered law school each year from 2000 through 2011. He then plotted those means against the average MBE scores earned by the same students three years later. The graph (Figure 10 on p. 43 of his article) looks like this:

2000-2011_LSAT-Mean-Score

As the black dots show, there is a strong linear relationship between scores on the LSAT and those for the MBE. Entering law school classes with high LSAT scores produce high MBE scores after graduation. For the classes that began law school from 2000 through 2010, the correlation is 0.89–a very high value.

Now look at the triangle toward the lower right-hand side of the graph. That symbol represents the relationship between mean LSAT score and mean MBE score for the class that entered law school in fall 2011 and took the bar exam in July 2014. As Albanese admits, this dot is way off the line: “it shows a mean MBE score that is much lower than that of other points with similar mean LSAT scores.”

Based on the historical relationship between LSAT and MBE scores, Albanese calculates that the Class of 2014 should have achieved a mean MBE score of 144.0. Instead, the mean was just 141.4, producing elevated bar failure rates across the country. As Albanese acknowledges, there was a clear “disruption in the relationship between the matriculant LSAT scores and MBE scores with the July 2014 examination.”

Professors Jerry Organ and Derek Muller made similar points last fall, but they were handicapped by their lack of access to LSAT means. The ABA releases only median scores, and those numbers are harder to compile into the type of persuasive graph that Albanese produced. Organ and Muller made an excellent case with their data–one that NCBE should have heeded–but they couldn’t be as precise as Albanese.

But now we have NCBE’s Director of Testing and Research admitting that “something happened” with the Class of 2014 “that disrupted the previous relationship between MBE scores and LSAT scores.” What could it have been?

Apprehending a Suspect

Albanese suggests a single culprit for the significant disruption shown in his graph: He states that the Law School Admission Council (LSAC) changed the manner in which it reported scores for students who take the LSAT more than once. Starting with the class that entered in fall 2011, Albanese writes, LSAC used the high score for each of those test takers; before then, it used the average scores.

At first blush, this seems like a possible explanation. On average, students who retake the LSAT improve their scores. Counting only high scores for these test takers, therefore, would increase the mean score for the entering class. National averages calculated using high scores for repeaters aren’t directly comparable to those computed with average scores.

But there is a problem with Albanese’s rationale: He is wrong about when LSAC switched its method for calculating national means. That occurred for the class that matriculated in fall 2010, not the one that entered in fall 2011. LSAC’s National Decision Profiles, which report these national means, state that quite clearly.

Albanese’s suspect, in other words, has an alibi. The change in LSAT reporting methods occurred a year earlier; it doesn’t explain the aberrational results on the July 2014 MBE. If we accept LSAT scores as a measure of ability, as NCBE has urged throughout this discussion, then the Class of 2014 should have received higher scores on the MBE. Why was their mean score so much lower than their LSAT test scores predicted?

NCBE has vigorously asserted that the test was not to blame; they prepared, vetted, and scored the July 2014 MBE using the same professional methods employed in the past. I believe them. Neither the test content nor the scoring algorithms are at fault. But we can’t ignore the evidence of Albanese’s graph: something untoward happened to the Class of 2014’s MBE scores.

The Villain

The villain almost certainly is the suspect who appeared at the very beginning of the story: ExamSoft. Anyone who has sat through the bar exam, who has talked to test-takers during those days, or who has watched students struggle to upload a single law school exam knows this.

I still remember the stress of the bar exam, although 35 years have passed. I’m pretty good at legal writing and analysis, but the exam wore me out. Few other experiences have taxed me as much mentally and physically as the bar exam.

For a majority of July 2014 test-takers, the ExamSoft “glitch” imposed hours of stress and sleeplessness in the middle of an already exhausting process. The disruption, moreover, occurred during the one period when examinees could recoup their energy and review material for the next day’s exam. It’s hard for me to imagine that ExamSoft’s failure didn’t reduce test-taker performance.

The numbers back up that claim. As I showed in a previous post, bar passage rates dropped significantly more in states affected directly by the software crash than in other states. The difference was large enough that there is less than a 0.001 probability that it occurred by chance. If we combine that fact with Albanese’s graph, what more evidence do we need?

Aiding and Abetting

ExamSoft was the original culprit, but NCBE aided and abetted the harm. The testing literature is clear that exams can be equated only if both the content and the test conditions are comparable. The testing conditions on July 29-30, 2014, were not the same as in previous years. The test-takers were stressed, overtired, and under-prepared because of ExamSoft’s disruption of the testing procedure.

NCBE was not responsible for the disruption, but it should have refrained from equating results produced under the 2014 conditions with those from previous years. Instead, it should have flagged this issue for state bar examiners and consulted with them about how to use scores that significantly understated the ability of test takers. The information was especially important for states that had not used ExamSoft, but whose examinees suffered repercussions through NCBE’s scaling process.

Given the strong relationship between LSAT scores and MBE performance, NCBE might even have used that correlation to generate a second set of scaled scores correcting for the ExamSoft disruption. States could have chosen which set of scores to use–or could have decided to make a one-time adjustment in the cut score. However states decided to respond, they would have understood the likely effect of the ExamSoft crisis on their examinees.

Instead, we have endured a year of obfuscation–and of blaming the Class of 2014 for being “less able” than previous classes. Albanese’s graph shows conclusively that diminished ability doesn’t explain the abnormal dip in July 2014 MBE scores. Our best predictor of that ability, scores earned on the LSAT, refutes that claim.

Lessons for the Future

It’s time to put the ExamSoft debacle to rest–although I hope we can do so with an even more candid acknowledgement from NCBE that the software crash was the primary culprit in this story. The test-takers deserve that affirmation.

At the same time, we need to reflect on what we can learn from this experience. In particular, why didn’t NCBE take the ExamSoft crash more seriously? Why didn’t NCBE and state bar examiners proactively address the impact of a serious flaw in exam administration? The equating and scaling process is designed to assure that exam takers do not suffer by taking one exam administration rather than another. The July 2014 examinees clearly did suffer by taking the exam during the ExamSoft disruption. Why didn’t NCBE and the bar examiners work to address that imbalance, rather than extend it?

I see three reasons. First, NCBE staff seem removed from the experience of bar exam takers. The psychometricians design and assess tests, but they are not lawyers. The president is a lawyer, but she was admitted through Wisconsin’s diploma privilege. NCBE staff may have tested bar questions and formats, but they lack firsthand knowledge of the test-taking experience. This may have affected their ability to grasp the impact of ExamSoft’s disruption.

Second, NCBE and law schools have competing interests. Law schools have economic and reputational interests in seeing their graduates pass the bar; NCBE has economic and reputational interests in disclaiming any disruption in the testing process. The bar examiners who work with NCBE have their own economic and reputational interests: reducing competition from new lawyers. Self interest is nothing to be ashamed of in a market economy; nor is self interest incompatible with working for the public good.

The problem with the bar exam, however, is that these parties (NCBE and bar examiners on one side, law schools on the other) tend to talk past one another. Rather than gain insights from each other, the parties often communicate after decisions are made. Each seems to believe that it protects the public interest, while the other is driven purely by self interest.

This stand-off hurts law school graduates, who get lost in the middle. NCBE and law schools need to start listening to one another; both sides have valid points to make. The ExamSoft crisis should have prompted immediate conversations between the groups. Law schools knew how the crash had affected their examinees; the cries of distress were loud and clear. NCBE knew, as Albanese’s graph shows, that MBE scores were far below outcomes predicted by the class’s LSAT scores. Discussion might have generated wisdom.

Finally, the ExamSoft debacle demonstrates that we need better coordination–and accountability–in the administration and scoring of bar exams. When law schools questioned the July 2014 results, NCBE’s president disclaimed any responsibility for exam administration. That’s technically true, but exam administration affects equating and scaling. Bar examiners, meanwhile, accepted NCBE’s results without question; they assumed that NCBE had taken all proper factors (including any effect from a flawed administration) into account.

We can’t rewind administration of the July 2014 bar exam; nor can we redo the scoring. But we can create a better system for exam administration going forward, one that includes more input from law schools (who have valid perspectives that NCBE and state bar examiners lack) as well as more coordination between NCBE and bar examiners on administration issues.

, View Comments (4)

On the Bar Exam, My Graduates Are Your Graduates

May 12th, 2015 / By

It’s no secret that the qualifications of law students have declined since 2010. As applications fell, schools started dipping further into their applicant pools. LSAT scores offer one measure of this trend. Jerry Organ has summarized changes in those scores for the entering classes of 2010 through 2014. Based on Organ’s data, average LSAT scores for accredited law schools fell:

* 2.3 points at the 75th percentile
* 2.7 points at the median
* 3.4 points at the 25th percentile

Among other problems, this trend raises significant concerns about bar passage rates. Indeed, the President of the National Conference of Bar Examiners (NCBE) blamed the July 2014 drop in MBE scores on the fact that the Class of 2014 (which entered law school in 2011) was “less able” than earlier classes. I have suggested that the ExamSoft debacle contributed substantially to the score decline, but here I focus on the future. What will the drop in student quality mean for the bar exam?

Falling Bar Passage Rates

Most observers agree that bar passage rates are likely to fall over the coming years. Indeed, they may have already started that decline with the July 2014 and February 2015 exam administrations. I believe that the ExamSoft crisis and MBE content changes account for much of those slumps, but there is little doubt that bar passage rates will remain depressed and continue to fall.

A substantial part of the decline will stem from examinees with very low LSAT scores. Prior studies suggest that students with low scores (especially those with scores below 145) are at high risk of failing the bar. As the number of low-LSAT students increases at law schools, the number (and percentage) of bar failures probably will mount as well.

The impact, however, will not be limited just to those students. As I explained in a previous post, NCBE’s process of equating and scaling the MBE can drag down scores for all examinees when the group as a whole performs poorly. This occurs because the lower overall performance prompts NCBE to “scale down” MBE scores for all test-takers. Think of this as a kind of “reverse halo” effect, although it’s one that depends on mathematical formulas rather than subjective impressions.

State bar examiners, unfortunately, amplify the reverse-halo effect by the way in which they scale essay and MPT answers to MBE scores. I explain this process in a previous post. In brief, the MBE performance of each state’s examinees sets the curve for scoring other portions of the bar exam within that state. If Ohio’s 2015 examinees perform less well on the MBE than the 2013 group did, then the 2015 examinees will get lower essay and MPT scores as well.

The law schools that have admitted high-risk students, in sum, are not the only schools that will suffer lower bar passage rates. The processes of equating and scaling will depress scores for other examinees in the pool. The reductions may be small, but they will be enough to shift examinees near the passing score from one side to another. Test-takers who might have passed the bar in 2013 will not pass in 2015. In addition to taking a harder exam (i.e. a 7-subject MBE), these unfortunate examinees will suffer from the reverse-halo effect describe above.

On the bar exam, the performance of my graduates affects outcomes for your graduates. If my graduates perform less well than in previous years, fewer of your graduates will pass: my graduates are your graduates in this sense. The growing number of low-LSAT students attending Thomas Cooley and other schools will also affect the fate of our graduates. On the bar exam, Cooley’s graduates are our graduates.

Won’t NCBE Fix This?

NCBE should address this problem, but they have shown no signs of doing so. The equating/scaling process used by NCBE assumes that test-takers retain roughly the same proficiency from year to year. That assumption undergirds the equating process. Psychometricians recognize that, as abilities shift, equating becomes less reliable.* The recent decline in LSAT scores suggests that the proficiency of bar examinees will change markedly over the next few years. Under those circumstances, NCBE should not attempt to equate and scale raw scores; doing so risks the type of reverse-halo effect I have described.

The problem is particularly acute with the bar exam because scaling occurs at several points in the process. As proficiency declines, equating and scaling of MBE performance will inappropriately depress those scores. Those scores, in turn, will lower scores on the essay and MPT portions of the exam. The combined effect of these missteps is likely to produce noticeable–and undeserved–declines in scores for examinees who are just as qualified as those who passed the bar in previous years.

Remember that I’m not referring here to graduates who perform well below the passing score. If you believe that the bar exam is a proper measure of entry-level competence, then those test-takers deserve to fail. The problem is that an increased number of unqualified examinees will drag down scores for more able test-takers. Some of those scores will drop enough to push qualified examinees below the passing line.

Looking Ahead

NCBE, unfortunately, has not been responsive on issues related to their equating and scaling processes. It seems unlikely that the organization will address the problem described here. There is no doubt, meanwhile, that entry-level qualifications of law students have declined. If bar passage rates fall, as they almost surely will, it will be easy to blame all of the decline on less able graduates.

This leaves three avenues for concerned educators and policymakers:

1. Continue to press for more transparency and oversight of NCBE. Testing requires confidentiality, but safeguards are essential to protect individual examinees and public trust of the process.

2. Take a tougher stand against law schools with low bar passage rates. As professionals, we already have an obligation to protect aspirants to our ranks. Self interest adds a potent kick to that duty. As you view the qualifications of students matriculating at schools with low bar passage rates, remember: those matriculants will affect your school’s bar passage rate.

3. Push for alternative ways to measure attorney competence. New lawyers need to know basic doctrinal principles, and law schools should teach those principles. A closed-book, multiple-choice exam covering seven broad subject areas, however, is not a good measure of doctrinal knowledge. It is even worse when performance on that exam sets the curve for scores on other, more useful parts of the bar exam (such as the performance tests). And the situation is worse still when a single organization, with little oversight, controls scoring of that crucial multiple-choice exam.

I have some suggestions for how we might restructure the bar exam, but those ideas must wait for another post. For now, remember: On the bar exam, all graduates are your graduates.

* For a recent review of the literature on changing proficiencies, see Sonya Powers & Michael J. Kolen, Evaluating Equating Accuracy and Assumptions for Groups That Differ in Performance, 51 J. Educ. Measurement 39 (2014). A more reader-friendly overview is available in this online chapter (note particularly the statements on p. 274).

, View Comments (6)

ExamSoft Update

April 21st, 2015 / By

In a series of posts (here, here, and here) I’ve explained why I believe that ExamSoft’s massive computer glitch lowered performance on the July 2014 Multistate Bar Exam (MBE). I’ve also explained how NCBE’s equating and scaling process amplified the damage to produce a 5-point drop in the national bar passage rate.

We now have a final piece of evidence suggesting that something untoward happened on the July 2014 bar exam: The February 2015 MBE did not produce the same type of score drop. This February’s MBE was harder than any version of the test given over the last four decades; it covered seven subjects instead of six. Confronted with that challenge, the February scores declined somewhat from the previous year’s mark. The mean scaled score on the February 2015 MBE was 136.2, 1.8 points lower than the February 2014 mean scaled score of 138.0.

The contested July 2014 MBE, however, produced a drop of 2.8 points compared to the July 2013 test. That drop was 35.7% larger than the February drop. The July 2014 shift was also larger than any other year-to-year change (positive or negative) recorded during the last ten years. (I treat the February and July exams as separate categories, as NCBE and others do.)

The shift in February 2015 scores, on the other hand, is similar in magnitude to five other changes that occurred during the last decade. Scores dropped, but not nearly as much as in July–and that’s despite taking a harder version of the MBE. Why did the July 2014 examinees perform so poorly?

It can’t be a change in the quality of test takers, as NCBE’s president, Erica Moeser, has suggested in a series of communications to law deans and the profession. The February 2015 examinees started law school at about the same time as the July 2014 ones. As others have shown, law student credentials (as measured by LSAT scores) declined only modestly for students who entered law school in 2011.

We’re left with the conclusion that something very unusual happened in July 2014, and it’s not hard to find that unusual event: a software problem that occupied test-takers’ time, aggravated their stress, and interfered with their sleep.

On its own, my comparison of score drops does not show that the ExamSoft crisis caused the fall in July 2014 test performance. The other evidence I have already discussed is more persuasive. I offer this supplemental analysis for two reasons.

First, I want to forestall arguments that February’s performance proves that the July test-takers must have been less qualified than previous examinees. February’s mean scaled score did drop, compared to the previous February, but the drop was considerably less than the sharp July decline. The latter drop remains the largest score change during the last ten years. It clearly is an outlier that requires more explanation. (And this, of course, is without considering the increased difficulty of the February exam.)

Second, when combined with other evidence about the ExamSoft debacle, this comparison adds to the concerns. Why did scores fall so precipitously in July 2014? The answer seems to be ExamSoft, and we owe that answer to test-takers who failed the July 2014 bar exam.

One final note: Although I remain very concerned about both the handling of the ExamSoft problem and the equating of the new MBE to the old one, I am equally concerned about law schools that admit students who will struggle to pass a fairly administered bar exam. NCBE, state bar examiners, and law schools together stand as gatekeepers to the profession and we all owe a duty of fairness to those who seek to join the profession. More about that soon.

, No Comments Yet

Equating, Scaling, and Civil Procedure

April 16th, 2015 / By

Still wondering about the February bar results? I continue that discussion here. As explained in my previous post, NCBE premiered its new Multistate Bar Exam (MBE) in February. That exam covers seven subjects, rather than the six tested on the MBE for more than four decades. Given the type of knowledge tested by the MBE, there is little doubt that the new exam is harder than the old one.

If you have any doubt about that fact, try this experiment: Tell any group of third-year students that the bar examiners have decided to offer them a choice. They may study for and take a version of the MBE covering the original six subjects, or they may choose a version that covers those subjects plus Civil Procedure. Which version do they choose?

After the students have eagerly indicated their preference for the six-subject test, you will have to apologize profusely to them. The examiners are not giving them a choice; they must take the harder seven-subject test.

But can you at least reassure the students that NCBE will account for this increased difficulty when it scales scores? After all, NCBE uses a process of equating and scaling scores that is designed to produce scores with a constant meaning over time. A scaled score of 136 in 2015 is supposed to represent the same level of achievement as a scaled score of 136 in 2012. Is that still true, despite the increased difficulty of the test?

Unfortunately, no. Equating works only for two versions of the same exam. As the word “equating” suggests, the process assumes that the exam drafters attempted to test the same knowledge on both versions of the exam. Equating can account for inadvertent fluctuations in difficulty that arise from constructing new questions that test the same knowledge. It cannot, however, account for changes in the content or scope of an exam.

This distinction is widely recognized in the testing literature–I cite numerous sources at the end of this post. It appears, however, that NCBE has attempted to “equate” the scores of the new MBE (with seven subjects) to older versions of the exam (with just six subjects). This treated the February 2015 examinees unfairly, leading to lower scores and pass rates.

To understand the problem, let’s first review the process of equating and scaling.

Equating

First, remember why NCBE equates exams. To avoid security breaches, NCBE must produce a different version of the MBE every February and July. Testing experts call these different versions “forms” of the test. For each of the MBE forms, the designers attempt to create questions that impose the same range of difficulty. Inevitably, however, some forms are harder than others. It would be unfair for examinees one year to get lower scores than examinees the next year, simply because they took a harder form of the test. Equating addresses this problem.

The process of equating begins with a set of “control” questions or “common items.” These are questions that appear on two forms of the same exam. The February 2015 MBE, for example, included a subset of questions that had also appeared on some earlier exam. For this discussion, let’s assume that there were 30 of these common items and 160 new questions that counted toward each examinee’s score. (Each MBE also includes 10 experimental questions that do not count toward the test-taker’s score but that help NCBE assess items for future use.)

When NCBE receives answer sheets from each version of the MBE, it is able to assess the examinees’ performance on the common items and new items. Let’s suppose that, on average, earlier examinees got 25 of the 30 common items correct. If the February 2015 test-takers averaged only 20 correct answers to those common items, NCBE would know that those test-takers were less able than previous examinees. That information would then help NCBE evaluate the February test-takers’ performance on the new test items. If the February examinees also performed poorly on those items, NCBE could conclude that the low scores were due to the test-takers’ abilities rather than to a particularly hard version of the test.

Conversely, if the February test-takers did very well on the new items–while faring poorly on the common ones–NCBE would conclude that the new items were easier than questions on earlier tests. The February examinees racked up points on those questions, not because they were better prepared than earlier test-takers, but because the questions were too easy.

The actual equating process is more complicated than this. NCBE, for example, can account for the difficulty of individual questions rather than just the overall difficulty of the common and new items. The heart of equating, however, lies in this use of “common items” to compare performance over time.

Scaling

Once NCBE has compared the most recent batch of exam-takers with earlier examinees, it converts the current raw scores to scaled ones. Think of the scaled scores as a rigid yardstick; these scores have the same meaning over time. 18 inches this year is the same as 18 inches last year. In the same way, a scaled score of 136 has the same meaning this year as last year.

How does NCBE translate raw points to scaled scores? The translation depends upon the results of equating. If a group of test-takers performs well on the common items, but not so well on the new questions, the equating process suggests that the new questions were harder than the ones on previous versions of the test. NCBE will “scale up” the raw scores for this group of exam takers to make them comparable to scores earned on earlier versions of the test.

Conversely, if examinees perform well on new questions but poorly on the common items, the equating process will suggest that the new questions were easier than ones on previous versions of the test. NCBE will then scale down the raw scores for this group of examinees. In the end, the scaled scores will account for small differences in test difficulty across otherwise similar forms.

Changing the Test

Equating and scaling work well for test forms that are designed to be as similar as possible. The processes break down, however, when test content changes. You can see this by thinking about the data that NCBE had available for equating the February 2015 bar exam. It had a set of common items drawn from earlier tests; these would have covered the six original subjects. It also had answers to 190 new items; these would have included both the original subjects and the new one (Civil Procedure).

With these data, NCBE could make two comparisons:

1. It could compare performance on the common items. It undoubtedly found that the February 2015 test-takers performed less well than previous test-takers on these items. That’s a predictable result of having a seventh subject to study. This year’s examinees spread their preparation among seven subjects rather than six. Their mastery of each subject was somewhat lower, and they would have performed less well on the common items testing those subjects.

2. NCBE could also compare performance on the new Civil Procedure items with performance on old and new items in other subjects. NCBE won’t release those comparisons, because it no longer discloses raw scores for subject areas. I predict, however, that performance on Civil Procedure items was the same as on Evidence, Property, or other subjects. Why? Because Civil Procedure is not intrinsically harder than these other subjects, and the examinees studied all seven subjects.

Neither of these comparisons, however, would address the key change in the MBE: Examinees had to prepare seven subjects rather than six. As my previous post suggested, this isn’t just a matter of taking all seven subjects in law school and remembering key concepts for the MBE. Because the MBE is a closed-book exam that requires recall of detailed rules, examinees devote 10 weeks of intense study to this exam. They don’t have more than 10 weeks, because they’re occupied with law school classes, extracurricular activities, and part-time jobs before mid-May or mid-December.

There’s only so much material you can cram into memory during ten weeks. If you try to memorize rules from seven subjects, rather than just six, some rules from each subject will fall by the wayside.

When Equating Doesn’t Work

Equating is not possible for a test like the new MBE, which has changed significantly in content and scope. The test places new demands on examinees, and equating cannot account for those demands. The testing literature is clear that, under these circumstances, equating produces misleading results. As Robert L. Brennan, a distinguished testing expert, wrote in a prominent guide: “When substantial changes in test specifications occur, either scores should be reported on a new scale or a clear statement should be provided to alert users that the scores are not directly comparable with those on earlier versions of the test.” (See p. 174 of Linking and Aligning Scores and Scales, cited more fully below.)

“Substantial changes” is one of those phrases that lawyers love to debate. The hypothetical described at the beginning of this post, however, seems like a common-sense way to identify a “substantial change.” If the vast majority of test-takers would prefer one version of a test over a second one, there is a substantial difference between the two.

As Brennan acknowledges in the chapter I quote above, test administrators dislike re-scaling an exam. Re-scaling is both costly and time-consuming. It can also discomfort test-takers and others who use those scores, because they are uncertain how to compare new scores to old ones. But when a test changes, as the MBE did, re-scaling should take the place of equating.

The second best option, as Brennan also notes, is to provide a “clear statement” to “alert users that the scores are not directly comparable with those on earlier versions of the test.” This is what NCBE should do. By claiming that it has equated the February 2015 results to earlier test results, and that the resulting scaled scores represent a uniform level of achievement, NCBE is failing to give test-takers, bar examiners, and the public the information they need to interpret these scores.

The February 2015 MBE was not the same as previous versions of the test, it cannot be properly equated to those tests, and the resulting scaled scores represent a different level of achievement. The lower scaled scores on the February 2015 MBE reflect, at least in part, a harder test. To the extent that the test-takers also differed from previous examinees, it is impossible to separate that variation from the difference in the tests themselves.

Conclusion

Equating was designed to detect small, unintended differences in test difficulty. It is not appropriate for comparing a revised test to previous versions of that test. In my next post on this issue, I will discuss further ramifications of the recent change in the MBE. Meanwhile, here is an annotated list of sources related to equating:

Michael T. Kane & Andrew Mroch, Equating the MBE, The Bar Examiner, Aug. 2005, at 22. This article, published in NCBE’s magazine, offers an overview of equating and scaling for the MBE.

Neil J. Dorans, et al., Linking and Aligning Scores and Scales (2007). This is one of the classic works on equating and scaling. Chapters 7-9 deal specifically with the problem of test changes. Although I’ve linked to the Amazon page, most university libraries should have this book. My library has the book in electronic form so that it can be read online.

Michael J. Kolen & Robert L. Brennan, Test Equating, Scaling, and Linking:
Methods and Practices (3d ed. 2014). This is another standard reference work in the field. Once again, my library has a copy online; check for a similar ebook at your institution.

CCSSO, A Practitioner’s Introduction to Equating. This guide was prepared by the Council of Chief State School Officers to help teachers, principals, and superintendents understand the equating of high-stakes exams. It is written for educated lay people, rather than experts, so it offers a good introduction. The source is publicly available at the link.

, No Comments Yet

The February 2015 Bar Exam

April 12th, 2015 / By

States have started to release results of the February 2015 bar exam, and Derek Muller has helpfully compiled the reports to date. Muller also uncovered the national mean scaled score for this February’s MBE, which was just 136.2. That’s a notable drop from last February’s mean of 138.0. It’s also lower than all but one of the means reported during the last decade; Muller has a nice graph of the scores.

The latest drop in MBE scores, unfortunately, was completely predictable–and not primarily because of a change in the test takers. I hope that Jerry Organ will provide further analysis of the latter possibility soon. Meanwhile, the expected drop in the February MBE scores can be summed up in five words: seven subjects instead of six. I don’t know how much the test-takers changed in February, but the test itself did.

MBE Subjects

For reasons I’ve explained in a previous post, the MBE is the central component of the bar exam. In addition to contributing a substantial amount to each test-taker’s score, the MBE is used to scale answers to both essay questions and the Multistate Performance Test (MPT). The scaling process amplifies any drop in MBE scores, leading to substantial drops in pass rates.

In February 2015, the MBE changed. For more than four decades, that test has covered six subjects: Contracts, Torts, Criminal Law and Procedure, Constitutional Law, Property, and Evidence. Starting with the February 2015 exam, the National Conference of Bar Examiners (NCBE) added a seventh subject, Civil Procedure.

Testing examinees’ knowledge of Civil Procedure is not itself problematic; law students study that subject along with the others tested on the exam. In fact, I suspect more students take a course in Civil Procedure than in Criminal Procedure. The difficulty is that it’s harder to memorize rules drawn from seven subjects than to learn the rules for six. For those who like math, that’s an increase of 16.7% in the body of knowledge tested.

Despite occasional claims to the contrary, the MBE requires lots of memorization. It’s not solely a test of memorization; the exam also tests issue spotting, application of law to fact, and other facets of legal reasoning. Test-takers, however, can’t display those reasoning abilities unless they remember the applicable rules: the MBE is a closed-book test.

There is no other context, in school or practice, where we expect lawyers to remember so many legal principles without reference to codes, cases, and other legal materials. Some law school exams are closed-book, but they cover a single subject that has just been studied for a semester. The “closed book” moments in practice are much fewer than many observers assume. I don’t know any trial lawyers who enter the courtroom without a copy of the rules of evidence and a personalized cribsheet reminding them of common objections and responses.

This critique of the bar exam is well known. I repeat it here only to stress the impact of expanding the MBE’s scope. February’s test takers answered the same number of multiple choice questions (190 that counted, plus 10 experimental ones) but they had to remember principles from seven fields of law rather than six.

There’s only so much that the brain can hold in memory–especially when the knowledge is abstract, rather than gained from years of real-client experience. I’ve watched many graduates prepare for the bar over the last decade: they sit in our law library or clinic, poring constantly over flash cards and subject outlines. Since states raised passing scores in the 1990s and early 2000s, examinees have had to memorize many more rules in order to answer enough questions correctly. From my observation, their memory banks were already full to overflowing.

Six to Seven Subjects

What happens, then, when the bar examiners add a seventh subject to an already challenging test? Correct answers will decline, not just in the new subject, but across all subjects. The February 2015 test-takers, I’m sure, studied just as hard as previous examinees. Indeed, they probably studied harder, because they knew that they would have to answer questions drawn from seven bodies of legal knowledge rather than six. But their memories could hold only so much information. Memorized rules of Civil Procedure took the place of some rules of Torts, Contracts, or Property.

Remember that the MBE tests only a fraction of the material that test-takers must learn. It’s not a matter of learning 190 legal principles to answer 190 questions. The universe of testable material is enormous. For Evidence, a subject that I teach, the subject matter outline lists 64 distinct topics. On average, I estimate that each of those topics requires knowledge of three distinct rules to answer questions correctly on the MBE–and that’s my most conservative estimate.

It’s not enough, for example, to know that there’s a hearsay exemption for some prior statements by a witness, and that the exemption allows the fact-finder to use a witness’s out-of-court statements for substantive purposes, rather than merely impeachment. That’s the type of general understanding I would expect a new lawyer to have about Evidence, permitting her to research an issue further if it arose in a case. The MBE, however, requires the test-taker to remember that a grand jury session counts as a “proceeding” for purposes of this exemption (see Q 19). That’s a sub-rule fairly far down the chain. In fact, I confess that I had to check my own book to refresh my recollection.

In any event, if Evidence requires mastering 200 sub-principles of this detail, and the same is true of the other five traditional MBE subjects, that was 1200 very specific rules to memorize and keep in memory–all while trying to apply those rules to new fact patterns. Adding a seventh subject upped the ante to 1400 or more detailed rules. How many things can one test-taker remember without checking a written source? There’s a reason why humanity invented writing, printing, and computers.

But They Already Studied Civil Procedure

Even before February, all jurisdictions (to my knowledge) tested Civil Procedure on their essay exams. So wouldn’t examinees have already studied those Civ Pro principles? No, not in the same manner. Detailed, comprehensive memorization is more necessary for the MBE than for traditional essays.

An essay allows room to display issue spotting and legal reasoning, even if you get one of the sub-rules wrong. In the Evidence example given above, an examinee could display considerable knowledge by identifying the issue, noting the relevant hearsay exemption, and explaining the impact of admissibility (substantive use rather than simply impeachment). If the examinee didn’t remember the correct status of grand jury proceedings under this particular rule, she would lose some points. She wouldn’t, however, get the whole question wrong–as she would on a multiple-choice question.

Adding a new subject to the MBE hit test-takers where they were already hurting: the need to memorize a large number of rules and sub-rules. By expanding the universe of rules to be memorized, NCBE made the exam considerably harder.

Looking Ahead

In upcoming posts, I will explain why NCBE’s equating/scaling process couldn’t account for the increased difficulty of this exam. Indeed, equating and scaling may have made the impact worse. I’ll also explore what this means for the ExamSoft discussion and what (if anything) legal educators might do about the increased difficulty of the MBE. To start the discussion, however, it’s essential to recognize that enhanced level of difficulty.

, View Comments (2)

ExamSoft and NCBE

April 6th, 2015 / By

I recently found a letter that Erica Moeser, President of the National Conference of Bar Examiners (NCBE) wrote to law school deans in mid-December. The letter responds to a formal request, signed by 79 law school deans, that NCBE “facilitate a thorough investigation of the administration and scoring of the July 2014 bar exam.” That exam suffered from the notorious ExamSoft debacle.

Moeser’s letter makes an interesting distinction. She assures the deans that NCBE has “reviewed and re-reviewed” its scoring, equating, and scaling of the July 2014 MBE. Those reviews, Moeser attests, revealed no flaw in NCBE’s process. She then adds that, to the extent the deans are concerned about “administration” of the exam, they should “note that NCBE does not administer the examination; jurisdictions do.”

Moeser doesn’t mention ExamSoft by name, but her message seems clear: If ExamSoft’s massive failure affected examinees’ performance, that’s not our problem. We take the bubble sheets as they come to us, grade them, equate the scores, scale those scores, and return the numbers to the states. It’s all the same to NCBE if examinees miss points because they failed to study, law schools taught them poorly, or they were groggy and stressed from struggling to upload their essay exams. We only score exams, we don’t administer them.

But is the line between administration and scoring so clear?

The Purpose of Equating

In an earlier post, I described the process of equating and scaling that NCBE uses to produce final MBE scores. The elaborate transformation of raw scores has one purpose: “to ensure consistency and fairness across the different MBE forms given on different test dates.”

NCBE thinks of this consistency with respect to its own test questions; it wants to ensure that some test-takers aren’t burdened with an overly difficult set of questions–or conversely, that other examinees don’t benefit from unduly easy questions. But substantial changes in exam conditions, like the ExamSoft crash, can also make an exam more difficult. If they do, NCBE’s equating and scaling process actually amplifies that unfairness.

To remain faithful to its mission, it seems that NCBE should at least explore the possible effects of major blunders in exam administration. This is especially true when a problem affects multiple jurisdictions, rather than a single state. If an incident affects a single jurisdiction, the examining authorities in that state can decide whether to adjust scores for that exam. When the problem is more diffuse, as with the ExamSoft failure, individual states may not have the information necessary to assess the extent of the impact. That’s an even greater concern when nationwide equating will spread the problem to states that did not even contract with ExamSoft.

What Should NCBE Have Done?

NCBE did not cause ExamSoft’s upload problems, but it almost certainly knew about them. Experts in exam scoring also understand that defects in exam administration can interfere with performance. With knowledge of the ExamSoft problem, NCBE had the ability to examine raw scores for the extent of the ExamSoft effect. Exploration would have been most effective with cooperation from ExamSoft itself, revealing which states suffered major upload problems and which ones experienced more minor interference. But even without that information, NCBE could have explored the raw scores for indications of whether test takers were “less able” in ExamSoft states.

If NCBE had found a problem, there would have been time to consult with bar examiners about possible solutions. At the very least, NCBE probably should have adjusted its scaling to reflect the fact that some of the decrease in raw scores stemmed from the software crash rather than from other changes in test-taker ability. With enough data, NCBE might have been able to quantify those effects fairly precisely.

Maybe NCBE did, in fact, do those things. Its public pronouncements, however, have not suggested any such process. On the contrary, Moeser seems to studiously avoid mentioning ExamSoft. This reveals an even deeper problem: we have a high-stakes exam for which responsibility is badly fragmented.

Who Do You Call?

Imagine yourself as a test-taker on July 29, 2014. You’ve been trying for several hours to upload your essay exam, without success. You’ve tried calling ExamSoft’s customer service line, but can’t get through. You’re worried that you’ll fail the exam if you don’t upload the essays on time, and you’re also worried that you won’t be sufficiently rested for the next day’s MBE. Who do you call?

You can’t call the state bar examiners; they don’t have an after-hours call line. If they did, they probably would reassure you on the first question, telling you that they would extend the deadline for submitting essay answers. (This is, in fact, what many affected states did.) But they wouldn’t have much to offer on the second question, about getting back on track for the next day’s MBE. Some state examiners don’t fully understand NCBE’s equating and scaling process; those examiners might even erroneously tell you “not to worry because everyone is in the same boat.”

NCBE wouldn’t be any more help. They, as Moeser pointed out, don’t actually administer exams; they just create and score them.

Many distressed examinees called law school staff members who had helped them prepare for the bar. Those staff members, in turn, called their deans–who contacted NCBE and state bar examiners. As Moeser’s letters indicate, however, bar examiners view deans with some suspicion. The deans, they believe, are too quick to advocate for their graduates and too worried about their own bar pass rates.

As NCBE and bar examiners refused to respond, or shifted responsibility to the other party, we reached a stand-off: no one was willing to take responsibility for flaws in a very high-stakes test administered to more than 50,000 examinees. That is a failure as great as the ExamSoft crash itself.

, No Comments Yet

About Law School Cafe

Cafe Manager & Co-Moderator
Deborah J. Merritt

Cafe Designer & Co-Moderator
Kyle McEntee

ABA Journal Blawg 100 HonoreeLaw School Cafe is a resource for anyone interested in changes in legal education and the legal profession.

Around the Cafe

Subscribe

Enter your email address to receive notifications of new posts by email.

Categories

Recent Comments

Recent Posts

Monthly Archives

Participate

Have something you think our audience would like to hear about? Interested in writing one or more guest posts? Send an email to the cafe manager at merritt52@gmail.com. We are interested in publishing posts from practitioners, students, faculty, and industry professionals.

Past and Present Guests