Dr. David DiBattista, professor of Psychology at Brock University and faculty associate to the Centre for Pedagogical Innovation gave a workshop today on how to determine the quality of your multiple choice questions.
Starting with the fundamental assumptions:
- Our goal is to help students learn
- We test students to find out how much they have learned
- Students who do well on your tests indicate they have learned
In addition to automatically grading student tests, using bubble sheets, e.g. Scantron™ is beneficial to instructors to receive overall data in aggregate in the form of a detailed item analysis.
A multiple choice question consist of a question (stem) and the answer choices (options).
The correct answer is known as the keyed option.
By analyzing the percentage of students who correctly answer a keyed answer for a question, you can determine the item difficulty. A range of 40-80% is considered an appropriately difficult question.
Another very important factor is item discrimination, which basically means that the question discriminates your strong students from your weak students through an analysis of the students’ overall score, the overall mean score for the class vs how they performed on that one particular question and how well the class did overall on that question.
Mathematically this is determined through a calculation of the point biserial. This number ranges from -1 to 1. For the keyed response a good ITEM Discrimination score ranges from +0.2 to +0.8.
Anything below that is not a good discriminator. For your incorrect answers (i.e. the distractors you would want the opposite), meaning that you would hope that your strong students would NOT choose the incorrect answer more often than the correct answer.
Dr. DiBattista has also given a workshop specifically addressing the construction of your multiple choice tests, as I’ve drawn before:
I attend these workshops with a healthy dose of cynicism about multiple choice testing and its historical influence on standardized testing. This opinion was heavily informed by Cathy N. Davidson‘s chapter 5 from Now You See It, How We Measure. So much so, that I also drew it:
Reading that chapter fuelled my outrage for some of the nonsense that is closely associated with multiple choice testing.
However, it must be said that David DiBattista’s thoughtful approach to teaching, learning and assessment forces me to not paint all multiple choice tests with the same broad brush. Not all multiple choice tests are created equal and not all questions are created equal.
Using the detailed item analysis can assist instructors with extremely large classes use multiple choice assessments that are more reliable and valid. Increasing the number of assessments (more often throughout the term) and variety of assessments (i.e. not just multiple choice) will also help ensure you are ascertaining a reliable picture of how your students are performing.
It is important to also look at item difficulty in the context of your teaching and whether you appropriately addressed that topic and more importantly, whether that topic is accurately represented by your question.
For more information on Tips for Constructing Multiple Choice Tests, see:
DiBattista, D., & Kurzawa, L. (2011). Examination of the Quality of Multiple-choice Items on Classroom Tests. Canadian Journal For The Scholarship Of Teaching & Learning, 2(2), 1.