Kellie Ottoboni and Philip B. Stark
The truth will set you free, but rst it will piss you of.
Student evaluations of teaching (SET) are widely used in academic personnel decisions as a measure of teaching eectiveness. We show:
– SET are biased against female instructors by an amount that is large and statistically signicant
– the bias aects how students rate even putatively objective aspects of teaching, such as how promptly assignments are graded
– the bias varies by discipline and by student gender, among other things
– it is not possible to adjust for the bias, because it depends on so many factors
– SET are more sensitive to students’ gender bias and grade expectations than they are to teaching effectiveness
– gender biases can be large enough to cause more eective instructors to get lower SET than less effective instructors.
These findings are based on nonparametric statistical tests applied to two datasets: 23,001 SET of 379 instructors by 4,423 students in six mandatory firrst-year courses in a ve-year natural experiment at a French university, and
43 SET for four sections of an online course in a randomized, controlled, blind experiment at a US university.
In two very dierent universities and in a broad range of course topics, SET measure students’ gender biases better than they measure the instructor’s teaching effectiveness. Overall, SET disadvantage female instructors. There is no evidence that this is the exception rather than the rule. Hence, the onus should be on universities that rely on SET for employment decisions to provide convincing aaffirmative evidence that such reliance does not have disparate impact on women, under-represented minorities, or other protected groups. Because the bias varies by course and institution, affirmative evidence needs to be specic to a given course in a given department in a given university. Absent such specic evidence, SET should not be used for personnel decisions.