Thinking, Fast and Slow

Kahneman’s best-selling book described our brain’s two systems of decision-making:

  • System 1 is fast and reflexive and particularly susceptible to our emotions and intuitions
  • System 2 is slower, deliberate, and utilizes conscious calculations and reasoning

(Frederick 2005) was motivated by Tversky and Kahneman’s Nobel Prize in Economics-winning research that identified how consumers do not act in their rational best interests because of cognitive biases.

The Cognitive Reflection Test (CRT) is a proxy for how proficient an individual is at inhibiting System 1 and demonstrating Cognitive Reflection (using System 2). CRT includes “trick” questions like:

“A bat and a ball cost $1.10 in total. The bat costs $1.00 more than the ball. How much does the ball cost?”

In behavioral economics, CRT was found as inversely correlated with:

  • Gambler’s fallacy
  • Sunk cost fallacy

In what ways might “Cognitive Reflection” play a role in software engineering?

Perhaps in verifying code?

  • Code Review
  • Testing

Does this function produce the described behavior?

Does this function produce the described behavior?

Unit Testing Exam

  • Following a 3-week module on unit testing
  • Exam to test and implement a class
  • Class interface and specification provided
  • Classified each implementation as acceptable or buggy
  • Ran each student’s tests against each implementation
  • Analyzed test accuracy for each student
Accuracy = 
  (True Positives + True Negatives) /
  (True Positives + True Negatives + False Positives + False Negatives)

Unit Testing Exam - Accuracy


CRT vs Inspection

  • CRT was not a significant predictor (p=0.329) of students’ (n=102) affirmation of acceptable code.
  • CRT was a significant predictor of students rejecting defective code (p<0.0001) with the log odds of correctly rejecting the defective code increasing by 2.94 (95% CI 1.56-4.50).
    • When considering only students who proposed cases that caused a defect, we found that CRT was a significant predictor (p<0.001) of students identifying a defective case with the log odds of doing so increasing by 2.37 (95% CI 1.05-3.86)

CRT vs Testing

  • CRT was not correlated with test accuracy (ρ=0.940, p=0.008).
    • Test effectiveness (M=0.66, sd=0.27) was not correlated with CRT (ρ=0.140, p=0.159)
    • Test affirmation (M=0.76, sd=0.17) was not correlated with CRT (ρ= -0.199, p=0.045)

Threats to Validity

  • Potential previous exposure to CRT
  • Was the phenomenon particular to the coding problems given?

Replication Study

  • Replaced CRT with similar (but newer, less common) variants (Thomson & Oppenheimer 2016) and (Primi 2016)
  • Adopted new Unit Testing Exam and new code inspection question
  • Studied students (n=38) in following term

Replication Results

  • Significant, positive correlation between alternate-CRT and manual verification (ρ=0.478, p<0.01)
    • Significant regression equation for predicting rejection of buggy implementation: -0.3138 + 1.0239 x alt-CRT score (F(1,36)=9.106,p<.01), with an R^2 of 0.1797
    • Not a significant predictor of affirmation alone (p=0.729).
  • No correlation between alternate-CRT and test accuracy (ρ=0.113, p=0.498).


  • Cognitive Reflection has a moderately positive correlation with manually identifying bugs in code
  • Cognitive Reflection not associated with unit testing accuracy

How would you interpret the results?

