RESEARCH PAPER

Multi-Select Causal Reasoning in LLMs 

A New Framework for Evaluating Complex AI Behavior 

In Summary

  • Most models under-select valid causes or over-select irrelevant ones 
  • Chain-of-thought prompting showed no consistent performance benefit 
  • Models display distinct behavioral patterns across causal and non-causal tasks