:page_facing_up: New preprint on a challenge benchmark for LLM reasoning over conflicting evidence