Skip to content

scalable oversight via recursive self-critiquing

๐Ÿ“„ Paper

Xueru Wen, Jie Lou, Xinyu Lu, Junjie Yang, Yanjiang Liu, Yaojie Lu, Debing Zhang, Xing Yu ยท 2025-02-07

View Original โ†—

Abstract

As AI capabilities increasingly surpass human proficiency in complex tasks, current alignment techniques including SFT and RLHF face fundamental challenges in ensuring reliable oversight. These methods rely on direct human assessment and become untenable when AI outputs exceed human cognitive thresholds. In response to this challenge, we explore two hypotheses: (1) \textit{Critique of critique can be easier than critique itself}, extending the widely-accepted observation that verification is easier than generation to the critique domain, as critique itself is a specialized form of generation; (2) \textit{This difficulty relationship is recursively held}, suggesting that when direct evaluation is infeasible, performing high-order critiques (e.g., critique of critique of critique) offers a more tractable supervision pathway. We further conduct Human-AI and AI-AI experiments to investigate the potential of utilizing recursive self-critiquing for AI supervision. Our results highlight recursive critique as a promising approach for scalable AI oversight.

Cited By (0 articles)

Not currently cited by any articles in the knowledge base.

โ† Back to Resources