Skip to content

RE-Bench: Evaluating frontier AI R&D capabilities