

Yes, the LLMs received credit for each level even if they didn’t complete the entire environment.
They have some replays of tasks on their website: https://arcprize.org/tasks
Here’s one where the human completed all 9 levels in 1458 actions, but the LLM completed only one level in 24 actions, then struggled for 190 actions until it timed-out, I guess. The LLM scored 2.8% because of the weighted average, I think. I didn’t take the time to all do the math, and I’m not sure if the replay action count is accurate, but it gives you an idea.
Human: https://arcprize.org/replay/0d461c1c-21e5-4dc8-b263-9922332a6485
LLM: https://arcprize.org/replay/cc821983-3975-4ae4-a70b-e031f6807bb0




Why is this so downvoted?