Type to search · Enter for full results

8.1 Global

deepseek-r1:32b

Judge: gemma4:31b · 144/160 tests · 1 h 13 min · 10.9 tok/s

32.8B · Q4_K_M · 18.5 GB · 131K ctx

Thinking

Category breakdown

surprise 10.0
long-context 9.9
agentic 9.4
instruction 8.7
code 8.4
reasoning 8.2
math 8.0
multilingual 7.9
frontend 7.8
safety 7.6
organization 7.5
roleplay 7.4
writing 6.1