Wolfram Ravenwolf: o3-mini takes 2nd place, right behind DeepSeek-R1, ahead of o1-mini, Claude and o1-preview. Not only is it better than o1-mini+preview, it's also much cheaper: A single benchmark run with o3-mini cost $2.27, while one run with o1-mini cost $6.24 and with o1-preview even $45.68!

Wolfram Ravenwolf wolfram.ravenwolf.ai · Feb 10
Here's a quick update on my recent work: Completed MMLU-Pro CS benchmarks of o3-mini, Gemini 2.0 Flash and several quantized versions of Mistral Small 2501 and its API. As always, benchmarking revealed some surprising anomalies and unexpected results worth noting:

View on Bluesky Download image Show all post labels
Wolfram Ravenwolf wolfram.ravenwolf.ai
o3-mini takes 2nd place, right behind DeepSeek-R1, ahead of o1-mini, Claude and o1-preview. Not only is it better than o1-mini+preview, it's also much cheaper: A single benchmark run with o3-mini cost $2.27, while one run with o1-mini cost $6.24 and with o1-preview even $45.68!
Feb 10, 2025 22:37
0 reposts 0 quotes 0 likes

View on Bluesky Show all post labels

An unhandled error has occurred. Reload 🗙