Here's a quick update on my recent work: Completed MMLU-Pro CS benchmarks of o3-mini, Gemini 2.0 Flash and several quantized versions of Mistral Small 2501 and its API. As always, benchmarking revealed some surprising anomalies and unexpected results worth noting:
Gemini 2.0 Flash is almost exactly on par with 1.5 Pro, but faster and cheaper. Looks like Gemini version 2.0 completely obsoletes the 1.5 series. This now also powers my smart home so my AI PC doesn't have to run all the time.
Feb 10, 2025 22:37