- Something I don't have my head fully around with things like OpenAI Codex (extensive system prompting and agent structure overlayed on GPT models): how much does 4o vs o3 matter here? Like: it's doing a facsimile of reasoning regardless just b/c it's an agent, right?May 6, 2025 23:36
- Some facsimiles are better than others. BTW, there's an open source agentic framework called aider which has plugins for nearly all major llms, if you want to compare.
- I use Aider! I'm mostly comparing Gemini 2.5 to OpenAI's models, which I can use for free (for myself) b/c work.
- I don’t know what codex means today, but originally codex was important (and still is) because it’s a fill-in-the-middle (FIM) model. It gets extra training with a few extra control tokens to make using it for tab completion efficient.
- Totally unrelated (confusingly) to original Codex; now OpenAI Codex is just their equivalent to Claude Code, a CLI tool written in Node.js and being rewritten in Rust.
- Oh! That’s also a topic I know a bit about (cough github.com/boldsoftware/sketch), though I am embarrassingly behind on others’ product launches. Yes, agent loops and “thinking” models have a lot in common. In the limit they may be the same, esp as they put web searches into the thinking phase.
- There are definitely efficiency advantages to training thinking in (different token space for reasoning, treat logits differently, etc), and some papers point to better outcomes. But the mechanics overlap heavily with agent loops.