Thomas Ptacek: Something I don't have my head fully around with things like OpenAI Codex (extensive system prompting and agent structure overlayed on GPT models): how much does 4o vs o3 matter here? Like: it's doing a facsimile of reasoning regardless just b/c it's an agent, right?

Thomas Ptacek sockpuppet.org
Something I don't have my head fully around with things like OpenAI Codex (extensive system prompting and agent structure overlayed on GPT models): how much does 4o vs o3 matter here? Like: it's doing a facsimile of reasoning regardless just b/c it's an agent, right?
May 6, 2025 23:36
0 reposts 0 quotes 1 like

View on Bluesky Show all post labels
rst rst.bsky.social · May 6
Some facsimiles are better than others. BTW, there's an open source agentic framework called aider which has plugins for nearly all major llms, if you want to compare.

View on Bluesky Show all post labels
Thomas Ptacek sockpuppet.org · May 6
I use Aider! I'm mostly comparing Gemini 2.5 to OpenAI's models, which I can use for free (for myself) b/c work.

View on Bluesky Show all post labels

David Crawshaw crawshaw.io · May 6
I don’t know what codex means today, but originally codex was important (and still is) because it’s a fill-in-the-middle (FIM) model. It gets extra training with a few extra control tokens to make using it for tab completion efficient.

View on Bluesky Show all post labels
Thomas Ptacek sockpuppet.org · May 6
Totally unrelated (confusingly) to original Codex; now OpenAI Codex is just their equivalent to Claude Code, a CLI tool written in Node.js and being rewritten in Rust.

View on Bluesky Show all post labels
David Crawshaw crawshaw.io · May 7
Oh! That’s also a topic I know a bit about (cough github.com/boldsoftware/sketch), though I am embarrassingly behind on others’ product launches. Yes, agent loops and “thinking” models have a lot in common. In the limit they may be the same, esp as they put web searches into the thinking phase.

View on Bluesky Show all post labels
David Crawshaw crawshaw.io · May 7
There are definitely efficiency advantages to training thinking in (different token space for reasoning, treat logits differently, etc), and some papers point to better outcomes. But the mechanics overlap heavily with agent loops.

View on Bluesky Show all post labels

An unhandled error has occurred. Reload 🗙