From Ideas Guy to Planager
Since the latest batch of LLM kool-aid was delivered — especially the orange-y flavour of Claude Code — many have declared that that now belongs to the ideas guy. It’s a drink I have sampled, then unabatedly consumed to deeply develop a similar sentiment. I have now taken some distance from the intoxication of this apparatus that is sometimes a jukebox and at other times a slot machine to notice my withdrawals and work past it. While I have not been able to drain the swamp of ideas that my head continues to swim in, my brief abstinence of a weekend showed me the perils of the racing mind of an ideas guy on a walk who, in the midst of this machination, becomes a planager.
More projects have been started than have been finished because there is just one more thing I need to tweak before that digital thing which can make all the other digital things is perfectly tuned. Ah, then I shall have this orchestrator controlling its minions to conveniently do my bidding, triggered into action from my phone right before bedtime. Ah, the folly and false promises.
the more days I have spent copy/pasting plans between Claude Code and Codex to agentically peer review each other, the more I have come to accept that this is the era of planager; he who manages most plans, wins.
To win the planager cup, one must minimise the usage of the mouse; the inventors of Vim knew best.
Botference is a Terminal User Interface environment where I, a human, can chat with multiple LLMs in a “council”; or when I feel like it’s better to have the agents hash out the technicalities of a topic or implementation, I send them into a “caucus”. The project is constantly evolving; early on, the main stable outputs were for planning artifacts like implementation-plan.md and checkpoint.md. Botference chats are started in plan mode; a build mode can be run — once the planning artefacts are created — in either interactive or headless mode.
Botference isn’t an “assistant for the user,” but “a system for getting multiple models and a human to converge on what should be built.” Or if one likes LLM-ese SWE jargon, then it’s a planning/control surface. Its build mode can orchestrate parallel agents based on the Howler project’s research companion, I now find value is in structured deliberation, model disagreement, convergence, and artifact generation before execution.
architecture (Old and maybe incorrect as I edit this piece; but there might be useful bits)
The markdown files in .claude/agents/ are just prompts — they define the agent’s persona, instructions, and constraints. They don’t do tool calling. The actual tool-calling orchestration lives in the Python layer: botference_agent.py owns the loop: it loads an agent’s tool registry, sends the prompt to the API, receives tool-call requests back, executes them, returns results, and repeats. The markdown file just gets injected as the system prompt.
fallback_agent_mcp.py does the same thing but inverted — instead of owning the loop, it exposes the tool registry as an MCP server and lets claude -p (the CLI) drive the loop. Same tools, same boundaries, different thing running the cycle.
You’re right, my three-layer model was wrong. There are actually four layers, and bash is one of them:
1. Agent definition (markdown) — who the agent is
2. Tool registry (Python) — what tools each agent gets
3. Inner loop (Python —
botference_agent.py) — the tool-calling loop: send prompt → receive tool call → execute → return result → repeat4. Outer loop (Bash — botference) — the iteration loop: detect agent → dispatch to the right model/path → monitor context → handle retries/circuit breakers → merge results → advance to next task
The bash script doesn’t mediate between model and tools — that’s what botference_agent.py and the MCP fallback do. Bash mediates between iterations. It’s the supervisor, not the executor.
On your second question — the connection between agent name and tools happens through a chain that starts in the plan file. The bash script does:
Philosophy
Botference has two main modes:
-
Plan mode (
botference planat the terminal) launches both Claude Code and Codex in an interactive TUI session simultaneously — the “council” session, if you will. This is intended to generate theimplementation-plan.mdvia agentic peer review first and my steering after. -
Build mode basically implements the plan in headless mode (
botference build -p) and allows using either Claude Code or Codex.
Other users might wish to add other models that they work with; for now, I do not currently intend to bring in more models in plan mode because I’d like to be building more things using botference. In fact, one of botference’s early achievements is going to be demonstrated with the “codewalk” below; if you scroll thru the any of the code files below, you will see annotated explanations of the code. I think this is going to be a practice I adopt — and will encourage others to do — moving ahead so I can make LLM code more legible to myself as there are many cases, particularly in research, where not reading the code is simply not advisable. The code IS the artifact that needs to be examined where no visual or interactive testing is possible; it is why I do not think that research automation is not on the horizon any time soon in non-data-centric fields. AI-human teams mightI am tempted to use “will”, but I want to be measured. end up doing amazing things.
The connection between agent name and tools happens through a chain that starts in the plan file. The bash script does
This is in detect.sh, which you sourced but didn’t share. But the implementation plan presumably has tasks formatted something like:
- [ ] 1.1 Research prior work — **scout**
- [ ] 1.2 Write module — **coder**
- [ ] 2.1 Review draft — **critic**
That detect_agent_from_checkpoint extracts bold word is the agent name. Then everything fans out from that single string:
- Prompt:
resolve_agent_path "$CURRENT_AGENT" → .claude/agents/coder.md - Model:
resolve_model "$CURRENT_AGENT" →looks upcontext-budgets.json - Tools:
get_tools_for_agent(agent_name)in Python → checksAGENT_TOOLSdict first, then parses## Toolssection from the agent’s markdown
This is where plan mode launches launches both Claude and Codex into the TUI simultaneously for a “council” session.
Build mode is where the model-swapping happens. The script calls resolve_model per agent, and then branches:
if is_openai_model "$CLAUDE_MODEL"; then
# try codex CLI, fall back to botference_agent.py
elif is_anthropic_model "$CLAUDE_MODEL" && ! has_anthropic_api_key; then
# MCP fallback: claude -p with botference's tools as MCP server
else
# direct API via botference_agent.py
So it’s three execution paths, but the agent definition and tool scoping are shared across all of them. The same build_claude_system_prompt and build_mcp_config functions construct the context regardless of which model runs.
What’s interesting is the asymmetry that still exists. When it uses the Codex CLI interactively, it passes the prompt directly to codex --full-auto — which means Codex uses its own native tools, not botference’s scoped tool registry. But when it uses botference_agent.py for Codex, it does go through the shared registry. So the unification is incomplete: the CLI paths for each model still diverge in what tools are available.
The MCP fallback path is the closest thing here to the unified architecture you’re proposing. It takes any agent definition, exposes the right tools via MCP, and lets the CLI drive. The gap is that this only kicks in when there’s no API key. If you generalized that path to be the default — always run through MCP regardless of auth method — you’d have the model-agnostic agent directory you’re describing. The model becomes a flag, the agent definition stays the same, and the tool surface is identical.
The other thing worth noting: resolve_model per agent means the plan file or config can assign different models to different agents. So you could have your coder agent on Claude and your reviewer on Codex within the same build loop. That’s already closer to “unified agent directory, model is just a parameter” than most setups.
mcp loading in it is via exec.sh
Yes — look at the fallback path inside the pipe mode block:
if $USE_CLAUDE_FALLBACK; then
MCP_CONFIG=$(build_mcp_config "$CURRENT_AGENT")
resolve_model_and_effort "$CLAUDE_MODEL" "$CURRENT_AGENT"
echo "$PROMPT" | claude --model "$CLI_MODEL" \
$EFFORT_FLAG \
--tools "" \
--mcp-config "$MCP_CONFIG" \
--append-system-prompt "$AGENT_SYSTEM_PROMPT" \
--output-format stream-json \
--verbose \
--dangerously-skip-permissions \
| python3 "${BOTFERENCE_HOME}/lib/stream-filter.py" ...
There’s no literal -p flag, but piping the prompt via echo "$PROMPT" | Claude … with --output-format stream-json achieves the same thing — non-interactive execution. The --mcp-config points to whatever build_mcp_config generates for that agent, which presumably writes a JSON config that launches fallback_agent_mcp.py as a stdio MCP server.
The --tools "" is notable too — it blanks out Claude’s native tools so the only tools available are the ones coming through the MCP server. That’s how it enforces the per-agent tool scoping even though Claude CLI is driving the loop instead of botference_agent.py.
So the chain is: bash script → build_mcp_config for the agent → claude CLI reads that config → launches fallback_agent_mcp.py as MCP server → agent gets exactly the tools its registry permits.
coding agents without web search
That’s a sharp catch. There’s a potential gap here. The MCP server explicitly skips server-side tools:
active_tools = [n for n in tool_names if n in TOOLS and n not in SERVER_TOOLS]
The comment says: “Server-side tools (e.g. web_search) are skipped — Claude handles those internally.” But then the CLI invocation does:
–tools “”
So if –tools “” blanks out all of Claude’s native tools, and the MCP server skips web_search assuming Claude has it natively… then web search falls through both cracks. Neither side provides it. Whether that’s a bug or intentional is hard to say without seeing what –tools “” actually does in the Claude CLI. It might only disable the built-in file/code tools (Read, Write, Edit, Bash) while leaving server-side capabilities like web search intact — those might be a separate category that can’t be toggled off. But I genuinely don’t know, and I’d be guessing. It’s worth checking. If you have Claude Code available, claude –help on the –tools flag would clarify whether it affects server-side capabilities or just the local tool set. If it does kill web search, then any agent that needs it would silently lose that capability on the fallback path.
exec.sh
Good, now we can see the full picture. On the –tools “” question — yes, they absolutely could use –allowedTools per agent instead. Look at how they already do it in plan mode:
PLAN_ALLOWED_TOOLS=(
"Read" "Glob" "Grep" "Bash"
"Edit(/checkpoint.md)"
"Write(/implementation-plan.md)"
...
)
That’s precise, per-agent scoping. But in the build fallback path they switch to –tools “” plus MCP for every agent. The reason is probably practical: they already have per-agent tool registries defined in Python (the AGENT_TOOLS dict and get_tools_for_agent function), so rather than duplicating that logic in bash as –allowedTools arrays for each agent, they blank the native tools and let the MCP server be the single source of truth for what each agent can do. One place to maintain it, not two. It’s a reasonable engineering tradeoff — but you’re right that it’s blunt. An agent that legitimately needs web search loses it silently on this path. On your second question — there it is:
build_mcp_config() {
...
cat > "$config_file" <<EOF
{
"mcpServers": {
"botference-tools": {
"command": "${py}",
"args": ["${BOTFERENCE_HOME}/core/fallback_agent_mcp.py", "${agent_name}"${extra_args}]
}
}
}
EOF
So the bash script never references fallback_agent_mcp.py directly — it generates a JSON config file that tells the Claude CLI to spawn it. The CLI reads the config and launches the MCP server as a subprocess. That’s why you couldn’t find the reference — it’s embedded inside a heredoc that writes to a temp file at runtime.
The build_claude_system_prompt function is also worth noting. It resolves the agent markdown (workspace first, then framework), prepends path context and file layout preambles, and concatenates everything. So the full prompt the model sees is: path preamble + file layout + agent markdown. The tools come separately via MCP. Clean separation.
How plan works
So the mapping isn’t filtering or word matching against the prompt content. It’s a lookup by name: the plan says “use coder,” the system finds coder.md, coder’s tool list, and coder’s model. The agent name is the join key across all four layers. There’s no clever matching or lookup table. It’s just parsing the last bold word from a task line. Look at extract_agent_from_task_block in detect.sh:
agent=$(printf "%s\n" "$task_block" \
| grep -o '\*\*[^*][^*]*\*\*' \
| tail -1 \
| sed 's/\*\*//g' \
| sed 's/[^a-zA-Z0-9_-]//g')
This chunk of code in extract_agent_from_task_block grabs the last bold word from the task in an implemenation-plan. So, for example, a line like the one below:
- [ ] 1.2 Write the authentication module — **coder**
yields coder. Keep scrolling down in the code to line 209.
Then resolve_agent_path checks if that string for the agent corresponds to an actual file:
if [ -f ".claude/agents/${agent_name}.md" ]; then
echo ".claude/agents/${agent_name}.md"
elif [ -f "${BOTFERENCE_HOME}/.claude/agents/${agent_name}.md" ]; then
echo "${BOTFERENCE_HOME}/.claude/agents/${agent_name}.md"
fi
If no file matches, it returns empty and the agent is rejected. That validation step at the bottom of detect_agent_from_checkpoint is important — it prevents random bold text like TDD or IMPORTANT from being mistaken for an agent name. So the full flow is: checkpoint says “Next Task” → extract it → if it’s terminal or prose, fall back to the first unchecked task in the plan → grab the last bold word → check if .claude/agents/{word}.md exists → if yes, that’s your agent.
The convention is entirely structural. The plan file is the routing table, bold formatting is the syntax, and the filesystem is the validator. No mapping file, no config, no registry. If someone puts **reviewer** at the end of a task line and reviewer.md exists in the agents directory, it runs. If the file doesn’t exist, it skips.
Elegant in its simplicity, fragile in that a plan author needs to know the convention. One typo in the bold word and the task gets skipped with a warning.
This is the ghuntley architecture they reference in the README. The interesting design choice is that the tool scoping is done in code rather than declaratively in the agent markdown, which gives more control but makes the agent definitions less portable.
How build works
mcp fallback
An MCP server exposes a tool interface that any model can call; Botference is setup right now for Codex and Claude (others can add more models if needed). As long as you have:
- Markdown agent definitions that both models can read
- An MCP server that both models can call tools through
- A thin generic prompt saying “read the agents directory and execute” The tool scoping being in Python doesn’t make the agent definitions less portable — it makes the tool boundaries enforced consistently regardless of which model is driving. Where I went wrong was conflating two things: the agent definition (which is just a prompt, and is portable) and the execution infrastructure (which is shared via MCP, and is also portable). I was treating tool-scoping-in-code as a portability problem when it’s actually the opposite — it’s what enables portability, because neither model needs to understand the other’s native tool system. The one real question your architecture raises isn’t about portability but about prompt effectiveness. The same markdown instructions might work better for one model than the other. But that’s a tuning issue, not a structural one. And arguably a single well-written agent prompt that both models can follow is better than maintaining two divergent versions anyway. So yes — the architecture you’re describing is sound, and it’s basically what Botference’s MCP fallback path already demonstrates.
Nothing in this file references Claude or Codex specifically. It takes an agent name, builds a tool set, and speaks MCP over stdio.
The get_tools_for_agent function is passed a specific agent type; it then checks its hardcoded tool registry in the __init__.py; it can also fall back to parsing a ## Tools section from that agent’s markdown file as a backup which is non-deterministic as it’s prompt-based. The filtering line is also telling
It strips out server-side tools (like web search) because those are capabilities the model already has natively. The MCP server only exposes the tools that the model doesn’t have — file operations, search over the codebase, etc. This is a clean separation: the model brings its own capabilities, the MCP server supplements them. Nothing in this file references Claude or Codex specifically. It takes an agent name, builds a tool set, and speaks MCP over stdio. You could point any MCP-compatible model at it. The agent markdown provides the prompt, the Python provides the tools, and the model is just whatever’s on the other end of the pipe.
TOOLS is defined here and repopulated below in, depending on the agent.
AGENT_TOOLS is just limiting the different types of tools I felt that specific agents relevant to writing a research paper should have.
botference.sh
This is really revealing. Look at the dispatch logic — it’s exactly the architecture you were describing, just implemented pragmatically rather than as a clean abstraction.
The key decision tree is:
Plan mode (interactive): IfBOTFERENCE_MODEis true, it launches both Claude and Codex into the TUI simultaneously — the “council” session. Otherwise it falls back to solo Claude via the CLI.Build mode: This is where the model-swapping happens. The script callsresolve_modelper agent, and then branches:
if is_openai_model "$CLAUDE_MODEL"; then
# try codex CLI, fall back to botference_agent.py
elif is_anthropic_model "$CLAUDE_MODEL" && ! has_anthropic_api_key; then
# MCP fallback: claude -p with botference's tools as MCP server
else
# direct API via botference_agent.py
So it’s three execution paths, but the agent definition and tool scoping are shared across all of them. The same build_claude_system_prompt and build_mcp_config functions construct the context regardless of which model runs.
What’s interesting is the asymmetry that still exists. When it uses the Codex CLI interactively, it passes the prompt directly to codex --full-auto — which means Codex uses its own native tools, not botference’s scoped tool registry. But when it uses botference_agent.py for Codex, it does go through the shared registry. So the unification is incomplete: the CLI paths for each model still diverge in what tools are available.
The MCP fallback path is the closest thing here to the unified architecture you’re proposing. It takes any agent definition, exposes the right tools via MCP, and lets the CLI drive. The gap is that this only kicks in when there’s no API key. If you generalized that path to be the default — always run through MCP regardless of auth method — you’d have the model-agnostic agent directory you’re describing. The model becomes a flag, the agent definition stays the same, and the tool surface is identical.
The other thing worth noting: resolve_model per agent means the plan file or config can assign different models to different agents. So you could have your coder agent on Claude and your reviewer on Codex within the same build loop. That’s already closer to “unified agent directory, model is just a parameter” than most setups.
mcp loading in it is via exec.sh
Yes — look at the fallback path inside the pipe mode block:
if $USE_CLAUDE_FALLBACK; then
MCP_CONFIG=$(build_mcp_config "$CURRENT_AGENT")
resolve_model_and_effort "$CLAUDE_MODEL" "$CURRENT_AGENT"
echo "$PROMPT" | claude --model "$CLI_MODEL" \
$EFFORT_FLAG \
--tools "" \
--mcp-config "$MCP_CONFIG" \
--append-system-prompt "$AGENT_SYSTEM_PROMPT" \
--output-format stream-json \
--verbose \
--dangerously-skip-permissions \
| python3 "${BOTFERENCE_HOME}/lib/stream-filter.py" ...
There’s no literal -p flag, but piping the prompt via echo "$PROMPT" | Claude … with --output-format stream-json achieves the same thing — non-interactive execution. The --mcp-config points to whatever build_mcp_config generates for that agent, which presumably writes a JSON config that launches fallback_agent_mcp.py as a stdio MCP server.
The
--tools "" is notable too — it blanks out Claude’s native tools so the only tools available are the ones coming through the MCP server. That’s how it enforces the per-agent tool scoping even though Claude CLI is driving the loop instead of botference_agent.py.So the chain is: bash script → build_mcp_config for the agent → claude CLI reads that config → launches fallback_agent_mcp.py as MCP server → agent gets exactly the tools its registry permits.
coding agents without web search
That’s a sharp catch. There’s a potential gap here.
The MCP server explicitly skips server-side tools:
active_tools = [n for n in tool_names if n in TOOLS and n not in SERVER_TOOLS]
The comment says: “Server-side tools (e.g. web_search) are skipped — Claude handles those internally.”
But then the CLI invocation does:
–tools “”
So if –tools “” blanks out all of Claude’s native tools, and the MCP server skips web_search assuming Claude has it natively… then web search falls through both cracks. Neither side provides it.
Whether that’s a bug or intentional is hard to say without seeing what –tools “” actually does in the Claude CLI. It might only disable the built-in file/code tools (Read, Write, Edit, Bash) while leaving server-side capabilities like web search intact — those might be a separate category that can’t be toggled off. But I genuinely don’t know, and I’d be guessing.
It’s worth checking. If you have Claude Code available, claude –help on the –tools flag would clarify whether it affects server-side capabilities or just the local tool set. If it does kill web search, then any agent that needs it would silently lose that capability on the fallback path.
exec.sh
Good, now we can see the full picture.
On the –tools “” question — yes, they absolutely could use –allowedTools per agent instead. Look at how they already do it in plan mode:
PLAN_ALLOWED_TOOLS=(
"Read" "Glob" "Grep" "Bash"
"Edit(/checkpoint.md)"
"Write(/implementation-plan.md)"
...
)
That’s precise, per-agent scoping. But in the build fallback path they switch to –tools “” plus MCP for every agent. The reason is probably practical: they already have per-agent tool registries defined in Python (the AGENT_TOOLS dict and get_tools_for_agent function), so rather than duplicating that logic in bash as –allowedTools arrays for each agent, they blank the native tools and let the MCP server be the single source of truth for what each agent can do. One place to maintain it, not two.
It’s a reasonable engineering tradeoff — but you’re right that it’s blunt. An agent that legitimately needs web search loses it silently on this path.
On your second question — there it is:
build_mcp_config() {
...
cat > "$config_file" <<EOF
{
"mcpServers": {
"botference-tools": {
"command": "${py}",
"args": ["${BOTFERENCE_HOME}/core/fallback_agent_mcp.py", "${agent_name}"${extra_args}]
}
}
}
EOF
So the bash script never references fallback_agent_mcp.py directly — it generates a JSON config file that tells the Claude CLI to spawn it. The CLI reads the config and launches the MCP server as a subprocess. That’s why you couldn’t find the reference — it’s embedded inside a heredoc that writes to a temp file at runtime.
The build_claude_system_prompt function is also worth noting. It resolves the agent markdown (workspace first, then framework), prepends path context and file layout preambles, and concatenates everything. So the full prompt the model sees is: path preamble + file layout + agent markdown. The tools come separately via MCP. Clean separation.
bash vs Python
You’re right to push back. Bash is faster for what most of this script actually does — spawning processes, piping between them, checking files, managing PIDs, and orchestrating CLI tools. Python would add startup latency on every invocation, and in a loop that’s polling every 5 seconds and managing multiple background processes, that adds up.
And the logic behind staying in bash is probably more specific than just speed. Look at what this script is really doing: it’s gluing together claude, codex, git, python3, node, pdflatex, and its own MCP servers. Bash is the native language for process orchestration — backgrounding with &, wait, trap, PID management, signal handling. Doing the equivalent in Python means subprocess.Popen everywhere, which is more verbose and arguably no clearer.
The git worktree operations are also natural in bash. git worktree add, git merge, git diff — these are CLI commands. Wrapping them in Python’s subprocess.run() doesn’t improve anything, it just adds a layer.
The parts that shouldn’t be in bash are specifically:
- The inline python3 -c JSON parsing — that’s already Python, just badly located Python. It should be a proper module they import.
- The awk-based checkpoint merging — that’s genuinely fragile string manipulation that would benefit from a real parser
parse_orchestrator_dispatch— extracting structured data from model output is exactly what Python is for. So I was wrong to say “rewrite in Python.” The more honest recommendation is: keep the orchestration in bash, but extract the data manipulation into proper Python scripts that the bash calls, instead of scattering python3 -c one-liners throughout.
limitations with codex
Codex doesn’t have --allowedTools**.
There’s an open issue from October 2025 specifically requesting the ability to disable built-in tools for MCP-only execution, and it hasn’t been resolved. As one user put it: “It’s an all or nothing thing with them. You can give them full access to your system, but you cannot limit where they go.”  Codex’s permission model works at the sandbox level — filesystem write access, network access, approval policies — not at the individual tool level.
The closest Codex gets is sandbox_mode = "read-only" on subagents, and web_search = "disabled" in config.toml. But there’s no equivalent to Claude’s --allowedTools "Read" "Glob" "Grep" "Edit(/specific-path.md)". You can’t say “this agent gets file read but not shell execution.”
This explains why botference uses codex --full-auto when it hits the Codex interactive path — there’s nothing more granular available. And it’s a real gap for the unified agent directory idea.
On whether they could share .claude/agents/: The markdown agent definitions themselves are just prompts — both models can read and follow them. And Codex already reads .claude/ as a fallback location. Codex subagents are defined in config.toml with fields like sandbox_mode, model, model_reasoning_effort, and developer_instructions  — a different format, but the content maps cleanly to what’s in the markdown files.
The real blocker is exactly what we just found: tool scoping. A Claude agent definition that says “this agent only gets Read, Glob, Grep, and Write(/checkpoint.md)” can be enforced by Claude Code natively. The same constraint applied to Codex would need to go through the MCP path, because Codex has no native way to restrict its own tools at that granularity.
So the answer is: yes, they could share the same agent definitions in .claude/agents/ — and botference essentially proves this works — but only if you route Codex through the MCP server for tool enforcement, rather than using Codex’s native tools. Which is exactly what the fallback path does. The “fallback” is actually the more architecturally sound path for unified agents. It just needs to stop being treated as a fallback.
## I need to correct myself on all three points.
-
Codex reading
.claude/as fallback: That’s not in any code you gave me. I was conflating two things. From the earlier search results, Codex readsAGENTS.mdas its instruction file, and Claude Code readsAGENTS.mdas a fallback if noCLAUDE.mdexists. Some tools like OpenCode scan.claude/skills/for skill discovery. But Codex does not natively read.claude/agents/for agent definitions. Inbotference, the bash script reads.claude/agents/, extracts the prompt, and passes it to whichever model it’s dispatching to. The framework is doing the bridging, not Codex. -
config.tomlin the repo: No. Looking at the repo structure from theREADME, there’s no .codex/config.toml. They use context-budgets.json for per-agent model and effort configuration, and the bash libraries handle all the dispatch logic. Codex is invoked as a CLI tool, not configured as a persistent agent system. -
MCP for Codex in interactive mode: This is the real problem. Codex does support MCP servers — you can add them in config.toml or via codex mcp add. So you could attach botference’s MCP server to an interactive Codex session. But as we just established, you can’t disable Codex’s built-in tools. So even if you provide scoped tools via MCP, Codex still has its native file editing, shell execution, and web search available. The agent would see both the MCP tools and its own built-in tools, and there’s nothing stopping it from using the built-ins instead. That’s fundamentally different from the Claude path where –tools “” kills the native tools and –allowedTools gives granular control. For Codex in interactive mode, the only tool restriction is sandbox-level: read-only vs workspace-write vs full access, and network on or off. You can’t say “use only these MCP tools.”
So to answer your actual question: no, you couldn’t replicate plan mode’s precise tool scoping for Codex interactively. You’d have to rely on the agent prompt saying “only use these tools” and trust the model to comply — which is prompt-level enforcement, not system-level enforcement. A meaningful gap.
clarifications on codex
No on both counts.
context-budgets.json only does two things. Look at how it’s used in exec.sh:
model=$(jq -r –arg a “$agent_name” ‘.[$a].model // empty’ “$budgets_file”)
effort=$(jq -r –arg a “$agent_name” ‘.[$a].effort // empty’ “$budgets_file”)
That’s it — per-agent model selection and reasoning effort. Something like:
{ “coder”: { “model”: “claude-sonnet-4-6”, “effort”: “high” }, “scout”: { “model”: “gpt-5.4”, “effort”: “medium” }, “plan”: { “model”: “claude-opus-4-6” } }
It’s not a toml replacement. It doesn’t touch tools, permissions, sandbox settings, or anything else config.toml would handle. It’s just a routing table: which model runs which agent, and how hard it should think.
And in build mode, Codex tools are not limited. Look at the interactive build path again:
if command -v codex >/dev/null 2>&1; then
codex --model "$CLAUDE_MODEL" --full-auto "$PROMPT"
That’s raw --full-auto with no scoping at all. The only restriction is what the agent prompt asks it to do. Compare that to the Claude plan mode path with its explicit --allowedTools array — completely different level of enforcement.
So in this codebase, Claude agents get system-level tool boundaries. Codex agents get the honor system.