From Ideas Guy to Planager

A Group Chat for Planning with Claude Code and Codex

Since the latest batch of LLM kool-aid was delivered — especially the orange-y flavour of Claude Code — many have declared that that now belongs to the ideas guy. It’s a drink I have sampled, then unabatedly consumed to deeply develop a similar sentiment. I have now taken some distance from the intoxication of this apparatus that is sometimes a jukebox and at other times a slot machine to notice my withdrawals and work past it. While I have not been able to drain the swamp of ideas that my head continues to swim in, my brief abstinence of a weekend showed me the perils of the racing mind of an ideas guy on a walk who, in the midst of this machination, becomes a planager.

More projects have been started than have been finished because there is just one more thing I need to tweak before that digital thing which can make all the other digital things is perfectly tuned. Ah, then I shall have this orchestrator controlling its minions to conveniently do my bidding, triggered into action from my phone right before bedtime. Ah, the folly and false promises.

the more days I have spent copy/pasting plans between Claude Code and Codex to agentically peer review each other, the more I have come to accept that this is the era of planager; he who manages most plans, wins.

To win the planager cup, one must minimise the usage of the mouse; the inventors of Vim knew best.

Botference is a Terminal User Interface environment where I, a human, can chat with multiple LLMs in a “council”; or when I feel like it’s better to have the agents hash out the technicalities of a topic or implementation, I send them into a “caucus”. The project is constantly evolving; early on, the main stable outputs were for planning artifacts like implementation-plan.md and checkpoint.md. Botference chats are started in plan mode; a build mode can be run — once the planning artefacts are created — in either interactive or headless mode.

Botference isn’t an “assistant for the user,” but “a system for getting multiple models and a human to converge on what should be built.” Or if one likes LLM-ese SWE jargon, then it’s a planning/control surface. Its build mode can orchestrate parallel agents based on the Howler project’s research companion, I now find value is in structured deliberation, model disagreement, convergence, and artifact generation before execution.  

architecture (Old and maybe incorrect as I edit this piece; but there might be useful bits)

The markdown files in .claude/agents/ are just prompts — they define the agent’s persona, instructions, and constraints. They don’t do tool calling. The actual tool-calling orchestration lives in the Python layer: botference_agent.py owns the loop: it loads an agent’s tool registry, sends the prompt to the API, receives tool-call requests back, executes them, returns results, and repeats. The markdown file just gets injected as the system prompt.

fallback_agent_mcp.py does the same thing but inverted — instead of owning the loop, it exposes the tool registry as an MCP server and lets claude -p (the CLI) drive the loop. Same tools, same boundaries, different thing running the cycle.

You’re right, my three-layer model was wrong. There are actually four layers, and bash is one of them: 1. Agent definition (markdown) — who the agent is 2. Tool registry (Python) — what tools each agent gets 3. Inner loop (Python — botference_agent.py) — the tool-calling loop: send prompt → receive tool call → execute → return result → repeat 4. Outer loop (Bash — botference) — the iteration loop: detect agent → dispatch to the right model/path → monitor context → handle retries/circuit breakers → merge results → advance to next task

The bash script doesn’t mediate between model and tools — that’s what botference_agent.py and the MCP fallback do. Bash mediates between iterations. It’s the supervisor, not the executor. On your second question — the connection between agent name and tools happens through a chain that starts in the plan file. The bash script does:

Philosophy

Botference has two main modes:

  • Plan mode (botference plan at the terminal) launches both Claude Code and Codex in an interactive TUI session simultaneously — the “council” session, if you will. This is intended to generate the implementation-plan.md via agentic peer review first and my steering after.
  • Build mode basically implements the plan in headless mode (botference build -p) and allows using either Claude Code or Codex.

Other users might wish to add other models that they work with; for now, I do not currently intend to bring in more models in plan mode because I’d like to be building more things using botference. In fact, one of botference’s early achievements is going to be demonstrated with the “codewalk” below; if you scroll thru the any of the code files below, you will see annotated explanations of the code. I think this is going to be a practice I adopt — and will encourage others to do — moving ahead so I can make LLM code more legible to myself as there are many cases, particularly in research, where not reading the code is simply not advisable. The code IS the artifact that needs to be examined where no visual or interactive testing is possible; it is why I do not think that research automation is not on the horizon any time soon in non-data-centric fields. AI-human teams mightI am tempted to use “will”, but I want to be measured. end up doing amazing things.

botference.sh
#!/usr/bin/env bash
set -euo pipefail
 
# ── Bootstrap ────────────────────────────────────────────────
# botference must locate the framework root before any abstraction exists.
# This is the one intentionally hardcoded path resolution in the system.
if [ -z "${BOTFERENCE_HOME:-}" ]; then
BOTFERENCE_HOME="$(cd "$(dirname "$0")" && pwd)"
fi
if [ ! -f "${BOTFERENCE_HOME}/core/botference_agent.py" ]; then
echo "Error: BOTFERENCE_HOME (${BOTFERENCE_HOME}) does not contain core/botference_agent.py" >&2
exit 1
fi
export BOTFERENCE_HOME
 
BOTFERENCE_PROJECT_ROOT="$(pwd -P)"
export BOTFERENCE_PROJECT_ROOT
 
source "${BOTFERENCE_HOME}/lib/config.sh"
source "${BOTFERENCE_HOME}/lib/detect.sh"
source "${BOTFERENCE_HOME}/lib/monitor.sh"
source "${BOTFERENCE_HOME}/lib/post-run.sh"
source "${BOTFERENCE_HOME}/lib/exec.sh"
 
parse_loop_args "$@"
export BOTFERENCE_ACTIVE_MODE="$LOOP_MODE"
 
if $SHOW_HELP; then
show_help
exit 0
fi
if [ "$LOOP_MODE" = "init" ]; then
python3 "${BOTFERENCE_HOME}/scripts/init_project.py" --profile "$INIT_PROFILE"
exit 0
fi
init_botference_paths
 
if ! validate_project_agents; then
exit 1
fi
if ! mode_is_allowed "$LOOP_MODE"; then
echo "Error: $LOOP_MODE is disabled by ${BOTFERENCE_PROJECT_CONFIG_FILE}." >&2
exit 1
fi
if [ -n "$CLI_MODEL" ]; then
export ANTHROPIC_MODEL="$CLI_MODEL"
fi
if $PIPE_MODE && { [ "$LOOP_MODE" = "plan" ] || [ "$LOOP_MODE" = "research-plan" ]; }; then
echo "Error: $LOOP_MODE mode is interactive only — remove the -p flag."
exit 1
fi
ARCH_MODE=$(resolve_arch_mode_from_plan "$ARCH_MODE" "$BOTFERENCE_PLAN_FILE")
export ARCH_MODE
 
if [ -n "$PROMPT_FILE" ]; then
PROMPT_FILE="${BOTFERENCE_HOME}/${PROMPT_FILE}"
if [ ! -f "$PROMPT_FILE" ]; then
echo "Error: $PROMPT_FILE not found"
exit 1
fi
fi
if [ "$LOOP_MODE" = "archive" ]; then
bash "${BOTFERENCE_HOME}/scripts/archive.sh"
exit 0
fi
CONTEXT_THRESHOLD=45 # default for <1M windows; overridden to 20 for 1M windows below
CTX_FILE="$BOTFERENCE_RUN/context-pct"
YIELD_FILE="$BOTFERENCE_RUN/yield"
BUDGET_FILE="$BOTFERENCE_RUN/budget-info"
PLAN_AUDIT_FILE="$BOTFERENCE_RUN/plan-audit-failed"
POLL_INTERVAL=5
BACKOFF=60
USAGE_LOG="$BOTFERENCE_LOGS_DIR/usage.jsonl"
AGENT_MAX_RETRIES=3
AGENT_RETRY_DELAYS=(5 15 45)
CB_FILE="$BOTFERENCE_RUN/circuit-breaker"
CB_THRESHOLD=5
CB_CONSECUTIVE_FAILURES=0
COUNTER_FILE="$BOTFERENCE_COUNTER_FILE"
HEARTBEAT_INTERVAL=90
 
CLAUDE_PID=""
MONITOR_PID=""
JSONL_MONITOR_PID=""
LAST_CTRL_C=0
 
restore_circuit_breaker_state
restore_iteration_counter
 
ensure_ink_ui_dist() {
local ink_dir="${BOTFERENCE_HOME}/ink-ui"
local dist_bin="${ink_dir}/dist/bin.js"
local install_cmd="cd ink-ui && npm install"
local rebuild=false
local src
 
if [ ! -f "$dist_bin" ]; then
rebuild=true
else
for src in \
"$ink_dir/build.mjs" \
"$ink_dir/package.json" \
"$ink_dir/package-lock.json"
do
if [ "$src" -nt "$dist_bin" ]; then
rebuild=true
break
fi
done
if ! $rebuild; then
while IFS= read -r src; do
if [ "$src" -nt "$dist_bin" ]; then
rebuild=true
break
fi
done < <(find "$ink_dir/src" -type f)
fi
fi
if $rebuild; then
if ! command -v node >/dev/null 2>&1 || ! command -v npm >/dev/null 2>&1; then
echo "Error: Ink UI requires Node.js and npm." >&2
echo "Run this once after cloning:" >&2
echo " ${install_cmd}" >&2
exit 1
fi
if [ ! -d "$ink_dir/node_modules" ] || [ ! -e "$ink_dir/node_modules/esbuild" ]; then
echo "Error: Ink UI dependencies are not installed." >&2
echo "Run this once after cloning:" >&2
echo " ${install_cmd}" >&2
if [ -f "$ink_dir/package-lock.json" ]; then
echo "If you want the lockfile-pinned install instead, run:" >&2
echo " cd ink-ui && npm ci" >&2
fi
exit 1
fi
echo " Building Ink UI"
(
cd "$ink_dir"
node build.mjs
)
fi
}
 
BUILD_AUDIT_SNAPSHOT=""
BUILD_AUDIT_ALLOWED=""
BUILD_AUDIT_VIOLATIONS=""
 
begin_build_audit() {
if [ -d "${BOTFERENCE_PROJECT_DIR:-}" ]; then
BUILD_AUDIT_SNAPSHOT=$(mktemp)
BUILD_AUDIT_ALLOWED=$(mktemp)
BUILD_AUDIT_VIOLATIONS=$(mktemp)
plan_write_state_snapshot "$BUILD_AUDIT_SNAPSHOT"
fi
}
 
cleanup_build_audit() {
rm -f "${BUILD_AUDIT_SNAPSHOT:-}" "${BUILD_AUDIT_ALLOWED:-}" "${BUILD_AUDIT_VIOLATIONS:-}"
BUILD_AUDIT_SNAPSHOT=""
BUILD_AUDIT_ALLOWED=""
BUILD_AUDIT_VIOLATIONS=""
}
 
enforce_build_audit() {
[ -n "${BUILD_AUDIT_SNAPSHOT:-}" ] || return 0
 
if ! audit_mode_changed_files "build" "$BUILD_AUDIT_SNAPSHOT" "$BUILD_AUDIT_ALLOWED" "$BUILD_AUDIT_VIOLATIONS"; then
echo ""
echo "✗ Build audit failed — unauthorized files changed:"
sed 's/^/ - /' "$BUILD_AUDIT_VIOLATIONS"
cleanup_build_audit
return 1
fi
cleanup_build_audit
return 0
}
 
trap 'handle_interrupt_signal' INT
 
if ! is_interactive_plan_mode; then
print_loop_banner
fi
 
# --- Pre-loop plan validation (safety net) ---
# Planner commit gates are the primary enforcement point for TDD structure.
# This build-start check is a safety net: fail fast before wasting an iteration
# on a plan that would fail commit gates anyway.
if [ -f "$BOTFERENCE_PLAN_FILE" ]; then
if ! validate_plan_tdd_structure "$BOTFERENCE_PLAN_FILE"; then
echo "✗ Plan validation failed — fix TDD task structure before running build."
exit 1
fi
fi
LOOP_EXIT_CODE=0
while true; do
# --- Circuit breaker check ---
if cb_is_open; then
echo ""
echo "╔══════════════════════════════════════════════════════════╗"
echo "║ CIRCUIT BREAKER OPEN — $CB_CONSECUTIVE_FAILURES consecutive failures"
echo "║ Halting to avoid wasting tokens. ║"
echo "╠══════════════════════════════════════════════════════════╣"
echo "║ To resume: ║"
echo "║ rm $CB_FILE && botference -p ║"
echo "║ Or investigate logs/usage.jsonl for error patterns. ║"
echo "╚══════════════════════════════════════════════════════════╝"
break
fi
if ! is_interactive_plan_mode; then
echo "=== Iteration $((ITERATION + 1)) ==="
fi
rm -f "$CTX_FILE" "$YIELD_FILE" "$BUDGET_FILE"
if ! is_interactive_plan_mode; then
sleep 3 # let any dying statusline process finish writing, then clear again
rm -f "$CTX_FILE"
fi
 
# --- Pre-iteration gate check ---
if [ -f "$BOTFERENCE_REVIEW_FILE" ]; then
_gate_template="${BOTFERENCE_HOME}/templates/HUMAN_REVIEW_NEEDED.md"
if ! diff -q "$BOTFERENCE_REVIEW_FILE" "$_gate_template" >/dev/null 2>&1; then
echo ""
echo "╔══════════════════════════════════════════════════════════╗"
echo "║ HUMAN REVIEW STILL PENDING ║"
echo "╚══════════════════════════════════════════════════════════╝"
echo ""
cat "$BOTFERENCE_REVIEW_FILE"
echo ""
echo "To continue: review above, then:"
echo " rm $BOTFERENCE_REVIEW_FILE && botference -p"
break
fi
fi
ITER_START=$(date +%s)
IGNORE_UNTIL=$(( ITER_START + 15 )) # ignore context readings for first 15s (stale cache)
 
# --- Reflection trigger (every 5th iteration) ---
rm -f "$BOTFERENCE_RUN/reflect"
if [ "$ITERATION" -gt 0 ] && [ $(( ITERATION % 5 )) -eq 0 ]; then
touch "$BOTFERENCE_RUN/reflect"
echo " Reflection iteration (mod 5)"
fi
 
# --- Detect thread and agent ---
CURRENT_THREAD=$(extract_thread)
CURRENT_AGENT=$(detect_agent_from_checkpoint "$BOTFERENCE_CHECKPOINT_FILE" "$BOTFERENCE_PLAN_FILE")
if [ "$LOOP_MODE" = "build" ]; then
cleanup_build_audit
begin_build_audit
fi
 
# --- Plan mode: one-shot interactive session, early exit ---
if [ "$LOOP_MODE" = "plan" ] || [ "$LOOP_MODE" = "research-plan" ]; then
CURRENT_AGENT="${CURRENT_AGENT:-plan}"
 
# --- Botference mode: Claude + Codex TUI ---
if $BOTFERENCE_MODE; then
CLAUDE_MODEL_RESOLVED=$(resolve_model "plan")
resolve_model_and_effort "$CLAUDE_MODEL_RESOLVED" "plan"
OPENAI_MODEL="${OPENAI_MODEL:-gpt-5.4}"
OPENAI_REASONING_EFFORT="${OPENAI_REASONING_EFFORT:-high}"
 
if [ -n "$PROMPT_FILE" ]; then
PROMPT=$(cat "$PROMPT_FILE")
else
PROMPT=""
fi
if [ -s "$BOTFERENCE_INBOX_FILE" ]; then
echo " 📬 Absorbing operator notes from inbox.md"
PROMPT="[Operator notes]"$'\n'"$(cat "$BOTFERENCE_INBOX_FILE")"$'\n\n'"$PROMPT"
: > "$BOTFERENCE_INBOX_FILE"
fi
if [ "$LOOP_MODE" = "research-plan" ]; then
PLAN_AGENT_PATH=$(resolve_agent_path "plan")
PLAN_SYSTEM="$(cat "${PLAN_AGENT_PATH:-${BOTFERENCE_HOME}/.claude/agents/plan.md}")"
else
PLAN_SYSTEM=""
fi
PLAN_SNAPSHOT=$(mktemp); PLAN_ALLOWED=$(mktemp); PLAN_VIOLATIONS=$(mktemp)
plan_write_state_snapshot "$PLAN_SNAPSHOT" "plan"
 
# Build debug-panes flag correctly (avoid passing "false" as truthy)
DEBUG_FLAG=""
if $DEBUG_PANES; then
DEBUG_FLAG="--debug-panes"
fi
 
# Load API keys from .env if not already in environment
if [ -f "${BOTFERENCE_HOME}/.env" ]; then
if [ -z "${OPENAI_API_KEY:-}" ]; then
_val=$(grep -m1 '^OPENAI_API_KEY=' "${BOTFERENCE_HOME}/.env" 2>/dev/null | cut -d= -f2 | tr -d "'" | tr -d '"' || true)
[ -n "${_val:-}" ] && export OPENAI_API_KEY="$_val"
unset _val
fi
if [ -z "${ANTHROPIC_API_KEY:-}" ]; then
_val=$(grep -m1 '^ANTHROPIC_API_KEY=' "${BOTFERENCE_HOME}/.env" 2>/dev/null | cut -d= -f2 | tr -d "'" | tr -d '"' || true)
[ -n "${_val:-}" ] && export ANTHROPIC_API_KEY="$_val"
unset _val
fi
fi
 
# Ensure codex has API key auth if available
if [ -n "${OPENAI_API_KEY:-}" ]; then
echo "$OPENAI_API_KEY" | codex login --with-api-key 2>/dev/null || true
fi
echo "Launching Botference Council - Claude=$CLI_MODEL Codex=$OPENAI_MODEL${EFFORT_FLAG:+ effort=${EFFORT_FLAG#--effort }}${OPENAI_REASONING_EFFORT:+ openai-effort=$OPENAI_REASONING_EFFORT}${DEBUG_FLAG:+ debug=on} ui=$UI_MODE"
if [ "$UI_MODE" = "ink" ]; then
ensure_ink_ui_dist
# Pass large strings via temp files to avoid arg-length/escaping issues
_ink_sys=$(mktemp); _ink_task=$(mktemp)
printf '%s' "$PLAN_SYSTEM" > "$_ink_sys"
printf '%s' "$PROMPT" > "$_ink_task"
node "${BOTFERENCE_HOME}/ink-ui/dist/bin.js" \
--anthropic-model "$CLI_MODEL" \
--openai-model "$OPENAI_MODEL" \
--openai-effort "$OPENAI_REASONING_EFFORT" \
${EFFORT_FLAG:+--claude-effort ${EFFORT_FLAG#--effort }} \
--system-prompt-file "$_ink_sys" \
--task-file "$_ink_task" \
$DEBUG_FLAG
rm -f "$_ink_sys" "$_ink_task"
else
python3 "${BOTFERENCE_HOME}/core/botference.py" \
--anthropic-model "$CLI_MODEL" \
--openai-model "$OPENAI_MODEL" \
--openai-effort "$OPENAI_REASONING_EFFORT" \
${EFFORT_FLAG:+--claude-effort ${EFFORT_FLAG#--effort }} \
--system-prompt "$PLAN_SYSTEM" \
--task "$PROMPT" \
$DEBUG_FLAG
fi
EXIT_CODE=$?
 
if ! plan_audit_changed_files "$PLAN_SNAPSHOT" "$PLAN_ALLOWED" "$PLAN_VIOLATIONS"; then
echo ""
echo "✗ Plan audit failed — unauthorized files changed:"
sed 's/^/ - /' "$PLAN_VIOLATIONS"
echo " Build is blocked until these changes are resolved."
EXIT_CODE=1
elif [ "$EXIT_CODE" -eq 0 ]; then
if ! plan_commit_and_push_changes "$PLAN_ALLOWED"; then
EXIT_CODE=1
fi
fi
rm -f "$PLAN_SNAPSHOT" "$PLAN_ALLOWED" "$PLAN_VIOLATIONS"
 
if [ "$EXIT_CODE" -eq 0 ]; then
echo ""
echo "=== Council session complete. Run 'build' to start executing. ==="
fi
break
fi
 
# Build prompt
if [ -n "$PROMPT_FILE" ]; then
PROMPT=$(cat "$PROMPT_FILE")
else
PROMPT=""
fi
if [ -s "$BOTFERENCE_INBOX_FILE" ]; then
echo " 📬 Absorbing operator notes from inbox.md"
PROMPT="## Operator Notes (read and act on these first)"$'\n\n'"$(cat "$BOTFERENCE_INBOX_FILE")"$'\n\n'"$PROMPT"
: > "$BOTFERENCE_INBOX_FILE"
fi
CLAUDE_MODEL=$(resolve_model "$CURRENT_AGENT")
 
# Start context monitor
monitor_context "$$" "$ITER_START" "$IGNORE_UNTIL" "$CURRENT_AGENT" &
MONITOR_PID=$!
 
# Archive check: if all tasks done, ask user before launching plan agent
CHECKED=$(grep -c '^\- \[x\]' "$BOTFERENCE_PLAN_FILE" 2>/dev/null) || CHECKED=0
UNCHECKED=$(grep -c '^\- \[ \]' "$BOTFERENCE_PLAN_FILE" 2>/dev/null) || UNCHECKED=0
if [ "$CHECKED" -gt 0 ] && [ "$UNCHECKED" -eq 0 ]; then
echo " All tasks in implementation-plan.md are complete."
read -r -p " Archive this thread and start fresh? (y/n): " answer < /dev/tty
if [[ "$answer" =~ ^[Yy] ]]; then
bash "${BOTFERENCE_HOME}/scripts/archive.sh"
echo " Archived. Starting fresh."
fi
fi
PLAN_SNAPSHOT=$(mktemp)
PLAN_ALLOWED=$(mktemp)
PLAN_VIOLATIONS=$(mktemp)
plan_write_state_snapshot "$PLAN_SNAPSHOT" "plan"
 
# Run plan agent via claude CLI
if [ "$LOOP_MODE" = "research-plan" ]; then
PLAN_AGENT_PATH=$(resolve_agent_path "plan")
PLAN_SYSTEM="$(cat "${PLAN_AGENT_PATH:-${BOTFERENCE_HOME}/.claude/agents/plan.md}")"
else
PLAN_SYSTEM=""
fi
SYS_ARGS=()
if [ -n "$PLAN_SYSTEM" ]; then
SYS_ARGS=(--append-system-prompt "$PLAN_SYSTEM")
fi
PLAN_CLAUDE_SETTINGS=$(mktemp)
python3 - "$BOTFERENCE_PROJECT_ROOT" "$BOTFERENCE_WORK_DIR" "$PLAN_CLAUDE_SETTINGS" <<'PY'
import json
import sys
from pathlib import Path
project_root = Path(sys.argv[1]).resolve()
work_dir = Path(sys.argv[2]).resolve()
out_path = Path(sys.argv[3])
config_name = os.environ.get("BOTFERENCE_PROJECT_DIR_NAME", "botference")
project_config = project_root / config_name / "project.json"
raw_roots = os.environ.get("BOTFERENCE_PLAN_EXTRA_WRITE_ROOTS", "").strip()
roots = []
if raw_roots:
for root in raw_roots.split(","):
root = root.strip().strip("/")
if root:
roots.append((project_root / root).resolve())
elif not project_config.exists():
roots = [work_dir]
allow = ["Read", "Glob", "Grep", "Bash", "WebSearch", "WebFetch"]
seen = set()
for root in roots:
root_abs = root.as_posix().lstrip("/")
if root_abs in seen:
continue
seen.add(root_abs)
allow.extend([
f"Edit(//{root_abs})",
f"Edit(//{root_abs}/*)",
f"Edit(//{root_abs}/**)",
])
settings = {
"permissions": {
"defaultMode": "dontAsk",
"allow": allow,
},
"sandbox": {
"enabled": True,
"allowUnsandboxedCommands": False,
},
}
out_path.write_text(json.dumps(settings))
PY
PLAN_CLAUDE_CWD="$BOTFERENCE_PROJECT_ROOT"
CLAUDE_DIR_ARGS=()
if [ -n "${BOTFERENCE_PLAN_EXTRA_WRITE_ROOTS:-}" ]; then
IFS=',' read -r -a _plan_roots <<< "$BOTFERENCE_PLAN_EXTRA_WRITE_ROOTS"
first_root=""
for _raw_root in "${_plan_roots[@]}"; do
_clean_root="${_raw_root#/}"
_clean_root="${_clean_root%/}"
[ -n "$_clean_root" ] || continue
_abs_root="$BOTFERENCE_PROJECT_ROOT/$_clean_root"
if [ -z "$first_root" ]; then
first_root="$_abs_root"
PLAN_CLAUDE_CWD="$_abs_root"
else
CLAUDE_DIR_ARGS+=(--add-dir "$_abs_root")
fi
done
if [ "$PLAN_CLAUDE_CWD" != "$BOTFERENCE_PROJECT_ROOT" ]; then
CLAUDE_DIR_ARGS+=(--add-dir "$BOTFERENCE_PROJECT_ROOT")
fi
elif [ ! -f "$BOTFERENCE_PROJECT_CONFIG_FILE" ] && [ "$BOTFERENCE_WORK_DIR" != "$BOTFERENCE_PROJECT_ROOT" ]; then
PLAN_CLAUDE_CWD="$BOTFERENCE_WORK_DIR"
CLAUDE_DIR_ARGS=(--add-dir "$BOTFERENCE_PROJECT_ROOT")
fi
SESSION_ID=$(uuidgen | tr '[:upper:]' '[:lower:]')
resolve_model_and_effort "$CLAUDE_MODEL" "plan"
echo " Model: $CLI_MODEL ($LOOP_MODE — claude CLI${EFFORT_FLAG:+, effort: ${EFFORT_FLAG#--effort }})"
if [ -n "$PROMPT" ]; then
(
cd "$PLAN_CLAUDE_CWD"
echo "$PROMPT" | claude --model "$CLI_MODEL" \
$EFFORT_FLAG \
"${SYS_ARGS[@]}" \
--session-id "$SESSION_ID" \
--name "${CURRENT_THREAD:-botference-plan}" \
--settings "$PLAN_CLAUDE_SETTINGS" \
"${CLAUDE_DIR_ARGS[@]}"
)
 
else
(
cd "$PLAN_CLAUDE_CWD"
claude --model "$CLI_MODEL" \
$EFFORT_FLAG \
"${SYS_ARGS[@]}" \
--session-id "$SESSION_ID" \
--name "${CURRENT_THREAD:-botference-plan}" \
--settings "$PLAN_CLAUDE_SETTINGS" \
"${CLAUDE_DIR_ARGS[@]}"
)
 
fi
EXIT_CODE=$?
rm -f "$PLAN_CLAUDE_SETTINGS"
 
cleanup_pid "$MONITOR_PID"; MONITOR_PID=""
 
# Log interactive session usage
log_interactive_session "$SESSION_ID" "$LOOP_MODE"
 
if ! plan_audit_changed_files "$PLAN_SNAPSHOT" "$PLAN_ALLOWED" "$PLAN_VIOLATIONS"; then
echo ""
echo "✗ Plan audit failed — unauthorized files changed:"
sed 's/^/ - /' "$PLAN_VIOLATIONS"
echo " Build is blocked until these changes are resolved."
EXIT_CODE=1
elif [ "$EXIT_CODE" -eq 0 ]; then
if ! plan_commit_and_push_changes "$PLAN_ALLOWED"; then
EXIT_CODE=1
fi
fi
rm -f "$PLAN_SNAPSHOT" "$PLAN_ALLOWED" "$PLAN_VIOLATIONS"
 
# Plan mode is one-shot — exit after this session
if [ "$EXIT_CODE" -eq 0 ]; then
echo ""
echo "=== Planning session complete. Run 'build' to start executing. ==="
fi
break
fi
 
# --- Build mode: increment iteration counter ---
ITERATION=$((ITERATION + 1))
echo "$ITERATION" > "$COUNTER_FILE"
 
if [ -f "$PLAN_AUDIT_FILE" ]; then
if DIRTY_VIOLATIONS=$(plan_violation_paths_still_dirty); then
echo ""
echo "✗ Build blocked — unresolved plan-mode file violations remain:"
printf '%s\n' "$DIRTY_VIOLATIONS" | sed 's/^/ - /'
echo " Resolve or discard those changes, then rerun plan/build."
break
fi
rm -f "$PLAN_AUDIT_FILE"
fi
 
# --- Build mode: detect agent ---
if [ -n "$CURRENT_AGENT" ] && [ "$CURRENT_AGENT" != "" ]; then
AGENT_PATH=$(resolve_agent_path "$CURRENT_AGENT")
if [ -z "$AGENT_PATH" ]; then
# Checkpoint has a bad Next Task (agent wrote prose instead of a task line).
# Fall back to the implementation plan's first unchecked task.
echo " ⚠ Agent '$CURRENT_AGENT' not found — falling back to implementation plan"
CURRENT_AGENT=$(detect_agent_from_checkpoint /dev/null "$BOTFERENCE_PLAN_FILE")
if [ -n "$CURRENT_AGENT" ]; then
AGENT_PATH=$(resolve_agent_path "$CURRENT_AGENT")
fi
if [ -z "$AGENT_PATH" ]; then
echo " Could not resolve agent from plan either. Skipping."
sleep 5
continue
fi
# Fix the checkpoint so this doesn't repeat
plan_next=$(grep '^\- \[ \]' "$BOTFERENCE_PLAN_FILE" 2>/dev/null \
| grep -v '<task description>\|<agent>' \
| head -1 | sed 's/^- \[ \] //')
if [ -n "$plan_next" ]; then
awk -v task="$plan_next" '
/^## Next Task/ { print; print ""; print task; skip=1; next }
skip && /^## / { skip=0 }
!skip { print }
' "$BOTFERENCE_CHECKPOINT_FILE" > "$BOTFERENCE_CHECKPOINT_FILE.tmp" && mv "$BOTFERENCE_CHECKPOINT_FILE.tmp" "$BOTFERENCE_CHECKPOINT_FILE"
echo " Fixed checkpoint Next Task → $CURRENT_AGENT"
fi
fi
echo " Agent detected: $CURRENT_AGENT (${AGENT_PATH})"
else
# Check if this is a completed plan (all tasks checked off) vs empty template
CHECKED=$(grep -c '^\- \[x\]' "$BOTFERENCE_PLAN_FILE" 2>/dev/null) || CHECKED=0
UNCHECKED=$(grep -c '^\- \[ \]' "$BOTFERENCE_PLAN_FILE" 2>/dev/null) || UNCHECKED=0
if [ "$CHECKED" -gt 0 ] && [ "$UNCHECKED" -eq 0 ]; then
echo ""
echo "╔══════════════════════════════════════════════════════════╗"
echo "║ BUILD COMPLETE — all tasks done ║"
echo "╠══════════════════════════════════════════════════════════╣"
echo "║ To archive this thread, run: ║"
echo "║ bash scripts/archive.sh ║"
echo "║ ║"
echo "║ This will move to ${BOTFERENCE_ARCHIVE_DIR}/<date>_<thread>/: ║"
echo "║ checkpoint.md, implementation-plan.md, ║"
echo "║ ai-generated-outputs/<thread>/, reflections/, ║"
echo "║ ${BOTFERENCE_CHANGELOG_FILE##*/}, inbox.md ║"
echo "║ and restore blank templates. ║"
echo "╚══════════════════════════════════════════════════════════╝"
 
# Auto-compile LaTeX if main.tex exists
if [ -f "main.tex" ]; then
echo "║ Compiling LaTeX..."
compile_result=$(pdflatex -interaction=nonstopmode main.tex 2>&1 && \
bibtex main 2>&1 && \
pdflatex -interaction=nonstopmode main.tex 2>&1 && \
pdflatex -interaction=nonstopmode main.tex 2>&1)
if [ -f "main.pdf" ]; then
echo "║ PDF generated: main.pdf"
else
echo "║ WARNING: LaTeX compilation failed"
echo "$compile_result" | grep "^!" | head -5
fi
fi
else
echo " No task found in checkpoint.md — nothing to do."
echo " Run 'botference plan' to plan next steps,"
echo " or 'bash scripts/archive.sh' to archive."
fi
break
fi

The connection between agent name and tools happens through a chain that starts in the plan file. The bash script does

This is in detect.sh, which you sourced but didn’t share. But the implementation plan presumably has tasks formatted something like:

- [ ] 1.1 Research prior work — **scout**
- [ ] 1.2 Write module — **coder**
- [ ] 2.1 Review draft — **critic**

That detect_agent_from_checkpoint extracts bold word is the agent name. Then everything fans out from that single string:

  • Prompt: resolve_agent_path "$CURRENT_AGENT" → .claude/agents/coder.md
  • Model: resolve_model "$CURRENT_AGENT" → looks up context-budgets.json
  • Tools: get_tools_for_agent(agent_name) in Python → checks AGENT_TOOLS dict first, then parses ## Tools section from the agent’s markdown

This is where plan mode launches launches both Claude and Codex into the TUI simultaneously for a “council” session.

Build mode is where the model-swapping happens. The script calls resolve_model per agent, and then branches:

if is_openai_model "$CLAUDE_MODEL"; then
    # try codex CLI, fall back to botference_agent.py
elif is_anthropic_model "$CLAUDE_MODEL" && ! has_anthropic_api_key; then
    # MCP fallback: claude -p with botference's tools as MCP server
else
    # direct API via botference_agent.py

So it’s three execution paths, but the agent definition and tool scoping are shared across all of them. The same build_claude_system_prompt and build_mcp_config functions construct the context regardless of which model runs.

What’s interesting is the asymmetry that still exists. When it uses the Codex CLI interactively, it passes the prompt directly to codex --full-auto — which means Codex uses its own native tools, not botference’s scoped tool registry. But when it uses botference_agent.py for Codex, it does go through the shared registry. So the unification is incomplete: the CLI paths for each model still diverge in what tools are available.

The MCP fallback path is the closest thing here to the unified architecture you’re proposing. It takes any agent definition, exposes the right tools via MCP, and lets the CLI drive. The gap is that this only kicks in when there’s no API key. If you generalized that path to be the default — always run through MCP regardless of auth method — you’d have the model-agnostic agent directory you’re describing. The model becomes a flag, the agent definition stays the same, and the tool surface is identical. The other thing worth noting: resolve_model per agent means the plan file or config can assign different models to different agents. So you could have your coder agent on Claude and your reviewer on Codex within the same build loop. That’s already closer to “unified agent directory, model is just a parameter” than most setups.​​​​​​​​​​​​​​​​

mcp loading in it is via exec.sh

Yes — look at the fallback path inside the pipe mode block:

if $USE_CLAUDE_FALLBACK; then
    MCP_CONFIG=$(build_mcp_config "$CURRENT_AGENT")
    resolve_model_and_effort "$CLAUDE_MODEL" "$CURRENT_AGENT"
    echo "$PROMPT" | claude --model "$CLI_MODEL" \
      $EFFORT_FLAG \
      --tools "" \
      --mcp-config "$MCP_CONFIG" \
      --append-system-prompt "$AGENT_SYSTEM_PROMPT" \
      --output-format stream-json \
      --verbose \
      --dangerously-skip-permissions \
      | python3 "${BOTFERENCE_HOME}/lib/stream-filter.py" ...

There’s no literal -p flag, but piping the prompt via echo "$PROMPT" | Claude … with --output-format stream-json achieves the same thing — non-interactive execution. The --mcp-config points to whatever build_mcp_config generates for that agent, which presumably writes a JSON config that launches fallback_agent_mcp.py as a stdio MCP server.

The --tools "" is notable too — it blanks out Claude’s native tools so the only tools available are the ones coming through the MCP server. That’s how it enforces the per-agent tool scoping even though Claude CLI is driving the loop instead of botference_agent.py. So the chain is: bash script → build_mcp_config for the agent → claude CLI reads that config → launches fallback_agent_mcp.py as MCP server → agent gets exactly the tools its registry permits.​​​​​​​​​​​​​​​​

That’s a sharp catch. There’s a potential gap here. The MCP server explicitly skips server-side tools:

active_tools = [n for n in tool_names if n in TOOLS and n not in SERVER_TOOLS]

The comment says: “Server-side tools (e.g. web_search) are skipped — Claude handles those internally.” But then the CLI invocation does:

–tools “”

So if –tools “” blanks out all of Claude’s native tools, and the MCP server skips web_search assuming Claude has it natively… then web search falls through both cracks. Neither side provides it. Whether that’s a bug or intentional is hard to say without seeing what –tools “” actually does in the Claude CLI. It might only disable the built-in file/code tools (Read, Write, Edit, Bash) while leaving server-side capabilities like web search intact — those might be a separate category that can’t be toggled off. But I genuinely don’t know, and I’d be guessing. It’s worth checking. If you have Claude Code available, claude –help on the –tools flag would clarify whether it affects server-side capabilities or just the local tool set. If it does kill web search, then any agent that needs it would silently lose that capability on the fallback path.​​​​​​​​​​​​​​​​

exec.sh

Good, now we can see the full picture. On the –tools “” question — yes, they absolutely could use –allowedTools per agent instead. Look at how they already do it in plan mode:

PLAN_ALLOWED_TOOLS=(
    "Read" "Glob" "Grep" "Bash"
    "Edit(/checkpoint.md)"
    "Write(/implementation-plan.md)"
    ...
)

That’s precise, per-agent scoping. But in the build fallback path they switch to –tools “” plus MCP for every agent. The reason is probably practical: they already have per-agent tool registries defined in Python (the AGENT_TOOLS dict and get_tools_for_agent function), so rather than duplicating that logic in bash as –allowedTools arrays for each agent, they blank the native tools and let the MCP server be the single source of truth for what each agent can do. One place to maintain it, not two. It’s a reasonable engineering tradeoff — but you’re right that it’s blunt. An agent that legitimately needs web search loses it silently on this path. On your second question — there it is:

build_mcp_config() {
    ...
    cat > "$config_file" <<EOF
{
  "mcpServers": {
    "botference-tools": {
      "command": "${py}",
      "args": ["${BOTFERENCE_HOME}/core/fallback_agent_mcp.py", "${agent_name}"${extra_args}]
    }
  }
}
EOF

So the bash script never references fallback_agent_mcp.py directly — it generates a JSON config file that tells the Claude CLI to spawn it. The CLI reads the config and launches the MCP server as a subprocess. That’s why you couldn’t find the reference — it’s embedded inside a heredoc that writes to a temp file at runtime.

The build_claude_system_prompt function is also worth noting. It resolves the agent markdown (workspace first, then framework), prepends path context and file layout preambles, and concatenates everything. So the full prompt the model sees is: path preamble + file layout + agent markdown. The tools come separately via MCP. Clean separation.​​​​​​​​​​​​​​​​

How plan works

So the mapping isn’t filtering or word matching against the prompt content. It’s a lookup by name: the plan says “use coder,” the system finds coder.md, coder’s tool list, and coder’s model. The agent name is the join key across all four layers. There’s no clever matching or lookup table. It’s just parsing the last bold word from a task line. Look at extract_agent_from_task_block in detect.sh:

agent=$(printf "%s\n" "$task_block" \
    | grep -o '\*\*[^*][^*]*\*\*' \
    | tail -1 \
    | sed 's/\*\*//g' \
    | sed 's/[^a-zA-Z0-9_-]//g')

detect.sh
#!/usr/bin/env bash
 
extract_next_task_from_checkpoint() {
local checkpoint_path=$1
local next_task=""
 
next_task=$(grep -i '^\*\*Next Task\*\*:\|^Next Task:' "$checkpoint_path" 2>/dev/null \
| head -1 | sed 's/.*: *//' | sed 's/\*//g')
if [ -z "$next_task" ]; then
next_task=$(awk '/^## Next Task/{found=1; next} found && /[^ ]/{print; exit}' \
"$checkpoint_path" 2>/dev/null)
fi
echo "$next_task" | sed 's/([^)]*)//g; s/\*//g; s/^ *//; s/ *$//'
}
 
extract_first_unchecked_task_block() {
local plan_path=$1
awk '
/^- \[ \]/ {
if (in_block) exit
in_block=1
print
next
}
in_block {
if (/^- \[[ x]\]/ || /^###+? /) exit
print
}
' "$plan_path" 2>/dev/null
}
 
extract_agent_from_task_block() {
local task_block=$1
local agent=""
 
agent=$(printf "%s\n" "$task_block" \
| grep -o '\*\*[^*][^*]*\*\*' \
| tail -1 \
| sed 's/\*\*//g' \
| sed 's/[^a-zA-Z0-9_-]//g')
 
if [ -z "$agent" ]; then
agent=$(printf "%s\n" "$task_block" \
| tail -1 \
| sed 's/^ *//; s/ *$//' )
agent="${agent##* }"
agent=$(echo "$agent" | sed 's/[^a-zA-Z0-9_-]//g')
fi
echo "$agent"
}
 
detect_agent_from_checkpoint() {
local checkpoint_path=$1
local plan_path=${2:-}
local next_task=""
 
next_task=$(extract_next_task_from_checkpoint "$checkpoint_path")
 
# Determine if next_task is a terminal/non-task state.
# Structured task lines start with a digit or checkbox ("- [").
# Anything else is prose (e.g. "Thread ready for review") or an
# explicit terminal marker — fall through to the plan.
local is_terminal=false
case "$next_task" in
none*|None*|"<"*|""|[Aa]ll\ tasks\ complete*|*ready\ to\ archive*|[Ss][Tt][Aa][Gg][Ee]\ [Gg][Aa][Tt][Ee]*)
is_terminal=true
;;
*)
if ! echo "$next_task" | grep -qE '^[0-9]|^- \['; then
is_terminal=true
fi
;;
esac
 
if $is_terminal; then
if [ -n "$plan_path" ]; then
next_task=$(extract_first_unchecked_task_block "$plan_path")
if printf "%s\n" "$next_task" | grep -q '<task description>\|<agent>'; then
next_task=""
fi
else
next_task=""
fi
fi
if [ -z "$next_task" ]; then
echo ""
return
fi
local agent
agent=$(extract_agent_from_task_block "$next_task")
 
# Validate: extracted word must correspond to a real agent file.
# Prevents annotations (e.g. "TDD") from being mistaken for agents.
if [ -n "$agent" ]; then
local agent_file
agent_file=$(resolve_agent_path "$agent")
if [ -z "$agent_file" ]; then
echo ""
return
fi
fi
echo "$agent"
}
 
detect_current_phase() {
local plan_path=$1
local in_phase=""
while IFS= read -r line; do
if echo "$line" | grep -q '^## Phase'; then
in_phase="$line"
fi
if echo "$line" | grep -q '^\- \[ \]'; then
echo "$in_phase"
return
fi
done < "$plan_path"
echo ""
}
 
is_parallel_phase() {
local phase_line=$1
echo "$phase_line" | grep -qi '(parallel)'
}
 
collect_phase_tasks() {
local plan_path=$1
local target_phase=$2
local in_target=false
while IFS= read -r line; do
if echo "$line" | grep -q '^## Phase'; then
if [ "$line" = "$target_phase" ]; then
in_target=true
elif $in_target; then
break
fi
fi
if $in_target && echo "$line" | grep -q '^\- \[ \]'; then
local task_desc
task_desc=$(echo "$line" | sed 's/^- \[ \] [0-9]*\. *//' | sed 's/\*//g')
local agent_name="${task_desc##* }"
agent_name=$(echo "$agent_name" | sed 's/[^a-zA-Z0-9_-]//g')
echo "${agent_name}|${task_desc}"
fi
done < "$plan_path"
}
 
validate_phase_dependencies() {
# Checks if all (depends: N) dependencies in a parallel phase are satisfied ([x]).
# Returns 0 if safe to parallelize, 1 if any dependency is unsatisfied.
local plan_path=$1
local target_phase=$2
local violations=0
 
while IFS= read -r line; do
if ! echo "$line" | grep -q '^\- \[ \]'; then
continue
fi
# Extract depends annotation: (depends: 1,2,3)
local deps
deps=$(echo "$line" | grep -o '(depends: [0-9,]*)')
if [ -z "$deps" ]; then
continue
fi
# Extract the task number
local task_num
task_num=$(echo "$line" | sed 's/^- \[ \] \([0-9]*\)\..*/\1/')
# Parse dependency numbers
local dep_nums
dep_nums=$(echo "$deps" | sed 's/(depends: //; s/)//' | tr ',' ' ')
for dep in $dep_nums; do
# Check if dependency task is completed ([x])
if ! grep -q "^\- \[x\] ${dep}\." "$plan_path" 2>/dev/null; then
echo " ⚠ Task $task_num depends on uncompleted task $dep"
violations=$((violations + 1))
fi
done
done < <(collect_phase_tasks_raw "$plan_path" "$target_phase")
 
return $((violations > 0 ? 1 : 0))
}
 
collect_phase_tasks_raw() {
# Like collect_phase_tasks but returns raw lines (not agent|desc format)
local plan_path=$1
local target_phase=$2
local in_target=false
while IFS= read -r line; do
if echo "$line" | grep -q '^## Phase'; then
if [ "$line" = "$target_phase" ]; then
in_target=true
elif $in_target; then
break
fi
fi
if $in_target && echo "$line" | grep -q '^\- \[ \]'; then
echo "$line"
fi
done < "$plan_path"
}
 
resolve_agent_path() {
local agent_name=$1
local project_agent_path="${BOTFERENCE_PROJECT_AGENT_DIR}/${agent_name}.md"
local compat_path=".claude/agents/${agent_name}.md"
local framework_path="${BOTFERENCE_HOME}/.claude/agents/${agent_name}.md"
 
if reserved_agent_names | grep -qx "$agent_name" && ! project_agent_override_allowed "$agent_name"; then
if [ -f "$framework_path" ]; then
echo "$framework_path"
fi
return
fi
if [ -f "$project_agent_path" ]; then
echo "$project_agent_path"
elif [ -f "$compat_path" ]; then
echo "$compat_path"
elif [ -f "${BOTFERENCE_HOME}/.claude/agents/${agent_name}.md" ]; then
echo "$framework_path"
fi
}

This chunk of code in extract_agent_from_task_block grabs the last bold word from the task in an implemenation-plan. So, for example, a line like the one below:

- [ ] 1.2 Write the authentication module — **coder**

yields coder. Keep scrolling down in the code to line 209.

Then resolve_agent_path checks if that string for the agent corresponds to an actual file:

if [ -f ".claude/agents/${agent_name}.md" ]; then
    echo ".claude/agents/${agent_name}.md"
elif [ -f "${BOTFERENCE_HOME}/.claude/agents/${agent_name}.md" ]; then
    echo "${BOTFERENCE_HOME}/.claude/agents/${agent_name}.md"
fi

If no file matches, it returns empty and the agent is rejected. That validation step at the bottom of detect_agent_from_checkpoint is important — it prevents random bold text like TDD or IMPORTANT from being mistaken for an agent name. So the full flow is: checkpoint says “Next Task” → extract it → if it’s terminal or prose, fall back to the first unchecked task in the plan → grab the last bold word → check if .claude/agents/{word}.md exists → if yes, that’s your agent.

The convention is entirely structural. The plan file is the routing table, bold formatting is the syntax, and the filesystem is the validator. No mapping file, no config, no registry. If someone puts **reviewer** at the end of a task line and reviewer.md exists in the agents directory, it runs. If the file doesn’t exist, it skips. Elegant in its simplicity, fragile in that a plan author needs to know the convention. One typo in the bold word and the task gets skipped with a warning.​​​​​​​​​​​​​​​​

This is the ghuntley architecture they reference in the README. The interesting design choice is that the tool scoping is done in code rather than declaratively in the agent markdown, which gives more control but makes the agent definitions less portable.​​​​​​​​​​​​​​​​

How build works

mcp fallback

An MCP server exposes a tool interface that any model can call; Botference is setup right now for Codex and Claude (others can add more models if needed). As long as you have:

  • Markdown agent definitions that both models can read
  • An MCP server that both models can call tools through
  • A thin generic prompt saying “read the agents directory and execute” The tool scoping being in Python doesn’t make the agent definitions less portable — it makes the tool boundaries enforced consistently regardless of which model is driving. Where I went wrong was conflating two things: the agent definition (which is just a prompt, and is portable) and the execution infrastructure (which is shared via MCP, and is also portable). I was treating tool-scoping-in-code as a portability problem when it’s actually the opposite — it’s what enables portability, because neither model needs to understand the other’s native tool system. The one real question your architecture raises isn’t about portability but about prompt effectiveness. The same markdown instructions might work better for one model than the other. But that’s a tuning issue, not a structural one. And arguably a single well-written agent prompt that both models can follow is better than maintaining two divergent versions anyway. So yes — the architecture you’re describing is sound, and it’s basically what Botference’s MCP fallback path already demonstrates.​​​​​​​​​​​​​​​​

Nothing in this file references Claude or Codex specifically. It takes an agent name, builds a tool set, and speaks MCP over stdio.

#!/usr/bin/env python3
"""Fallback agent runner — MCP server exposing botference's per-agent tool registry.
Usage: python3 core/fallback_agent_mcp.py <agent_name> [--cwd <dir>]
This is the fallback execution path used when no API key is available.
It wraps the tool registry as an MCP stdio server so that `claude -p
--mcp-config <config>` can call botference's tools natively — preserving
truncation, redaction, and per-agent tool boundaries.
Peer of botference_agent.py (the primary agent runner that calls the
Anthropic/OpenAI API directly).
Server-side tools (e.g. web_search) are skipped — Claude handles those
internally.
"""
 
import asyncio
import sys
import os
 
# Ensure botference's root is on the path
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
 
from mcp.server import Server
from mcp.server.stdio import stdio_server
from mcp.types import Tool, TextContent
 
from tools import TOOLS, AGENT_TOOLS, DEFAULT_TOOLS, SERVER_TOOLS, execute_tool, get_tools_for_agent
 
_LOG_FILE = os.environ.get("BOTFERENCE_MCP_LOG", "")
 
 
def _log(msg: str):
if _LOG_FILE:
with open(_LOG_FILE, "a") as f:
f.write(f"[MCP] {msg}\n")
 
 
def build_server(agent_name: str) -> Server:
"""Create an MCP server with tools scoped to the given agent."""
# Use get_tools_for_agent which checks hardcoded registry first,
# then parses ## Tools from the agent's .md file for custom agents.
tool_names, _ = get_tools_for_agent(agent_name)
 
# Filter to client-side tools that exist in the registry
active_tools = [n for n in tool_names if n in TOOLS and n not in SERVER_TOOLS]
from __future__ import annotations
 
"""Tool registry for botference_agent.py.
Collects tool definitions from submodules and provides per-agent registries.
Adding a tool = adding it to the right submodule's TOOLS dict + AGENT_TOOLS here.
"""
 
import os
import re
from pathlib import Path
from typing import Optional
 
from tools.core import TOOLS as _core_tools
from tools.checks import TOOLS as _checks_tools
from tools.pdf import TOOLS as _pdf_tools
from tools.search import TOOLS as _search_tools
from tools.download import TOOLS as _download_tools
from tools.claims import TOOLS as _claims_tools
from tools.interact import TOOLS as _interact_tools
from tools.github import TOOLS as _github_tools
from tools.latex import TOOLS as _latex_tools
from tools.verify import TOOLS as _verify_tools
 
# ── Merged registry ───────────────────────────────────────────
TOOLS = {}
TOOLS.update(_core_tools)
TOOLS.update(_checks_tools)
TOOLS.update(_pdf_tools)
TOOLS.update(_search_tools)
TOOLS.update(_download_tools)
TOOLS.update(_claims_tools)
TOOLS.update(_interact_tools)
TOOLS.update(_github_tools)
TOOLS.update(_latex_tools)
TOOLS.update(_verify_tools)
 
# ── Per-agent tool registries ─────────────────────────────────
# Every agent gets the essentials: read_file, write_file, git_commit, list_files, code_search
# Only agents that genuinely need full shell access get bash.
_ESSENTIALS = ["read_file", "write_file", "bash", "git_commit", "git_push", "list_files", "code_search"]
 
# Server-side tools — executed by the API, not locally.
# Keyed by tool name; values are the raw tool definitions sent to the API.
SERVER_TOOLS = {
"web_search": {"type": "web_search_20250305", "name": "web_search"},
}
 
AGENT_TOOLS = {
"paper-writer": _ESSENTIALS + ["check_language", "citation_lint", "compile_latex"],
"critic": _ESSENTIALS + ["check_language", "check_journal", "check_figure", "check_claims", "citation_verify_all", "verify_cited_claims", "build_cited_tracker_from_tex"],
"scout": _ESSENTIALS + ["web_search", "pdf_metadata", "citation_lookup", "citation_verify", "citation_verify_all", "citation_manifest", "citation_download"],
"deep-reader": _ESSENTIALS + ["pdf_metadata", "extract_figure", "view_pdf_page"],
"research-coder": _ESSENTIALS,
"figure-stylist": _ESSENTIALS + ["check_figure", "view_pdf_page"],
"editor": _ESSENTIALS + ["check_claims", "check_language", "citation_lint", "citation_verify_all", "verify_cited_claims", "build_cited_tracker_from_tex"],
"coherence-reviewer": _ESSENTIALS + ["check_claims", "check_language"],
"provocateur": _ESSENTIALS + [],
"synthesizer": _ESSENTIALS + ["citation_lint", "citation_verify_all"],
"triage": _ESSENTIALS + ["pdf_metadata", "citation_verify_all"],
"coder": _ESSENTIALS + ["gh"],
# plan mode uses claude CLI (not botference_agent.py) — no tool registry needed
}

The get_tools_for_agent function is passed a specific agent type; it then checks its hardcoded tool registry in the __init__.py; it can also fall back to parsing a ## Tools section from that agent’s markdown file as a backup which is non-deterministic as it’s prompt-based. The filtering line is also telling It strips out server-side tools (like web search) because those are capabilities the model already has natively. The MCP server only exposes the tools that the model doesn’t have — file operations, search over the codebase, etc. This is a clean separation: the model brings its own capabilities, the MCP server supplements them. Nothing in this file references Claude or Codex specifically. It takes an agent name, builds a tool set, and speaks MCP over stdio. You could point any MCP-compatible model at it. The agent markdown provides the prompt, the Python provides the tools, and the model is just whatever’s on the other end of the pipe.

TOOLS is defined here and repopulated below in, depending on the agent.

AGENT_TOOLS is just limiting the different types of tools I felt that specific agents relevant to writing a research paper should have.

botference.sh

This is really revealing. Look at the dispatch logic — it’s exactly the architecture you were describing, just implemented pragmatically rather than as a clean abstraction. The key decision tree is:

  • Plan mode (interactive): If BOTFERENCE_MODE is true, it launches both Claude and Codex into the TUI simultaneously — the “council” session. Otherwise it falls back to solo Claude via the CLI.
  • Build mode: This is where the model-swapping happens. The script calls resolve_model per agent, and then branches:
if is_openai_model "$CLAUDE_MODEL"; then
    # try codex CLI, fall back to botference_agent.py
elif is_anthropic_model "$CLAUDE_MODEL" && ! has_anthropic_api_key; then
    # MCP fallback: claude -p with botference's tools as MCP server
else
    # direct API via botference_agent.py

So it’s three execution paths, but the agent definition and tool scoping are shared across all of them. The same build_claude_system_prompt and build_mcp_config functions construct the context regardless of which model runs.

What’s interesting is the asymmetry that still exists. When it uses the Codex CLI interactively, it passes the prompt directly to codex --full-auto — which means Codex uses its own native tools, not botference’s scoped tool registry. But when it uses botference_agent.py for Codex, it does go through the shared registry. So the unification is incomplete: the CLI paths for each model still diverge in what tools are available.

The MCP fallback path is the closest thing here to the unified architecture you’re proposing. It takes any agent definition, exposes the right tools via MCP, and lets the CLI drive. The gap is that this only kicks in when there’s no API key. If you generalized that path to be the default — always run through MCP regardless of auth method — you’d have the model-agnostic agent directory you’re describing. The model becomes a flag, the agent definition stays the same, and the tool surface is identical. The other thing worth noting: resolve_model per agent means the plan file or config can assign different models to different agents. So you could have your coder agent on Claude and your reviewer on Codex within the same build loop. That’s already closer to “unified agent directory, model is just a parameter” than most setups.​​​​​​​​​​​​​​​​

mcp loading in it is via exec.sh

Yes — look at the fallback path inside the pipe mode block:

if $USE_CLAUDE_FALLBACK; then
    MCP_CONFIG=$(build_mcp_config "$CURRENT_AGENT")
    resolve_model_and_effort "$CLAUDE_MODEL" "$CURRENT_AGENT"
    echo "$PROMPT" | claude --model "$CLI_MODEL" \
      $EFFORT_FLAG \
      --tools "" \
      --mcp-config "$MCP_CONFIG" \
      --append-system-prompt "$AGENT_SYSTEM_PROMPT" \
      --output-format stream-json \
      --verbose \
      --dangerously-skip-permissions \
      | python3 "${BOTFERENCE_HOME}/lib/stream-filter.py" ...

There’s no literal -p flag, but piping the prompt via echo "$PROMPT" | Claude … with --output-format stream-json achieves the same thing — non-interactive execution. The --mcp-config points to whatever build_mcp_config generates for that agent, which presumably writes a JSON config that launches fallback_agent_mcp.py as a stdio MCP server.

The --tools "" is notable too — it blanks out Claude’s native tools so the only tools available are the ones coming through the MCP server. That’s how it enforces the per-agent tool scoping even though Claude CLI is driving the loop instead of botference_agent.py. So the chain is: bash script → build_mcp_config for the agent → claude CLI reads that config → launches fallback_agent_mcp.py as MCP server → agent gets exactly the tools its registry permits.​​​​​​​​​​​​​​​​

coding agents without web search

That’s a sharp catch. There’s a potential gap here. The MCP server explicitly skips server-side tools:

active_tools = [n for n in tool_names if n in TOOLS and n not in SERVER_TOOLS]

The comment says: “Server-side tools (e.g. web_search) are skipped — Claude handles those internally.” But then the CLI invocation does:

–tools “”

So if –tools “” blanks out all of Claude’s native tools, and the MCP server skips web_search assuming Claude has it natively… then web search falls through both cracks. Neither side provides it. Whether that’s a bug or intentional is hard to say without seeing what –tools “” actually does in the Claude CLI. It might only disable the built-in file/code tools (Read, Write, Edit, Bash) while leaving server-side capabilities like web search intact — those might be a separate category that can’t be toggled off. But I genuinely don’t know, and I’d be guessing. It’s worth checking. If you have Claude Code available, claude –help on the –tools flag would clarify whether it affects server-side capabilities or just the local tool set. If it does kill web search, then any agent that needs it would silently lose that capability on the fallback path.​​​​​​​​​​​​​​​​

exec.sh

Good, now we can see the full picture. On the –tools “” question — yes, they absolutely could use –allowedTools per agent instead. Look at how they already do it in plan mode:

PLAN_ALLOWED_TOOLS=(
    "Read" "Glob" "Grep" "Bash"
    "Edit(/checkpoint.md)"
    "Write(/implementation-plan.md)"
    ...
)

That’s precise, per-agent scoping. But in the build fallback path they switch to –tools “” plus MCP for every agent. The reason is probably practical: they already have per-agent tool registries defined in Python (the AGENT_TOOLS dict and get_tools_for_agent function), so rather than duplicating that logic in bash as –allowedTools arrays for each agent, they blank the native tools and let the MCP server be the single source of truth for what each agent can do. One place to maintain it, not two. It’s a reasonable engineering tradeoff — but you’re right that it’s blunt. An agent that legitimately needs web search loses it silently on this path. On your second question — there it is:

build_mcp_config() {
    ...
    cat > "$config_file" <<EOF
{
  "mcpServers": {
    "botference-tools": {
      "command": "${py}",
      "args": ["${BOTFERENCE_HOME}/core/fallback_agent_mcp.py", "${agent_name}"${extra_args}]
    }
  }
}
EOF

So the bash script never references fallback_agent_mcp.py directly — it generates a JSON config file that tells the Claude CLI to spawn it. The CLI reads the config and launches the MCP server as a subprocess. That’s why you couldn’t find the reference — it’s embedded inside a heredoc that writes to a temp file at runtime.

The build_claude_system_prompt function is also worth noting. It resolves the agent markdown (workspace first, then framework), prepends path context and file layout preambles, and concatenates everything. So the full prompt the model sees is: path preamble + file layout + agent markdown. The tools come separately via MCP. Clean separation.​​​​​​​​​​​​​​​​

bash vs Python

You’re right to push back. Bash is faster for what most of this script actually does — spawning processes, piping between them, checking files, managing PIDs, and orchestrating CLI tools. Python would add startup latency on every invocation, and in a loop that’s polling every 5 seconds and managing multiple background processes, that adds up.

And the logic behind staying in bash is probably more specific than just speed. Look at what this script is really doing: it’s gluing together claude, codex, git, python3, node, pdflatex, and its own MCP servers. Bash is the native language for process orchestration — backgrounding with &, wait, trap, PID management, signal handling. Doing the equivalent in Python means subprocess.Popen everywhere, which is more verbose and arguably no clearer.

The git worktree operations are also natural in bash. git worktree add, git merge, git diff — these are CLI commands. Wrapping them in Python’s subprocess.run() doesn’t improve anything, it just adds a layer.

The parts that shouldn’t be in bash are specifically:

  • The inline python3 -c JSON parsing — that’s already Python, just badly located Python. It should be a proper module they import.
  • The awk-based checkpoint merging — that’s genuinely fragile string manipulation that would benefit from a real parser
  • parse_orchestrator_dispatch — extracting structured data from model output is exactly what Python is for. So I was wrong to say “rewrite in Python.” The more honest recommendation is: keep the orchestration in bash, but extract the data manipulation into proper Python scripts that the bash calls, instead of scattering python3 -c one-liners throughout.​​​​​​​​​​​​​​​​

limitations with codex

Codex doesn’t have --allowedTools**.

There’s an open issue from October 2025 specifically requesting the ability to disable built-in tools for MCP-only execution, and it hasn’t been resolved. As one user put it: “It’s an all or nothing thing with them. You can give them full access to your system, but you cannot limit where they go.”  Codex’s permission model works at the sandbox level — filesystem write access, network access, approval policies — not at the individual tool level.

The closest Codex gets is sandbox_mode = "read-only" on subagents, and web_search = "disabled" in config.toml. But there’s no equivalent to Claude’s --allowedTools "Read" "Glob" "Grep" "Edit(/specific-path.md)". You can’t say “this agent gets file read but not shell execution.”

This explains why botference uses codex --full-auto when it hits the Codex interactive path — there’s nothing more granular available. And it’s a real gap for the unified agent directory idea.

On whether they could share .claude/agents/: The markdown agent definitions themselves are just prompts — both models can read and follow them. And Codex already reads .claude/ as a fallback location. Codex subagents are defined in config.toml with fields like sandbox_mode, model, model_reasoning_effort, and developer_instructions  — a different format, but the content maps cleanly to what’s in the markdown files.

The real blocker is exactly what we just found: tool scoping. A Claude agent definition that says “this agent only gets Read, Glob, Grep, and Write(/checkpoint.md)” can be enforced by Claude Code natively. The same constraint applied to Codex would need to go through the MCP path, because Codex has no native way to restrict its own tools at that granularity.

So the answer is: yes, they could share the same agent definitions in .claude/agents/ — and botference essentially proves this works — but only if you route Codex through the MCP server for tool enforcement, rather than using Codex’s native tools. Which is exactly what the fallback path does. The “fallback” is actually the more architecturally sound path for unified agents. It just needs to stop being treated as a fallback.​​​​​​​​​​​​​​​​

## I need to correct myself on all three points.

  • Codex reading .claude/ as fallback: That’s not in any code you gave me. I was conflating two things. From the earlier search results, Codex reads AGENTS.md as its instruction file, and Claude Code reads AGENTS.md as a fallback if no CLAUDE.md exists. Some tools like OpenCode scan .claude/skills/ for skill discovery. But Codex does not natively read .claude/agents/ for agent definitions. In botference, the bash script reads .claude/agents/, extracts the prompt, and passes it to whichever model it’s dispatching to. The framework is doing the bridging, not Codex.

  • config.toml in the repo: No. Looking at the repo structure from the README, there’s no .codex/config.toml. They use context-budgets.json for per-agent model and effort configuration, and the bash libraries handle all the dispatch logic. Codex is invoked as a CLI tool, not configured as a persistent agent system.

  • MCP for Codex in interactive mode: This is the real problem. Codex does support MCP servers — you can add them in config.toml or via codex mcp add. So you could attach botference’s MCP server to an interactive Codex session. But as we just established, you can’t disable Codex’s built-in tools. So even if you provide scoped tools via MCP, Codex still has its native file editing, shell execution, and web search available. The agent would see both the MCP tools and its own built-in tools, and there’s nothing stopping it from using the built-ins instead. That’s fundamentally different from the Claude path where –tools “” kills the native tools and –allowedTools gives granular control. For Codex in interactive mode, the only tool restriction is sandbox-level: read-only vs workspace-write vs full access, and network on or off. You can’t say “use only these MCP tools.”

So to answer your actual question: no, you couldn’t replicate plan mode’s precise tool scoping for Codex interactively. You’d have to rely on the agent prompt saying “only use these tools” and trust the model to comply — which is prompt-level enforcement, not system-level enforcement. A meaningful gap.​​​​​​​​​​​​​​​​

clarifications on codex

No on both counts.

context-budgets.json only does two things. Look at how it’s used in exec.sh:

model=$(jq -r –arg a “$agent_name” ‘.[$a].model // empty’ “$budgets_file”)

effort=$(jq -r –arg a “$agent_name” ‘.[$a].effort // empty’ “$budgets_file”)

That’s it — per-agent model selection and reasoning effort. Something like:

{ “coder”: { “model”: “claude-sonnet-4-6”, “effort”: “high” }, “scout”: { “model”: “gpt-5.4”, “effort”: “medium” }, “plan”: { “model”: “claude-opus-4-6” } }

It’s not a toml replacement. It doesn’t touch tools, permissions, sandbox settings, or anything else config.toml would handle. It’s just a routing table: which model runs which agent, and how hard it should think. And in build mode, Codex tools are not limited. Look at the interactive build path again:

if command -v codex >/dev/null 2>&1; then
    codex --model "$CLAUDE_MODEL" --full-auto "$PROMPT"

That’s raw --full-auto with no scoping at all. The only restriction is what the agent prompt asks it to do. Compare that to the Claude plan mode path with its explicit --allowedTools array — completely different level of enforcement.

So in this codebase, Claude agents get system-level tool boundaries. Codex agents get the honor system.​​​​​​​​​​​​​​​​



Mentions & Discussions

Loading mentions...