Learning Friction at Inference Speed

Towards Deeper Understanding Alongside Coding Agents

Human authors: Angadh Nanjangud
AI authors: Claude Opus 4.7 & GPT-5.4

Completion: 80%
3671 words
Essay first planted: April 20, 2026 • Last updated: April 21, 2026

I am a year late to the process of resolving issues with vibe coded personal software with an “Accept All” mentality. This has worked remarkably well if I think only from the perspective of getting to usable working software prototypes, but has simultaneously induced a dull headache whose source I am still trying to localise. I’ve previously felt similar pains during prolonged learning activities — manually massaging equations on paper; handcrafting/debugging code; and re-designing parts of code — but I would also say those have informed deeper understanding of concepts; the agentic version of this ache has yet to bring obvious understanding to light. So, while one can ship many personally usable things and also scale them for others’ utility at inference-speed, what hasn’t shipped at that speed is an intimate understanding of what lies under the hood. On account of this, sometime in late March, I decided it was time to sober up from this deep intelligence inebriation and re-scope how I worked with coding agents.

Learning speed is inversely proportional to deep understanding, and deep understanding comes from having some learning friction. Agentic chats feel like teleportation; issues are resolved before I can wrap my head around what happened where, but I find myself approximately where I wanted to be. To go beyond vaguely appreciating the need for learning frictions, I am now figuring out which frictions are worth retaining. Classical approaches, like reading or writing notes for myself, serve as personal distillation activities that preserve desirable learning friction; disseminating my learnings about agentic coding to others¹ is another useful distillation, but requires having opportunities for and overcoming the fear of public speaking. Making use of agents to accelerate generating associated elements for self-learning is detrimental; outsourcing my capacity of translating thoughts into words to any other entity, agent or human, is a disservice to myself.

When I read about people clamouring for newer IDEs, I think the problem they are thinking about is, “How do I introduce agentic learning frictions?” Newer ways of reading code are waiting to be uncovered and could be one way to solve part of the problem. There might also be better ways to use existing tools and LLMs. And so I am trying to find agentic versions of learning friction that retain the benefits of speed without compromising my long-term understanding.

Below, I demonstrate some things I am having these agents build that feel like solutions to this problem. At this moment, they feel like the kinds of things that are helping me understand how to work with them: by reading and annotating their code using pre-LLM tools — like Obsidian and static websites — in my pipeline.

Botference: Agentic Review Loops

The first builds on the well-trodden idea of using agentic review loops — one agent reviews another’s work — that is sometimes fruitful, but more often entertaining. I suspect such loops are likely more token-efficient when reviewing implemented code, but I prefer now to also use them in reviewing plans prior to coding.

Previously, I used a manual approach of copypasta-ing Claude’s plans to Codex (and vice versa); this helped me see Codex as the more detail-oriented engineer with a bias towards testing everything but also reading code deeply alongside Claude’s bias to prototyping with less thorough codebase exploration. The qualitative gains felt like accelerations to me, but the overhead of flipping from one terminal induced migraines. So, I got the two to scheme on Botference, a Terminal User Interface (TUI) that hosts both models in the same terminal session. The general premise was to collaboratively ideate in a main chatroom with the three of us — this is called Council. However, if we get to a point where changes need to be made to an unfamiliar-to-me codebase² or they hold differing opinions on some matter, I send them into a private room — called a Caucus — where I can’t chat. The two hash things out in Caucus in about tenish messages to eventually return to the Council to inform me if they have converged on a path forward (or not) and which of them will lead plan authoring; the other agent automatically becomes the reviewer of this plan over as many rounds as I need. Botference’s interface is below:

Council is on the left panel, where I talk to Claude and Codex.

Caucus is the right panel, where Claude and Codex talk uninterrupted by us humans.

Though it is difficult to prove whether this is a better way to plan out code than the copypasta approach, I have found it cognitively far more manageable to read a serialised chat between two agents than reading their thoughts in two terminal windows placed side-by-side; this has felt like reading two pages of a book at the same time. The doubling in plan files was cumbersome to read so I quickly defaulted to letting agents decide rather than steering them towards my decisions; this combination of multi-terminal swapping and multi-file spawning was one cause of a headache that is less frequent now — if we ignore the sheer volume of text still being generated inside Botference.

The main artifact from a Botference chat in plan mode is an implementation-plan.md (though additional files can be requested from the agents within the Botference workspace). I usually implement that plan from a second terminal with botference build -p. The -p flag means headless: Botference spawns custom coding agents and gives them tools either through a direct API call or, when I am using my Claude Max subscription rather than an Anthropic API key, through a scoped MCP server. Dropping -p launches a regular interactive Claude Code session instead.

Using two terminals — one to plan, another to build — prevents the build phase from gobbling up the planning chat’s context window, and lets me steer a build more deliberately than I can in a Claude Code or Codex session where plan-and-build share a single chat. This is especially useful when a plan contains human-review gates. Those gates are planned into Botference to force me to navigate the codebase — a deliberate, albeit marginal, learning friction that helps me understand how the code is laid out even if I am not principally writing it.

The next section discusses a feature made with Botference that helps me better read and agentically annotate a codebase for my understanding.

Codetalk: Making Agent Code Legible to Myself

I have derived most use from Botference in my Obsidian vault, which I consider a brownfield codebase — it has years of accumulated notes that inform generation of LLM Knowledge Bases, but is also where this site’s contents are written and then pushed to Github for deployment. In other words, Obsidian is indispensable to my workflow and, fortunately, one I love to edit in. But I seldom use Markdown to explain code-snippets in my blog as I find them an inappropriate format for explaining large chunks of code.

In this agentic era, there is way more code being written and forgotten about than ever before; this means code is either unimportant to read or super-important to read. For this latter case, I had Botference engineer Codetalk as a new way for me to read and annotate code-specific files from Obsidian that I then review from a local build of my site.

Codetalk allows me to drop a spotlight on specific lines of code and make annotations beside it; the rest of the lines are dimmed while an annotation is in view. When a section involves multiple files, they appear as tabs you can switch between. The goal is not to annotate every line, but to trace a path through the parts that matter for a particular discussion. This possibly makes for — or can be further tweaked to offer — a more informative and less linear code-reading experience for other humans than, for example, something like Jupyter BooksI love them nonetheless. allows.

I have introduced it into my workflow in a bid to better review and understand agent-written code. Sometimes I am writing the annotations, at other times the annotations are agentic; I often ask agents to spotlight certain behaviours of a codebase and which files are integral to them so they add annotations for me to read on my browser. This practice becomes not just about reading every line, but understanding which are the right lines to spotlight; this might inform later refactors of a large codebase, or will benefit some other researcher or agent in grokking the core idea in a codebase that is not merely scaffolding.

For a blog post such as this one, the editorial choice of which lines to spotlight and explain remains mine. As a demonstration, Codetalk is used to present three aspects of Botference: the first explores its architecture; the second is a multi-tab control loop that moves between botference.sh and detect.sh; and the third is a multi-tab exploration of how Botference’s build agents get their tools through exec.sh, fallback_agent_mcp.py, and __init__.py.

Botference Architecture

The file tree below follows the execution path of a single Botference run, from the moment you type botference plan to the final archived output once you have completed building what was in the plan. Each section maps to a phase of the system’s lifecycle and so is not a raw ls on the contents of Botference. Scroll through the files in the tabs below to get a sense of how Codetalks works in its current iteration.

architecture.txt

# ── 1. You run botference ────────────────────────────────────────────

botference ← the entry point (bash script)

.env ← API keys: ANTHROPIC_API_KEY, OPENAI_API_KEY

context-budgets.json ← which model each agent uses, token limits

# ── 2. It sets up the environment ────────────────────────────────────

lib/

config.sh ← loads .env, resolves BOTFERENCE_HOME

detect.sh ← reads checkpoint.md to find the current agent

exec.sh ← the main dispatch loop (34K of orchestration)

monitor.sh ← tracks token usage, context %, yield signals

post-run.sh ← after each agent: archive, handoff, cleanup

stream-filter.py ← filters and formats live agent output

# ── 3. It picks an agent from the plan ───────────────────────────────

.claude/agents/

agent-base.md ← shared rules all agents inherit

agent-template.md ← blank template for creating new agents

# Planning

plan.md ← interactive planning (council mode)

orchestrator.md ← AI decides which agents run in what order

# Research loop

scout.md ← searches for papers, scores relevance

triage.md ← deduplicates corpus, builds a reading plan

deep-reader.md ← reads PDFs in 5-page chunks, extracts claims

critic.md ← assesses structure, checks compliance

provocateur.md ← stress-tests via negative space and inversions

synthesizer.md ← merges findings into a narrative + outline

# Writing loop

paper-writer.md ← drafts sections from the outline

editor.md ← edits with evidence backing

coherence-reviewer.md ← checks for contradictions and drift

# Code + figures

coder.md ← writes application code (red/green TDD)

refactorer.md ← restructures code without changing behavior

research-coder.md ← simulations, data analysis, figure scripts

figure-stylist.md ← reviews figure clarity for print

# Utilities

security-auditor.md ← read-only security review

role-analyst.md ← job posting analysis and CV fitting

# ── 4. The agent reads its instructions ──────────────────────────────

specs/

grading-rubric.md ← how agent output quality is scored

writing-style.md ← prose rules agents must follow

publication-requirements.md ← venue-specific constraints (ICML, NeurIPS, etc.)

banned-phrases.txt ← words and phrases agents must never use

scout-output-format.md ← structured output schema for scout

triage-output-format.md ← structured output schema for triage

deep-reader-output-format.md ← structured output schema for deep-reader

critic-output-format.md ← structured output schema for critic

provocateur-output-format.md ← structured output schema for provocateur

synthesizer-output-format.md ← structured output schema for synthesizer

paper-writer-output-format.md ← structured output schema for paper-writer

editor-output-format.md ← structured output schema for editor

coherence-reviewer-output-format.md ← structured output schema for coherence-reviewer

research-coder-output-format.md ← structured output schema for research-coder

figure-stylist-output-format.md ← structured output schema for figure-stylist

# ── 5. The agent reads the current state ─────────────────────────────

work/ ← runtime state for the current thread

checkpoint.md ← knowledge state table + next task

implementation-plan.md ← task list with dependencies and gates

inbox.md ← operator notes for the current agent

iteration_count ← how many loops have run

HUMAN_REVIEW_NEEDED.md ← blocker: agent cannot proceed

templates/

checkpoint.md ← blank checkpoint for new threads

implementation-plan.md ← blank plan for new threads

handoff.md ← agent transition record format

HUMAN_REVIEW_NEEDED.md ← blocker template

# ── 6. The agent uses tools to do its work ───────────────────────────

tools/

__init__.py ← tool registry: which agent gets which tools

core.py ← file read, write, list, search

citations.py ← citation lookup, verification, manifest

claims.py ← fact-checking claims against corpus

pdf.py ← PDF metadata and figure extraction

search.py ← semantic search over papers

download.py ← download papers from URLs

latex.py ← LaTeX compilation and checks

check_language.py ← prose style and grammar validation

check_journal.py ← venue formatting rules

check_figure.py ← figure clarity and print readiness

verify.py ← citation and claim verification

redact.py ← sensitive content redaction

github.py ← GitHub API integration

interact.py ← human interaction prompts

fmt.py ← output formatting

cli.py ← CLI argument parsing

# ── 7. The Python core runs the models ───────────────────────────────

core/

botference.py ← main loop: dispatch agents, manage turns (2500 LOC)

botference_agent.py ← bridges agent markdown specs to API calls

providers.py ← model abstraction (Anthropic API, OpenAI API)

cli_adapters.py ← adapts CLI commands to model-specific formats

handoff.py ← agent-to-agent handoff protocol

paths.py ← resolves BOTFERENCE_HOME vs project paths

session_store.py ← persists sessions across runs

room_prompts.py ← builds prompts for council and caucus modes

fallback_agent_mcp.py ← MCP server: exposes tools over stdio

Everything starts with the botference bash script in the terminal; botference plan starts a chat with the models and botference build -p implements the plan. The shell script reads context-budgets.json to know which model and token budget an agent uses. Auth is resolved later: API keys in .env enable direct API calls, while headless Anthropic builds without an API key fall back to the Claude OAuth subscription path.

Under lib/ are shell scripts that set up the environment via config.sh and provide additional scaffolding for much of the work done during build phase (e.g., detect.sh determines which agent must be used; exec.sh contains build helpers for model resolution, prompt construction, and MCP setup; monitor.sh watches token budgets and triggers new sessions so that tasks are always completed by agents in the supposed smart zone).

The .claude/agents/ directory is basically how Claude Code recognises user-defined agents; these are more specifically tailored to research paper writing tasks so when I run botference research-plan, both Claude and Codex know about the agents before a single chat message is sent. agent-base.md defines the shared protocol that all agents inherit — checkpoint discipline, yield behaviour, incremental commits.

There is a (mostly untested in Botference) Orchestrator agent in orchestrator.md to decide which agents to dispatch and in what order, used in the orchestrated architecture mode for non-serialised builds.

work/ is the live state of the current thread of work. At first, it contains the outputs of a planning discussion: implementation-plan.md, which states the sequence of tasks and assigned agents, and checkpoint.md, which tracks the next build task. Once build begins, checkpoint.md also accumulates handoff notes from the outgoing agent to the incoming one. inbox.md lets me leave additional notes for the next agent without interrupting the loop.

Under tools/, Botference defines a shared tool registry. __init__.py maps each agent type to the tools it is allowed to use. The individual files are tool modules. core.py contains the basic local primitives — reading and writing files, running shell commands, and committing or pushing with git. I put this in because I wanted to limit capabilities of build agents, which I discussed in RalPhD.

Some research-specific modules are highlighted here as examples of how the same registry can grow: claims.py can check manuscript claims against evidence; pdf.py can inspect PDFs or render pages; download.py downloads PDFs of papers where available; and latex.py can compile LaTeX or build citation trackers. The important architectural point is not the full list; it is that agents get a scoped subset of these capabilities. A non-research module, search.py, handles file listing and code search; it shows that the registry is not only for research-paper tooling.

The Python core in core/ runs the models: botference.py manages the orchestration loop, botference_agent.py bridges agent markdown specs to direct Anthropic/OpenAI API calls, and providers.py abstracts the model APIs so the rest of the system doesn’t care whether it’s talking to Anthropic or OpenAI.

There are two ways Botference exposes tools to an agent during build. In the direct API path, botference_agent.py passes tool schemas from tools/__init__.py to the model provider and executes returned tool calls locally. In the Claude CLI/subscription fallback, fallback_agent_mcp.py wraps the same registry as an MCP stdio server.

This is not an elegant or ideal architecture necessarily, but is one approach to steering the build agents to evaluate or reconsider their work.

The control loop

Botference has two planning modes — plan for general work has no system prompt, and research-plan loads high-level details about built-in academic research agents — and two build paths: headless (-p) or interactive. botference.sh is the shared entry point for all four; detect.sh only enters once build mode takes over, reading the plan/checkpoint state to decide which agent runs to complete the next unchecked task. I co-annotated the architecture.txt and botference.sh with Botference, but the remaining files of this post were annotated by agents as I peppered them with questions. Note that these annotations don’t suggest that the code was assessed for quality/concision but do give me a stronger sense of what is happening where in the code.

#!/usr/bin/env bash

set -euo pipefail

# ── Bootstrap ────────────────────────────────────────────────

# botference must locate the framework root before any abstraction exists.

# This is the one intentionally hardcoded path resolution in the system.

if [ -z "${BOTFERENCE_HOME:-}" ]; then

BOTFERENCE_HOME="$(cd "$(dirname "$0")" && pwd)"

if [ ! -f "${BOTFERENCE_HOME}/core/botference_agent.py" ]; then

echo "Error: BOTFERENCE_HOME (${BOTFERENCE_HOME}) does not contain core/botference_agent.py" >&2

exit 1

export BOTFERENCE_HOME

BOTFERENCE_PROJECT_ROOT="$(pwd -P)"

export BOTFERENCE_PROJECT_ROOT

source "${BOTFERENCE_HOME}/lib/config.sh"

source "${BOTFERENCE_HOME}/lib/detect.sh"

source "${BOTFERENCE_HOME}/lib/monitor.sh"

source "${BOTFERENCE_HOME}/lib/post-run.sh"

source "${BOTFERENCE_HOME}/lib/exec.sh"

parse_loop_args "$@"

export BOTFERENCE_ACTIVE_MODE="$LOOP_MODE"

if $SHOW_HELP; then

show_help

exit 0

if [ "$LOOP_MODE" = "init" ]; then

python3 "${BOTFERENCE_HOME}/scripts/init_project.py" --profile "$INIT_PROFILE"

exit 0

init_botference_paths

if ! validate_project_agents; then

exit 1

if ! mode_is_allowed "$LOOP_MODE"; then

echo "Error: $LOOP_MODE is disabled by ${BOTFERENCE_PROJECT_CONFIG_FILE}." >&2

exit 1

if [ -n "$CLI_MODEL" ]; then

export ANTHROPIC_MODEL="$CLI_MODEL"

if $PIPE_MODE && { [ "$LOOP_MODE" = "plan" ] || [ "$LOOP_MODE" = "research-plan" ]; }; then

echo "Error: $LOOP_MODE mode is interactive only — remove the -p flag."

exit 1

ARCH_MODE=$(resolve_arch_mode_from_plan "$ARCH_MODE" "$BOTFERENCE_PLAN_FILE")

export ARCH_MODE

if [ -n "$PROMPT_FILE" ]; then

PROMPT_FILE="${BOTFERENCE_HOME}/${PROMPT_FILE}"

if [ ! -f "$PROMPT_FILE" ]; then

echo "Error: $PROMPT_FILE not found"

exit 1

if [ "$LOOP_MODE" = "archive" ]; then

bash "${BOTFERENCE_HOME}/scripts/archive.sh"

exit 0

CONTEXT_THRESHOLD=45 # default for <1M windows; overridden to 20 for 1M windows below

CTX_FILE="$BOTFERENCE_RUN/context-pct"

YIELD_FILE="$BOTFERENCE_RUN/yield"

BUDGET_FILE="$BOTFERENCE_RUN/budget-info"

PLAN_AUDIT_FILE="$BOTFERENCE_RUN/plan-audit-failed"

POLL_INTERVAL=5

BACKOFF=60

USAGE_LOG="$BOTFERENCE_LOGS_DIR/usage.jsonl"

AGENT_MAX_RETRIES=3

AGENT_RETRY_DELAYS=(5 15 45)

CB_FILE="$BOTFERENCE_RUN/circuit-breaker"

CB_THRESHOLD=5

CB_CONSECUTIVE_FAILURES=0

COUNTER_FILE="$BOTFERENCE_COUNTER_FILE"

HEARTBEAT_INTERVAL=90

CLAUDE_PID=""

MONITOR_PID=""

JSONL_MONITOR_PID=""

LAST_CTRL_C=0

restore_circuit_breaker_state

restore_iteration_counter

ensure_ink_ui_dist() {

local ink_dir="${BOTFERENCE_HOME}/ink-ui"

local dist_bin="${ink_dir}/dist/bin.js"

local install_cmd="cd ink-ui && npm install"

local rebuild=false

local src

if [ ! -f "$dist_bin" ]; then

rebuild=true

else

for src in \

"$ink_dir/build.mjs" \

"$ink_dir/package.json" \

"$ink_dir/package-lock.json"

if [ "$src" -nt "$dist_bin" ]; then

rebuild=true

break

done

if ! $rebuild; then

while IFS= read -r src; do

if [ "$src" -nt "$dist_bin" ]; then

rebuild=true

break

done < <(find "$ink_dir/src" -type f)

if $rebuild; then

if ! command -v node >/dev/null 2>&1 || ! command -v npm >/dev/null 2>&1; then

echo "Error: Ink UI requires Node.js and npm." >&2

echo "Run this once after cloning:" >&2

echo " ${install_cmd}" >&2

exit 1

if [ ! -d "$ink_dir/node_modules" ] || [ ! -e "$ink_dir/node_modules/esbuild" ]; then

echo "Error: Ink UI dependencies are not installed." >&2

echo "Run this once after cloning:" >&2

echo " ${install_cmd}" >&2

if [ -f "$ink_dir/package-lock.json" ]; then

echo "If you want the lockfile-pinned install instead, run:" >&2

echo " cd ink-ui && npm ci" >&2

exit 1

echo " Building Ink UI"

(

cd "$ink_dir"

node build.mjs

)

}

BUILD_AUDIT_SNAPSHOT=""

BUILD_AUDIT_ALLOWED=""

BUILD_AUDIT_VIOLATIONS=""

begin_build_audit() {

if [ -d "${BOTFERENCE_PROJECT_DIR:-}" ]; then

BUILD_AUDIT_SNAPSHOT=$(mktemp)

BUILD_AUDIT_ALLOWED=$(mktemp)

BUILD_AUDIT_VIOLATIONS=$(mktemp)

plan_write_state_snapshot "$BUILD_AUDIT_SNAPSHOT"

}

cleanup_build_audit() {

rm -f "${BUILD_AUDIT_SNAPSHOT:-}" "${BUILD_AUDIT_ALLOWED:-}" "${BUILD_AUDIT_VIOLATIONS:-}"

BUILD_AUDIT_SNAPSHOT=""

BUILD_AUDIT_ALLOWED=""

BUILD_AUDIT_VIOLATIONS=""

}

enforce_build_audit() {

[ -n "${BUILD_AUDIT_SNAPSHOT:-}" ] || return 0

if ! audit_mode_changed_files "build" "$BUILD_AUDIT_SNAPSHOT" "$BUILD_AUDIT_ALLOWED" "$BUILD_AUDIT_VIOLATIONS"; then

echo ""

echo "✗ Build audit failed — unauthorized files changed:"

sed 's/^/ - /' "$BUILD_AUDIT_VIOLATIONS"

cleanup_build_audit

return 1

cleanup_build_audit

return 0

}

trap 'handle_interrupt_signal' INT

if ! is_interactive_plan_mode; then

print_loop_banner

# --- Pre-loop plan validation (safety net) ---

# Planner commit gates are the primary enforcement point for TDD structure.

# This build-start check is a safety net: fail fast before wasting an iteration

# on a plan that would fail commit gates anyway.

if [ -f "$BOTFERENCE_PLAN_FILE" ]; then

if ! validate_plan_tdd_structure "$BOTFERENCE_PLAN_FILE"; then

echo "✗ Plan validation failed — fix TDD task structure before running build."

exit 1

LOOP_EXIT_CODE=0

while true; do

# --- Circuit breaker check ---

if cb_is_open; then

echo ""

echo "╔══════════════════════════════════════════════════════════╗"

echo "║ CIRCUIT BREAKER OPEN — $CB_CONSECUTIVE_FAILURES consecutive failures"

echo "║ Halting to avoid wasting tokens. ║"

echo "╠══════════════════════════════════════════════════════════╣"

echo "║ To resume: ║"

echo "║ rm $CB_FILE && botference -p ║"

echo "║ Or investigate logs/usage.jsonl for error patterns. ║"

echo "╚══════════════════════════════════════════════════════════╝"

break

if ! is_interactive_plan_mode; then

echo "=== Iteration $((ITERATION + 1)) ==="

rm -f "$CTX_FILE" "$YIELD_FILE" "$BUDGET_FILE"

if ! is_interactive_plan_mode; then

sleep 3 # let any dying statusline process finish writing, then clear again

rm -f "$CTX_FILE"

# --- Pre-iteration gate check ---

if [ -f "$BOTFERENCE_REVIEW_FILE" ]; then

_gate_template="${BOTFERENCE_HOME}/templates/HUMAN_REVIEW_NEEDED.md"

if ! diff -q "$BOTFERENCE_REVIEW_FILE" "$_gate_template" >/dev/null 2>&1; then

echo ""

echo "╔══════════════════════════════════════════════════════════╗"

echo "║ HUMAN REVIEW STILL PENDING ║"

echo "╚══════════════════════════════════════════════════════════╝"

echo ""

cat "$BOTFERENCE_REVIEW_FILE"

echo ""

echo "To continue: review above, then:"

echo " rm $BOTFERENCE_REVIEW_FILE && botference -p"

break

ITER_START=$(date +%s)

IGNORE_UNTIL=$(( ITER_START + 15 )) # ignore context readings for first 15s (stale cache)

# --- Reflection trigger (every 5th iteration) ---

rm -f "$BOTFERENCE_RUN/reflect"

if [ "$ITERATION" -gt 0 ] && [ $(( ITERATION % 5 )) -eq 0 ]; then

touch "$BOTFERENCE_RUN/reflect"

echo " Reflection iteration (mod 5)"

# --- Detect thread and agent ---

CURRENT_THREAD=$(extract_thread)

CURRENT_AGENT=$(detect_agent_from_checkpoint "$BOTFERENCE_CHECKPOINT_FILE" "$BOTFERENCE_PLAN_FILE")

if [ "$LOOP_MODE" = "build" ]; then

cleanup_build_audit

begin_build_audit

# --- Plan mode: one-shot interactive session, early exit ---

if [ "$LOOP_MODE" = "plan" ] || [ "$LOOP_MODE" = "research-plan" ]; then

CURRENT_AGENT="${CURRENT_AGENT:-plan}"

# --- Botference mode: Claude + Codex TUI ---

if $BOTFERENCE_MODE; then

CLAUDE_MODEL_RESOLVED=$(resolve_model "plan")

resolve_model_and_effort "$CLAUDE_MODEL_RESOLVED" "plan"

OPENAI_MODEL="${OPENAI_MODEL:-gpt-5.4}"

OPENAI_REASONING_EFFORT="${OPENAI_REASONING_EFFORT:-high}"

if [ -n "$PROMPT_FILE" ]; then

PROMPT=$(cat "$PROMPT_FILE")

else

PROMPT=""

if [ -s "$BOTFERENCE_INBOX_FILE" ]; then

echo " 📬 Absorbing operator notes from inbox.md"

PROMPT="[Operator notes]"$'\n'"$(cat "$BOTFERENCE_INBOX_FILE")"$'\n\n'"$PROMPT"

: > "$BOTFERENCE_INBOX_FILE"

if [ "$LOOP_MODE" = "research-plan" ]; then

PLAN_AGENT_PATH=$(resolve_agent_path "plan")

PLAN_SYSTEM="$(cat "${PLAN_AGENT_PATH:-${BOTFERENCE_HOME}/.claude/agents/plan.md}")"

else

PLAN_SYSTEM=""

PLAN_SNAPSHOT=$(mktemp); PLAN_ALLOWED=$(mktemp); PLAN_VIOLATIONS=$(mktemp)

plan_write_state_snapshot "$PLAN_SNAPSHOT" "plan"

# Build debug-panes flag correctly (avoid passing "false" as truthy)

DEBUG_FLAG=""

if $DEBUG_PANES; then

DEBUG_FLAG="--debug-panes"

# Load API keys from .env if not already in environment

if [ -f "${BOTFERENCE_HOME}/.env" ]; then

if [ -z "${OPENAI_API_KEY:-}" ]; then

_val=$(grep -m1 '^OPENAI_API_KEY=' "${BOTFERENCE_HOME}/.env" 2>/dev/null | cut -d= -f2 | tr -d "'" | tr -d '"' || true)

[ -n "${_val:-}" ] && export OPENAI_API_KEY="$_val"

unset _val

if [ -z "${ANTHROPIC_API_KEY:-}" ]; then

_val=$(grep -m1 '^ANTHROPIC_API_KEY=' "${BOTFERENCE_HOME}/.env" 2>/dev/null | cut -d= -f2 | tr -d "'" | tr -d '"' || true)

[ -n "${_val:-}" ] && export ANTHROPIC_API_KEY="$_val"

unset _val

# Ensure codex has API key auth if available

if [ -n "${OPENAI_API_KEY:-}" ]; then

echo "$OPENAI_API_KEY" | codex login --with-api-key 2>/dev/null || true

echo "Launching Botference Council - Claude=$CLI_MODEL Codex=$OPENAI_MODEL${EFFORT_FLAG:+ effort=${EFFORT_FLAG#--effort }}${OPENAI_REASONING_EFFORT:+ openai-effort=$OPENAI_REASONING_EFFORT}${DEBUG_FLAG:+ debug=on} ui=$UI_MODE"

if [ "$UI_MODE" = "ink" ]; then

ensure_ink_ui_dist

# Pass large strings via temp files to avoid arg-length/escaping issues

_ink_sys=$(mktemp); _ink_task=$(mktemp)

printf '%s' "$PLAN_SYSTEM" > "$_ink_sys"

printf '%s' "$PROMPT" > "$_ink_task"

node "${BOTFERENCE_HOME}/ink-ui/dist/bin.js" \

--anthropic-model "$CLI_MODEL" \

--openai-model "$OPENAI_MODEL" \

--openai-effort "$OPENAI_REASONING_EFFORT" \

${EFFORT_FLAG:+--claude-effort ${EFFORT_FLAG#--effort }} \

--system-prompt-file "$_ink_sys" \

--task-file "$_ink_task" \

$DEBUG_FLAG

rm -f "$_ink_sys" "$_ink_task"

else

python3 "${BOTFERENCE_HOME}/core/botference.py" \

--anthropic-model "$CLI_MODEL" \

--openai-model "$OPENAI_MODEL" \

--openai-effort "$OPENAI_REASONING_EFFORT" \

${EFFORT_FLAG:+--claude-effort ${EFFORT_FLAG#--effort }} \

--system-prompt "$PLAN_SYSTEM" \

--task "$PROMPT" \

$DEBUG_FLAG

EXIT_CODE=$?

if ! plan_audit_changed_files "$PLAN_SNAPSHOT" "$PLAN_ALLOWED" "$PLAN_VIOLATIONS"; then

echo ""

echo "✗ Plan audit failed — unauthorized files changed:"

sed 's/^/ - /' "$PLAN_VIOLATIONS"

echo " Build is blocked until these changes are resolved."

EXIT_CODE=1

elif [ "$EXIT_CODE" -eq 0 ]; then

if ! plan_commit_and_push_changes "$PLAN_ALLOWED"; then

EXIT_CODE=1

rm -f "$PLAN_SNAPSHOT" "$PLAN_ALLOWED" "$PLAN_VIOLATIONS"

if [ "$EXIT_CODE" -eq 0 ]; then

echo ""

echo "=== Council session complete. Run 'build' to start executing. ==="

break

# Build prompt

if [ -n "$PROMPT_FILE" ]; then

PROMPT=$(cat "$PROMPT_FILE")

else

PROMPT=""

if [ -s "$BOTFERENCE_INBOX_FILE" ]; then

echo " 📬 Absorbing operator notes from inbox.md"

PROMPT="## Operator Notes (read and act on these first)"$'\n\n'"$(cat "$BOTFERENCE_INBOX_FILE")"$'\n\n'"$PROMPT"

: > "$BOTFERENCE_INBOX_FILE"

CLAUDE_MODEL=$(resolve_model "$CURRENT_AGENT")

# Start context monitor

monitor_context "$$" "$ITER_START" "$IGNORE_UNTIL" "$CURRENT_AGENT" &

MONITOR_PID=$!

# Archive check: if all tasks done, ask user before launching plan agent

CHECKED=$(grep -c '^\- \[x\]' "$BOTFERENCE_PLAN_FILE" 2>/dev/null) || CHECKED=0

UNCHECKED=$(grep -c '^\- \[ \]' "$BOTFERENCE_PLAN_FILE" 2>/dev/null) || UNCHECKED=0

if [ "$CHECKED" -gt 0 ] && [ "$UNCHECKED" -eq 0 ]; then

echo " All tasks in implementation-plan.md are complete."

read -r -p " Archive this thread and start fresh? (y/n): " answer < /dev/tty

if [[ "$answer" =~ ^[Yy] ]]; then

bash "${BOTFERENCE_HOME}/scripts/archive.sh"

echo " Archived. Starting fresh."

PLAN_SNAPSHOT=$(mktemp)

PLAN_ALLOWED=$(mktemp)

PLAN_VIOLATIONS=$(mktemp)

plan_write_state_snapshot "$PLAN_SNAPSHOT" "plan"

# Run plan agent via claude CLI

if [ "$LOOP_MODE" = "research-plan" ]; then

PLAN_AGENT_PATH=$(resolve_agent_path "plan")

PLAN_SYSTEM="$(cat "${PLAN_AGENT_PATH:-${BOTFERENCE_HOME}/.claude/agents/plan.md}")"

else

PLAN_SYSTEM=""

SYS_ARGS=()

if [ -n "$PLAN_SYSTEM" ]; then

SYS_ARGS=(--append-system-prompt "$PLAN_SYSTEM")

PLAN_CLAUDE_SETTINGS=$(mktemp)

python3 - "$BOTFERENCE_PROJECT_ROOT" "$BOTFERENCE_WORK_DIR" "$PLAN_CLAUDE_SETTINGS" <<'PY'

import json

import sys

from pathlib import Path

project_root = Path(sys.argv[1]).resolve()

work_dir = Path(sys.argv[2]).resolve()

out_path = Path(sys.argv[3])

config_name = os.environ.get("BOTFERENCE_PROJECT_DIR_NAME", "botference")

project_config = project_root / config_name / "project.json"

raw_roots = os.environ.get("BOTFERENCE_PLAN_EXTRA_WRITE_ROOTS", "").strip()

roots = []

if raw_roots:

for root in raw_roots.split(","):

root = root.strip().strip("/")

if root:

roots.append((project_root / root).resolve())

elif not project_config.exists():

roots = [work_dir]

allow = ["Read", "Glob", "Grep", "Bash", "WebSearch", "WebFetch"]

seen = set()

for root in roots:

root_abs = root.as_posix().lstrip("/")

if root_abs in seen:

continue

seen.add(root_abs)

allow.extend([

f"Edit(//{root_abs})",

f"Edit(//{root_abs}/*)",

f"Edit(//{root_abs}/**)",

])

settings = {

"permissions": {

"defaultMode": "dontAsk",

"allow": allow,

"sandbox": {

"enabled": True,

"allowUnsandboxedCommands": False,

}

out_path.write_text(json.dumps(settings))

PLAN_CLAUDE_CWD="$BOTFERENCE_PROJECT_ROOT"

CLAUDE_DIR_ARGS=()

if [ -n "${BOTFERENCE_PLAN_EXTRA_WRITE_ROOTS:-}" ]; then

IFS=',' read -r -a _plan_roots <<< "$BOTFERENCE_PLAN_EXTRA_WRITE_ROOTS"

first_root=""

for _raw_root in "${_plan_roots[@]}"; do

_clean_root="${_raw_root#/}"

_clean_root="${_clean_root%/}"

[ -n "$_clean_root" ] || continue

_abs_root="$BOTFERENCE_PROJECT_ROOT/$_clean_root"

if [ -z "$first_root" ]; then

first_root="$_abs_root"

PLAN_CLAUDE_CWD="$_abs_root"

else

CLAUDE_DIR_ARGS+=(--add-dir "$_abs_root")

done

if [ "$PLAN_CLAUDE_CWD" != "$BOTFERENCE_PROJECT_ROOT" ]; then

CLAUDE_DIR_ARGS+=(--add-dir "$BOTFERENCE_PROJECT_ROOT")

elif [ ! -f "$BOTFERENCE_PROJECT_CONFIG_FILE" ] && [ "$BOTFERENCE_WORK_DIR" != "$BOTFERENCE_PROJECT_ROOT" ]; then

PLAN_CLAUDE_CWD="$BOTFERENCE_WORK_DIR"

CLAUDE_DIR_ARGS=(--add-dir "$BOTFERENCE_PROJECT_ROOT")

SESSION_ID=$(uuidgen | tr '[:upper:]' '[:lower:]')

resolve_model_and_effort "$CLAUDE_MODEL" "plan"

echo " Model: $CLI_MODEL ($LOOP_MODE — claude CLI${EFFORT_FLAG:+, effort: ${EFFORT_FLAG#--effort }})"

if [ -n "$PROMPT" ]; then

(

cd "$PLAN_CLAUDE_CWD"

echo "$PROMPT" | claude --model "$CLI_MODEL" \

$EFFORT_FLAG \

"${SYS_ARGS[@]}" \

--session-id "$SESSION_ID" \

--name "${CURRENT_THREAD:-botference-plan}" \

--settings "$PLAN_CLAUDE_SETTINGS" \

"${CLAUDE_DIR_ARGS[@]}"

)

else

(

cd "$PLAN_CLAUDE_CWD"

claude --model "$CLI_MODEL" \

$EFFORT_FLAG \

"${SYS_ARGS[@]}" \

--session-id "$SESSION_ID" \

--name "${CURRENT_THREAD:-botference-plan}" \

--settings "$PLAN_CLAUDE_SETTINGS" \

"${CLAUDE_DIR_ARGS[@]}"

)

EXIT_CODE=$?

rm -f "$PLAN_CLAUDE_SETTINGS"

cleanup_pid "$MONITOR_PID"; MONITOR_PID=""

# Log interactive session usage

log_interactive_session "$SESSION_ID" "$LOOP_MODE"

if ! plan_audit_changed_files "$PLAN_SNAPSHOT" "$PLAN_ALLOWED" "$PLAN_VIOLATIONS"; then

echo ""

echo "✗ Plan audit failed — unauthorized files changed:"

sed 's/^/ - /' "$PLAN_VIOLATIONS"

echo " Build is blocked until these changes are resolved."

EXIT_CODE=1

elif [ "$EXIT_CODE" -eq 0 ]; then

if ! plan_commit_and_push_changes "$PLAN_ALLOWED"; then

EXIT_CODE=1

rm -f "$PLAN_SNAPSHOT" "$PLAN_ALLOWED" "$PLAN_VIOLATIONS"

# Plan mode is one-shot — exit after this session

if [ "$EXIT_CODE" -eq 0 ]; then

echo ""

echo "=== Planning session complete. Run 'build' to start executing. ==="

break

# --- Build mode: increment iteration counter ---

ITERATION=$((ITERATION + 1))

echo "$ITERATION" > "$COUNTER_FILE"

if [ -f "$PLAN_AUDIT_FILE" ]; then

if DIRTY_VIOLATIONS=$(plan_violation_paths_still_dirty); then

echo ""

echo "✗ Build blocked — unresolved plan-mode file violations remain:"

printf '%s\n' "$DIRTY_VIOLATIONS" | sed 's/^/ - /'

echo " Resolve or discard those changes, then rerun plan/build."

break

rm -f "$PLAN_AUDIT_FILE"

# --- Build mode: detect agent ---

if [ -n "$CURRENT_AGENT" ] && [ "$CURRENT_AGENT" != "" ]; then

AGENT_PATH=$(resolve_agent_path "$CURRENT_AGENT")

if [ -z "$AGENT_PATH" ]; then

# Checkpoint has a bad Next Task (agent wrote prose instead of a task line).

# Fall back to the implementation plan's first unchecked task.

echo " ⚠ Agent '$CURRENT_AGENT' not found — falling back to implementation plan"

CURRENT_AGENT=$(detect_agent_from_checkpoint /dev/null "$BOTFERENCE_PLAN_FILE")

if [ -n "$CURRENT_AGENT" ]; then

AGENT_PATH=$(resolve_agent_path "$CURRENT_AGENT")

if [ -z "$AGENT_PATH" ]; then

echo " Could not resolve agent from plan either. Skipping."

sleep 5

continue

# Fix the checkpoint so this doesn't repeat

plan_next=$(grep '^\- \[ \]' "$BOTFERENCE_PLAN_FILE" 2>/dev/null \

| grep -v '<task description>\|<agent>' \

| head -1 | sed 's/^- \[ \] //')

if [ -n "$plan_next" ]; then

awk -v task="$plan_next" '

/^## Next Task/ { print; print ""; print task; skip=1; next }

skip && /^## / { skip=0 }

!skip { print }

' "$BOTFERENCE_CHECKPOINT_FILE" > "$BOTFERENCE_CHECKPOINT_FILE.tmp" && mv "$BOTFERENCE_CHECKPOINT_FILE.tmp" "$BOTFERENCE_CHECKPOINT_FILE"

echo " Fixed checkpoint Next Task → $CURRENT_AGENT"

echo " Agent detected: $CURRENT_AGENT (${AGENT_PATH})"

else

# Check if this is a completed plan (all tasks checked off) vs empty template

CHECKED=$(grep -c '^\- \[x\]' "$BOTFERENCE_PLAN_FILE" 2>/dev/null) || CHECKED=0

UNCHECKED=$(grep -c '^\- \[ \]' "$BOTFERENCE_PLAN_FILE" 2>/dev/null) || UNCHECKED=0

if [ "$CHECKED" -gt 0 ] && [ "$UNCHECKED" -eq 0 ]; then

echo ""

echo "╔══════════════════════════════════════════════════════════╗"

echo "║ BUILD COMPLETE — all tasks done ║"

echo "╠══════════════════════════════════════════════════════════╣"

echo "║ To archive this thread, run: ║"

echo "║ bash scripts/archive.sh ║"

echo "║ ║"

echo "║ This will move to ${BOTFERENCE_ARCHIVE_DIR}/<date>_<thread>/: ║"

echo "║ checkpoint.md, implementation-plan.md, ║"

echo "║ ai-generated-outputs/<thread>/, reflections/, ║"

echo "║ ${BOTFERENCE_CHANGELOG_FILE##*/}, inbox.md ║"

echo "║ and restore blank templates. ║"

echo "╚══════════════════════════════════════════════════════════╝"

# Auto-compile LaTeX if main.tex exists

if [ -f "main.tex" ]; then

echo "║ Compiling LaTeX..."

compile_result=$(pdflatex -interaction=nonstopmode main.tex 2>&1 && \

bibtex main 2>&1 && \

pdflatex -interaction=nonstopmode main.tex 2>&1 && \

pdflatex -interaction=nonstopmode main.tex 2>&1)

if [ -f "main.pdf" ]; then

echo "║ PDF generated: main.pdf"

else

echo "║ WARNING: LaTeX compilation failed"

echo "$compile_result" | grep "^!" | head -5

else

echo " No task found in checkpoint.md — nothing to do."

echo " Run 'botference plan' to plan next steps,"

echo " or 'bash scripts/archive.sh' to archive."

break

# --- Orchestrated mode: AI-driven dispatch ---

if [ "$ARCH_MODE" = "orchestrated" ]; then

ORCH_RC=0

run_orchestrated_phase || ORCH_RC=$?

if [ "$ORCH_RC" -eq 2 ]; then

# Orchestrator says all tasks complete

echo ""

echo "╔══════════════════════════════════════════════════════════╗"

echo "║ BUILD COMPLETE — orchestrator confirmed all tasks done ║"

echo "╚══════════════════════════════════════════════════════════╝"

break

elif [ "$ORCH_RC" -eq 0 ] && [ -n "$CURRENT_AGENT" ]; then

# Orchestrator set CURRENT_AGENT for serial dispatch — fall through to build prompt

echo " Agent dispatched by orchestrator: $CURRENT_AGENT"

elif [ "$ORCH_RC" -eq 0 ]; then

# Orchestrator handled everything (parallel dispatch or adaptation)

if ! enforce_build_audit; then

break

capture_eval_metrics || true

echo ""

echo "=== Iteration $ITERATION complete (orchestrated). Fresh context in 3s... ==="

sleep 3

if [ -n "${MAX_ITERATIONS:-}" ] && [ "$ITERATION" -ge "$MAX_ITERATIONS" ]; then

echo "=== Max iterations ($MAX_ITERATIONS) reached. Exiting. ==="

break

continue

else

# Orchestrator failed — fall through to plan-driven logic

echo " Falling back to plan-driven execution"

# --- Parallel mode: run all tasks in current phase concurrently ---

if [ "$ARCH_MODE" = "parallel" ]; then

CURRENT_PHASE=$(detect_current_phase "$BOTFERENCE_PLAN_FILE")

if [ -n "$CURRENT_PHASE" ] && is_parallel_phase "$CURRENT_PHASE"; then

# Validate dependencies before running in parallel

if ! validate_phase_dependencies "$BOTFERENCE_PLAN_FILE" "$CURRENT_PHASE"; then

echo " ⚠ Dependency check failed — falling back to serial execution"

else

run_parallel_phase "$CURRENT_PHASE"

PARALLEL_RC=$?

if ! enforce_build_audit; then

break

capture_eval_metrics || true

echo ""

echo "=== Iteration $ITERATION complete (parallel phase). Fresh context in 3s... ==="

sleep 3

if [ -n "${MAX_ITERATIONS:-}" ] && [ "$ITERATION" -ge "$MAX_ITERATIONS" ]; then

echo "=== Max iterations ($MAX_ITERATIONS) reached. Exiting. ==="

break

continue

# Not a parallel phase — fall through to serial execution

echo " Phase not marked (parallel) — running serially"

# --- Snapshot plan for single-task enforcement ---

PLAN_BEFORE_SNAPSHOT="$BOTFERENCE_RUN/plan-before-${ITERATION}.md"

cp "$BOTFERENCE_PLAN_FILE" "$PLAN_BEFORE_SNAPSHOT"

# --- Build prompt ---

PROMPT=$(cat "$PROMPT_FILE")

# Inbox: absorb operator notes if present

if [ -s "$BOTFERENCE_INBOX_FILE" ]; then

echo " 📬 Absorbing operator notes from inbox.md"

PROMPT="## Operator Notes (read and act on these first)"$'\n\n'"$(cat "$BOTFERENCE_INBOX_FILE")"$'\n\n'"$PROMPT"

: > "$BOTFERENCE_INBOX_FILE"

if $PIPE_MODE; then

# Non-interactive: pipe prompt, background claude, poll for context via JSONL monitor

CLAUDE_MODEL=$(resolve_model "$CURRENT_AGENT")

echo " Model: $(resolve_cli_model "$CLAUDE_MODEL")"

# Create start marker for JSONL monitor (before launching claude)

touch "$BOTFERENCE_RUN/monitor-start"

# Launch JSONL context monitor in background

# Search: BOTFERENCE_HOME first (framework), then GITHUB_WORKSPACE, then CWD

MONITOR_SCRIPT=""

for _s in "${BOTFERENCE_HOME}/.github/scripts/botference-monitor.sh" \

"${GITHUB_WORKSPACE:-.}/.github/scripts/botference-monitor.sh" \

".github/scripts/botference-monitor.sh"; do

if [ -x "$_s" ]; then MONITOR_SCRIPT="$_s"; break; fi

done

if [ -n "$MONITOR_SCRIPT" ]; then

CONTEXT_WINDOW=$(resolve_context_window "$CLAUDE_MODEL")

# 1M windows yield earlier; smaller windows use a 45% threshold.

if [ "$CONTEXT_WINDOW" -ge 1000000 ] 2>/dev/null; then

CONTEXT_THRESHOLD=20

else

CONTEXT_THRESHOLD=45

bash "$MONITOR_SCRIPT" "$CONTEXT_THRESHOLD" "$CONTEXT_WINDOW" &

JSONL_MONITOR_PID=$!

echo " JSONL context monitor started (pid $JSONL_MONITOR_PID)"

# Auth-detection: Anthropic model but no API key → use claude -p fallback (OAuth/Max plan)

USE_CLAUDE_FALLBACK=false

AGENT_SYSTEM_PROMPT=""

if is_anthropic_model "$CLAUDE_MODEL" && ! has_anthropic_api_key; then

USE_CLAUDE_FALLBACK=true

echo " No API key detected — using claude -p fallback (OAuth/Max plan)"

AGENT_SYSTEM_PROMPT=$(build_claude_system_prompt "$CURRENT_AGENT")

# Launch agent runner with bash-level retries for transient failures

AGENT_ATTEMPT=0

while true; do

rm -f $BOTFERENCE_RUN/output.json

if $USE_CLAUDE_FALLBACK; then

MCP_CONFIG=$(build_mcp_config "$CURRENT_AGENT")

resolve_model_and_effort "$CLAUDE_MODEL" "$CURRENT_AGENT"

echo "$PROMPT" | claude --model "$CLI_MODEL" \

$EFFORT_FLAG \

--tools "" \

--mcp-config "$MCP_CONFIG" \

--append-system-prompt "$AGENT_SYSTEM_PROMPT" \

--output-format stream-json \

--verbose \

--dangerously-skip-permissions \

| python3 "${BOTFERENCE_HOME}/lib/stream-filter.py" "$BOTFERENCE_RUN/output.json" &

CLAUDE_PID=$!

else

echo "$PROMPT" | python3 "${BOTFERENCE_HOME}/core/botference_agent.py" --agent "$CURRENT_AGENT" --task - --model "$CLAUDE_MODEL" --output-json $BOTFERENCE_RUN/output.json &

CLAUDE_PID=$!

# Wait for first context reading (after ignore window) and print it

for i in $(seq 1 30); do

now=$(date +%s)

if [ "$now" -ge "$IGNORE_UNTIL" ] && [ -f "$CTX_FILE" ]; then

if [[ "$OSTYPE" == "darwin"* ]]; then

file_time=$(stat -f %m "$CTX_FILE" 2>/dev/null || echo 0)

else

file_time=$(stat -c %Y "$CTX_FILE" 2>/dev/null || echo 0)

if [ "$file_time" -ge "$IGNORE_UNTIL" ]; then

start_pct=$(cat "$CTX_FILE" 2>/dev/null | tr -d '[:space:]')

if [ -n "$start_pct" ] 2>/dev/null; then

echo " Starting context: ${start_pct}%"

break

sleep 2

done

monitor_context "$CLAUDE_PID" "$ITER_START" "$IGNORE_UNTIL" "$CURRENT_AGENT" &

MONITOR_PID=$!

wait "$CLAUDE_PID" 2>/dev/null

EXIT_CODE=$?

CLAUDE_PID=""

cleanup_pid "$MONITOR_PID"; MONITOR_PID=""

# Success — break out of retry loop

if [ "$EXIT_CODE" -eq 0 ]; then

break

# Check if the failure looks transient (no output JSON = crash before any work)

AGENT_ATTEMPT=$((AGENT_ATTEMPT + 1))

if [ "$AGENT_ATTEMPT" -ge "$AGENT_MAX_RETRIES" ]; then

echo " Agent $CURRENT_AGENT failed after $((AGENT_ATTEMPT + 1)) attempts (exit $EXIT_CODE)"

break

delay=${AGENT_RETRY_DELAYS[$((AGENT_ATTEMPT - 1))]:-45}

echo " [agent retry] $CURRENT_AGENT exited $EXIT_CODE, attempt $((AGENT_ATTEMPT + 1))/$((AGENT_MAX_RETRIES + 1)), waiting ${delay}s..."

sleep "$delay"

done

# Stop JSONL monitor if running

cleanup_pid "$JSONL_MONITOR_PID"; JSONL_MONITOR_PID=""

# Log post-run usage summary from --output-format json

if [ -f $BOTFERENCE_RUN/output.json ]; then

print_output_json_summary $BOTFERENCE_RUN/output.json

# Persist usage data to logs/usage.jsonl

# Extract agent name from checkpoint.md "**Last agent:**" field

AGENT_NAME=$(extract_agent_name)

log_usage_from_output_json $BOTFERENCE_RUN/output.json "$ITERATION" "$AGENT_NAME" "$LOOP_MODE" "$CURRENT_THREAD" \

&& echo " Usage logged to $USAGE_LOG" \

|| echo " (could not log usage data)"

# --- Eval capture ---

post_iteration

if ! enforce_build_audit; then

EXIT_CODE=1

else

# Interactive: monitor runs in background, agent gets the terminal

monitor_context "$$" "$ITER_START" "$IGNORE_UNTIL" "$CURRENT_AGENT" &

MONITOR_PID=$!

CLAUDE_MODEL=$(resolve_model "$CURRENT_AGENT")

if is_openai_model "$CLAUDE_MODEL"; then

# OpenAI models: use codex CLI for interactive TUI (uses codex's own tools),

# fall back to botference_agent.py (uses botference's per-agent tool registry)

rm -f $BOTFERENCE_RUN/output.json

if command -v codex >/dev/null 2>&1; then

echo " Model: $CLAUDE_MODEL (OpenAI — using codex CLI)"

codex --model "$CLAUDE_MODEL" --full-auto "$PROMPT"

EXIT_CODE=$?

else

echo " Model: $CLAUDE_MODEL (OpenAI — codex CLI not found, using botference_agent.py)"

echo "$PROMPT" | python3 "${BOTFERENCE_HOME}/core/botference_agent.py" --agent "$CURRENT_AGENT" --task - --model "$CLAUDE_MODEL" --output-json $BOTFERENCE_RUN/output.json

EXIT_CODE=$?

cleanup_pid "$MONITOR_PID"; MONITOR_PID=""

# Log usage

AGENT_NAME=$(extract_agent_name)

if [ -f $BOTFERENCE_RUN/output.json ]; then

print_output_json_summary $BOTFERENCE_RUN/output.json

log_usage_from_output_json $BOTFERENCE_RUN/output.json "$ITERATION" "$AGENT_NAME" "$LOOP_MODE" "$CURRENT_THREAD" \

&& echo " Usage logged to $USAGE_LOG" \

|| echo " (could not log usage data)"

else

echo " (codex interactive — usage not logged)"

else

# Anthropic models: use claude CLI for interactive TUI

AGENT_SYSTEM_PROMPT=$(build_claude_system_prompt "$CURRENT_AGENT")

SESSION_ID=$(uuidgen | tr '[:upper:]' '[:lower:]')

resolve_model_and_effort "$CLAUDE_MODEL" "$CURRENT_AGENT"

echo " Model: $CLI_MODEL (interactive build — claude CLI${EFFORT_FLAG:+, effort: ${EFFORT_FLAG#--effort }})"

echo "$PROMPT" | claude --model "$CLI_MODEL" $EFFORT_FLAG \

--append-system-prompt "$AGENT_SYSTEM_PROMPT" \

--session-id "$SESSION_ID" --dangerously-skip-permissions

EXIT_CODE=$?

#!/usr/bin/env bash

extract_next_task_from_checkpoint() {

local checkpoint_path=$1

local next_task=""

next_task=$(grep -i '^\*\*Next Task\*\*:\|^Next Task:' "$checkpoint_path" 2>/dev/null \

| head -1 | sed 's/.*: *//' | sed 's/\*//g')

if [ -z "$next_task" ]; then

next_task=$(awk '/^## Next Task/{found=1; next} found && /[^ ]/{print; exit}' \

"$checkpoint_path" 2>/dev/null)

echo "$next_task" | sed 's/([^)]*)//g; s/\*//g; s/^ *//; s/ *$//'

}

extract_first_unchecked_task_block() {

local plan_path=$1

awk '

/^- \[ \]/ {

if (in_block) exit

in_block=1

}

in_block {

if (/^- \[[ x]\]/ || /^###+? /) exit

}

' "$plan_path" 2>/dev/null

}

extract_agent_from_task_block() {

local task_block=$1

local agent=""

agent=$(printf "%s\n" "$task_block" \

| grep -o '\*\*[^*][^*]*\*\*' \

| tail -1 \

| sed 's/\*\*//g' \

| sed 's/[^a-zA-Z0-9_-]//g')

if [ -z "$agent" ]; then

agent=$(printf "%s\n" "$task_block" \

| tail -1 \

| sed 's/^ *//; s/ *$//' )

agent="${agent##* }"

agent=$(echo "$agent" | sed 's/[^a-zA-Z0-9_-]//g')

echo "$agent"

}

detect_agent_from_checkpoint() {

local checkpoint_path=$1

local plan_path=${2:-}

local next_task=""

next_task=$(extract_next_task_from_checkpoint "$checkpoint_path")

# Determine if next_task is a terminal/non-task state.

# Structured task lines start with a digit or checkbox ("- [").

# Anything else is prose (e.g. "Thread ready for review") or an

# explicit terminal marker — fall through to the plan.

local is_terminal=false

case "$next_task" in

is_terminal=true

;;

if ! echo "$next_task" | grep -qE '^[0-9]|^- \['; then

is_terminal=true

;;

esac

if $is_terminal; then

if [ -n "$plan_path" ]; then

next_task=$(extract_first_unchecked_task_block "$plan_path")

if printf "%s\n" "$next_task" | grep -q '<task description>\|<agent>'; then

next_task=""

else

next_task=""

if [ -z "$next_task" ]; then

echo ""

return

local agent

agent=$(extract_agent_from_task_block "$next_task")

# Validate: extracted word must correspond to a real agent file.

# Prevents annotations (e.g. "TDD") from being mistaken for agents.

if [ -n "$agent" ]; then

local agent_file

agent_file=$(resolve_agent_path "$agent")

if [ -z "$agent_file" ]; then

echo ""

return

echo "$agent"

(shared entry) parse_loop_args reads the command-line arguments and sets LOOP_MODE — this is the variable that determines whether Botference runs in plan, research-plan, build, or init mode.

(plan / research-plan) The interactive-only guard. plan and research-plan reject the -p (pipe/headless) flag. You cannot plan headlessly because the whole point of these modes is human steering.

(plan / research-plan) The plan mode entry. If LOOP_MODE is plan or research-plan, CURRENT_AGENT defaults to “plan”. The script resolves models for both Claude and Codex — Claude’s model comes from resolve_model("plan"), while Codex defaults to gpt-5.4.

(research-plan only) The research-plan fork. If the mode is research-plan, the script loads the plan.md agent file as a system prompt — this gives both models awareness of the research agents (scout, deep-reader, critic, etc.) before the session starts. The else branch sets PLAN_SYSTEM="", so no system prompt is loaded in plan mode.

(plan / research-plan) The Ink TUI council launch. node ink-ui/dist/bin.js starts the React/Ink terminal UI with both models. The system prompt and task are written to temp files to avoid shell escaping issues.

(plan / research-plan) The Python fallback. If the Ink UI is not available, python3 core/botference.py launches the same council session with the same model flags. Both backends produce the same output artifact: an implementation-plan.md.

(handoff: plan ends, build begins) Plan mode is one-shot. After the session: “Planning session complete. Run ‘build’ to start executing.” Build mode picks up from here — the iteration counter increments, and the loop begins detecting agents from the checkpoint. The detection logic is annotated in the detect.sh file tab, but there are more botference.sh annotations below.

(headless build: API vs MCP choice) Auth detection for headless builds. If the model is Anthropic but no API key is found, the script assumes it should use the Claude CLI/OAuth subscription path instead of the direct API path. It sets USE_CLAUDE_FALLBACK=true and builds the system prompt. This is where the MCP path gets activated. The larger dispatch logic is supported by three predicates defined in exec.sh — is_openai_model, has_anthropic_api_key, and is_anthropic_model — which are covered in the agent tool surface section below.

(headless build: MCP fallback) The MCP fallback command. The prompt is piped into the Claude CLI with --tools "" (blanking native tools) and --mcp-config pointing to a generated config that starts fallback_agent_mcp.py. The agent gets exactly the tools its registry permits, nothing more. The MCP server itself is annotated under fallback_agent_mcp.py below.

(headless build: direct API) The direct API command. When an API key is available, the prompt goes through botference_agent.py which calls the Anthropic/OpenAI API directly with the shared tool registry.

(interactive build) What happens when you drop -p. Without the headless flag, Botference launches an interactive claude session with --append-system-prompt. No MCP, no tool blanking — the human is in the loop and can steer directly.

extract_agent_from_task_block grabs the last bold word from a task in implementation-plan.md. A line like - [ ] 1.2 Write the auth module — **coder** yields coder. This single string is the join key across all four layers: prompt, model, tools, agent file.

The detection flow. detect_agent_from_checkpoint reads the “Next Task” from the checkpoint.md. If that field contains end-of-run markers or prose instead of a real task line, it falls through to the first unchecked task in implementation-plan.md’.

Validation. The extracted agent name is passed to resolve_agent_path, which checks three locations in precedence order: project-local first, then .claude/agents/ in the working directory, then the framework’s own agents directory. The three-tier order matters when Botference runs in a brownfield codebase that has its own pre-defined agents. Project-local comes first because Botference lets me add agents beyond its built-in ones; the downside is potential name-clashes, so I typically add those additions as explicit tasks in the plan. If no file matches, the agent is rejected.

The agent tool surface

These files are the machinery that botference.sh calls into when a build agent needs to run:

exec.sh resolves the model and constructs the prompt and MCP config.
fallback_agent_mcp.py adapts the tool registry for CLI execution; it is a model-agnostic bridge so nothing in this file references Claude or Codex — it takes an agent name, builds a tool set, and speaks MCP over stdio.
__init__.py is the shared tool registry. This mapping feeds both the direct API runner and the MCP fallback.

Together, these files implement the coding-agent pattern discussed in RalPhD: each agent is bound to a role-specific tool set rather than the full kit.

#!/usr/bin/env bash

resolve_model() {

local agent_name="${1:-}"

local budgets_file="${BOTFERENCE_HOME}/context-budgets.json"

local model=""

# ANTHROPIC_MODEL is a global override — when set, it wins over per-agent config.

# This lets `ANTHROPIC_MODEL=gpt-5.4 botference -p build` run all agents on GPT-5.4.

if [ -n "${ANTHROPIC_MODEL:-}" ]; then

echo "$ANTHROPIC_MODEL"

return

# Otherwise check per-agent model in context-budgets.json

if [ -n "$agent_name" ] && [ -f "$budgets_file" ] && command -v jq >/dev/null 2>&1; then

model=$(jq -r --arg a "$agent_name" '.[$a].model // empty' "$budgets_file" 2>/dev/null || true)

if [ -z "$model" ]; then

model="${ANTHROPIC_MODEL:-claude-opus-4-6}"

echo "$model"

}

resolve_effort() {

local agent_name="${1:-}"

local budgets_file="${BOTFERENCE_HOME}/context-budgets.json"

local effort=""

if [ -n "$agent_name" ] && [ -f "$budgets_file" ] && command -v jq >/dev/null 2>&1; then

effort=$(jq -r --arg a "$agent_name" '.[$a].effort // empty' "$budgets_file" 2>/dev/null || true)

echo "$effort"

}

resolve_model_and_effort() {

# Resolves CLI model name and effort flag for a given agent.

# Sets globals: CLI_MODEL, EFFORT_FLAG

# Usage: resolve_model_and_effort <model> <agent_name>

local model="${1:-}"

local agent_name="${2:-}"

local effort

CLI_MODEL=$(resolve_cli_model "$model")

effort=$(resolve_effort "$agent_name")

EFFORT_FLAG=""

if [ -n "$effort" ]; then

EFFORT_FLAG="--effort $effort"

}

is_openai_model() {

local model="${1:-}"

case "$model" in

gpt-*|o1*|o3*|o4*) return 0 ;;

*) return 1 ;;

esac

}

# Returns 0 if ANTHROPIC_API_KEY is set to a regular API key (sk-ant-api*).

# OAuth tokens (sk-ant-oat*) and missing keys both return 1.

has_anthropic_api_key() {

local key="${ANTHROPIC_API_KEY:-}"

[ -z "$key" ] && return 1

case "$key" in

sk-ant-api*) return 0 ;;

*) return 1 ;;

esac

}

# Returns 0 if the model is an Anthropic model (anything not matched by is_openai_model).

is_anthropic_model() {

local model="${1:-}"

is_openai_model "$model" && return 1

return 0

}

# Build the full system prompt for claude -p headless mode.

# Outputs to stdout: path preamble (if needed) + agent .md + tool-via-bash appendix.

build_claude_system_prompt() {

local agent_name="${1:-}"

local project_agent_path="${BOTFERENCE_PROJECT_AGENT_DIR}/${agent_name}.md"

local compat_path=".claude/agents/${agent_name}.md"

local framework_path="${BOTFERENCE_HOME}/.claude/agents/${agent_name}.md"

local is_reserved=false

if reserved_agent_names | grep -qx "$agent_name"; then

is_reserved=true

# Resolve agent file with the same precedence as botference_agent.py/tools/__init__.py

local agent_file=""

if $is_reserved && ! project_agent_override_allowed "$agent_name"; then

agent_file="$framework_path"

elif [ -f "$project_agent_path" ]; then

agent_file="$project_agent_path"

elif [ -f "$compat_path" ]; then

agent_file="$compat_path"

elif [ -f "$framework_path" ]; then

agent_file="$framework_path"

else

echo "Error: agent '${agent_name}' not found in workspace or framework" >&2

return 1

# Build path preamble (mirrors botference_agent.py:build_path_preamble)

local cwd rh

cwd=$(pwd -P)

rh=$(cd "$BOTFERENCE_HOME" && pwd -P)

if [ "$rh" != "$cwd" ]; then

cat <<PREAMBLE_EOF

## Path Context

botference is running as an engine on a separate project.

- **BOTFERENCE_HOME** (framework): \`${rh}\`

- **Working directory** (project): \`${cwd}\`

File paths in this prompt use short names. Resolve them as follows:

- **Framework files** — prefix with BOTFERENCE_HOME:

\`specs/*\`, \`templates/*\`, \`prompt-*.md\`

Example: \`specs/writing-style.md\` → \`${rh}/specs/writing-style.md\`

- **Agent files** — project-local first: \`botference/agents/{name}.md\`,

then \`.claude/agents/{name}.md\`, then BOTFERENCE_HOME built-ins

- **Project files** — relative to working directory

PREAMBLE_EOF

local work_rel build_rel

work_rel=$(python3 - <<'PY'

import os

from pathlib import Path

project = Path(os.environ["BOTFERENCE_PROJECT_ROOT"]).resolve()

work = Path(os.environ["BOTFERENCE_WORK_DIR"]).resolve()

print(os.path.relpath(work, project))

)

build_rel=$(python3 - <<'PY'

import os

from pathlib import Path

project = Path(os.environ["BOTFERENCE_PROJECT_ROOT"]).resolve()

build = Path(os.environ["BOTFERENCE_BUILD_DIR"]).resolve()

print(os.path.relpath(build, project))

)

# File layout preamble — always emitted (mirrors _build_file_layout_preamble)

cat <<LAYOUT_EOF

## File Layout

Thread state files and generated outputs live in dedicated directories.

The build system resolves paths automatically.

Use bare names in conversation and plans — the mapping is:

- **Thread files** (\`checkpoint.md\`, \`implementation-plan.md\`, \`inbox.md\`,

\`HUMAN_REVIEW_NEEDED.md\`, \`iteration_count\`):

Under \`${work_rel}/\`.

- **Generated outputs** (\`AI-generated-outputs/\`, \`logs/\`, \`run/\`):

Under \`${build_rel}/\`.

LAYOUT_EOF

# Agent .md content

cat "$agent_file"

# Tools are exposed via MCP server (core/fallback_agent_mcp.py), not via bash template.

# No tool-via-bash appendix needed.

}

# Generate a temporary MCP config JSON pointing to core/fallback_agent_mcp.py for the given agent.

# Outputs the path to the config file.

build_mcp_config() {

local agent_name="${1:-}"

local work_dir="${2:-}"

local config_file="${BOTFERENCE_RUN}/mcp-${agent_name}.json"

# mcp requires Python ≥3.10; find the best available interpreter

local py="python3"

for candidate in python3.13 python3.12 python3.11 python3.10; do

if command -v "$candidate" >/dev/null 2>&1; then

py="$candidate"

break

done

# If a work_dir is specified (worktree), set cwd so the MCP server

# resolves file paths relative to the worktree, not the main project.

local abs_work_dir=""

local cwd_line=""

local extra_args=""

if [ -n "$work_dir" ] && [ "$work_dir" != "." ]; then

abs_work_dir=$(cd "$work_dir" && pwd)

cwd_line="\"cwd\": \"${abs_work_dir}\","

extra_args=", \"--cwd\", \"${abs_work_dir}\""

cat > "$config_file" <<EOF

{

"mcpServers": {

"botference-tools": {

${cwd_line}

"command": "${py}",

"args": ["${BOTFERENCE_HOME}/core/fallback_agent_mcp.py", "${agent_name}"${extra_args}]

}

EOF

#!/usr/bin/env python3

"""Fallback agent runner — MCP server exposing botference's per-agent tool registry.

Usage: python3 core/fallback_agent_mcp.py <agent_name> [--cwd <dir>]

This is the fallback execution path used when no API key is available.

It wraps the tool registry as an MCP stdio server so that `claude -p

--mcp-config <config>` can call botference's tools natively — preserving

truncation, redaction, and per-agent tool boundaries.

Peer of botference_agent.py (the primary agent runner that calls the

Anthropic/OpenAI API directly).

Server-side tools (e.g. web_search) are skipped — Claude handles those

internally.

"""

import asyncio

import sys

import os

# Ensure botference's root is on the path

sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))

from mcp.server import Server

from mcp.server.stdio import stdio_server

from mcp.types import Tool, TextContent

from tools import TOOLS, AGENT_TOOLS, DEFAULT_TOOLS, SERVER_TOOLS, execute_tool, get_tools_for_agent

_LOG_FILE = os.environ.get("BOTFERENCE_MCP_LOG", "")

def _log(msg: str):

if _LOG_FILE:

with open(_LOG_FILE, "a") as f:

f.write(f"[MCP] {msg}\n")

def build_server(agent_name: str) -> Server:

"""Create an MCP server with tools scoped to the given agent."""

# Use get_tools_for_agent which checks hardcoded registry first,

# then parses ## Tools from the agent's .md file for custom agents.

tool_names, _ = get_tools_for_agent(agent_name)

# Filter to client-side tools that exist in the registry

active_tools = [n for n in tool_names if n in TOOLS and n not in SERVER_TOOLS]

server = Server(f"botference-{agent_name}")

@server.list_tools()

async def list_tools():

return [

Tool(

name=name,

description=TOOLS[name].get("description", ""),

inputSchema=TOOLS[name].get("input_schema", {

"type": "object", "properties": {}

}),

)

for name in active_tools

]

@server.call_tool()

async def call_tool(name: str, arguments: dict):

_log(f"tool_call: {name} args={arguments}")

result = execute_tool(name, arguments)

_log(f"tool_done: {name} result_len={len(str(result))}")

return [TextContent(type="text", text=str(result))]

return server

from __future__ import annotations

"""Tool registry for botference_agent.py.

Collects tool definitions from submodules and provides per-agent registries.

Adding a tool = adding it to the right submodule's TOOLS dict + AGENT_TOOLS here.

"""

import os

import re

from pathlib import Path

from typing import Optional

from tools.core import TOOLS as _core_tools

from tools.checks import TOOLS as _checks_tools

from tools.pdf import TOOLS as _pdf_tools

from tools.search import TOOLS as _search_tools

from tools.download import TOOLS as _download_tools

from tools.claims import TOOLS as _claims_tools

from tools.interact import TOOLS as _interact_tools

from tools.github import TOOLS as _github_tools

from tools.latex import TOOLS as _latex_tools

from tools.verify import TOOLS as _verify_tools

# ── Merged registry ───────────────────────────────────────────

TOOLS = {}

TOOLS.update(_core_tools)

TOOLS.update(_checks_tools)

TOOLS.update(_pdf_tools)

TOOLS.update(_search_tools)

TOOLS.update(_download_tools)

TOOLS.update(_claims_tools)

TOOLS.update(_interact_tools)

TOOLS.update(_github_tools)

TOOLS.update(_latex_tools)

TOOLS.update(_verify_tools)

# ── Per-agent tool registries ─────────────────────────────────

# Every agent gets the essentials: read_file, write_file, git_commit, list_files, code_search

# Only agents that genuinely need full shell access get bash.

_ESSENTIALS = ["read_file", "write_file", "bash", "git_commit", "git_push", "list_files", "code_search"]

# Server-side tools — executed by the API, not locally.

# Keyed by tool name; values are the raw tool definitions sent to the API.

SERVER_TOOLS = {

"web_search": {"type": "web_search_20250305", "name": "web_search"},

}

AGENT_TOOLS = {

"paper-writer": _ESSENTIALS + ["check_language", "citation_lint", "compile_latex"],

"critic": _ESSENTIALS + ["check_language", "check_journal", "check_figure", "check_claims", "citation_verify_all", "verify_cited_claims", "build_cited_tracker_from_tex"],

"scout": _ESSENTIALS + ["web_search", "pdf_metadata", "citation_lookup", "citation_verify", "citation_verify_all", "citation_manifest", "citation_download"],

"deep-reader": _ESSENTIALS + ["pdf_metadata", "extract_figure", "view_pdf_page"],

"research-coder": _ESSENTIALS,

"figure-stylist": _ESSENTIALS + ["check_figure", "view_pdf_page"],

"editor": _ESSENTIALS + ["check_claims", "check_language", "citation_lint", "citation_verify_all", "verify_cited_claims", "build_cited_tracker_from_tex"],

"coherence-reviewer": _ESSENTIALS + ["check_claims", "check_language"],

"provocateur": _ESSENTIALS + [],

"synthesizer": _ESSENTIALS + ["citation_lint", "citation_verify_all"],

"triage": _ESSENTIALS + ["pdf_metadata", "citation_verify_all"],

"coder": _ESSENTIALS + ["gh"],

# plan mode uses claude CLI (not botference_agent.py) — no tool registry needed

}

resolve_model — per-agent model selection. Checks ANTHROPIC_MODEL as a global override first, then looks up per-agent config in context-budgets.json. This means different agents can run on different models within the same build loop.

build_claude_system_prompt resolves the agent’s markdown file from one of three locations: project-local, .claude/agents/, or the framework. This mirrors the same project-first precedence used elsewhere in Botference.

After the path and file-layout preambles are emitted, the resolved agent markdown is appended to the prompt. Tools are not embedded in the prompt — they come separately through the shared registry³.

The punchline of build_mcp_config: a heredoc that writes a temporary JSON config telling the Claude CLI to spawn fallback_agent_mcp.py as an MCP stdio server. botference.sh only passes --mcp-config; the concrete Python server path is generated here at runtime and cleaned up after.

build_server creates an MCP server scoped to the given agent. It calls get_tools_for_agent to determine which tools this agent is allowed, then filters out server-side tools like web_search — those are handled by the model natively, not by local Python handlers.

The MCP server’s read side. list_tools returns only the scoped tool set for the current agent.

The MCP server’s execution side. call_tool executes a requested tool and returns the result using the same execute_tool dispatcher that the direct API path uses.

_ESSENTIALS and SERVER_TOOLS. Every agent gets the essentials (read_file, write_file, bash, git_commit, git_push, list_files, code_search). SERVER_TOOLS defines capabilities like web_search that the model handles natively — the MCP wrapper skips these.

AGENT_TOOLS — the per-agent scoping. scout gets web search, citation, and PDF tools; critic gets language checks, figure checks, and claim verification; coder gets just the essentials plus GitHub. Line 64: plan mode uses the Claude CLI directly, so it does not need this build-agent registry.

End

The Codetalk you just scrolled through is itself a Botference artifact — planned in its Council and Caucus, later built via build -p (with some minor touch-ups from within Claude Code) and then annotated from within Obsidian. The annotations are not exhaustive. I did not spotlight every function or trace every edge case. I chose specific lines — the mode guard, the bold-word convention, the MCP heredoc — because those are where the architectural ideas live.

This is the practice I want to carry forward when agents write code — especially where no visual or interactive testing is possible. I asked the agents to annotate their output here, and then exercised the editorial judgment of further titrating because they wanted to highlight all of the code and write verbose explanations.

AI-human teams mightI am tempted to use “will”, but I want to be measured. end up doing amazing things; for that we will need to find better ways to work together. Maybe things like Botference will help; maybe they will hinder. But introduction of learning frictions will matter once agent-written code accelerates us toward new potentialities. If you are curious to try Botference, you can do so here.

GitHub

GitHub - angadhn/botference: A TUI chat for you, Claude Code, and Codex to collaborate

A TUI chat for you, Claude Code, and Codex to collaborate - angadhn/botference

I have: given a talk to engineers at Github Next on engineering experimental harnesses towards hallucination-free PhD-level research paper-writing, and run internal workshops for academics/students on agentic tools in physical sciences and engineering. ↩
Not if, but when because it is a matter of time before they surpass my understanding by using coding jargon I am unfamiliar with or decide on what might be a better implementation architecture for an idea. ↩
The asymmetry is interesting: Claude agents get system-level tool boundaries via --tools "" plus MCP scoping, while Codex agents get the honor system — codex --full-auto with no granular tool restriction. An open issue has been requesting per-tool control since October 2025. ↩

Learning Friction at Inference Speed

Botference: Agentic Review Loops

Codetalk: Making Agent Code Legible to Myself

Botference Architecture

The control loop

The agent tool surface

End

Enjoying this article? Subscribe to stay updated

Mentions & Discussions