Architecture Design¶

TL;DR¶

A summary of the system architecture, component design, and data flows of Agenter.

The SDK follows a simple layered design pattern from top user-facing layers to bottom coding agent backends:

Adapters (LangGraph, PydanticAI)
→ Facade (AutonomousCodingAgent)
→ Runtime (CodingSession)
→ Backends (Anthropic, Claude Code, Codex, OpenHands, ACP).

System Overview¶

┌─────────────────────────────────────────────────────────────┐
│                    User Applications                        │
├─────────────────────────────┬───────────────────────────────┤
│  LangGraph Adapter          │  PydanticAI Adapter           │
│  (adapters/langgraph.py)    │  (adapters/pydantic_ai.py)    │
└─────────────────────────────┴───────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│              AutonomousCodingAgent (Facade)                 │
│  - execute(request) -> CodingResult                         │
│  - stream_execute() -> AsyncIterator[CodingEvent]           │
└─────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│                   CodingSession                             │
│  - Manages iteration loop (code → validate → fix → retry)   │
│  - Emits events for observability                           │
│  - Enforces budget limits                                   │
└─────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌───────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                    CodingBackend Protocol (Abstract)                                  │
├───────────────────────┬───────────────────────┬─────────────────────┬───────────────────┬─────────────┤
│  AnthropicSDKBackend  │   ClaudeCodeBackend   │    CodexBackend     │ OpenHandsBackend  │  ACPBackend │
│  (anthropic SDK)      │  (claude-code-sdk)    │  (openai-agents)    │  (openhands-sdk)  │  (ACP CLI)  │
└───────────────────────┴───────────────────────┴─────────────────────┴───────────────────┴─────────────┘

Layer Responsibilities¶

Layer 1: User Applications (Adapters)¶

The top layer represents adapters for open-ended agentic coding frameworks that integrate the SDK into different agentic workflows.

Adapter	Framework	Use Case
`langgraph.py`	LangGraph	`create_coding_node()` for StateGraph workflows
`pydantic_ai.py`	PydanticAI	`CodingAgent` for direct execution with PydanticAI-like interface

Adapters are thin wrappers and facades that translate between framework conventions and the SDK’s unified interface.

Layer 2: AutonomousCodingAgent (Facade)¶

The public API surface. Users interact only with this class.

Responsibilities:

Instantiate and configure coding agent backends
Provide sync/async execution methods
Aggregate results from coding sessions
Hide all internal complexity of the layers below

Interface:

execute(request: CodingRequest) -> CodingResult
stream_execute(request: CodingRequest) -> AsyncIterator[CodingEvent]

Layer 3: CodingSession¶

The core orchestration logic for a coding session. Manages the iteration loop within the coding session.

Responsibilities:

Execute a backend with a prompt
Run validators on outputs
Retry on validation failures (with error context in the updated prompt)
Enforce budget limits (tokens, cost, time, iterations)
Emit events for observability

Iteration Loop:

┌─────────────────────────────────────────────────────────────┐
│                    CodingSession.run()                      │
├─────────────────────────────────────────────────────────────┤
│  FOREACH iteration (up to max_iterations):                  │
│    1. Check budget limits                                   │
│    2. Execute backend.execute(prompt)                       │
│    3. Collect files modified                                │
│    4. Run validators (e.g. syntax or security)              │
│    5. IF validation passed THEN return COMPLETED            │
│    6. IF budget exceeded THEN return BUDGET_EXCEEDED        │
│    7. Prepare a retry prompt with validation errors         │
│  END → return FAILED                                        │
└─────────────────────────────────────────────────────────────┘

Layer 4: CodingBackend Protocol¶

Abstract interfaces implemented for each distinct coding agent backend.

Responsibilities:

Connect to a backend (subprocess, SDK client)
Execute prompts and obtain stream responses
Track files modified
Report token usage and estimated cost

Backend Comparison:

Backend	Wraps	Agent Logic	Runtime	Key Feature
AnthropicSDKBackend	`anthropic`	Custom tool-use loop	HTTP (async client)	Full control, custom tools, AWS Bedrock
ClaudeCodeBackend	`claude-code-sdk`	Claude Code	Claude Code CLI	Battle-tested tools, AWS Bedrock, Google Vertex
CodexBackend	`openai-agents`	Codex MCP server	MCP over stdio	OpenAI models, custom MCP tools, sandbox modes
OpenHandsBackend	`openhands-sdk`	OpenHands agent	litellm	Any model, no sandbox (full access)
ACPBackend	`agent-client-protocol`	Any ACP-compatible agent	JSON-RPC over stdio	Interoperability with ACP agents

Which backend to use?

Use Case and Scenario	Recommended Backend	Config
Custom tools, full control	`AnthropicSDKBackend`	`backend="anthropic-sdk"` (default)
Battle-tested Claude Code tools	`ClaudeCodeBackend`	`backend="claude-code", model="claude-sonnet-4-5-20250929", claude_max_thinking_tokens=8192`
Need skills, slash commands, MCP	`ClaudeCodeBackend`	`backend="claude-code"`
OpenAI models (o3, GPT-5-Codex)	`CodexBackend`	`backend="codex", model="gpt-5.4", codex_reasoning_effort="high"`
Custom MCP tools with OpenAI	`CodexBackend`	`backend="codex", codex_mcp_servers=[...]`
Any model via litellm	`OpenHandsBackend`	`backend="openhands", sandbox=False`
Any ACP-compatible agent process	`ACPBackend`	`backend="acp", acp_command="codex-acp", acp_args=["-c", "model=\"gpt-5.4\"", "-c", "model_reasoning_effort=\"high\""]`

Note: All backends default to sandbox=True. Use sandbox=False for unrestricted access.

Backend SDK Patterns¶

Each backend SDK presents a different API surface. Here’s how they work and how we abstract them:

Anthropic SDK Backend (Implemented)¶

# Actual implementation uses anthropic SDK directly
import anthropic

client = anthropic.AsyncAnthropic()

response = await client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=16384,
    system="You are an autonomous coding agent...",
    tools=[...],  # File tools: read_file, write_file, edit_file
    messages=[{"role": "user", "content": "Fix the bug"}],
)

# Tool use loop handles file operations
for block in response.content:
    if block.type == "tool_use":
        result = execute_tool(block.name, block.input)
        # Continue conversation with tool result

Key Abstractions:

anthropic.AsyncAnthropic() → async HTTP client
Custom tool-use loop manages file operations
Supports both Anthropic API and AWS Bedrock via boto3
No subprocess or CLI dependency

Claude Code SDK Backend (Implemented)¶

# Uses claude-code-sdk (Claude Code as a library)
from claude_code_sdk import query, ClaudeCodeOptions

# Safe mode with native sandbox (default)
options = ClaudeCodeOptions(
    cwd="/path/to/project",
    model="claude-sonnet-4-5-20250929",
    max_thinking_tokens=8192,
    allowed_tools=["Read", "Edit", "Write", "Bash", "Glob"],
    sandbox={"enabled": True, "autoAllowBashIfSandboxed": True},
    permission_mode="default",
)

async for message in query(prompt="Fix the bug", options=options):
    # message is AssistantMessage, ToolResultMessage, or ResultMessage
    print(message)

Key Abstractions:

query() → streams messages as AsyncIterator
Built-in tools: Read, Edit, Write, Bash, Glob, Grep, etc.
Native OS-level sandbox support
Supports AWS Bedrock (CLAUDE_CODE_USE_BEDROCK=1), Google Vertex, Microsoft Foundry
Same tools that power Claude Code

When to use:

You want production-ready tools maintained by Anthropic
You need Claude Code features (skills, slash commands, MCP servers)
You want native OS-level sandboxing

Codex Backend (Implemented)¶

# Uses openai-agents SDK MCPServerStdio for MCP communication
from agents.mcp import MCPServerStdio

mcp_server = MCPServerStdio(
    name="codex",
    params={"command": "codex", "args": ["mcp-server"]},
)
await mcp_server.connect()

# Call codex tool to start a session
result = await mcp_server.call_tool("codex", {
    "prompt": "Fix the bug",
    "cwd": "/path/to/project",
    "approval-policy": "never",
    "sandbox": "workspace-write",
    "model": "o3",
    "config": {"model_reasoning_effort": "high"},
})

# Continue with codex-reply for subsequent messages
result = await mcp_server.call_tool("codex-reply", {
    "prompt": "Now add tests",
    "conversationId": result["conversationId"],
})

Key Abstractions:

MCPServerStdio → manages subprocess lifecycle
MCP tools: codex (start session) and codex-reply (continue)
Configurable sandbox and approval policies
Custom MCP servers passed via config parameter

When to use:

You want to use OpenAI reasoning models (o3, GPT-5-Codex)
You need specific approval policies (untrusted, on-request, on-failure, never)
You want to extend Codex with custom MCP tools

Codex Backend Tool Limitations¶

The Codex backend runs custom tools in a subprocess via MCP. Tools are serialized using cloudpickle.

Note: Tools which capture unpicklable state will fail silently.

When tools fail to pickle:

Warning logged: custom_tools_not_picklable
Tools are dropped — agent runs without them
No error raised (silent failure)

Writing Codex-compatible tools:

✅ Do:

Use module-level functions (not lambdas or closures)
Import dependencies inside the function
Pass dynamic state via environment variables or temp files

❌ Don’t:

Capture self or instance variables in closures
Capture async clients, locks, or trace recorders
Use lambda functions that capture outer scope

Example — BAD (captures self):

def _create_tools(self):
    async def wrapper(inputs):
        return await self.some_method(inputs)  # Captures self - NOT picklable!
    return [FunctionTool(func=wrapper)]

Example — GOOD (stateless module-level function):

# Module level - no closure, fully picklable
async def _stateless_wrapper(inputs: dict) -> str:
    from mymodule import some_function  # Import inside function
    return await some_function(inputs)

def _create_tools(self):
    return [FunctionTool(func=_stateless_wrapper)]

Note: For tools that need dynamic state (such as mas_code that changes per call), use environment variables or temporary files to pass data to the subprocess.

ACP Backend (Implemented)¶

agent = AutonomousCodingAgent(
    backend="acp",
    acp_command="codex-acp",
    acp_args=["-c", 'model="gpt-5.4"', "-c", 'model_reasoning_effort="high"'],
    acp_permission_policy="allow",
)

Key Abstractions:

Spawns an ACP-compatible agent subprocess over stdio
Creates an ACP session with the request working directory
Sends prompts through session/prompt
Serves ACP client file reads and writes inside the request working directory
Adds Agenter’s autonomous backend contract by default and auto-continues once if an interactive ACP agent asks for confirmation instead of editing
Converts ACP session updates into Agenter backend messages
Tracks modified files by comparing workspace snapshots

When to use:

You want to run an ACP-compatible agent through Agenter’s validation and event model
You need an agent that already speaks ACP but does not have a native Agenter backend

Security note: ACP agents are external processes. Agenter observes file changes after execution, but sandbox enforcement depends on the launched ACP agent and its own configuration.

Abstraction Strategy¶

The CodingBackend protocol normalizes backend differences:

┌─────────────────────────────────────────────────────────────────────────────────┐
│                           CodingBackend Protocol                                │
├─────────────────────────────────────────────────────────────────────────────────┤
│  connect(cwd, allowed_write_paths, resume_session_id, output_type, system_prompt)│
│  execute(prompt: str)     → AsyncIterator[BackendMessage]                       │
│  modified_files()         → ModifiedFiles                                       │
│  usage()                  → Usage (tokens, cost)                                │
│  structured_output()      → BaseModel | None                                    │
│  refusal()                → RefusalMessage | None                               │
│  disconnect()             → Cleanup                                             │
└─────────────────────────────────────────────────────────────────────────────────┘
                              │
    ┌─────────────┬───────────┼───────────┬─────────────┐
    │             │           │           │             │
    ▼             ▼           ▼           ▼             ▼
┌───────────┐ ┌───────────┐ ┌───────────┐ ┌──────────────────┐
│ Anthropic │ │ ClaudeAgt │ │  Codex    │ │   OpenHands      │
│ Backend   │ │ Backend   │ │ Backend   │ │   Backend        │
│ messages. │ │ query()   │ │ call_tool │ │ Conversation.run │
│ create()  │ │           │ │           │ │                  │
└───────────┘ └───────────┘ └───────────┘ └──────────────────┘

Protocol Method Mappings:

Protocol Method	AnthropicSDKBackend	ClaudeCodeBackend	CodexBackend	OpenHandsBackend
`connect(cwd)`	Set `PathResolver(cwd)`	Set `cwd` option	Set in MCP params	Set in Conversation
`execute(prompt)`	`messages.create()` + tool loop	`query(prompt, options)`	`call_tool("codex")`	`Conversation.run()`
`modified_files()`	Track via FileOperations	Parse from messages	Parse from result	Extract from events
`usage()`	Track tokens + litellm pricing	Extract from SDK	Extract from result	Track via litellm
`structured_output()`	Parse from tool call	Parse from tool call	Parse from response text	Parse from response text
`refusal()`	Capture Refusal tool	Capture Refusal tool	Capture Refusal tool	Capture Refusal tool

Data Flow¶

Execution Flow¶

┌──────────────────────────────────────────────────────────────────┐
│                         CodingRequest                            │
│  prompt, cwd, system_prompt, budget, output_type                 │
└───────────────────────────────┬──────────────────────────────────┘
                                │
                                ▼
┌──────────────────────────────────────────────────────────────────┐
│                    AutonomousCodingAgent.execute()               │
│  1. Create CodingSession with config                             │
│  2. Initialize backend                                           │
│  3. Run session.run(request)                                     │
└───────────────────────────────┬──────────────────────────────────┘
                                │
                                ▼
┌──────────────────────────────────────────────────────────────────┐
│                      CodingSession.run()                         │
│  LOOP (max_iterations):                                          │
│    backend.execute(prompt) → stream BackendMessages              │
│    Collect files modified                                        │
│    Run validators → ValidationResult                             │
│    IF passed THEN COMPLETED                                      │
│    IF budget exceeded THEN BUDGET_EXCEEDED                       │
│    Prepare retry prompt with errors                              │
│  END LOOP → FAILED                                               │
└───────────────────────────────┬──────────────────────────────────┘
                                │
                                ▼
┌──────────────────────────────────────────────────────────────────┐
│                         CodingResult                             │
│  status, files, summary, iterations, metrics                     │
└──────────────────────────────────────────────────────────────────┘

Event Streams¶

For observability, the session emits events throughout its execution:

Event Type	When	Emitted Data
`SESSION_START`	Session initialized	cwd, model, max_iterations
`ITERATION_START`	Beginning iteration N	iteration number
`BACKEND_MESSAGE`	Message from backend	message_type, content, tool_name
`VALIDATION_START`	About to run validators	validators list, file_count
`VALIDATION_RESULT`	Result from one validator	validator name, passed, errors
`ITERATION_END`	Iteration complete	iteration, passed, files_modified, tokens, cost
`COMPLETED`	Task completed successfully	files, summary, iterations, tokens, cost
`FAILED`	Task failed	status, files, summary, iterations, tokens, cost
`REFUSED`	LLM declined the request	refusal reason, category
`SESSION_END`	Session finished (always emitted)	status, iterations, tokens, cost

Event lifecycle:

SESSION_START → ITERATION_START → BACKEND_MESSAGE* →
VALIDATION_START → VALIDATION_RESULT* → ITERATION_END →
(repeat iterations) → COMPLETED/FAILED/REFUSED → SESSION_END

Validation Framework¶

Validator Protocol¶

Validators run against modified files and return pass/fail with errors.

Validator	Checks	Blocking	When to Use
SyntaxValidator	Python AST parsing	Yes	Always (default, instant feedback)
SecurityValidator	Bandit static analysis	No (advisory)	Security-sensitive code

Note: Validators are configurable via AutonomousCodingAgent(validators=[...]). Default is [SyntaxValidator()]. SecurityValidator uses Bandit to detect vulnerabilities (eval, hardcoded secrets, SQL injection, etc.) and is non-blocking by default. Custom validators can be added by implementing the Validator protocol.

Validation Flow¶

Files Modified
      │
      ▼
┌─────────────┐
│   Syntax    │
│  Validator  │
└─────────────┘
      │
      ▼
   Errors?
      │
      ▼
ValidationResult
(passed, errors)

Validators run sequentially via ValidatorChain. Blocking validators (like SyntaxValidator) short-circuit on failure.

Budget Enforcement¶

Budget Criteria¶

Criterion	Unit	Enforcement
Tokens	count	Sum across iterations, hard stop
Cost	USD	Estimated from token usage, hard stop
Time	seconds	Wall clock from session start, hard stop
Iterations	count	Loop limit, hard stop

Budget Checkpoints¶

Budget is checked:

before each iteration starts
after each backend execution completes
before retry prompt is sent

If any limit is exceeded, the session returns BUDGET_EXCEEDED status with an explanation.

Configuration¶

Config Hierarchy¶

AutonomousCodingAgent
├── Backend Selection
│   └── backend: "anthropic-sdk" | "claude-code" | "codex" | "openhands" | "acp"
├── Security
│   └── sandbox: bool = True  (unified sandbox control)
├── Backend-Specific (anthropic-sdk backend)
│   ├── model: str  (e.g., "claude-sonnet-4-20250514")
│   ├── tools: list[Tool]  (custom tools)
│   └── use_anthropic_tools: bool  (use text_editor_20250728)
├── Backend-Specific (claude-code backend)
│   ├── allowed_tools: ["Read", "Edit", "Write", "Bash", ...]
│   ├── setting_sources: ["project", "user"]
│   └── claude_max_thinking_tokens: int
├── Backend-Specific (codex backend)
│   ├── codex_approval_policy: "never" | "on-request" | "on-failure" | "untrusted"
│   ├── codex_mcp_servers: list[CodexMCPServer]  (custom MCP tools)
│   └── codex_reasoning_effort: "minimal" | "low" | "medium" | "high"
├── Backend-Specific (openhands backend)
│   ├── model: str  (litellm format, e.g., "openai/gpt-4o")
│   └── sandbox: False  (required - no sandbox support)
├── Backend-Specific (acp backend)
│   ├── acp_command: str
│   ├── acp_args: list[str]
│   ├── acp_env: dict[str, str]
│   ├── acp_mcp_servers: list[Any]
│   ├── acp_permission_policy: "deny" | "allow"
│   └── acp_autonomous: bool
├── Safeguards
│   ├── max_iterations: 5
│   ├── max_tokens: 500_000
│   ├── max_cost_usd: 10.0
│   └── max_time_seconds: 3600
└── Validation
    └── validators: ["syntax", "security"]  (security is non-blocking by default)

Error Handling¶

Exception vs. Status¶

Exceptions are raised from unrecoverable errors that prevent execution:

ConfigurationError: Invalid configuration (wrong backend name, missing keys)
BackendError: Backend connection or execution failure
BudgetExceededError: Only if raise_on_budget_exceeded=True

Status codes are returned for expected completion states:

CodingStatus.COMPLETED: Task finished successfully within budget
CodingStatus.COMPLETED_WITH_LIMIT_EXCEEDED: Task succeeded but exceeded budget limits
CodingStatus.BUDGET_EXCEEDED: Stopped before completion due to limits (default behavior)
CodingStatus.REFUSED: LLM declined the request
CodingStatus.FAILED: Task couldn’t complete (validation never passed)

Category	Handling	Recovery
Invalid configuration	Raise `ConfigurationError`	Fix config, retry
Backend connection failure	Raise `BackendError`	Check credentials/network
Backend execution error	Raise `BackendError`	Check backend health
Validation failure	Retry iteration with context	Max iterations limit
Budget exceeded	Return `BUDGET_EXCEEDED` status	Caller decides
LLM refusal	Return `REFUSED` status	Modify prompt

Security Considerations¶

Unified Sandbox Mode¶

All backends default to sandbox=True for safe operation:

Backend	`sandbox=True` (default)	`sandbox=False`
AnthropicSDKBackend	PathResolver enforces `allowed_write_paths`	Writes anywhere in cwd
ClaudeCodeBackend	Native OS-level sandbox	`bypassPermissions` mode
CodexBackend	`workspace-write` mode	`danger-full-access` mode
OpenHandsBackend	Not supported (raises error)	Full filesystem access
ACPBackend	Depends on launched ACP agent	Depends on launched ACP agent

Usage Examples¶

# Safe mode (default) - all backends sandboxed
agent = AutonomousCodingAgent(backend="claude-code")

# Disable sandbox for full access
agent = AutonomousCodingAgent(backend="claude-code", sandbox=False)

File System Scope¶

sandbox=True: Operations restricted based on a given backend’s sandbox implementation
sandbox=False: Full filesystem access

API Key Management¶

Keys passed via environment variables or config
Never logged or included in error messages
Backend-specific key names: ANTHROPIC_API_KEY, OPENAI_API_KEY

Framework Adapters¶

LangGraph Adapter¶

create_coding_node() returns an async function compatible with StateGraph.add_node().

Input: State dict with prompt key (and optional cwd)

Output: State update with coding_result dict containing CodingResult fields

from agenter.adapters.langgraph import create_coding_node, CodingState

graph = StateGraph(CodingState)
graph.add_node("coder", create_coding_node(cwd="./workspace"))

PydanticAI Adapter¶

CodingAgent provides direct execution with a PydanticAI-like interface without extra LLM layers.

Input: Prompt and cwd via run() method

Output: CodingResult with status, files modified, and summary

from agenter.adapters.pydantic_ai import CodingAgent

agent = CodingAgent(backend="anthropic-sdk")
result = await agent.run("Implement the feature", cwd="./workspace")