Architecture Design

TL;DR

A summary of the system architecture, component design, and data flows of Agenter.

The SDK follows a simple layered design pattern from top user-facing layers to bottom coding agent backends:

  • Adapters (LangGraph, PydanticAI)

  • Facade (AutonomousCodingAgent)

  • Runtime (CodingSession)

  • Backends (Anthropic, Claude Code, Codex, OpenHands, ACP).


System Overview

┌─────────────────────────────────────────────────────────────┐
│                    User Applications                        │
├─────────────────────────────┬───────────────────────────────┤
│  LangGraph Adapter          │  PydanticAI Adapter           │
│  (adapters/langgraph.py)    │  (adapters/pydantic_ai.py)    │
└─────────────────────────────┴───────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│              AutonomousCodingAgent (Facade)                 │
│  - execute(request) -> CodingResult                         │
│  - stream_execute() -> AsyncIterator[CodingEvent]           │
└─────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│                   CodingSession                             │
│  - Manages iteration loop (code → validate → fix → retry)   │
│  - Emits events for observability                           │
│  - Enforces budget limits                                   │
└─────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌───────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                    CodingBackend Protocol (Abstract)                                  │
├───────────────────────┬───────────────────────┬─────────────────────┬───────────────────┬─────────────┤
│  AnthropicSDKBackend  │   ClaudeCodeBackend   │    CodexBackend     │ OpenHandsBackend  │  ACPBackend │
│  (anthropic SDK)      │  (claude-code-sdk)    │  (openai-agents)    │  (openhands-sdk)  │  (ACP CLI)  │
└───────────────────────┴───────────────────────┴─────────────────────┴───────────────────┴─────────────┘

Layer Responsibilities

Layer 1: User Applications (Adapters)

The top layer represents adapters for open-ended agentic coding frameworks that integrate the SDK into different agentic workflows.

Adapter

Framework

Use Case

langgraph.py

LangGraph

create_coding_node() for StateGraph workflows

pydantic_ai.py

PydanticAI

CodingAgent for direct execution with PydanticAI-like interface

Adapters are thin wrappers and facades that translate between framework conventions and the SDK’s unified interface.

Layer 2: AutonomousCodingAgent (Facade)

The public API surface. Users interact only with this class.

Responsibilities:

  • Instantiate and configure coding agent backends

  • Provide sync/async execution methods

  • Aggregate results from coding sessions

  • Hide all internal complexity of the layers below

Interface:

  • execute(request: CodingRequest) -> CodingResult

  • stream_execute(request: CodingRequest) -> AsyncIterator[CodingEvent]

Layer 3: CodingSession

The core orchestration logic for a coding session. Manages the iteration loop within the coding session.

Responsibilities:

  • Execute a backend with a prompt

  • Run validators on outputs

  • Retry on validation failures (with error context in the updated prompt)

  • Enforce budget limits (tokens, cost, time, iterations)

  • Emit events for observability

Iteration Loop:

┌─────────────────────────────────────────────────────────────┐
│                    CodingSession.run()                      │
├─────────────────────────────────────────────────────────────┤
│  FOREACH iteration (up to max_iterations):                  │
│    1. Check budget limits                                   │
│    2. Execute backend.execute(prompt)                       │
│    3. Collect files modified                                │
│    4. Run validators (e.g. syntax or security)              │
│    5. IF validation passed THEN return COMPLETED            │
│    6. IF budget exceeded THEN return BUDGET_EXCEEDED        │
│    7. Prepare a retry prompt with validation errors         │
│  END → return FAILED                                        │
└─────────────────────────────────────────────────────────────┘

Layer 4: CodingBackend Protocol

Abstract interfaces implemented for each distinct coding agent backend.

Responsibilities:

  • Connect to a backend (subprocess, SDK client)

  • Execute prompts and obtain stream responses

  • Track files modified

  • Report token usage and estimated cost

Backend Comparison:

Backend

Wraps

Agent Logic

Runtime

Key Feature

AnthropicSDKBackend

anthropic

Custom tool-use loop

HTTP (async client)

Full control, custom tools, AWS Bedrock

ClaudeCodeBackend

claude-code-sdk

Claude Code

Claude Code CLI

Battle-tested tools, AWS Bedrock, Google Vertex

CodexBackend

openai-agents

Codex MCP server

MCP over stdio

OpenAI models, custom MCP tools, sandbox modes

OpenHandsBackend

openhands-sdk

OpenHands agent

litellm

Any model, no sandbox (full access)

ACPBackend

agent-client-protocol

Any ACP-compatible agent

JSON-RPC over stdio

Interoperability with ACP agents

Which backend to use?

Use Case and Scenario

Recommended Backend

Config

Custom tools, full control

AnthropicSDKBackend

backend="anthropic-sdk" (default)

Battle-tested Claude Code tools

ClaudeCodeBackend

backend="claude-code", model="claude-sonnet-4-5-20250929", claude_max_thinking_tokens=8192

Need skills, slash commands, MCP

ClaudeCodeBackend

backend="claude-code"

OpenAI models (o3, GPT-5-Codex)

CodexBackend

backend="codex", model="gpt-5.4", codex_reasoning_effort="high"

Custom MCP tools with OpenAI

CodexBackend

backend="codex", codex_mcp_servers=[...]

Any model via litellm

OpenHandsBackend

backend="openhands", sandbox=False

Any ACP-compatible agent process

ACPBackend

backend="acp", acp_command="codex-acp", acp_args=["-c", "model=\"gpt-5.4\"", "-c", "model_reasoning_effort=\"high\""]

Note: All backends default to sandbox=True. Use sandbox=False for unrestricted access.


Backend SDK Patterns

Each backend SDK presents a different API surface. Here’s how they work and how we abstract them:

Anthropic SDK Backend (Implemented)

# Actual implementation uses anthropic SDK directly
import anthropic

client = anthropic.AsyncAnthropic()

response = await client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=16384,
    system="You are an autonomous coding agent...",
    tools=[...],  # File tools: read_file, write_file, edit_file
    messages=[{"role": "user", "content": "Fix the bug"}],
)

# Tool use loop handles file operations
for block in response.content:
    if block.type == "tool_use":
        result = execute_tool(block.name, block.input)
        # Continue conversation with tool result

Key Abstractions:

  • anthropic.AsyncAnthropic() → async HTTP client

  • Custom tool-use loop manages file operations

  • Supports both Anthropic API and AWS Bedrock via boto3

  • No subprocess or CLI dependency

Claude Code SDK Backend (Implemented)

# Uses claude-code-sdk (Claude Code as a library)
from claude_code_sdk import query, ClaudeCodeOptions

# Safe mode with native sandbox (default)
options = ClaudeCodeOptions(
    cwd="/path/to/project",
    model="claude-sonnet-4-5-20250929",
    max_thinking_tokens=8192,
    allowed_tools=["Read", "Edit", "Write", "Bash", "Glob"],
    sandbox={"enabled": True, "autoAllowBashIfSandboxed": True},
    permission_mode="default",
)

async for message in query(prompt="Fix the bug", options=options):
    # message is AssistantMessage, ToolResultMessage, or ResultMessage
    print(message)

Key Abstractions:

  • query() → streams messages as AsyncIterator

  • Built-in tools: Read, Edit, Write, Bash, Glob, Grep, etc.

  • Native OS-level sandbox support

  • Supports AWS Bedrock (CLAUDE_CODE_USE_BEDROCK=1), Google Vertex, Microsoft Foundry

  • Same tools that power Claude Code

When to use:

  • You want production-ready tools maintained by Anthropic

  • You need Claude Code features (skills, slash commands, MCP servers)

  • You want native OS-level sandboxing

Codex Backend (Implemented)

# Uses openai-agents SDK MCPServerStdio for MCP communication
from agents.mcp import MCPServerStdio

mcp_server = MCPServerStdio(
    name="codex",
    params={"command": "codex", "args": ["mcp-server"]},
)
await mcp_server.connect()

# Call codex tool to start a session
result = await mcp_server.call_tool("codex", {
    "prompt": "Fix the bug",
    "cwd": "/path/to/project",
    "approval-policy": "never",
    "sandbox": "workspace-write",
    "model": "o3",
    "config": {"model_reasoning_effort": "high"},
})

# Continue with codex-reply for subsequent messages
result = await mcp_server.call_tool("codex-reply", {
    "prompt": "Now add tests",
    "conversationId": result["conversationId"],
})

Key Abstractions:

  • MCPServerStdio → manages subprocess lifecycle

  • MCP tools: codex (start session) and codex-reply (continue)

  • Configurable sandbox and approval policies

  • Custom MCP servers passed via config parameter

When to use:

  • You want to use OpenAI reasoning models (o3, GPT-5-Codex)

  • You need specific approval policies (untrusted, on-request, on-failure, never)

  • You want to extend Codex with custom MCP tools

Codex Backend Tool Limitations

The Codex backend runs custom tools in a subprocess via MCP. Tools are serialized using cloudpickle.

Note: Tools which capture unpicklable state will fail silently.

When tools fail to pickle:

  • Warning logged: custom_tools_not_picklable

  • Tools are dropped — agent runs without them

  • No error raised (silent failure)

Writing Codex-compatible tools:

Do:

  • Use module-level functions (not lambdas or closures)

  • Import dependencies inside the function

  • Pass dynamic state via environment variables or temp files

Don’t:

  • Capture self or instance variables in closures

  • Capture async clients, locks, or trace recorders

  • Use lambda functions that capture outer scope

Example — BAD (captures self):

def _create_tools(self):
    async def wrapper(inputs):
        return await self.some_method(inputs)  # Captures self - NOT picklable!
    return [FunctionTool(func=wrapper)]

Example — GOOD (stateless module-level function):

# Module level - no closure, fully picklable
async def _stateless_wrapper(inputs: dict) -> str:
    from mymodule import some_function  # Import inside function
    return await some_function(inputs)

def _create_tools(self):
    return [FunctionTool(func=_stateless_wrapper)]

Note: For tools that need dynamic state (such as mas_code that changes per call), use environment variables or temporary files to pass data to the subprocess.

ACP Backend (Implemented)

agent = AutonomousCodingAgent(
    backend="acp",
    acp_command="codex-acp",
    acp_args=["-c", 'model="gpt-5.4"', "-c", 'model_reasoning_effort="high"'],
    acp_permission_policy="allow",
)

Key Abstractions:

  • Spawns an ACP-compatible agent subprocess over stdio

  • Creates an ACP session with the request working directory

  • Sends prompts through session/prompt

  • Serves ACP client file reads and writes inside the request working directory

  • Adds Agenter’s autonomous backend contract by default and auto-continues once if an interactive ACP agent asks for confirmation instead of editing

  • Converts ACP session updates into Agenter backend messages

  • Tracks modified files by comparing workspace snapshots

When to use:

  • You want to run an ACP-compatible agent through Agenter’s validation and event model

  • You need an agent that already speaks ACP but does not have a native Agenter backend

Security note: ACP agents are external processes. Agenter observes file changes after execution, but sandbox enforcement depends on the launched ACP agent and its own configuration.

Abstraction Strategy

The CodingBackend protocol normalizes backend differences:

┌─────────────────────────────────────────────────────────────────────────────────┐
│                           CodingBackend Protocol                                │
├─────────────────────────────────────────────────────────────────────────────────┤
│  connect(cwd, allowed_write_paths, resume_session_id, output_type, system_prompt)│
│  execute(prompt: str)     → AsyncIterator[BackendMessage]                       │
│  modified_files()         → ModifiedFiles                                       │
│  usage()                  → Usage (tokens, cost)                                │
│  structured_output()      → BaseModel | None                                    │
│  refusal()                → RefusalMessage | None                               │
│  disconnect()             → Cleanup                                             │
└─────────────────────────────────────────────────────────────────────────────────┘
                              │
    ┌─────────────┬───────────┼───────────┬─────────────┐
    │             │           │           │             │
    ▼             ▼           ▼           ▼             ▼
┌───────────┐ ┌───────────┐ ┌───────────┐ ┌──────────────────┐
│ Anthropic │ │ ClaudeAgt │ │  Codex    │ │   OpenHands      │
│ Backend   │ │ Backend   │ │ Backend   │ │   Backend        │
│ messages. │ │ query()   │ │ call_tool │ │ Conversation.run │
│ create()  │ │           │ │           │ │                  │
└───────────┘ └───────────┘ └───────────┘ └──────────────────┘

Protocol Method Mappings:

Protocol Method

AnthropicSDKBackend

ClaudeCodeBackend

CodexBackend

OpenHandsBackend

connect(cwd)

Set PathResolver(cwd)

Set cwd option

Set in MCP params

Set in Conversation

execute(prompt)

messages.create() + tool loop

query(prompt, options)

call_tool("codex")

Conversation.run()

modified_files()

Track via FileOperations

Parse from messages

Parse from result

Extract from events

usage()

Track tokens + litellm pricing

Extract from SDK

Extract from result

Track via litellm

structured_output()

Parse from tool call

Parse from tool call

Parse from response text

Parse from response text

refusal()

Capture Refusal tool

Capture Refusal tool

Capture Refusal tool

Capture Refusal tool


Data Flow

Execution Flow

┌──────────────────────────────────────────────────────────────────┐
│                         CodingRequest                            │
│  prompt, cwd, system_prompt, budget, output_type                 │
└───────────────────────────────┬──────────────────────────────────┘
                                │
                                ▼
┌──────────────────────────────────────────────────────────────────┐
│                    AutonomousCodingAgent.execute()               │
│  1. Create CodingSession with config                             │
│  2. Initialize backend                                           │
│  3. Run session.run(request)                                     │
└───────────────────────────────┬──────────────────────────────────┘
                                │
                                ▼
┌──────────────────────────────────────────────────────────────────┐
│                      CodingSession.run()                         │
│  LOOP (max_iterations):                                          │
│    backend.execute(prompt) → stream BackendMessages              │
│    Collect files modified                                        │
│    Run validators → ValidationResult                             │
│    IF passed THEN COMPLETED                                      │
│    IF budget exceeded THEN BUDGET_EXCEEDED                       │
│    Prepare retry prompt with errors                              │
│  END LOOP → FAILED                                               │
└───────────────────────────────┬──────────────────────────────────┘
                                │
                                ▼
┌──────────────────────────────────────────────────────────────────┐
│                         CodingResult                             │
│  status, files, summary, iterations, metrics                     │
└──────────────────────────────────────────────────────────────────┘

Event Streams

For observability, the session emits events throughout its execution:

Event Type

When

Emitted Data

SESSION_START

Session initialized

cwd, model, max_iterations

ITERATION_START

Beginning iteration N

iteration number

BACKEND_MESSAGE

Message from backend

message_type, content, tool_name

VALIDATION_START

About to run validators

validators list, file_count

VALIDATION_RESULT

Result from one validator

validator name, passed, errors

ITERATION_END

Iteration complete

iteration, passed, files_modified, tokens, cost

COMPLETED

Task completed successfully

files, summary, iterations, tokens, cost

FAILED

Task failed

status, files, summary, iterations, tokens, cost

REFUSED

LLM declined the request

refusal reason, category

SESSION_END

Session finished (always emitted)

status, iterations, tokens, cost

Event lifecycle:

SESSION_START → ITERATION_START → BACKEND_MESSAGE* →
VALIDATION_START → VALIDATION_RESULT* → ITERATION_END →
(repeat iterations) → COMPLETED/FAILED/REFUSED → SESSION_END

Validation Framework

Validator Protocol

Validators run against modified files and return pass/fail with errors.

Validator

Checks

Blocking

When to Use

SyntaxValidator

Python AST parsing

Yes

Always (default, instant feedback)

SecurityValidator

Bandit static analysis

No (advisory)

Security-sensitive code

Note: Validators are configurable via AutonomousCodingAgent(validators=[...]). Default is [SyntaxValidator()]. SecurityValidator uses Bandit to detect vulnerabilities (eval, hardcoded secrets, SQL injection, etc.) and is non-blocking by default. Custom validators can be added by implementing the Validator protocol.

Validation Flow

Files Modified
      │
      ▼
┌─────────────┐
│   Syntax    │
│  Validator  │
└─────────────┘
      │
      ▼
   Errors?
      │
      ▼
ValidationResult
(passed, errors)

Validators run sequentially via ValidatorChain. Blocking validators (like SyntaxValidator) short-circuit on failure.


Budget Enforcement

Budget Criteria

Criterion

Unit

Enforcement

Tokens

count

Sum across iterations, hard stop

Cost

USD

Estimated from token usage, hard stop

Time

seconds

Wall clock from session start, hard stop

Iterations

count

Loop limit, hard stop

Budget Checkpoints

Budget is checked:

  1. before each iteration starts

  2. after each backend execution completes

  3. before retry prompt is sent

If any limit is exceeded, the session returns BUDGET_EXCEEDED status with an explanation.


Configuration

Config Hierarchy

AutonomousCodingAgent
├── Backend Selection
│   └── backend: "anthropic-sdk" | "claude-code" | "codex" | "openhands" | "acp"
├── Security
│   └── sandbox: bool = True  (unified sandbox control)
├── Backend-Specific (anthropic-sdk backend)
│   ├── model: str  (e.g., "claude-sonnet-4-20250514")
│   ├── tools: list[Tool]  (custom tools)
│   └── use_anthropic_tools: bool  (use text_editor_20250728)
├── Backend-Specific (claude-code backend)
│   ├── allowed_tools: ["Read", "Edit", "Write", "Bash", ...]
│   ├── setting_sources: ["project", "user"]
│   └── claude_max_thinking_tokens: int
├── Backend-Specific (codex backend)
│   ├── codex_approval_policy: "never" | "on-request" | "on-failure" | "untrusted"
│   ├── codex_mcp_servers: list[CodexMCPServer]  (custom MCP tools)
│   └── codex_reasoning_effort: "minimal" | "low" | "medium" | "high"
├── Backend-Specific (openhands backend)
│   ├── model: str  (litellm format, e.g., "openai/gpt-4o")
│   └── sandbox: False  (required - no sandbox support)
├── Backend-Specific (acp backend)
│   ├── acp_command: str
│   ├── acp_args: list[str]
│   ├── acp_env: dict[str, str]
│   ├── acp_mcp_servers: list[Any]
│   ├── acp_permission_policy: "deny" | "allow"
│   └── acp_autonomous: bool
├── Safeguards
│   ├── max_iterations: 5
│   ├── max_tokens: 500_000
│   ├── max_cost_usd: 10.0
│   └── max_time_seconds: 3600
└── Validation
    └── validators: ["syntax", "security"]  (security is non-blocking by default)

Error Handling

Exception vs. Status

Exceptions are raised from unrecoverable errors that prevent execution:

  • ConfigurationError: Invalid configuration (wrong backend name, missing keys)

  • BackendError: Backend connection or execution failure

  • BudgetExceededError: Only if raise_on_budget_exceeded=True

Status codes are returned for expected completion states:

  • CodingStatus.COMPLETED: Task finished successfully within budget

  • CodingStatus.COMPLETED_WITH_LIMIT_EXCEEDED: Task succeeded but exceeded budget limits

  • CodingStatus.BUDGET_EXCEEDED: Stopped before completion due to limits (default behavior)

  • CodingStatus.REFUSED: LLM declined the request

  • CodingStatus.FAILED: Task couldn’t complete (validation never passed)

Category

Handling

Recovery

Invalid configuration

Raise ConfigurationError

Fix config, retry

Backend connection failure

Raise BackendError

Check credentials/network

Backend execution error

Raise BackendError

Check backend health

Validation failure

Retry iteration with context

Max iterations limit

Budget exceeded

Return BUDGET_EXCEEDED status

Caller decides

LLM refusal

Return REFUSED status

Modify prompt


Security Considerations

Unified Sandbox Mode

All backends default to sandbox=True for safe operation:

Backend

sandbox=True (default)

sandbox=False

AnthropicSDKBackend

PathResolver enforces allowed_write_paths

Writes anywhere in cwd

ClaudeCodeBackend

Native OS-level sandbox

bypassPermissions mode

CodexBackend

workspace-write mode

danger-full-access mode

OpenHandsBackend

Not supported (raises error)

Full filesystem access

ACPBackend

Depends on launched ACP agent

Depends on launched ACP agent

Usage Examples

# Safe mode (default) - all backends sandboxed
agent = AutonomousCodingAgent(backend="claude-code")

# Disable sandbox for full access
agent = AutonomousCodingAgent(backend="claude-code", sandbox=False)

File System Scope

  • sandbox=True: Operations restricted based on a given backend’s sandbox implementation

  • sandbox=False: Full filesystem access

API Key Management

  • Keys passed via environment variables or config

  • Never logged or included in error messages

  • Backend-specific key names: ANTHROPIC_API_KEY, OPENAI_API_KEY


Framework Adapters

LangGraph Adapter

create_coding_node() returns an async function compatible with StateGraph.add_node().

Input: State dict with prompt key (and optional cwd)

Output: State update with coding_result dict containing CodingResult fields

from agenter.adapters.langgraph import create_coding_node, CodingState

graph = StateGraph(CodingState)
graph.add_node("coder", create_coding_node(cwd="./workspace"))

PydanticAI Adapter

CodingAgent provides direct execution with a PydanticAI-like interface without extra LLM layers.

Input: Prompt and cwd via run() method

Output: CodingResult with status, files modified, and summary

from agenter.adapters.pydantic_ai import CodingAgent

agent = CodingAgent(backend="anthropic-sdk")
result = await agent.run("Implement the feature", cwd="./workspace")