# Architecture Design

## TL;DR

A summary of the system architecture, component design, and data flows of **Agenter**.

The SDK follows a simple layered design pattern from top user-facing layers to bottom coding agent backends:

* **Adapters** (LangGraph, PydanticAI)
* → **Facade** (AutonomousCodingAgent)
* → **Runtime** (CodingSession)
* → **Backends** (Anthropic, Claude Code, Codex, OpenHands, ACP).

---

## System Overview

```text
┌─────────────────────────────────────────────────────────────┐
│                    User Applications                        │
├─────────────────────────────┬───────────────────────────────┤
│  LangGraph Adapter          │  PydanticAI Adapter           │
│  (adapters/langgraph.py)    │  (adapters/pydantic_ai.py)    │
└─────────────────────────────┴───────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│              AutonomousCodingAgent (Facade)                 │
│  - execute(request) -> CodingResult                         │
│  - stream_execute() -> AsyncIterator[CodingEvent]           │
└─────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│                   CodingSession                             │
│  - Manages iteration loop (code → validate → fix → retry)   │
│  - Emits events for observability                           │
│  - Enforces budget limits                                   │
└─────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌───────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                    CodingBackend Protocol (Abstract)                                  │
├───────────────────────┬───────────────────────┬─────────────────────┬───────────────────┬─────────────┤
│  AnthropicSDKBackend  │   ClaudeCodeBackend   │    CodexBackend     │ OpenHandsBackend  │  ACPBackend │
│  (anthropic SDK)      │  (claude-code-sdk)    │  (openai-agents)    │  (openhands-sdk)  │  (ACP CLI)  │
└───────────────────────┴───────────────────────┴─────────────────────┴───────────────────┴─────────────┘
```

---

## Layer Responsibilities

### Layer 1: User Applications (Adapters)

The top layer represents adapters for open-ended agentic coding frameworks that integrate the SDK into different agentic workflows.

| Adapter | Framework | Use Case |
|---------|-----------|----------|
| `langgraph.py` | LangGraph | `create_coding_node()` for StateGraph workflows |
| `pydantic_ai.py` | PydanticAI | `CodingAgent` for direct execution with PydanticAI-like interface |

Adapters are thin wrappers and facades that translate between framework conventions and the SDK's unified interface.

### Layer 2: AutonomousCodingAgent (Facade)

The public API surface. Users interact only with this class.

**Responsibilities**:

- Instantiate and configure coding agent backends
- Provide sync/async execution methods
- Aggregate results from coding sessions
- Hide all internal complexity of the layers below

**Interface**:

- `execute(request: CodingRequest)` -> `CodingResult`
- `stream_execute(request: CodingRequest)` -> `AsyncIterator[CodingEvent]`

### Layer 3: CodingSession

The core orchestration logic for a coding session. Manages the iteration loop within the coding session.

**Responsibilities**:

- Execute a backend with a prompt
- Run validators on outputs
- Retry on validation failures (with error context in the updated prompt)
- Enforce budget limits (tokens, cost, time, iterations)
- Emit events for observability

**Iteration Loop**:

```text
┌─────────────────────────────────────────────────────────────┐
│                    CodingSession.run()                      │
├─────────────────────────────────────────────────────────────┤
│  FOREACH iteration (up to max_iterations):                  │
│    1. Check budget limits                                   │
│    2. Execute backend.execute(prompt)                       │
│    3. Collect files modified                                │
│    4. Run validators (e.g. syntax or security)              │
│    5. IF validation passed THEN return COMPLETED            │
│    6. IF budget exceeded THEN return BUDGET_EXCEEDED        │
│    7. Prepare a retry prompt with validation errors         │
│  END → return FAILED                                        │
└─────────────────────────────────────────────────────────────┘
```

### Layer 4: CodingBackend Protocol

Abstract interfaces implemented for each distinct coding agent backend.

**Responsibilities**:

- Connect to a backend (subprocess, SDK client)
- Execute prompts and obtain stream responses
- Track files modified
- Report token usage and estimated cost

**Backend Comparison**:

| Backend | Wraps | Agent Logic | Runtime | Key Feature |
|---------|-------|-------------|---------|-------------|
| AnthropicSDKBackend | [`anthropic`](https://github.com/anthropics/anthropic-sdk-python) | Custom tool-use loop | HTTP (async client) | Full control, custom tools, AWS Bedrock |
| ClaudeCodeBackend | [`claude-code-sdk`](https://github.com/anthropics/claude-code-sdk-python) | Claude Code | Claude Code CLI | Battle-tested tools, AWS Bedrock, Google Vertex |
| CodexBackend | [`openai-agents`](https://github.com/openai/openai-agents-python) | Codex MCP server | MCP over stdio | OpenAI models, custom MCP tools, sandbox modes |
| OpenHandsBackend | [`openhands-sdk`](https://github.com/All-Hands-AI/OpenHands) | OpenHands agent | litellm | Any model, no sandbox (full access) |
| ACPBackend | [`agent-client-protocol`](https://agentclientprotocol.github.io/python-sdk/) | Any ACP-compatible agent | JSON-RPC over stdio | Interoperability with ACP agents |

**Which backend to use?**

| Use Case and Scenario | Recommended Backend | Config |
|----------|---------|--------|
| Custom tools, full control | `AnthropicSDKBackend` | `backend="anthropic-sdk"` (default) |
| Battle-tested Claude Code tools | `ClaudeCodeBackend` | `backend="claude-code", model="claude-sonnet-4-5-20250929", claude_max_thinking_tokens=8192` |
| Need skills, slash commands, MCP | `ClaudeCodeBackend` | `backend="claude-code"` |
| OpenAI models (o3, GPT-5-Codex) | `CodexBackend` | `backend="codex", model="gpt-5.4", codex_reasoning_effort="high"` |
| Custom MCP tools with OpenAI | `CodexBackend` | `backend="codex", codex_mcp_servers=[...]` |
| Any model via litellm | `OpenHandsBackend` | `backend="openhands", sandbox=False` |
| Any ACP-compatible agent process | `ACPBackend` | `backend="acp", acp_command="codex-acp", acp_args=["-c", "model=\"gpt-5.4\"", "-c", "model_reasoning_effort=\"high\""]` |

> **Note**: All backends default to `sandbox=True`. Use `sandbox=False` for unrestricted access.

---

## Backend SDK Patterns

Each backend SDK presents a different API surface. Here's how they work and how we abstract them:

### Anthropic SDK Backend (Implemented)

```python
# Actual implementation uses anthropic SDK directly
import anthropic

client = anthropic.AsyncAnthropic()

response = await client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=16384,
    system="You are an autonomous coding agent...",
    tools=[...],  # File tools: read_file, write_file, edit_file
    messages=[{"role": "user", "content": "Fix the bug"}],
)

# Tool use loop handles file operations
for block in response.content:
    if block.type == "tool_use":
        result = execute_tool(block.name, block.input)
        # Continue conversation with tool result
```

**Key Abstractions**:
- `anthropic.AsyncAnthropic()` → async HTTP client
- Custom tool-use loop manages file operations
- Supports both Anthropic API and AWS Bedrock via boto3
- No subprocess or CLI dependency

### Claude Code SDK Backend (Implemented)

```python
# Uses claude-code-sdk (Claude Code as a library)
from claude_code_sdk import query, ClaudeCodeOptions

# Safe mode with native sandbox (default)
options = ClaudeCodeOptions(
    cwd="/path/to/project",
    model="claude-sonnet-4-5-20250929",
    max_thinking_tokens=8192,
    allowed_tools=["Read", "Edit", "Write", "Bash", "Glob"],
    sandbox={"enabled": True, "autoAllowBashIfSandboxed": True},
    permission_mode="default",
)

async for message in query(prompt="Fix the bug", options=options):
    # message is AssistantMessage, ToolResultMessage, or ResultMessage
    print(message)
```

**Key Abstractions**:
- `query()` → streams messages as `AsyncIterator`
- Built-in tools: Read, Edit, Write, Bash, Glob, Grep, etc.
- Native OS-level sandbox support
- Supports AWS Bedrock (`CLAUDE_CODE_USE_BEDROCK=1`), Google Vertex, Microsoft Foundry
- Same tools that power Claude Code

**When to use**:
- You want production-ready tools maintained by Anthropic
- You need Claude Code features (skills, slash commands, MCP servers)
- You want native OS-level sandboxing

### Codex Backend (Implemented)

```python
# Uses openai-agents SDK MCPServerStdio for MCP communication
from agents.mcp import MCPServerStdio

mcp_server = MCPServerStdio(
    name="codex",
    params={"command": "codex", "args": ["mcp-server"]},
)
await mcp_server.connect()

# Call codex tool to start a session
result = await mcp_server.call_tool("codex", {
    "prompt": "Fix the bug",
    "cwd": "/path/to/project",
    "approval-policy": "never",
    "sandbox": "workspace-write",
    "model": "o3",
    "config": {"model_reasoning_effort": "high"},
})

# Continue with codex-reply for subsequent messages
result = await mcp_server.call_tool("codex-reply", {
    "prompt": "Now add tests",
    "conversationId": result["conversationId"],
})
```

**Key Abstractions**:
- `MCPServerStdio` → manages subprocess lifecycle
- MCP tools: `codex` (start session) and `codex-reply` (continue)
- Configurable sandbox and approval policies
- Custom MCP servers passed via `config` parameter

**When to use**:
- You want to use OpenAI reasoning models (o3, GPT-5-Codex)
- You need specific approval policies (untrusted, on-request, on-failure, never)
- You want to extend Codex with custom MCP tools

### Codex Backend Tool Limitations

The Codex backend runs custom tools in a **subprocess** via MCP. Tools are serialized using `cloudpickle`. 

**Note**: **Tools which capture unpicklable state will fail silently.**

**When tools fail to pickle:**
- Warning logged: `custom_tools_not_picklable`
- Tools are dropped — agent runs without them
- No error raised (silent failure)

**Writing Codex-compatible tools:**

✅ **Do:**
- Use module-level functions (not lambdas or closures)
- Import dependencies inside the function
- Pass dynamic state via environment variables or temp files

❌ **Don't:**
- Capture `self` or instance variables in closures
- Capture async clients, locks, or trace recorders
- Use lambda functions that capture outer scope

**Example — BAD (captures self):**
```python
def _create_tools(self):
    async def wrapper(inputs):
        return await self.some_method(inputs)  # Captures self - NOT picklable!
    return [FunctionTool(func=wrapper)]
```

**Example — GOOD (stateless module-level function):**
```python
# Module level - no closure, fully picklable
async def _stateless_wrapper(inputs: dict) -> str:
    from mymodule import some_function  # Import inside function
    return await some_function(inputs)

def _create_tools(self):
    return [FunctionTool(func=_stateless_wrapper)]
```

**Note**: For tools that need dynamic state (such as `mas_code` that changes per call), use environment variables or temporary files to pass data to the subprocess.

### ACP Backend (Implemented)

```python
agent = AutonomousCodingAgent(
    backend="acp",
    acp_command="codex-acp",
    acp_args=["-c", 'model="gpt-5.4"', "-c", 'model_reasoning_effort="high"'],
    acp_permission_policy="allow",
)
```

**Key Abstractions**:
- Spawns an ACP-compatible agent subprocess over stdio
- Creates an ACP session with the request working directory
- Sends prompts through `session/prompt`
- Serves ACP client file reads and writes inside the request working directory
- Adds Agenter's autonomous backend contract by default and auto-continues once
  if an interactive ACP agent asks for confirmation instead of editing
- Converts ACP session updates into Agenter backend messages
- Tracks modified files by comparing workspace snapshots

**When to use**:
- You want to run an ACP-compatible agent through Agenter's validation and event model
- You need an agent that already speaks ACP but does not have a native Agenter backend

**Security note**: ACP agents are external processes. Agenter observes file changes after execution, but sandbox enforcement depends on the launched ACP agent and its own configuration.

### Abstraction Strategy

The `CodingBackend` protocol normalizes backend differences:

```text
┌─────────────────────────────────────────────────────────────────────────────────┐
│                           CodingBackend Protocol                                │
├─────────────────────────────────────────────────────────────────────────────────┤
│  connect(cwd, allowed_write_paths, resume_session_id, output_type, system_prompt)│
│  execute(prompt: str)     → AsyncIterator[BackendMessage]                       │
│  modified_files()         → ModifiedFiles                                       │
│  usage()                  → Usage (tokens, cost)                                │
│  structured_output()      → BaseModel | None                                    │
│  refusal()                → RefusalMessage | None                               │
│  disconnect()             → Cleanup                                             │
└─────────────────────────────────────────────────────────────────────────────────┘
                              │
    ┌─────────────┬───────────┼───────────┬─────────────┐
    │             │           │           │             │
    ▼             ▼           ▼           ▼             ▼
┌───────────┐ ┌───────────┐ ┌───────────┐ ┌──────────────────┐
│ Anthropic │ │ ClaudeAgt │ │  Codex    │ │   OpenHands      │
│ Backend   │ │ Backend   │ │ Backend   │ │   Backend        │
│ messages. │ │ query()   │ │ call_tool │ │ Conversation.run │
│ create()  │ │           │ │           │ │                  │
└───────────┘ └───────────┘ └───────────┘ └──────────────────┘
```

**Protocol Method Mappings**:

| Protocol Method | AnthropicSDKBackend | ClaudeCodeBackend | CodexBackend | OpenHandsBackend |
|-----------------|---------------------|-------------------|--------------|------------------|
| `connect(cwd)` | Set `PathResolver(cwd)` | Set `cwd` option | Set in MCP params | Set in Conversation |
| `execute(prompt)` | `messages.create()` + tool loop | `query(prompt, options)` | `call_tool("codex")` | `Conversation.run()` |
| `modified_files()` | Track via FileOperations | Parse from messages | Parse from result | Extract from events |
| `usage()` | Track tokens + litellm pricing | Extract from SDK | Extract from result | Track via litellm |
| `structured_output()` | Parse from tool call | Parse from tool call | Parse from response text | Parse from response text |
| `refusal()` | Capture Refusal tool | Capture Refusal tool | Capture Refusal tool | Capture Refusal tool |

---

## Data Flow

### Execution Flow

```text
┌──────────────────────────────────────────────────────────────────┐
│                         CodingRequest                            │
│  prompt, cwd, system_prompt, budget, output_type                 │
└───────────────────────────────┬──────────────────────────────────┘
                                │
                                ▼
┌──────────────────────────────────────────────────────────────────┐
│                    AutonomousCodingAgent.execute()               │
│  1. Create CodingSession with config                             │
│  2. Initialize backend                                           │
│  3. Run session.run(request)                                     │
└───────────────────────────────┬──────────────────────────────────┘
                                │
                                ▼
┌──────────────────────────────────────────────────────────────────┐
│                      CodingSession.run()                         │
│  LOOP (max_iterations):                                          │
│    backend.execute(prompt) → stream BackendMessages              │
│    Collect files modified                                        │
│    Run validators → ValidationResult                             │
│    IF passed THEN COMPLETED                                      │
│    IF budget exceeded THEN BUDGET_EXCEEDED                       │
│    Prepare retry prompt with errors                              │
│  END LOOP → FAILED                                               │
└───────────────────────────────┬──────────────────────────────────┘
                                │
                                ▼
┌──────────────────────────────────────────────────────────────────┐
│                         CodingResult                             │
│  status, files, summary, iterations, metrics                     │
└──────────────────────────────────────────────────────────────────┘
```

### Event Streams

For observability, the session emits events throughout its execution:

| Event Type | When | Emitted Data |
|------------|------|------|
| `SESSION_START` | Session initialized | cwd, model, max_iterations |
| `ITERATION_START` | Beginning iteration N | iteration number |
| `BACKEND_MESSAGE` | Message from backend | message_type, content, tool_name |
| `VALIDATION_START` | About to run validators | validators list, file_count |
| `VALIDATION_RESULT` | Result from one validator | validator name, passed, errors |
| `ITERATION_END` | Iteration complete | iteration, passed, files_modified, tokens, cost |
| `COMPLETED` | Task completed successfully | files, summary, iterations, tokens, cost |
| `FAILED` | Task failed | status, files, summary, iterations, tokens, cost |
| `REFUSED` | LLM declined the request | refusal reason, category |
| `SESSION_END` | Session finished (always emitted) | status, iterations, tokens, cost |

Event lifecycle:
```text
SESSION_START → ITERATION_START → BACKEND_MESSAGE* →
VALIDATION_START → VALIDATION_RESULT* → ITERATION_END →
(repeat iterations) → COMPLETED/FAILED/REFUSED → SESSION_END
```

---

## Validation Framework

### Validator Protocol

Validators run against modified files and return pass/fail with errors.

| Validator | Checks | Blocking | When to Use |
|-----------|--------|----------|-------------|
| SyntaxValidator | Python AST parsing | Yes | Always (default, instant feedback) |
| SecurityValidator | Bandit static analysis | No (advisory) | Security-sensitive code |

**Note:** Validators are configurable via `AutonomousCodingAgent(validators=[...])`. Default is `[SyntaxValidator()]`. `SecurityValidator` uses Bandit to detect vulnerabilities (eval, hardcoded secrets, SQL injection, etc.) and is non-blocking by default. Custom validators can be added by implementing the `Validator` protocol.

### Validation Flow

```text
Files Modified
      │
      ▼
┌─────────────┐
│   Syntax    │
│  Validator  │
└─────────────┘
      │
      ▼
   Errors?
      │
      ▼
ValidationResult
(passed, errors)
```

Validators run sequentially via `ValidatorChain`. Blocking validators (like `SyntaxValidator`) short-circuit on failure.

---

## Budget Enforcement

### Budget Criteria

| Criterion | Unit | Enforcement |
|-----------|------|-------------|
| Tokens | count | Sum across iterations, hard stop |
| Cost | USD | Estimated from token usage, hard stop |
| Time | seconds | Wall clock from session start, hard stop |
| Iterations | count | Loop limit, hard stop |

### Budget Checkpoints

Budget is checked:

1. before each iteration starts
2. after each backend execution completes
3. before retry prompt is sent

If any limit is exceeded, the session returns `BUDGET_EXCEEDED` status with an explanation.

---

## Configuration

### Config Hierarchy

```text
AutonomousCodingAgent
├── Backend Selection
│   └── backend: "anthropic-sdk" | "claude-code" | "codex" | "openhands" | "acp"
├── Security
│   └── sandbox: bool = True  (unified sandbox control)
├── Backend-Specific (anthropic-sdk backend)
│   ├── model: str  (e.g., "claude-sonnet-4-20250514")
│   ├── tools: list[Tool]  (custom tools)
│   └── use_anthropic_tools: bool  (use text_editor_20250728)
├── Backend-Specific (claude-code backend)
│   ├── allowed_tools: ["Read", "Edit", "Write", "Bash", ...]
│   ├── setting_sources: ["project", "user"]
│   └── claude_max_thinking_tokens: int
├── Backend-Specific (codex backend)
│   ├── codex_approval_policy: "never" | "on-request" | "on-failure" | "untrusted"
│   ├── codex_mcp_servers: list[CodexMCPServer]  (custom MCP tools)
│   └── codex_reasoning_effort: "minimal" | "low" | "medium" | "high"
├── Backend-Specific (openhands backend)
│   ├── model: str  (litellm format, e.g., "openai/gpt-4o")
│   └── sandbox: False  (required - no sandbox support)
├── Backend-Specific (acp backend)
│   ├── acp_command: str
│   ├── acp_args: list[str]
│   ├── acp_env: dict[str, str]
│   ├── acp_mcp_servers: list[Any]
│   ├── acp_permission_policy: "deny" | "allow"
│   └── acp_autonomous: bool
├── Safeguards
│   ├── max_iterations: 5
│   ├── max_tokens: 500_000
│   ├── max_cost_usd: 10.0
│   └── max_time_seconds: 3600
└── Validation
    └── validators: ["syntax", "security"]  (security is non-blocking by default)
```

---

## Error Handling

### Exception vs. Status

**Exceptions** are raised from unrecoverable errors that prevent execution:
- `ConfigurationError`: Invalid configuration (wrong backend name, missing keys)
- `BackendError`: Backend connection or execution failure
- `BudgetExceededError`: Only if `raise_on_budget_exceeded=True`

**Status codes** are returned for expected completion states:
- `CodingStatus.COMPLETED`: Task finished successfully within budget
- `CodingStatus.COMPLETED_WITH_LIMIT_EXCEEDED`: Task succeeded but exceeded budget limits
- `CodingStatus.BUDGET_EXCEEDED`: Stopped before completion due to limits (default behavior)
- `CodingStatus.REFUSED`: LLM declined the request
- `CodingStatus.FAILED`: Task couldn't complete (validation never passed)

| Category | Handling | Recovery |
|----------|----------|----------|
| Invalid configuration | Raise `ConfigurationError` | Fix config, retry |
| Backend connection failure | Raise `BackendError` | Check credentials/network |
| Backend execution error | Raise `BackendError` | Check backend health |
| Validation failure | Retry iteration with context | Max iterations limit |
| Budget exceeded | Return `BUDGET_EXCEEDED` status | Caller decides |
| LLM refusal | Return `REFUSED` status | Modify prompt |

---

## Security Considerations

### Unified Sandbox Mode

All backends default to `sandbox=True` for safe operation:

| Backend | `sandbox=True` (default) | `sandbox=False` |
|---------|--------------------------|-----------------|
| AnthropicSDKBackend | PathResolver enforces `allowed_write_paths` | Writes anywhere in cwd |
| ClaudeCodeBackend | Native OS-level sandbox | `bypassPermissions` mode |
| CodexBackend | `workspace-write` mode | `danger-full-access` mode |
| OpenHandsBackend | Not supported (raises error) | Full filesystem access |
| ACPBackend | Depends on launched ACP agent | Depends on launched ACP agent |

### Usage Examples

```python
# Safe mode (default) - all backends sandboxed
agent = AutonomousCodingAgent(backend="claude-code")

# Disable sandbox for full access
agent = AutonomousCodingAgent(backend="claude-code", sandbox=False)
```

### File System Scope

- `sandbox=True`: Operations restricted based on a given backend's sandbox implementation
- `sandbox=False`: Full filesystem access

### API Key Management

- Keys passed via environment variables or config
- Never logged or included in error messages
- Backend-specific key names: `ANTHROPIC_API_KEY`, `OPENAI_API_KEY`

---

## Framework Adapters

### LangGraph Adapter

`create_coding_node()` returns an async function compatible with `StateGraph.add_node()`.

**Input**: State dict with `prompt` key (and optional `cwd`)

**Output**: State update with `coding_result` dict containing `CodingResult` fields

```python
from agenter.adapters.langgraph import create_coding_node, CodingState

graph = StateGraph(CodingState)
graph.add_node("coder", create_coding_node(cwd="./workspace"))
```

### PydanticAI Adapter

`CodingAgent` provides direct execution with a PydanticAI-like interface without extra LLM layers.

**Input**: Prompt and cwd via `run()` method

**Output**: `CodingResult` with status, files modified, and summary

```python
from agenter.adapters.pydantic_ai import CodingAgent

agent = CodingAgent(backend="anthropic-sdk")
result = await agent.run("Implement the feature", cwd="./workspace")
```