Architecture Design¶
TL;DR¶
A summary of the system architecture, component design, and data flows of Agenter.
The SDK follows a simple layered design pattern from top user-facing layers to bottom coding agent backends:
Adapters (LangGraph, PydanticAI)
→ Facade (AutonomousCodingAgent)
→ Runtime (CodingSession)
→ Backends (Anthropic, Claude Code, Codex, OpenHands, ACP).
System Overview¶
┌─────────────────────────────────────────────────────────────┐
│ User Applications │
├─────────────────────────────┬───────────────────────────────┤
│ LangGraph Adapter │ PydanticAI Adapter │
│ (adapters/langgraph.py) │ (adapters/pydantic_ai.py) │
└─────────────────────────────┴───────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ AutonomousCodingAgent (Facade) │
│ - execute(request) -> CodingResult │
│ - stream_execute() -> AsyncIterator[CodingEvent] │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ CodingSession │
│ - Manages iteration loop (code → validate → fix → retry) │
│ - Emits events for observability │
│ - Enforces budget limits │
└─────────────────────────────────────────────────────────────┘
│
▼
┌───────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ CodingBackend Protocol (Abstract) │
├───────────────────────┬───────────────────────┬─────────────────────┬───────────────────┬─────────────┤
│ AnthropicSDKBackend │ ClaudeCodeBackend │ CodexBackend │ OpenHandsBackend │ ACPBackend │
│ (anthropic SDK) │ (claude-code-sdk) │ (openai-agents) │ (openhands-sdk) │ (ACP CLI) │
└───────────────────────┴───────────────────────┴─────────────────────┴───────────────────┴─────────────┘
Layer Responsibilities¶
Layer 1: User Applications (Adapters)¶
The top layer represents adapters for open-ended agentic coding frameworks that integrate the SDK into different agentic workflows.
Adapter |
Framework |
Use Case |
|---|---|---|
|
LangGraph |
|
|
PydanticAI |
|
Adapters are thin wrappers and facades that translate between framework conventions and the SDK’s unified interface.
Layer 2: AutonomousCodingAgent (Facade)¶
The public API surface. Users interact only with this class.
Responsibilities:
Instantiate and configure coding agent backends
Provide sync/async execution methods
Aggregate results from coding sessions
Hide all internal complexity of the layers below
Interface:
execute(request: CodingRequest)->CodingResultstream_execute(request: CodingRequest)->AsyncIterator[CodingEvent]
Layer 3: CodingSession¶
The core orchestration logic for a coding session. Manages the iteration loop within the coding session.
Responsibilities:
Execute a backend with a prompt
Run validators on outputs
Retry on validation failures (with error context in the updated prompt)
Enforce budget limits (tokens, cost, time, iterations)
Emit events for observability
Iteration Loop:
┌─────────────────────────────────────────────────────────────┐
│ CodingSession.run() │
├─────────────────────────────────────────────────────────────┤
│ FOREACH iteration (up to max_iterations): │
│ 1. Check budget limits │
│ 2. Execute backend.execute(prompt) │
│ 3. Collect files modified │
│ 4. Run validators (e.g. syntax or security) │
│ 5. IF validation passed THEN return COMPLETED │
│ 6. IF budget exceeded THEN return BUDGET_EXCEEDED │
│ 7. Prepare a retry prompt with validation errors │
│ END → return FAILED │
└─────────────────────────────────────────────────────────────┘
Layer 4: CodingBackend Protocol¶
Abstract interfaces implemented for each distinct coding agent backend.
Responsibilities:
Connect to a backend (subprocess, SDK client)
Execute prompts and obtain stream responses
Track files modified
Report token usage and estimated cost
Backend Comparison:
Backend |
Wraps |
Agent Logic |
Runtime |
Key Feature |
|---|---|---|---|---|
AnthropicSDKBackend |
Custom tool-use loop |
HTTP (async client) |
Full control, custom tools, AWS Bedrock |
|
ClaudeCodeBackend |
Claude Code |
Claude Code CLI |
Battle-tested tools, AWS Bedrock, Google Vertex |
|
CodexBackend |
Codex MCP server |
MCP over stdio |
OpenAI models, custom MCP tools, sandbox modes |
|
OpenHandsBackend |
OpenHands agent |
litellm |
Any model, no sandbox (full access) |
|
ACPBackend |
Any ACP-compatible agent |
JSON-RPC over stdio |
Interoperability with ACP agents |
Which backend to use?
Use Case and Scenario |
Recommended Backend |
Config |
|---|---|---|
Custom tools, full control |
|
|
Battle-tested Claude Code tools |
|
|
Need skills, slash commands, MCP |
|
|
OpenAI models (o3, GPT-5-Codex) |
|
|
Custom MCP tools with OpenAI |
|
|
Any model via litellm |
|
|
Any ACP-compatible agent process |
|
|
Note: All backends default to
sandbox=True. Usesandbox=Falsefor unrestricted access.
Backend SDK Patterns¶
Each backend SDK presents a different API surface. Here’s how they work and how we abstract them:
Anthropic SDK Backend (Implemented)¶
# Actual implementation uses anthropic SDK directly
import anthropic
client = anthropic.AsyncAnthropic()
response = await client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=16384,
system="You are an autonomous coding agent...",
tools=[...], # File tools: read_file, write_file, edit_file
messages=[{"role": "user", "content": "Fix the bug"}],
)
# Tool use loop handles file operations
for block in response.content:
if block.type == "tool_use":
result = execute_tool(block.name, block.input)
# Continue conversation with tool result
Key Abstractions:
anthropic.AsyncAnthropic()→ async HTTP clientCustom tool-use loop manages file operations
Supports both Anthropic API and AWS Bedrock via boto3
No subprocess or CLI dependency
Claude Code SDK Backend (Implemented)¶
# Uses claude-code-sdk (Claude Code as a library)
from claude_code_sdk import query, ClaudeCodeOptions
# Safe mode with native sandbox (default)
options = ClaudeCodeOptions(
cwd="/path/to/project",
model="claude-sonnet-4-5-20250929",
max_thinking_tokens=8192,
allowed_tools=["Read", "Edit", "Write", "Bash", "Glob"],
sandbox={"enabled": True, "autoAllowBashIfSandboxed": True},
permission_mode="default",
)
async for message in query(prompt="Fix the bug", options=options):
# message is AssistantMessage, ToolResultMessage, or ResultMessage
print(message)
Key Abstractions:
query()→ streams messages asAsyncIteratorBuilt-in tools: Read, Edit, Write, Bash, Glob, Grep, etc.
Native OS-level sandbox support
Supports AWS Bedrock (
CLAUDE_CODE_USE_BEDROCK=1), Google Vertex, Microsoft FoundrySame tools that power Claude Code
When to use:
You want production-ready tools maintained by Anthropic
You need Claude Code features (skills, slash commands, MCP servers)
You want native OS-level sandboxing
Codex Backend (Implemented)¶
# Uses openai-agents SDK MCPServerStdio for MCP communication
from agents.mcp import MCPServerStdio
mcp_server = MCPServerStdio(
name="codex",
params={"command": "codex", "args": ["mcp-server"]},
)
await mcp_server.connect()
# Call codex tool to start a session
result = await mcp_server.call_tool("codex", {
"prompt": "Fix the bug",
"cwd": "/path/to/project",
"approval-policy": "never",
"sandbox": "workspace-write",
"model": "o3",
"config": {"model_reasoning_effort": "high"},
})
# Continue with codex-reply for subsequent messages
result = await mcp_server.call_tool("codex-reply", {
"prompt": "Now add tests",
"conversationId": result["conversationId"],
})
Key Abstractions:
MCPServerStdio→ manages subprocess lifecycleMCP tools:
codex(start session) andcodex-reply(continue)Configurable sandbox and approval policies
Custom MCP servers passed via
configparameter
When to use:
You want to use OpenAI reasoning models (o3, GPT-5-Codex)
You need specific approval policies (untrusted, on-request, on-failure, never)
You want to extend Codex with custom MCP tools
Codex Backend Tool Limitations¶
The Codex backend runs custom tools in a subprocess via MCP. Tools are serialized using cloudpickle.
Note: Tools which capture unpicklable state will fail silently.
When tools fail to pickle:
Warning logged:
custom_tools_not_picklableTools are dropped — agent runs without them
No error raised (silent failure)
Writing Codex-compatible tools:
✅ Do:
Use module-level functions (not lambdas or closures)
Import dependencies inside the function
Pass dynamic state via environment variables or temp files
❌ Don’t:
Capture
selfor instance variables in closuresCapture async clients, locks, or trace recorders
Use lambda functions that capture outer scope
Example — BAD (captures self):
def _create_tools(self):
async def wrapper(inputs):
return await self.some_method(inputs) # Captures self - NOT picklable!
return [FunctionTool(func=wrapper)]
Example — GOOD (stateless module-level function):
# Module level - no closure, fully picklable
async def _stateless_wrapper(inputs: dict) -> str:
from mymodule import some_function # Import inside function
return await some_function(inputs)
def _create_tools(self):
return [FunctionTool(func=_stateless_wrapper)]
Note: For tools that need dynamic state (such as mas_code that changes per call), use environment variables or temporary files to pass data to the subprocess.
ACP Backend (Implemented)¶
agent = AutonomousCodingAgent(
backend="acp",
acp_command="codex-acp",
acp_args=["-c", 'model="gpt-5.4"', "-c", 'model_reasoning_effort="high"'],
acp_permission_policy="allow",
)
Key Abstractions:
Spawns an ACP-compatible agent subprocess over stdio
Creates an ACP session with the request working directory
Sends prompts through
session/promptServes ACP client file reads and writes inside the request working directory
Adds Agenter’s autonomous backend contract by default and auto-continues once if an interactive ACP agent asks for confirmation instead of editing
Converts ACP session updates into Agenter backend messages
Tracks modified files by comparing workspace snapshots
When to use:
You want to run an ACP-compatible agent through Agenter’s validation and event model
You need an agent that already speaks ACP but does not have a native Agenter backend
Security note: ACP agents are external processes. Agenter observes file changes after execution, but sandbox enforcement depends on the launched ACP agent and its own configuration.
Abstraction Strategy¶
The CodingBackend protocol normalizes backend differences:
┌─────────────────────────────────────────────────────────────────────────────────┐
│ CodingBackend Protocol │
├─────────────────────────────────────────────────────────────────────────────────┤
│ connect(cwd, allowed_write_paths, resume_session_id, output_type, system_prompt)│
│ execute(prompt: str) → AsyncIterator[BackendMessage] │
│ modified_files() → ModifiedFiles │
│ usage() → Usage (tokens, cost) │
│ structured_output() → BaseModel | None │
│ refusal() → RefusalMessage | None │
│ disconnect() → Cleanup │
└─────────────────────────────────────────────────────────────────────────────────┘
│
┌─────────────┬───────────┼───────────┬─────────────┐
│ │ │ │ │
▼ ▼ ▼ ▼ ▼
┌───────────┐ ┌───────────┐ ┌───────────┐ ┌──────────────────┐
│ Anthropic │ │ ClaudeAgt │ │ Codex │ │ OpenHands │
│ Backend │ │ Backend │ │ Backend │ │ Backend │
│ messages. │ │ query() │ │ call_tool │ │ Conversation.run │
│ create() │ │ │ │ │ │ │
└───────────┘ └───────────┘ └───────────┘ └──────────────────┘
Protocol Method Mappings:
Protocol Method |
AnthropicSDKBackend |
ClaudeCodeBackend |
CodexBackend |
OpenHandsBackend |
|---|---|---|---|---|
|
Set |
Set |
Set in MCP params |
Set in Conversation |
|
|
|
|
|
|
Track via FileOperations |
Parse from messages |
Parse from result |
Extract from events |
|
Track tokens + litellm pricing |
Extract from SDK |
Extract from result |
Track via litellm |
|
Parse from tool call |
Parse from tool call |
Parse from response text |
Parse from response text |
|
Capture Refusal tool |
Capture Refusal tool |
Capture Refusal tool |
Capture Refusal tool |
Data Flow¶
Execution Flow¶
┌──────────────────────────────────────────────────────────────────┐
│ CodingRequest │
│ prompt, cwd, system_prompt, budget, output_type │
└───────────────────────────────┬──────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────────┐
│ AutonomousCodingAgent.execute() │
│ 1. Create CodingSession with config │
│ 2. Initialize backend │
│ 3. Run session.run(request) │
└───────────────────────────────┬──────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────────┐
│ CodingSession.run() │
│ LOOP (max_iterations): │
│ backend.execute(prompt) → stream BackendMessages │
│ Collect files modified │
│ Run validators → ValidationResult │
│ IF passed THEN COMPLETED │
│ IF budget exceeded THEN BUDGET_EXCEEDED │
│ Prepare retry prompt with errors │
│ END LOOP → FAILED │
└───────────────────────────────┬──────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────────┐
│ CodingResult │
│ status, files, summary, iterations, metrics │
└──────────────────────────────────────────────────────────────────┘
Event Streams¶
For observability, the session emits events throughout its execution:
Event Type |
When |
Emitted Data |
|---|---|---|
|
Session initialized |
cwd, model, max_iterations |
|
Beginning iteration N |
iteration number |
|
Message from backend |
message_type, content, tool_name |
|
About to run validators |
validators list, file_count |
|
Result from one validator |
validator name, passed, errors |
|
Iteration complete |
iteration, passed, files_modified, tokens, cost |
|
Task completed successfully |
files, summary, iterations, tokens, cost |
|
Task failed |
status, files, summary, iterations, tokens, cost |
|
LLM declined the request |
refusal reason, category |
|
Session finished (always emitted) |
status, iterations, tokens, cost |
Event lifecycle:
SESSION_START → ITERATION_START → BACKEND_MESSAGE* →
VALIDATION_START → VALIDATION_RESULT* → ITERATION_END →
(repeat iterations) → COMPLETED/FAILED/REFUSED → SESSION_END
Validation Framework¶
Validator Protocol¶
Validators run against modified files and return pass/fail with errors.
Validator |
Checks |
Blocking |
When to Use |
|---|---|---|---|
SyntaxValidator |
Python AST parsing |
Yes |
Always (default, instant feedback) |
SecurityValidator |
Bandit static analysis |
No (advisory) |
Security-sensitive code |
Note: Validators are configurable via AutonomousCodingAgent(validators=[...]). Default is [SyntaxValidator()]. SecurityValidator uses Bandit to detect vulnerabilities (eval, hardcoded secrets, SQL injection, etc.) and is non-blocking by default. Custom validators can be added by implementing the Validator protocol.
Validation Flow¶
Files Modified
│
▼
┌─────────────┐
│ Syntax │
│ Validator │
└─────────────┘
│
▼
Errors?
│
▼
ValidationResult
(passed, errors)
Validators run sequentially via ValidatorChain. Blocking validators (like SyntaxValidator) short-circuit on failure.
Budget Enforcement¶
Budget Criteria¶
Criterion |
Unit |
Enforcement |
|---|---|---|
Tokens |
count |
Sum across iterations, hard stop |
Cost |
USD |
Estimated from token usage, hard stop |
Time |
seconds |
Wall clock from session start, hard stop |
Iterations |
count |
Loop limit, hard stop |
Budget Checkpoints¶
Budget is checked:
before each iteration starts
after each backend execution completes
before retry prompt is sent
If any limit is exceeded, the session returns BUDGET_EXCEEDED status with an explanation.
Configuration¶
Config Hierarchy¶
AutonomousCodingAgent
├── Backend Selection
│ └── backend: "anthropic-sdk" | "claude-code" | "codex" | "openhands" | "acp"
├── Security
│ └── sandbox: bool = True (unified sandbox control)
├── Backend-Specific (anthropic-sdk backend)
│ ├── model: str (e.g., "claude-sonnet-4-20250514")
│ ├── tools: list[Tool] (custom tools)
│ └── use_anthropic_tools: bool (use text_editor_20250728)
├── Backend-Specific (claude-code backend)
│ ├── allowed_tools: ["Read", "Edit", "Write", "Bash", ...]
│ ├── setting_sources: ["project", "user"]
│ └── claude_max_thinking_tokens: int
├── Backend-Specific (codex backend)
│ ├── codex_approval_policy: "never" | "on-request" | "on-failure" | "untrusted"
│ ├── codex_mcp_servers: list[CodexMCPServer] (custom MCP tools)
│ └── codex_reasoning_effort: "minimal" | "low" | "medium" | "high"
├── Backend-Specific (openhands backend)
│ ├── model: str (litellm format, e.g., "openai/gpt-4o")
│ └── sandbox: False (required - no sandbox support)
├── Backend-Specific (acp backend)
│ ├── acp_command: str
│ ├── acp_args: list[str]
│ ├── acp_env: dict[str, str]
│ ├── acp_mcp_servers: list[Any]
│ ├── acp_permission_policy: "deny" | "allow"
│ └── acp_autonomous: bool
├── Safeguards
│ ├── max_iterations: 5
│ ├── max_tokens: 500_000
│ ├── max_cost_usd: 10.0
│ └── max_time_seconds: 3600
└── Validation
└── validators: ["syntax", "security"] (security is non-blocking by default)
Error Handling¶
Exception vs. Status¶
Exceptions are raised from unrecoverable errors that prevent execution:
ConfigurationError: Invalid configuration (wrong backend name, missing keys)BackendError: Backend connection or execution failureBudgetExceededError: Only ifraise_on_budget_exceeded=True
Status codes are returned for expected completion states:
CodingStatus.COMPLETED: Task finished successfully within budgetCodingStatus.COMPLETED_WITH_LIMIT_EXCEEDED: Task succeeded but exceeded budget limitsCodingStatus.BUDGET_EXCEEDED: Stopped before completion due to limits (default behavior)CodingStatus.REFUSED: LLM declined the requestCodingStatus.FAILED: Task couldn’t complete (validation never passed)
Category |
Handling |
Recovery |
|---|---|---|
Invalid configuration |
Raise |
Fix config, retry |
Backend connection failure |
Raise |
Check credentials/network |
Backend execution error |
Raise |
Check backend health |
Validation failure |
Retry iteration with context |
Max iterations limit |
Budget exceeded |
Return |
Caller decides |
LLM refusal |
Return |
Modify prompt |
Security Considerations¶
Unified Sandbox Mode¶
All backends default to sandbox=True for safe operation:
Backend |
|
|
|---|---|---|
AnthropicSDKBackend |
PathResolver enforces |
Writes anywhere in cwd |
ClaudeCodeBackend |
Native OS-level sandbox |
|
CodexBackend |
|
|
OpenHandsBackend |
Not supported (raises error) |
Full filesystem access |
ACPBackend |
Depends on launched ACP agent |
Depends on launched ACP agent |
Usage Examples¶
# Safe mode (default) - all backends sandboxed
agent = AutonomousCodingAgent(backend="claude-code")
# Disable sandbox for full access
agent = AutonomousCodingAgent(backend="claude-code", sandbox=False)
File System Scope¶
sandbox=True: Operations restricted based on a given backend’s sandbox implementationsandbox=False: Full filesystem access
API Key Management¶
Keys passed via environment variables or config
Never logged or included in error messages
Backend-specific key names:
ANTHROPIC_API_KEY,OPENAI_API_KEY
Framework Adapters¶
LangGraph Adapter¶
create_coding_node() returns an async function compatible with StateGraph.add_node().
Input: State dict with prompt key (and optional cwd)
Output: State update with coding_result dict containing CodingResult fields
from agenter.adapters.langgraph import create_coding_node, CodingState
graph = StateGraph(CodingState)
graph.add_node("coder", create_coding_node(cwd="./workspace"))
PydanticAI Adapter¶
CodingAgent provides direct execution with a PydanticAI-like interface without extra LLM layers.
Input: Prompt and cwd via run() method
Output: CodingResult with status, files modified, and summary
from agenter.adapters.pydantic_ai import CodingAgent
agent = CodingAgent(backend="anthropic-sdk")
result = await agent.run("Implement the feature", cwd="./workspace")