ADK Agent
Contop's execution agent is built on Google ADK (Agent Development Kit) — a Python framework for building autonomous AI agents with tool-calling capabilities.
Two-Tier Agent Architecture
- Conversation Agent (mobile) — Multi-provider LLM (Gemini, OpenAI, Anthropic, OpenRouter) classifies user intent. Text responses handled locally; tool-requiring commands routed to server.
- Execution Agent (server) — ADK
LlmAgentwith 30+ tools for autonomous multi-step execution (40+ with optional skills).
ADK Configuration
The execution agent is configured in core/execution_agent.py and core/agent_config.py:
agent = LlmAgent(
model=execution_model, # Default: "gemini-2.5-flash"
name="contop_executor",
instruction=system_prompt, # From prompts/execution-agent.md
tools=[...], # 30+ tool functions (plain callables)
)
Key Parameters
| Parameter | Default | Purpose |
|---|---|---|
model | gemini-2.5-flash | LLM for reasoning and tool selection |
max_iterations | 50 | Maximum tool-calling loops |
max_execution_time | 600s (10 min) | Wall-clock safety cap |
Callback System
Three callbacks enforce security, observability, and image management:
before_tool_callback
Mandatory security gate. Every tool call passes through DualToolEvaluator.classify() before execution. This callback:
- Classifies the command as host, sandbox, or confirmation-required
- Routes to the appropriate execution environment (host or Docker sandbox)
- Blocks forbidden commands from reaching host execution
Registering tools on the agent without the before_tool_callback is forbidden. This callback IS the structural enforcement of the security gate.
after_tool_callback
Runs after every tool execution to:
- Send
agent_progressmessages to the mobile app via data channel - Record the action in the ring buffer (50-entry action history)
- Write audit log entries (JSONL)
Silent tool calls (no progress update sent) are forbidden.
before_model_callback (Image + Memory Management)
Runs before every LLM call to manage images and context:
- Strip pass — Remove all previously-injected inline JPEG
Partobjects fromllm_request.contents(~100–150 KB per screenshot) - Inject pass — Add only the latest screenshot as an inline
Part(decoded from base64) - Memory processors — For non-Gemini models, apply
ToolCallFilterandTokenLimiterto fit within the model's context window
This bridges an ADK limitation: Gemini cannot interpret base64 strings in FunctionResponse as images — only inline Part objects work. See Vision Routing for details on the 9 vision backends.
Planning as a Tool
Planning is not a separate pre-execution phase — it's a tool (generate_plan) the execution agent can invoke mid-run when it decides a task needs upfront planning.
When the agent calls generate_plan(task_description):
- An ephemeral planning sub-agent is created (its own
LlmAgentinstance withInMemorySessionService) - The sub-agent receives the same tools as the execution agent (minus
generate_planitself to prevent recursion) - It runs with a budget of 15 LLM calls to analyze the task and produce a structured plan
- The plan is sent to the user's phone for approval — the user can approve or reject
- The tool returns the plan and approval status to the execution agent, which proceeds accordingly
The planning sub-agent uses the prompt from prompts/planning-agent.md and applies BuiltInPlanner with thinking config if thinking mode is enabled.
Per-Tool OS-Specific Instructions
The _TOOL_INSTRUCTIONS registry in core/agent_config.py injects platform-aware guidance into the execution agent's system prompt. Each tool can define instructions for _all (shared), Windows, Darwin, and Linux:
- Path-sensitive tools (
execute_cli,save_dialog,open_dialog,read_file,edit_file,find_files) — backslash vs forward-slash path conventions - App naming tools (
launch_app,close_app,window_focus) —.exenames vs macOS bundle names vs Linux binaries - Accessibility tools (
execute_accessible) — UIA (Windows), AX (macOS), AT-SPI (Linux) control types - System tools (
install_app,set_env_var,change_setting) — OS-specific mechanisms
Dynamic resolution is supported — for example, execute_cli on Windows detects whether Git Bash is available and adjusts shell guidance accordingly. Instructions are formatted as a ## Tool-Specific Instructions ({PLATFORM}) section injected via the {tool_instructions} placeholder in the execution prompt template.
Session Persistence
The execution agent uses ADK's DatabaseSessionService backed by SQLite at ~/.contop/data/sessions.db. Every tool call, tool result, and model response is persisted — the full execution history survives server restarts.
- Session reuse — The same
session_idis reused across intents within a conversation, giving the agent multi-turn memory - Session restore — On WebRTC reconnect, mobile sends its stored
adk_session_id; the server restores the ADK session from SQLite before processing the next intent - Session cleanup — Sessions older than 7 days are automatically deleted on server startup via
cleanup_old_sessions() - Planning sub-agent — Uses ephemeral
InMemorySessionService(one-shot, discarded after each planning run)
For the full context flow including offset tracking and tool summary persistence, see Context Flow.
Session Lifecycle
- Lazy initialization — Agent is created on first
user_intentmessage, not on WebRTC connection - Execution loop — Agent iterates through observe → plan → execute → report cycles
- Cancellation — Supported via the stop button; kills running processes immediately
- Completion — Agent sends
agent_resultwith answer, step count, tool summary, and duration - Transfer — On reconnect, execution is detached from old peer and adopted by new one
Related: Tool Layers · Dual-Tool Evaluator · Agent Execution · Context Flow