Skip to main content

Model Selection

Contop uses a multi-model architecture with three distinct model roles. You can configure each independently.

Three Model Roles

RoleWhere It RunsDefaultPurpose
Conversation ModelMobile (phone)gemini-2.5-flashClassifies user intent, generates conversational responses, polishes execution results
Execution ModelServer (desktop)gemini-2.5-flashControls the autonomous ADK agent for multi-step task execution
Computer Use BackendServer (desktop)omniparserScreen understanding and UI element detection

Available Providers

The execution model supports multiple LLM providers via LiteLLM routing:

ProviderExample ModelsAPI Key Required
Google Geminigemini-2.5-flash, gemini-2.5-pro, gemini-3.1-pro-preview, gemini-3-flash-preview, gemini-2.5-flash-litegemini_api_key
OpenAIopenai/gpt-5.4, openai/gpt-4.1, openai/gpt-4.1-mini, openai/o3, openai/o4-miniopenai_api_key
Anthropicanthropic/claude-opus-4-6, anthropic/claude-sonnet-4-6, anthropic/claude-haiku-4-5anthropic_api_key
OpenRouterGrok, Devstral, Qwen, Nemotron, Phi-4, MiniMax, and any model on OpenRouteropenrouter_api_key
tip

Non-Gemini models must use the provider prefix (e.g., openai/gpt-5.4, anthropic/claude-sonnet-4-6). Gemini models use bare names. Community models from providers like Groq, Mistral, and DeepSeek are accessed through OpenRouter.

Switching Models at Runtime

Change models from the mobile app's AI Settings without restarting:

  1. Open the session menu → AI Settings
  2. Select your preferred model for each role
  3. Changes take effect on the next command

Thinking Mode

Toggle extended thinking for supported models (Gemini 2.5/3.x, OpenAI o3/o4-mini, Claude Opus/Sonnet):

  • Enabled — Model uses chain-of-thought reasoning (slower but more accurate for complex tasks)
  • Disabled — Faster responses for simple tasks
  • Default — Uses the model's built-in default

Cost/Capability Tradeoffs

Model TierSpeedCostBest For
Flash (e.g., Gemini Flash)FastLowSimple tasks, quick commands
Pro (e.g., Gemini Pro, GPT-5.4)MediumMediumComplex multi-step tasks
Large (e.g., Claude Opus)SlowHighNuanced reasoning, code generation

Subscription Mode

Instead of API keys, you can use your existing LLM subscription (Claude Pro/Max, Gemini Pro, ChatGPT Pro). In subscription mode, requests route through a local CLI proxy on the desktop that wraps the provider's official CLI tool.

  • The mobile app shows a SUB badge on model chips when subscription mode is active for that provider
  • A NO KEY badge appears when no API key is configured for a provider (the model can still be used via subscription)
  • The mobile use_subscription flag is authoritative — the phone decides per-request whether to use subscription or API key mode
  • Vision limitation: CLI tools accept text only — in subscription mode, the execution agent's LLM vision fallback (direct screenshot analysis) is unavailable. The agent relies on local vision backends instead.

See Configuration — Subscription Mode for setup instructions.

Vision Backends

The computer use backend determines how the agent understands your screen:

BackendSpeedAccuracyNotes
OmniParserMediumHighDefault; local-first, privacy-preserving, GPU/CPU adaptive
UI-TARSFastHighSingle API call via OpenRouter
Gemini Computer UseMediumHighGemini-native computer use with stateful history
Accessibility TreeFastVariableDeterministic, best for native apps
Kimi / Qwen / Phi / Molmo / HolotronFastVariableAlternative VLMs via OpenRouter

Related: Configuration · Agent Execution · ADK Agent