Thinking Models: Claude and Gemini 3 Reasoning Capabilities
What You'll Learn
- Configure thinking budget for Claude Opus 4.5 and Sonnet 4.5 Thinking models
- Use thinking levels (minimal/low/medium/high) for Gemini 3 Pro/Flash
- Flexibly adjust thinking intensity through OpenCode's variant system
- Understand interleaved thinking (reasoning mechanism during tool calls)
- Master thinking block retention strategy (keep_thinking configuration)
Your Current Struggles
You want AI models to perform better on complex tasks—like multi-step reasoning, code debugging, or architectural design. But you know:
- Regular models answer too quickly without deep reasoning
- Claude's official thinking features are restricted and hard to access
- Gemini 3's thinking level configuration is unclear
- Not sure how much thinking is enough (what should the budget be)
- Signature errors when reading thinking block content
When to Use This
Use Cases:
- Complex problems requiring multi-step reasoning (algorithm design, system architecture)
- Code review or debugging that requires careful consideration
- Deep analysis of long documents or codebases
- Tool-intensive tasks (need interleaved thinking)
Not Suitable For:
- Simple Q&A (wastes thinking quota)
- Rapid prototyping (speed prioritized)
- Factual queries (no reasoning needed)
🎒 Prerequisites
Prerequisite Check
- Plugin installation and authentication completed: Refer to Quick Install and First Authentication
- Basic model usage understood: Refer to First Request
- Access to Thinking models available: Ensure your account has access to Claude Opus 4.5/Sonnet 4.5 Thinking or Gemini 3 Pro/Flash
Core Concepts
What Are Thinking Models
Thinking models perform internal reasoning (thinking blocks) before generating the final answer. These thinking contents:
- Not billed: Thinking tokens don't count toward regular output quota (actual billing rules per Antigravity official policy)
- Improves reasoning quality: More thinking → more accurate, logical responses
- Consumes time: Thinking increases response latency but delivers better results
Key Differences:
| Regular Models | Thinking Models |
|---|---|
| Directly generate answer | Think first → then generate answer |
| Fast but potentially shallow | Slow but deeper |
| Suitable for simple tasks | Suitable for complex tasks |
Two Thinking Implementations
Antigravity Auth plugin supports two Thinking implementations:
Claude Thinking (Opus 4.5, Sonnet 4.5)
- Token-based budget: Control thinking amount with numbers (e.g., 8192, 32768)
- Interleaved thinking: Can think before/after tool calls
- Snake_case keys: Uses
include_thoughts,thinking_budget
Gemini 3 Thinking (Pro, Flash)
- Level-based: Control thinking intensity with strings (minimal/low/medium/high)
- CamelCase keys: Uses
includeThoughts,thinkingLevel - Model differences: Flash supports all 4 levels, Pro only supports low/high
Follow Along
Step 1: Configure Thinking Models via Variant
OpenCode's variant system lets you select thinking intensity directly from the model selector without memorizing complex model names.
Check Existing Configuration
View your model configuration file (usually in .opencode/models.json or system config directory):
## View model configuration
cat ~/.opencode/models.jsonConfigure Claude Thinking Models
Find antigravity-claude-sonnet-4-5-thinking or antigravity-claude-opus-4-5-thinking, add variants:
{
"antigravity-claude-sonnet-4-5-thinking": {
"name": "Claude Sonnet 4.5 Thinking",
"limit": { "context": 200000, "output": 64000 },
"modalities": { "input": ["text", "image", "pdf"], "output": ["text"] },
"variants": {
"low": { "thinkingConfig": { "thinkingBudget": 8192 } },
"medium": { "thinkingConfig": { "thinkingBudget": 16384 } },
"max": { "thinkingConfig": { "thinkingBudget": 32768 } }
}
}
}Configuration Notes:
low: 8192 tokens - lightweight thinking, suitable for medium-complexity tasksmedium: 16384 tokens - balanced thinking and speedmax: 32768 tokens - maximum thinking, suitable for most complex tasks
Configure Gemini 3 Thinking Models
Gemini 3 Pro (only supports low/high):
{
"antigravity-gemini-3-pro": {
"name": "Gemini 3 Pro (Antigravity)",
"limit": { "context": 1048576, "output": 65535 },
"modalities": { "input": ["text", "image", "pdf"], "output": ["text"] },
"variants": {
"low": { "thinkingLevel": "low" },
"high": { "thinkingLevel": "high" }
}
}
}Gemini 3 Flash (supports all 4 levels):
{
"antigravity-gemini-3-flash": {
"name": "Gemini 3 Flash (Antigravity)",
"limit": { "context": 1048576, "output": 65536 },
"modalities": { "input": ["text", "image", "pdf"], "output": ["text"] },
"variants": {
"minimal": { "thinkingLevel": "minimal" },
"low": { "thinkingLevel": "low" },
"medium": { "thinkingLevel": "medium" },
"high": { "thinkingLevel": "high" }
}
}
}Configuration Notes:
minimal: Minimum thinking, fastest response (Flash only)low: Lightweight thinkingmedium: Balanced thinking (Flash only)high: Maximum thinking (slowest but most in-depth)
You should see: In OpenCode's model selector, after selecting a Thinking model, you can see a variant dropdown menu.
Step 2: Make Requests with Thinking Models
After configuration, you can select models and variants through OpenCode:
## Use Claude Sonnet 4.5 Thinking (max)
opencode run "Help me design a distributed caching system architecture" \
--model=google/antigravity-claude-sonnet-4-5-thinking \
--variant=max
## Use Gemini 3 Pro (high)
opencode run "Analyze performance bottlenecks in this code" \
--model=google/antigravity-gemini-3-pro \
--variant=high
## Use Gemini 3 Flash (minimal - fastest)
opencode run "Quickly summarize the content of this file" \
--model=google/antigravity-gemini-3-flash \
--variant=minimalYou should see: The model will first output thinking blocks (thinking content), then generate the final answer.
Step 3: Understand Interleaved Thinking
Interleaved thinking is a special capability of Claude models—it can think before and after tool calls.
Scenario Example: When asking AI to use tools (like file operations, database queries) to complete tasks:
Thinking: I need to first read the config file, then decide next steps based on the content...
[Calling tool: read_file("config.json")]
Tool Result: { "port": 8080, "mode": "production" }
Thinking: Port is 8080, production mode. I need to verify the configuration is correct...
[Calling tool: validate_config({ "port": 8080, "mode": "production" })]
Tool Result: { "valid": true }
Thinking: Configuration is valid. Now I can start the service.
[Generate final answer]Why it matters:
- Thinking before/after tool calls → smarter decisions
- Adapts to tool results → dynamic strategy adjustment
- Avoids blind execution → reduces errors
Plugin Auto-Handles
You don't need to manually configure interleaved thinking. Antigravity Auth plugin automatically detects Claude Thinking models and injects system instructions:
- "Interleaved thinking is enabled. You may think between tool calls and after receiving tool results before deciding on next action or final answer."
Step 4: Control Thinking Block Retention Strategy
By default, the plugin strips thinking blocks to improve reliability (avoid signature errors). If you want to read thinking content, configure keep_thinking.
Why Strip by Default?
Signature Error Issue:
- Thinking blocks in multi-turn conversations require signature matching
- Keeping all thinking blocks may cause signature conflicts
- Stripping thinking blocks is a more stable solution (but loses thinking content)
Enable Thinking Block Retention
Create or edit configuration file:
Linux/macOS: ~/.config/opencode/antigravity.json
Windows: %APPDATA%\opencode\antigravity.json
{
"$schema": "https://raw.githubusercontent.com/NoeFabris/opencode-antigravity-auth/main/assets/antigravity.schema.json",
"keep_thinking": true
}Or use environment variable:
export OPENCODE_ANTIGRAVITY_KEEP_THINKING=1You should see:
keep_thinking: false(default): Only see final answer, thinking blocks hiddenkeep_thinking: true: See complete thinking process (may encounter signature errors in some multi-turn conversations)
Recommended Practice
- Production: Use default
keep_thinking: false, ensure stability - Debugging/Learning: Temporarily enable
keep_thinking: true, observe thinking process - If signature errors occur: Disable
keep_thinking, plugin auto-recovers
Step 5: Check Max Output Tokens
Claude Thinking models require larger output token limits (maxOutputTokens), otherwise thinking budget may not be fully utilized.
Plugin Auto-Handles:
- If you set thinking budget, plugin automatically adjusts
maxOutputTokensto 64,000 - Source location:
src/plugin/transform/claude.ts:78-90
Manual Configuration (Optional):
If you manually set maxOutputTokens, ensure it's greater than thinking budget:
{
"antigravity-claude-sonnet-4-5-thinking": {
"variants": {
"max": {
"thinkingConfig": { "thinkingBudget": 32768 },
"maxOutputTokens": 64000 // Must be >= thinkingBudget
}
}
}
}You should see:
- If
maxOutputTokenstoo small, plugin automatically adjusts to 64,000 - Debug log shows "Adjusted maxOutputTokens for thinking model"
Checkpoint ✅
Verify your configuration is correct:
1. Verify Variant Visibility
In OpenCode:
- Open model selector
- Select
Claude Sonnet 4.5 Thinking - Check if variant dropdown menu exists (low/medium/max)
Expected Result: You should see 3 variant options.
2. Verify Thinking Content Output
opencode run "Think 3 steps: 1+1=? Why?" \
--model=google/antigravity-claude-sonnet-4-5-thinking \
--variant=maxExpected Result:
- If
keep_thinking: true: See detailed thinking process - If
keep_thinking: false(default): Directly see answer "2"
3. Verify Interleaved Thinking (requires tool calls)
## Use task requiring tool calls
opencode run "Read the current directory's file list, then summarize file types" \
--model=google/antigravity-claude-sonnet-4-5-thinking \
--variant=mediumExpected Result:
- See thinking content before and after tool calls
- AI adjusts strategy based on tool return results
Troubleshooting
❌ Error 1: Thinking Budget Exceeds Max Output Tokens
Problem: Set thinkingBudget: 32768, but maxOutputTokens: 20000
Error Message:
Invalid argument: max_output_tokens must be greater than or equal to thinking_budgetSolution:
- Let plugin auto-handle (recommended)
- Or manually set
maxOutputTokens >= thinkingBudget
❌ Error 2: Gemini 3 Pro Uses Unsupported Level
Problem: Gemini 3 Pro only supports low/high, but you try using minimal
Error Message:
Invalid argument: thinking_level "minimal" not supported for gemini-3-proSolution: Only use levels supported by Pro (low/high)
❌ Error 3: Signature Errors in Multi-Turn Conversations (keep_thinking: true)
Problem: After enabling keep_thinking: true, encounter errors in multi-turn conversations
Error Message:
Signature mismatch in thinking blocksSolution:
- Temporarily disable
keep_thinking:export OPENCODE_ANTIGRAVITY_KEEP_THINKING=0 - Or start a new conversation
❌ Error 4: Variant Not Displayed
Problem: Configured variants, but can't see them in OpenCode model selector
Possible Causes:
- Wrong configuration file path
- JSON syntax error (missing commas, quotes)
- Model ID mismatch
Solution:
- Check configuration file path:
~/.opencode/models.jsonor~/.config/opencode/models.json - Validate JSON syntax:
cat ~/.opencode/models.json | python -m json.tool - Check if model ID matches what plugin returns
Summary
Thinking models improve response quality for complex tasks by performing internal reasoning before generating answers:
| Feature | Claude Thinking | Gemini 3 Thinking |
|---|---|---|
| Configuration | thinkingBudget (number) | thinkingLevel (string) |
| Levels | Custom budget | minimal/low/medium/high |
| Keys | snake_case (include_thoughts) | camelCase (includeThoughts) |
| Interleaved | ✅ Supported | ❌ Not supported |
| Max Output | Auto-adjusted to 64,000 | Uses default value |
Key Configurations:
- Variant System: Configure via
thinkingConfig.thinkingBudget(Claude) orthinkingLevel(Gemini 3) - keep_thinking: Default false (stable), true (readable thinking content)
- Interleaved thinking: Auto-enabled, no manual configuration needed
Next Lesson Preview
Next, we'll learn Google Search Grounding.
You'll learn:
- Enable Google Search retrieval for Gemini models
- Configure dynamic retrieval thresholds
- Improve factual accuracy, reduce hallucinations
Appendix: Source Code Reference
Click to view source code locations
Last updated: 2026-01-23
| Feature | File Path | Lines |
|---|---|---|
| Claude Thinking Config Build | src/plugin/transform/claude.ts | 62-72 |
| Gemini 3 Thinking Config Build | src/plugin/transform/gemini.ts | 163-171 |
| Gemini 2.5 Thinking Config Build | src/plugin/transform/gemini.ts | 176-184 |
| Claude Thinking Model Detection | src/plugin/transform/claude.ts | 34-37 |
| Gemini 3 Model Detection | src/plugin/transform/gemini.ts | 137-139 |
| Interleaved Thinking Hint Injection | src/plugin/transform/claude.ts | 96-138 |
| --- | --- | --- |
| keep_thinking Config Schema | src/plugin/config/schema.ts | 78-87 |
| Claude Thinking Apply Transform | src/plugin/transform/claude.ts | 324-366 |
| Gemini Thinking Apply Transform | src/plugin/transform/gemini.ts | 372-434 |
Key Constants:
CLAUDE_THINKING_MAX_OUTPUT_TOKENS = 64_000: Maximum output tokens for Claude Thinking modelsCLAUDE_INTERLEAVED_THINKING_HINT: Interleaved thinking hint injected into system instructions
Key Functions:
buildClaudeThinkingConfig(includeThoughts, thinkingBudget): Build Claude Thinking configuration (snake_case keys)buildGemini3ThinkingConfig(includeThoughts, thinkingLevel): Build Gemini 3 Thinking configuration (level string)buildGemini25ThinkingConfig(includeThoughts, thinkingBudget): Build Gemini 2.5 Thinking configuration (numeric budget)ensureClaudeMaxOutputTokens(generationConfig, thinkingBudget): Ensure maxOutputTokens is large enoughappendClaudeThinkingHint(payload, hint): Inject interleaved thinking hintisClaudeThinkingModel(model): Detect if it's a Claude Thinking modelisGemini3Model(model): Detect if it's a Gemini 3 model