Thinking mode
Thinking mode enables a reasoning pass before the model produces its visible answer. This trades time-to-first-token for substantially better multi-step reasoning, planning, and tool-use decisions.
Toggle
In the REPL:
/thinking on
/thinking off
/thinking # toggle current state
Setting persists for the session only. CLI restart falls back to server default.
The default ON/OFF is set server-side. By default IsonForge ships with thinking ON.
When to use
- Multi-step refactors that touch many files.
- Debugging where the root cause isn't obvious.
- Architecture decisions ("should this be a queue or a webhook?").
- Long planning chains (5+ tool calls).
- Anything where being right matters more than being fast.
When to turn off
- Quick single-file edits ("rename this variable everywhere").
- Tight loops on print-mode pipelines where latency matters.
- Status queries ("what's the diff?").
- Trivial syntax fixes.
What you see
When thinking is on, IsonForge shows a live "๐ง Thinking" panel above the output:
โโ ๐ง Thinking โโ
โ Looking at auth/session.py first to understand the current...
โ The constructor takes a TTL config. I'll need to preserve...
โ Redis with pipelining would be cleanest. Let me check if...
โโโโโโโโโโโโโโโโ
It scrolls a 12-line rolling window so it doesn't fill your terminal. When the model finishes reasoning and starts producing the visible answer, the panel collapses into a "Thinking..." entry in your scrollback (expandable via /sessions later).
Latency
Thinking adds 30-60 seconds typical on tool-heavy turns. Sometimes more if the reasoning is deep. This is the cost - the model is doing more compute before responding.
For interactive sessions this is usually worth it. For batch scripts in -p mode where the same task runs 100 times, consider --effort low (turns off thinking + uses lower sampling) instead.
Programmatic control
Print mode:
# Default depends on server, override explicitly:
ISONFORGE_THINKING=1 isonforge -p "complex task"
ISONFORGE_THINKING=0 isonforge -p "quick task"
Settings.json doesn't currently expose a thinking field directly - use the env var or session toggle.
Interaction with effort levels
The --effort flag controls thinking too:
| Effort | Thinking |
|---|---|
low |
OFF |
medium |
follows server default |
high |
ON |
xhigh |
ON + more output tokens |
max |
ON + use full token budget |
See Effort levels.
Mid-stream abort
If you Ctrl+C during the thinking phase, IsonForge persists what it captured so far as a "Thinking..." entry in scrollback. No data lost.
Caveats
- The reasoning text is the model's internal chain-of-thought. It can be wrong, contradict the final answer, or contain tentative ideas. Don't rely on it as the answer; the visible answer that follows is what counts.
- Long reasoning chains consume tokens against your context window. Heavy thinking + heavy tool output =
/compactbecomes worth running. - Reasoning output is persisted to session files.
/exportincludes it. If you share an export, you share the thinking.