Effort levels
Effort controls how much compute IsonForge spends per turn. Five levels.
| Level | Thinking | Sampling | Max output | Use for |
|---|---|---|---|---|
low |
OFF | tighter (lower temp) | short cap | Quick lookups, simple edits |
medium |
server default | normal | normal | Default daily work |
high |
ON | normal | normal | Multi-step tasks, complex bugs |
xhigh |
ON | normal | extended | Long generation (full files, large diffs) |
max |
ON | normal | unlimited (full ctx) | Heavy refactors, full project audits |
Set effort
At startup:
isonforge --effort high
isonforge -p --effort low "quick check"
Mid-session:
/effort high
/effort low
/effort # show current
Default in settings:
{
"effort": "medium"
}
CLI flag always wins over settings.
What each level changes
low
- Reasoning mode forced OFF (fast answer, no thinking pass).
- Temperature trimmed (more deterministic).
- Max output tokens capped low (~1024).
Best for: trivial fixes, quick syntax questions, scripted batch jobs where latency matters.
medium
- Follows server defaults for reasoning + sampling.
- Standard output budget.
Best for: daily coding work. Most things should be medium.
high
- Reasoning ON (deeper analysis).
- Standard sampling.
- Standard output budget.
Best for: multi-file refactors, debugging, design questions, anything where being right matters.
xhigh
- Reasoning ON.
- Output budget bumped (~8K+).
Best for: generating long files, writing many tests at once, producing extensive documentation, long-form analysis.
max
- Reasoning ON.
- Output budget = whatever's left of the context window after input.
Best for: full repo audits, generating an entire migration plan in one turn, "go through every file and report issues."
Cost vs latency tradeoffs
| Level | Time per turn | Token cost | Quality |
|---|---|---|---|
low |
~5s | low | OK for trivial |
medium |
~10-30s | medium | Good |
high |
~30-90s | higher | Better |
xhigh |
60-120s | high | Better + longer output |
max |
up to several min | very high | Best for huge contexts |
Self-hosted IsonAI doesn't bill per token. But you still pay in wall-clock time and context-window pressure. Higher effort = more thinking tokens = /compact becomes worth running sooner.
Recipe: mid-session escalation
Start cheap, escalate if needed:
> /effort low
> what's in auth/session.py?
[fast answer]
> /effort high
> refactor it to use redis with proper connection pooling
[thoughtful plan + execution]
> /effort low
> commit with a good message
[fast]
Recipe: budget for print mode
isonforge -p \
--effort low \
--max-turns 3 \
--max-budget-usd 0.01 \
"summarize this file" < big-file.md
Cheap, bounded, suitable for CI.
Interaction with /thinking
/thinking on|off overrides what --effort set. If you --effort low then /thinking on, thinking is on (your manual toggle wins for that session). If you /effort high after, thinking goes back on per the level's default.
Verify
Verbose mode shows the resolved effort + sampling + thinking values:
isonforge --verbose --effort xhigh "task" 2>&1 | grep effort