Effort levels

Effort controls how much compute IsonForge spends per turn. Five levels.

Level	Thinking	Sampling	Max output	Use for
`low`	OFF	tighter (lower temp)	short cap	Quick lookups, simple edits
`medium`	server default	normal	normal	Default daily work
`high`	ON	normal	normal	Multi-step tasks, complex bugs
`xhigh`	ON	normal	extended	Long generation (full files, large diffs)
`max`	ON	normal	unlimited (full ctx)	Heavy refactors, full project audits

What each level changes

`low`

Reasoning mode forced OFF (fast answer, no thinking pass).
Temperature trimmed (more deterministic).
Max output tokens capped low (~1024).

Best for: trivial fixes, quick syntax questions, scripted batch jobs where latency matters.

`medium`

Follows server defaults for reasoning + sampling.
Standard output budget.

Best for: daily coding work. Most things should be medium.

`high`

Reasoning ON (deeper analysis).
Standard sampling.
Standard output budget.

Best for: multi-file refactors, debugging, design questions, anything where being right matters.

`xhigh`

Reasoning ON.
Output budget bumped (~8K+).

Best for: generating long files, writing many tests at once, producing extensive documentation, long-form analysis.

`max`

Reasoning ON.
Output budget = whatever's left of the context window after input.

Best for: full repo audits, generating an entire migration plan in one turn, "go through every file and report issues."

Cost vs latency tradeoffs

Level	Time per turn	Token cost	Quality
`low`	~5s	low	OK for trivial
`medium`	~10-30s	medium	Good
`high`	~30-90s	higher	Better
`xhigh`	60-120s	high	Better + longer output
`max`	up to several min	very high	Best for huge contexts

Self-hosted IsonAI doesn't bill per token. But you still pay in wall-clock time and context-window pressure. Higher effort = more thinking tokens = /compact becomes worth running sooner.

Recipe: mid-session escalation

Start cheap, escalate if needed:

> /effort low
> what's in auth/session.py?
[fast answer]

> /effort high
> refactor it to use redis with proper connection pooling
[thoughtful plan + execution]

> /effort low
> commit with a good message
[fast]

/thinking on|off overrides what --effort set. If you --effort low then /thinking on, thinking is on (your manual toggle wins for that session). If you /effort high after, thinking goes back on per the level's default.

Verify

Verbose mode shows the resolved effort + sampling + thinking values:

isonforge --verbose --effort xhigh "task" 2>&1 | grep effort