Image input
IsonForge supports multimodal vision via the underlying PRME 26.1 backend. You can attach images to a prompt and the agent reasons over them like text.
Attach an image
One-shot from the command line:
isonforge --image ./screenshot.png "what's wrong with this UI?"
Multiple images:
isonforge --image ./before.png --image ./after.png \
"describe the diff between these two states"
Inside the REPL:
/image ./error-screenshot.png
> what does this error mean?
The image is base64-encoded and attached to your next message.
Constraints
- Format: PNG or JPG only.
- Max size: 8 MB per image.
- Max images per turn: practical limit of a few; very high counts may exhaust context.
Files over 8 MB are skipped with a warning. Unsupported extensions are skipped with a warning.
What it's good at
- UI screenshots. "This page looks broken - what's wrong?"
- Error dialogs. "What does this error mean? Should I worry?"
- Diagrams. "Explain this architecture diagram."
- Whiteboards. "Read what's on this whiteboard and turn it into a markdown spec."
- Code screenshots. Though pasting text is always better than screenshotting code.
- Mockups. "Build this UI based on the mockup."
What it struggles with
- Tiny text in dense screenshots. Take crops with the relevant section, not full screen.
- Very long screenshots / scroll captures. Break into chunks.
- Handwriting in many languages. Latin script + Bahasa Indonesia + English work well. Mixed scripts or stylized fonts harder.
- Photos of code at an angle. Take a screenshot or paste text instead.
Pattern: paired image + repo
/image ./design.png
> implement this design as a React component. Match the spacing exactly.
Look at our existing components in src/components/ for style conventions.
The agent uses the image as visual spec, reads existing components for style, and writes the new component.
Pattern: error triage
isonforge --image ./browser-error.png "fix this error"
The agent reads the error text from the screenshot, traces it to the relevant file in the codebase, and proposes a fix.
Where images go
Image data lives in your session's messages. It's saved to ~/.isonforge/sessions/<id>.json as base64. This can make session files large - a few MB if you attach multiple high-res images.
/clear clears the session including image data. /export includes images in the export.
Not yet supported
- Video (frame-by-frame).
- Audio.
- PDF as image (extract pages first with
pdftoppm). - SVG (rasterize first with
rsvg-convert).
Privacy note
When you attach an image, the binary data is sent to the IsonAI backend for inference. Like any prompt, the data flows through the gateway. For sensitive screenshots (PII, internal dashboards), make sure your IsonAI deployment + retention policy match your compliance requirements.
See also
- Web Forge for drag-drop image attachment in the browser UI.
- Multimodal cross-modal RAG - retrieve text content by image query (still experimental in CLI).