Vision

Ash can analyze inbound images and inject structured image context into the user message before normal text processing.

Vision In 30 Seconds

Provider receives image bytes (for example from Telegram)
Vision integration analyzes images (currently via OpenAI)
Ash injects an [IMAGE_CONTEXT] block into message text
Main conversation pipeline remains text-first

Quick Start

[image]
enabled = true
provider = "openai"

Provide OpenAI credentials via config or environment:

[openai]
api_key = "sk-..."

export OPENAI_API_KEY=sk-...

Recommended Config

[image]
enabled = true
provider = "openai"
max_images_per_message = 1
max_image_bytes = 8000000
request_timeout_seconds = 12.0
include_ocr_text = true
inject_position = "prepend"      # prepend | append
no_caption_auto_respond = true

Troubleshooting

Images are ignored

uv run ash doctor

Check:

[image].enabled = true
OpenAI key is configured
inbound message actually contains image content

Requests time out on larger images

Increase timeout or lower payload size:

[image]
request_timeout_seconds = 20.0
max_image_bytes = 4000000

OCR content is noisy

Disable OCR text injection:

[image]
include_ocr_text = false

Reference (Advanced)

Vision integration hook runs before standard session/agent flow through ImageIntegration.preprocess_incoming_message.

If vision prerequisites are missing or the payload is invalid, Ash safely falls back to text-only behavior.