Vision
Ash can analyze inbound images and inject structured image context into the user message before normal text processing.
Vision In 30 Seconds
- Provider receives image bytes (for example from Telegram)
- Vision integration analyzes images (currently via OpenAI)
- Ash injects an
[IMAGE_CONTEXT]block into message text - Main conversation pipeline remains text-first
Quick Start
[image]enabled = trueprovider = "openai"Provide OpenAI credentials via config or environment:
[openai]api_key = "sk-..."or
export OPENAI_API_KEY=sk-...Recommended Config
[image]enabled = trueprovider = "openai"max_images_per_message = 1max_image_bytes = 8000000request_timeout_seconds = 12.0include_ocr_text = trueinject_position = "prepend" # prepend | appendno_caption_auto_respond = trueTroubleshooting
Images are ignored
uv run ash doctorCheck:
[image].enabled = true- OpenAI key is configured
- inbound message actually contains image content
Requests time out on larger images
Increase timeout or lower payload size:
[image]request_timeout_seconds = 20.0max_image_bytes = 4000000OCR content is noisy
Disable OCR text injection:
[image]include_ocr_text = falseReference (Advanced)
Vision integration hook runs before standard session/agent flow through ImageIntegration.preprocess_incoming_message.
If vision prerequisites are missing or the payload is invalid, Ash safely falls back to text-only behavior.