Skip to content

Vision

Ash can analyze inbound images and inject structured image context into the user message before normal text processing.

Vision In 30 Seconds

  • Provider receives image bytes (for example from Telegram)
  • Vision integration analyzes images (currently via OpenAI)
  • Ash injects an [IMAGE_CONTEXT] block into message text
  • Main conversation pipeline remains text-first

Quick Start

[image]
enabled = true
provider = "openai"

Provide OpenAI credentials via config or environment:

[openai]
api_key = "sk-..."

or

Terminal window
export OPENAI_API_KEY=sk-...
[image]
enabled = true
provider = "openai"
max_images_per_message = 1
max_image_bytes = 8000000
request_timeout_seconds = 12.0
include_ocr_text = true
inject_position = "prepend" # prepend | append
no_caption_auto_respond = true

Troubleshooting

Images are ignored

Terminal window
uv run ash doctor

Check:

  • [image].enabled = true
  • OpenAI key is configured
  • inbound message actually contains image content

Requests time out on larger images

Increase timeout or lower payload size:

[image]
request_timeout_seconds = 20.0
max_image_bytes = 4000000

OCR content is noisy

Disable OCR text injection:

[image]
include_ocr_text = false

Reference (Advanced)

Vision integration hook runs before standard session/agent flow through ImageIntegration.preprocess_incoming_message.

If vision prerequisites are missing or the payload is invalid, Ash safely falls back to text-only behavior.