OpenClaw Unboxed (@openclaw): "ollama just quietly unlocked something big. qwen3.5 now runs locally with image + text input. meaning your agents can see what they’re doing, not just read prompts. run it: ollama run qwen3.5 this lets a local agent analyze screenshots, diagrams, PDFs, or UI states, then dec…"

ollama just quietly unlocked something big.

qwen3.5 now runs locally with image + text input.

meaning your agents can see what they’re doing, not just read prompts.

run it: ollama run qwen3.5

this lets a local agent analyze screenshots, diagrams, PDFs, or UI states, then decide what action to take.

simple loop builders are experimenting with:

1 capture screenshot
2 send image + task to qwen3.5
3 model decides next action
4 automation layer executes

stack looks like:

ollama
qwen3.5
playwright
python or node agent loop

old agents read prompts.

multimodal agents read environments.

that’s where local automation starts to feel like real operators.

…

curious what people here are building with vision models locally.

browser agents? screen-aware copilots? something weirder?

Mar 4

7:54 PM