Make money doing the work you believe in

ollama just quietly unlocked something big.

qwen3.5 now runs locally with image + text input.

meaning your agents can see what they’re doing, not just read prompts.

run it: ollama run qwen3.5

this lets a local agent analyze screenshots, diagrams, PDFs, or UI states, then decide what action to take.

simple loop builders are experimenting with:

1 capture screenshot
2 send image + task to qwen3.5
3 model decides next action
4 automation layer executes

stack looks like:

ollama
qwen3.5
playwright
python or node agent loop

old agents read prompts.

multimodal agents read environments.

that’s where local automation starts to feel like real operators.

curious what people here are building with vision models locally.

browser agents? screen-aware copilots? something weirder?

Mar 4
at
7:54 PM
Relevant people

Log in or sign up

Join the most interesting and insightful discussions.