Agents can now see images returned by MCP tools
🎯 What is it?
We've shipped a new file system for conversations. The first thing it unlocks: agents can now see images returned by MCP tools. When a tool produces an image (JPEG, PNG, GIF, WebP), the agent receives the actual visual content and can interpret, analyze, and describe it.
💡 Why is it useful?
Until now, when an MCP tool returned an image, agents were blind to it. They could acknowledge the file existed but couldn't look at it. This was a real gap: screenshots from Microsoft Drive, charts from data tools, visual outputs from external integrations — all invisible to the agent. That's now fixed.
⚙ How does it work?
Behind the scenes, all conversation files are now organized in a unified, structured system. When an agent needs to access an image returned by an MCP tool, it receives the actual visual content instead of just metadata.
✨ Concrete Use Cases
Here's how you could use it:
Screenshot analysis: Have an agent interpret a screenshot returned by a Drive or browser tool — without manually describing what's on screen.
Chart and graph reading: Ask an agent to extract trends or key metrics from a chart generated by a data tool.
Visual tool output review: Any MCP tool that produces images can now have its output directly analyzed by the agent in context.
