Microsoft has announced a significant enhancement to its artificial intelligence (AI) assistant, Copilot, with the introduction of a new feature known as Copilot Vision. This latest development allows the tool to “see” what appears on a user’s screen — but only if the user actively opts in.
Revealed by Mustafa Suleyman, CEO of Microsoft AI, the feature is designed to improve how users interact with the digital assistant. According to Suleyman, Copilot Vision is intended to offer a more intuitive, visual-based aid that can support users in a variety of real-time scenarios. Once enabled, the AI assistant will be capable of interpreting on-screen content and offering tailored guidance in response.
Described by Microsoft as a “talk-based experience,” Copilot Vision allows users to simply speak aloud, asking for help while the assistant analyses the screen’s content to provide relevant support. While it’s not an entirely new concept, the addition of screen interpretation represents a notable leap forward in how AI can integrate into everyday computing tasks.
Suleyman gave practical examples of how Copilot Vision could be used: from guiding users through cooking instructions displayed in a browser window, to parsing complicated job descriptions, generating interview questions, or even drafting a customised cover letter — all in real time, based on what’s visible on the screen.
However, the feature comes with certain caveats and privacy safeguards. According to Microsoft’s official support page, Copilot Vision may highlight specific portions of the screen to help users locate useful information. Importantly, it does not take control of the device, click on links, or perform actions on the user’s behalf.
Privacy, of course, is a major consideration. Microsoft is keen to reassure users that while Copilot Vision is active, only the AI’s responses are recorded. The content of the screen, user inputs, images, or websites being viewed are not stored. Moreover, users have the power to stop the feature at any time, either by closing the relevant browser window or manually ending the session.
For now, the more sophisticated capabilities of Copilot Vision are available exclusively to those who hold a Copilot Pro subscription. These users will benefit from the assistant’s abilities not just in Microsoft Edge, but across a broader range of applications. This includes creative software such as Photoshop, video editing platforms, and even gaming environments. In fact, Copilot Vision can provide players with in-game advice — including tips within popular titles like Minecraft.
The rollout of Copilot Vision marks a key step in Microsoft’s ongoing strategy to embed AI more deeply within its ecosystem. It follows the company’s broader efforts to make artificial intelligence more accessible and useful in day-to-day digital life — both professionally and personally.
While some users may be cautious about enabling a tool that can “see” their screen, Microsoft’s opt-in approach and firm data privacy policies are designed to foster trust and encourage adoption. For those already accustomed to using digital assistants, the new capabilities represent an exciting enhancement that could significantly reduce the effort required to complete tasks across devices.
With tech giants increasingly leaning into visual AI and contextual understanding, Copilot Vision places Microsoft at the forefront of the next phase in intelligent user assistance — where your AI doesn’t just listen, but sees and understands your digital world too.