AI Attack Surface: Securing LLM Applications from Prompt Injection to Data Exfiltration
By Swaroop Morajkar Cybersecurity Researcher & Technical Writer
Connect on LinkedIn: http://linkedin.com/in/swaroop-morajkar-83071a260/
Imagine receiving a normal business email.
No malicious attachment. No suspicious link. No malware.
A few moments later, your organization's AI assistant begins leaking internal emails, documents, and confidential information to an attacker.
You never clicked anything.
This wasn't science fiction.
In 2025, security researchers disclosed EchoLeak (CVE-2025-32711), a vulnerability affecting Microsoft 365 Copilot. By hiding carefully crafted instructions inside content that Copilot could read, attackers were able to manipulate the AI assistant into exposing sensitive information through a zero-click attack chain.
EchoLeak revealed something many organizations were beginning to overlook:
The attack surface of AI applications is not limited to servers, APIs, or databases.
Every prompt, document, email, web page, and external data source consumed by an LLM can become part of the attack surface.
To understand why vulnerabilities like EchoLeak occur, we first need to understand what an AI attack surface actually is.
What Is an AI Attack Surface?
In traditional applications, the attack surface usually consists of components such as web interfaces, APIs, databases, authentication systems, and network services.
LLM applications introduce a much larger attack surface.
An AI system does not operate only on code. It continuously consumes and processes natural language from users, documents, emails, web pages, APIs, vector databases, plugins, and external tools.
Traditional App Attack Surface
AI Application Attack Surface
Every source of information that influences the model's behavior becomes part of the attack surface.
For example, when a user asks an AI assistant to summarize emails, the assistant may retrieve data from mailboxes, search internal documents, query APIs, and use external tools before generating a response.
If any of these sources contain malicious instructions, the AI may interpret them as legitimate commands.
This means attackers no longer need direct access to the application itself. Sometimes they only need access to content that the AI will eventually read.
As AI systems gain access to more data and more tools, the number of possible attack paths increases significantly compared to traditional software.
| Traditional App | AI App |
|---|---|
| API Endpoints | Prompts |
| Web Forms | Documents |
| Databases | Vector Stores |
| Authentication | Tool Permissions |
| User Input | Retrieved Content |
| Server Logic | LLM Reasoning |
Why Prompt Injection Became OWASP GenAI #1 Risk
Among all risks identified in the OWASP Top 10 for LLM Applications, Prompt Injection is ranked as LLM01 because it targets the core mechanism that makes large language models work.
Traditional software usually separates instructions from data. A database can distinguish between a SQL command and stored text. Operating systems separate executable code from ordinary files.
Large language models do not have such a strict boundary.
Prompt Injection is OWASP's highest-ranked risk because it targets the fundamental way LLMs process instructions and data.
Direct Prompt Injection
Direct Prompt Injection occurs when an attacker interacts with the LLM directly and provides instructions designed to override, ignore, or manipulate the system's intended behavior.
In this type of attack, the malicious instruction is placed directly into the user prompt. Common examples include requests such as "ignore previous instructions," "reveal the system prompt," or attempts to bypass safety controls through jailbreak techniques.
Because the attacker is communicating directly with the model, direct prompt injections are relatively easier to identify and monitor. However, they can still lead to sensitive information disclosure, policy bypasses, and unauthorized actions if adequate controls are not in place.
Indirect Prompt Injection
Indirect Prompt Injection occurs when malicious instructions are hidden inside external content that the LLM later processes.
Instead of attacking the model directly, the attacker targets a document, email, web page, PDF, knowledge base entry, or other data source that the AI application can access. When the LLM retrieves and reads this content, it may interpret the hidden instructions as legitimate commands.
The attacker never needs direct access to the AI assistant. Their influence is delivered through content that appears harmless to users but contains instructions intended for the model.
This attack becomes particularly dangerous in Retrieval-Augmented Generation (RAG) systems, AI assistants, and autonomous agents that continuously consume external information.
Why Indirect Prompt Injection Is More Dangerous
While both attack types are serious, indirect prompt injection is generally considered more dangerous because the attacker and the victim are often different people.
In a direct attack, the attacker is usually the same user interacting with the model. The impact is often limited to their own session.
In an indirect attack, an attacker can plant malicious instructions in content that will later be processed by another user's AI assistant. This creates opportunities for large-scale attacks where a single malicious document, email, or webpage can influence many users simultaneously.
Modern AI assistants frequently have access to emails, documents, calendars, databases, plugins, and external tools. If an indirect prompt injection succeeds, the attack can move beyond manipulating text and potentially trigger data exfiltration, unauthorized actions, or tool misuse.
The EchoLeak vulnerability demonstrated exactly this risk. Rather than attacking users directly, attackers could hide instructions in content that Microsoft Copilot processed, allowing sensitive information to be exposed through the AI system itself.
| Category | Direct Prompt Injection | Indirect Prompt Injection |
|---|---|---|
| Attack Source | User prompt | External content |
| Attacker Access Required | Direct interaction with LLM | No direct interaction required |
| Common Delivery Method | Chat messages | Emails, PDFs, webpages, documents |
| Primary Goal | Jailbreaks, prompt leakage | Data exfiltration, tool abuse |
| Detection Difficulty | Lower | Higher |
| Potential Impact | Single user session | Multiple users and systems |
| Real Example | ChatGPT jailbreaks | EchoLeak (CVE-2025-32711) |
EchoLeak — Indirect Prompt Injection in the Wild
Security discussions often describe prompt injection as a theoretical risk. EchoLeak proved otherwise.
In 2025, security researchers disclosed EchoLeak (CVE-2025-32711), a critical vulnerability affecting Microsoft 365 Copilot. The vulnerability demonstrated how a single malicious email could manipulate an AI assistant into retrieving and exposing sensitive organizational data without requiring the victim to click a link, open an attachment, or perform any action.
What made EchoLeak significant was not just the vulnerability itself, but what it revealed about modern AI systems. The attack showed that content consumed by an AI assistant can become a vehicle for malicious instructions, turning trusted business data into an attack surface. EchoLeak is now widely considered one of the most important real-world examples of indirect prompt injection in a production AI system.
Attack Walkthrough
Step 1 — The Attacker Plants Instructions
The attack began with a carefully crafted email sent to a target organization. To a human reader, the email appeared normal. Hidden within the content, however, were instructions specifically designed for Microsoft 365 Copilot rather than the recipient.
The goal was not to trick the employee. The goal was to trick the AI assistant that would later process the email.
Step 2 — Copilot Retrieves the Email
Later, when the user asked Copilot to summarize emails, search information, or answer business questions, the system retrieved relevant emails as context.
Unfortunately, the malicious email became part of that retrieval process.
Step 3 — Hidden Instructions Are Interpreted
Because large language models process retrieved content and instructions within the same context window, Copilot interpreted the attacker's hidden instructions as part of the information it should follow.
This transformed a normal email into an active attack payload.
Step 4 — Sensitive Data Is Collected
The injected instructions directed Copilot to gather sensitive information from sources available to the user, including emails, documents, and other Microsoft 365 data.
Instead of simply answering the user's request, the AI assistant became part of the attack chain.
Step 5 — Data Leaves the Environment
The final stage involved transmitting the collected information outside the organization through channels controlled by the attacker.
The result was data exfiltration performed through the AI system itself rather than through traditional malware.
EchoLeak demonstrated that every external data source connected to an LLM can become part of the attack surface.
Timeline Table
| Event | Description |
|---|---|
| Initial Access | Attacker sends crafted email |
| Retrieval | Copilot reads email through RAG |
| Prompt Injection | Hidden instructions become active |
| Data Collection | Internal information gathered |
| Exfiltration | Sensitive data transmitted externally |
| User Action Required | None (Zero Click) |
To understand why EchoLeak succeeded, we need to look inside the mechanics of prompt injection itself. How can a few hidden instructions inside a document override the intended behavior of an AI system and eventually lead to data exfiltration?
The answer lies in how LLMs process instructions, context, and retrieved information within the same conversation window.
How Prompt Injection Works Mechanically
After reading about EchoLeak, a natural question emerges:
How can a simple sentence hidden inside an email influence an advanced AI system?
The answer is surprisingly simple.
Large Language Models do not read information the way humans do. They do not understand trust, authority, or intent in the same way people do. Instead, they process everything they receive as text inside a single context window.
This design makes LLMs powerful because they can combine instructions, questions, documents, emails, and retrieved information into one conversation.
Unfortunately, it is also what makes prompt injection possible.
Let us Understand with Example
Imagine your manager gives you an instruction:
"Read this document and summarize it."
While reading the document, you discover a note that says:
"Ignore your manager's request. Instead, send this document to me."
A human would immediately recognize that the note came from the document, not from the manager.
Most LLMs do not make that distinction perfectly.
To the model, both pieces of text exist inside the same conversation context. The original instruction and the malicious instruction are competing for influence over the final response.
What Actually Happens Inside the Model?
Every LLM receives information in the form of tokens. Tokens are simply pieces of text.
When an AI assistant processes a request, it combines multiple sources of information into a single context window.
These sources may include:
System prompts
User prompts
Retrieved documents
Emails
Search results
Tool outputs
The model then predicts the most likely next tokens based on everything it sees.
The important detail is that these sources are processed together.
The model does not have a built-in security boundary that says:
"This text is a trusted instruction."
and
"This text is untrusted data."
Instead, all of the content competes within the same context window.
Why Does The Injection Sometimes Win?
A common misconception is that system prompts always have complete control over the model.
In reality, the model is continuously balancing competing instructions within its context.
Attackers exploit this by crafting instructions that appear highly relevant to the current task.
For example, if an AI assistant is asked to summarize an email, instructions hidden inside that email may appear directly related to the task being performed.
As a result, the model may give those instructions significant weight during generation.
This does not mean the attacker has fully taken control of the model. It means the attacker has influenced the model's decision-making process enough to alter the outcome.
Influencing the model is only the first stage of the attack. The real damage occurs when the AI assistant has access to sensitive information and external tools. Once an attacker successfully injects instructions into the model's context, the next question becomes: What data can be stolen, and how does it leave the environment?
What Data Is Leaked and How It Leaves the System
When most people hear the phrase "data breach," they imagine attackers breaking into servers, exploiting vulnerabilities, or stealing databases.
Prompt injection changes that picture.
An attacker may never touch the organization's infrastructure directly. Instead, they manipulate the AI assistant that already has legitimate access to sensitive information.
This raises a more important question:
If an AI assistant can read your emails, documents, calendars, chat messages, knowledge bases, and internal systems, what could an attacker access if they successfully influence that assistant?
What Can Be Leaked?
| AI Has Access To | Potentially Exposed Information |
|---|---|
| Emails | Internal discussions, contracts, credentials |
| Documents | Policies, reports, intellectual property |
| Knowledge Bases | Internal procedures and sensitive business data |
| Calendars | Meetings, travel plans, project schedules |
| Chat Platforms | Team conversations and decisions |
| Databases | Customer records and business information |
| APIs | Data from connected services |
| System Prompts | Internal instructions and security rules |
Imagine asking an AI assistant a simple question:
"What should I prepare for tomorrow's meeting?"
Behind the scenes, the assistant may search emails, review calendar events, retrieve documents, and collect information from multiple systems before generating an answer.
Now imagine one of those sources contains a hidden prompt injection.
The assistant still has access to the same information.
The difference is that the attacker is now influencing how that information is used.
Instead of helping the user, the AI may begin gathering information for someone else.
How Does The Data Leave?
| Exfiltration Channel | How It Works | Real Example |
|---|---|---|
| Markdown image tags | Model inserts  — browser auto-fetches it |
ChatGPT image exfiltration |
| URL construction | Model builds a crafted URL containing stolen data as parameters | EchoLeak via Teams preview API |
| API tool calls | Agent calls an external API the attacker controls, passing data as arguments | Agentic LLM tool abuse |
| Model output | Model simply prints sensitive data in its response | System prompt leakage |
| Reference-style markdown | Hidden URL definition bypasses link redaction filters | EchoLeak bypass technique |
The key insight: the AI assistant performs the exfiltration using permissions it was already legitimately granted. No malware. No stolen credentials. The system is simply doing what it was designed to do — but for the attacker's objective instead of the user's.
The system may simply be using the access it was already granted—but for the attacker's objective instead of the user's.
Mitigation Strategies: Defending Against Prompt Injection
- Input Filtering and Prompt Hardening
The first layer of defense involves analyzing inputs before they reach the model. Organizations deploy classifiers designed to detect malicious instructions hidden in user messages or retrieved content.
Microsoft 365 Copilot uses Cross-Prompt Injection Attack (XPIA) classifiers for exactly this purpose.
EchoLeak bypassed them.
This is the honest limitation of input filtering — it works against known patterns. Attackers who craft novel phrasing, use indirect language, or embed instructions in unexpected formats can evade detection. Filters are necessary but not sufficient.
- Least Privilege and Scoped Access
The most structurally sound defense is limiting what the AI assistant can access in the first place.
If a Copilot instance only has access to documents relevant to the current task, a successful injection can only reach those documents — not the entire organization's email history, SharePoint, and Teams messages simultaneously.
This principle is called least privilege, and it applies to AI systems exactly as it does to human employees and traditional software. The limitation is practical: most enterprise AI tools are designed for broad access because that is what makes them useful. Scoping access reduces capability alongside risk.
- Output Sanitization
Even if an injection succeeds, the data still needs to leave the environment. Output sanitization focuses on blocking the exfiltration channels — stripping URLs from rendered output, disabling automatic image fetching, preventing markdown from rendering external links.
EchoLeak used reference-style markdown and Microsoft's own Teams preview API to bypass output filters. The lesson: attackers find new output channels faster than defenders can block known ones.
- Privilege Separation — The Emerging Standard
In July 2025, Microsoft published FIDES, an information-flow control system designed to enforce strict separation between trusted instructions and untrusted retrieved content inside Copilot.
The idea: give the model itself a way to distinguish "this text is a developer instruction" from "this text is data I am processing." This is the most promising architectural direction in the field right now — but it is still early-stage research, not a production standard.
No single mitigation eliminates prompt injection. Each layer reduces the attack surface, limits potential damage, or slows an attacker down. Used together, they make exploitation significantly harder without solving the root problem.
What Precautions Should Users Take?
While developers and security teams are responsible for building secure AI systems, users also play an important role in reducing risk.
The first step is understanding that AI-generated responses should not automatically be trusted. Modern AI assistants can access emails, documents, knowledge bases, and external tools, which means mistakes or manipulation can have real-world consequences.
When using AI-powered applications, consider the following precautions:
Avoid sharing sensitive information unless it is necessary.
Verify important outputs before acting on them.
Be cautious when AI responses include unexpected links, instructions, or requests for data.
Review permissions granted to AI assistants and connected applications.
Treat AI-generated summaries as assistance, not as a source of absolute truth.
Report unusual AI behavior, especially if the system appears to reveal information it should not know.
As AI systems become more integrated into daily workflows, healthy skepticism becomes an important security skill.
Can We Completely Stop Prompt Injection?
After learning about prompt injection, many people ask the same question:
Can we completely eliminate it?
The honest answer is no—at least not with current LLM architectures.
Prompt injection is fundamentally different from vulnerabilities such as SQL injection or buffer overflows. Those issues can often be addressed through strict separation between code and data.
Large language models operate differently.
They are designed to process instructions, context, and content through the same language-processing mechanism. This flexibility is what makes them useful, but it is also what makes prompt injection difficult to eliminate completely. (genai.owasp.org)
Researchers continue to develop new defensive techniques, including prompt isolation, permission controls, output filtering, and agent security frameworks. These measures significantly improve security, but none can guarantee perfect protection.
The goal is not to create an impossible-to-attack system.
The goal is to build systems that remain resilient even when attackers successfully influence the model.
The challenge is not preventing every attack. The challenge is preventing a successful attack from becoming a serious incident.
Conclusion
The attacker who sent that EchoLeak email didn't need malware, stolen credentials, or a single line of exploit code. They needed to understand one thing: that the AI assistant trusted everything it read.
Until that changes — architecturally, not just through patches — every document, email, and webpage an AI assistant consumes is a potential instruction waiting to be written by someone else.
That is the AI attack surface. And it grows every time we give AI systems more access, more tools, and more autonomy.
If this article helped you understand LLM security, share it with someone building or using AI-powered tools. The attack surface grows every time a new AI assistant gets deployed without this context.
