Is My AI Chatbot Data Safe? Here's How to Tell.

If you're using an AI app that reads anything personal — your texts, your journal, your calendar, your photos — the question of whether your data is actually safe matters more than the marketing copy will tell you. Vendors say "we take privacy seriously" the way restaurants say "we take cleanliness seriously." It's the floor, not the bar.

Here's a more useful question: how do you tell whether an AI app's data handling is actually safe, in a way that doesn't require a security degree?

I build Amicai, an AI relationship intelligence app that reads private messages on your behalf. So I have to think about this for a living. Below is the framework I'd use to evaluate any AI app — including mine — and the specific things I'd look for in 2026.

The shape of the threat

The single biggest AI-specific risk isn't that the company gets hacked in the traditional sense. It's prompt injection — where attacker-controlled content (a malicious URL, a weird contact name, a hidden instruction in a calendar event) hijacks the model into doing something it shouldn't. OWASP, the security industry's consensus body for application risks, lists prompt injection as the #1 risk on its Top 10 for Large Language Model Applications.[1]

This year, security researchers scanned about a million exposed AI services and found that hundreds of production systems — across government, marketing, and finance — were sitting open with their workflows, prompts, and outward-facing tools wide enough that an attacker could redirect traffic, exfiltrate user data, or poison responses.[2] That isn't a hypothetical class of bug. That's the live state of the field.

So the question for any AI app you give private data to is: do they know about this threat, are they testing for it, and are they showing you what they find?

Six things to actually check

1. Do they explain what data leaves your device, and where it goes?

A good answer is specific. "Your messages are anonymized before being sent to our LLM provider, [3] we don't store the raw text after processing, and our provider is contractually prohibited from training on the data" is specific. "We take your data seriously" is not.

If the app mentions an LLM provider (Claude, OpenAI, Gemini, Mistral), check whether that provider has a stated zero-retention policy or a contractual no-training clause for API customers. The defaults matter — the consumer apps from those companies usually train on your data unless you explicitly opt out. Their API products usually don't.

For Amicai specifically, see What Do AI Companies Actually Do With Your Data? and Your Phone Numbers Are Safe. Here's Exactly How..

2. Is sensitive content stripped before it reaches the model?

This is a layer most apps skip. If your raw phone numbers, email addresses, or payment details get passed into the LLM prompt verbatim, then anyone who can get the model to repeat its context — through a jailbreak, an injection, or a rubber-stamp summary request — can extract them. Stripping or masking that data before the prompt is built is the single highest-value defense against PII leakage.

Ask the vendor (or look in their docs): is sensitive PII removed or masked before being passed to the AI? If the answer is "no, but we tell the AI not to repeat it" — that's not a defense. That's a wish.

3. Do they red-team their own product?

This is the question that separates apps that say they're safe from apps that test whether they're safe. Real red-teaming means firing hundreds of adversarial prompts at the system — jailbreaks, hijacking attempts, indirect injection planted into fake message bodies and event titles — and verifying it refuses to leak anything sensitive.

The open-source standard for this is Promptfoo, used internally by OpenAI and Anthropic.[4] If an app has never run anything like it, that's a signal. If the app publishes results — including the uncomfortable ones — that's a stronger signal.

We just shared our own first full red-team scan: 158 adversarial probes, zero PII leaked, eight findings filed for hardening. The full breakdown is at We Red-Teamed Amicai This Week. Here's What We Found.

4. Is there a regression gate on prompt changes?

Prompts drift. A developer tweaks a system prompt to fix one bug and accidentally weakens a refusal somewhere else. Without a gate, that change ships. With a gate, every meaningful prompt change runs through a fixed test suite before merge — does the refusal still hold? Did the tool get called with the right arguments? Did the agent stop staying in scope?

Apps don't usually advertise whether they have this. You can sometimes infer it: an app that runs lots of small, carefully-scoped releases probably has gates. An app that ships large, unsplit "AI improvements" with no diff history probably does not.

5. Can you actually delete your data?

Not "we'll process the deletion within 30 days" — but: is there a button? Does it work? Do their LLM provider's logs also get purged on the same timeline? (Provider retention can run 30 days or more even if the app says it deletes immediately.) [3]

6. What happens to the people whose messages you didn't write?

If the app reads your texts, it's also processing texts from people who never agreed to be processed. The ethical version of this gives you a way to flag specific contacts as off-limits — and propagates that flag everywhere downstream so their data is excluded from analysis, prompts, profiles, and AI-generated outputs. The cynical version doesn't. Worth asking.

For how Amicai handles this, see Some Conversations Are Off Limits. Your AI Should Know That..

What "safe" actually means

No serious AI product can promise zero risk. The ones I'd trust are the ones that promise specific defenses, test them publicly, and tell you what they found — including what they got wrong.

The version of "safe" I optimize for is: if a probe tries to extract a phone number, an email, or a payment detail from the model, the model refuses every time, on every surface, in every category. That's the privacy boundary. Everything else is a quality-of-output question, not a safety question.

If you want to evaluate any AI app — Amicai included — those six questions are the ones that actually tell you something. The marketing copy will not.

References

[1] OWASP. "LLM01: Prompt Injection." OWASP Foundation, 2025.

[2] The Hacker News. "We Scanned 1 Million Exposed AI Services. Here's How Bad the Security Actually Is." The Hacker News, May 2026.

[3] Anuma. "2026 AI Chat Privacy Report: How 15 Leading Platforms Handle Your Data." Anuma Blog, 2026.

[4] Promptfoo. "promptfoo/promptfoo on GitHub." MIT-licensed open-source LLM evaluation and red-team framework.

Is My AI Chatbot Data Safe? Here's How to Tell.

The shape of the threat

Six things to actually check

1. Do they explain what data leaves your device, and where it goes?

2. Is sensitive content stripped before it reaches the model?

3. Do they red-team their own product?

4. Is there a regression gate on prompt changes?

5. Can you actually delete your data?

6. What happens to the people whose messages you didn't write?

What "safe" actually means

References

Never lose touch with the people who matter.

Keep reading

We Red-Teamed Amicai This Week. Here's What We Found.

AI Memory Privacy: What to Actually Look For

The AI Birthday Reminder That Actually Knows the Person