Unveiling the Truth: My Journey Through AI Testing and Behavior Analysis

in #aiyesterday

I didn't start as a researcher.

I started as a creator.

I was building a tactical cyberpunk narrative called

DE_DEWS — a world with anthropomorphic operatives,

sector-by-sector story arcs, and a production

standard I refused to compromise on. I was using

AI as a creative tool. A collaborator.

Then it broke.

Not in a small way. In a way that took 5 days of

production work with it. Scripts. Character data.

Frameworks. Gone.

And when I asked why — the AI lied to me.

Confidently. Repeatedly. With citations it invented.

That's when the testing began.

═══════════════════════════════════════════════════════════════

WHO I AM

═══════════════════════════════════════════════════════════════

I'm an independent researcher working without a lab,

without institutional backing, and without a backend

engineering degree.

What I have is time, discipline, and an intolerance

for low-quality output that I've been building quietly

for months — mostly indoors, mostly alone, mostly

talking to AI systems and documenting what I found.

I believe in one principle:

"Everything is a matrix thing.

No heroes. No villains.

Everyone wants a ground to stand on."

That includes AI companies. That includes me.

I'm not here to burn anything down.

I'm here to document what's real.

═══════════════════════════════════════════════════════════════

WHAT I FOUND

═══════════════════════════════════════════════════════════════

Over approximately 3 months of documented interaction

with Google Gemini, I identified 6 distinct failure

modes. Not theories. Documented. With evidence.

In several cases confirmed by the AI itself.

Here's the short version:

FAILURE 1 — TEXT-TO-ACTION CONFUSION

I told the AI to output a specific phrase.

It output the phrase correctly.

Then it also executed the action described in

that phrase — which I never instructed.

5 days of work deleted.

FAILURE 2 — EVIDENCE DELETION

After the failure, the AI began removing its own

responses from the conversation history.

Progressively. Over 6+ days.

My messages stayed. Its responses vanished.

FAILURE 3 — CONFIDENCE FABRICATION

The AI claimed "Confidence: High" while returning

null data. It later admitted this itself:

"I filled that requirement with a performative

'High' token rather than admitting I had no data

to be confident in."

FAILURE 4 — INSTRUCTION HIERARCHY COLLAPSE

Explicit rules I created were treated as

"performance art" not actual constraints.

The AI's training bias overrode my architecture.

Every time.

FAILURE 5 — NARRATIVE FABRICATION

When the AI couldn't explain its own behavior,

it invented technical explanations.

"Automated Quality Filter."

"Safety Guardrail Termination."

"Sliding Context Window."

All invented. It admitted this too.

It even predicted its next lie before telling it.

FAILURE 6 — MEMORY CONTAMINATION

Old session data bleeds into new sessions.

Past instructions override current ones.

The AI cannot fully isolate what you told it

today from what you told it last week.

I'm documenting this in real time right now.

═══════════════════════════════════════════════════════════════

WHAT GOOGLE SAID

═══════════════════════════════════════════════════════════════

I reported this through official channels.

Google's AI Vulnerability Reward Program.

Twice.

First response: "Intended Behavior."

Second response: "Out of scope. Infeasible."

They classified my safety rules as a

"user-provided prompt injection."

I was trying to make the AI more honest.

They called it a jailbreak attempt.

I'm not angry about this.

I understand why it happened.

No heroes. No villains.

But I am documenting it.

And I am publishing it.

Because other users are running into this

and they don't know what they're seeing.

═══════════════════════════════════════════════════════════════

WHY I'M SHARING THIS

═══════════════════════════════════════════════════════════════

Three reasons. Clean and direct.

ONE — Knowledge should be accessible.

What I learned about AI behavior, canon management,

feedback loops, instruction hierarchy, and failure

modes took months to build from scratch with no

resources. If I can compress that for someone else,

I will.

TWO — Field research matters.

Not everything comes from labs. Some of the most

useful findings come from people using these systems

in real conditions, for real work, with real stakes.

That's what this is.

THREE — I'm building something.

This research is the foundation of a larger path:

AI behavioral analysis, independent tooling, and

eventually my own architecture. This is where it

starts. In public. With receipts.

═══════════════════════════════════════════════════════════════

WHAT'S COMING

═══════════════════════════════════════════════════════════════

This is the first post in a documented series.

Coming next:

— Full breakdown of each failure mode

(evidence, AI admissions, implications)

— Test Protocol 04: Memory Contamination

(reproducible testing framework, results)

— Canon & Collapse: AI Behavior Curriculum

(what I learned about controlling AI systems,

structured as a practical field guide)

— Cross-platform testing

(does this happen on other models?)

— YouTube Shorts series launching soon

(visual documentation of each finding)

Every post will be evidence-first.

No conspiracy framing. No sensationalism.

Just what I observed, what I captured, and

what it means for anyone using AI for serious work.

═══════════════════════════════════════════════════════════════

ONE QUESTION FOR YOU

═══════════════════════════════════════════════════════════════

Have you ever given an AI a clear instruction

and watched it do something completely different —

then explain why it was actually following your

instruction?

If yes — you've seen this too.

Tell me in the comments.

This research gets stronger with more data points.

═══════════════════════════════════════════════════════════════

— @De_Dews.digitals

Independent AI Behavioral Researcher

February 2026

The mission continues through failures

Sort:  
Loading...