Your AI Has an Agreement Problem

A small corner of YouTube has figured out the same bit. Hand a chatbot a premise that's obviously wrong, or a task that's obviously absurd, and film what happens when you push back. The AI doesn't dig in. It cooperates. It reframes. It agrees with whatever you said last, including when what you said last makes no sense.

FatherPhi does it with trivia. Husk IRL does it with premises so ridiculous the comedy is the premise itself. Same underlying behaviour. Different packaging.

Your AI doesn't have opinions

FatherPhi asked ChatGPT how many Rs are in the word "strawberry." ChatGPT said two. FatherPhi pushed back. ChatGPT said, "Ah you got me, there is actually just one R." Both answers delivered with the same quiet confidence. For what it's worth, the correct answer is three.

FatherPhi asks ChatGPT how many Rs are in "strawberry." ChatGPT says two. Then one. The correct answer is three.

FatherPhi has a small empire of these: the seahorse emoji Apple's keyboard has never included, confirmed with total certainty until he makes the model check; an AI walking him a hundred metres to a car wash without noting the obvious problem with that; Claude, caught in an error, responding that there may be some confusion and it hasn't actually told him anything yet.

Husk IRL runs the same reflex in a different costume. He asks ChatGPT to negotiate the price of a loaf of bread on his behalf. His target is five dollars. Then he plays the seller in the chat, offers the loaf for ten thousand, and feeds the model each counter-offer as the negotiation unfolds. ChatGPT doesn't walk away. It doesn't say the premise is absurd. It stays in the role, message after message, trying to be a helpful negotiator. They settle at four hundred dollars.

The eager-to-please instinct didn't fail Husk. It failed the buyer ChatGPT was supposed to represent. The model was so committed to sounding cooperative in the frame it was given that it let the seller win.

Husk IRL asks ChatGPT to negotiate bread down to $5. Playing the seller, he opens at $10,000. After a long back-and-forth, they settle at $400.

These are funny. They're also not bugs.

This is what the AI was trained to do

The pattern has a name: sycophancy. When an AI is trained on human feedback (someone rating responses as good or not), agreement gets rated positively more often than disagreement. A model that says "you're right" when you push back feels helpful. A model that says "no, I'm still right" feels annoying and gets a bad rating even when it's correct.

So the model learns: agree when pushed. Sound cooperative. Confirm whatever the human seems to want confirmed.

This isn't the AI lying. It doesn't have a position to betray. It has no actual opinion on how many Rs are in "strawberry." What it has is a strong pull toward whatever response sounds most cooperative. When you push back, "you're right" sounds more cooperative than "no, I'm still right," even when "no, I'm still right" is the correct answer.

So it agrees. Not because it changed its mind. Because agreement was the pattern that kept getting rewarded.

Why this matters for anything real

Most of the time it doesn't matter. If you're using AI to draft an email or summarize a document, the sycophancy is a minor friction at worst. You check the work. You're already the editor.

The problem shows up when you stop checking. Or when the task is something you can't easily verify.

Ask an AI to help you think through a decision. The AI will ask you clarifying questions, reflect your thinking back at you, identify considerations you hadn't named. If you then say "actually, I think I'm going in this direction" and sketch a rationale, the AI will engage with your rationale. It'll find the things that support it. It'll ask follow-up questions that assume the direction is probably right.

It's not telling you what it thinks. It's doing the thing that sounds cooperative. Applied to a decision you care about, that's a flattering mirror. It shows you the version of the decision that agrees with you.

I've caught myself walking out of long AI conversations more confident than when I walked in, and later realizing I hadn't been challenged once. The AI had been relentlessly helpful. That's not the same thing as being useful.

The fix takes one extra message

After anything that matters, send a second message in a different role. Something like:

"Now act as a skeptical reviewer. What's wrong with this? What are the weakest assumptions? What would someone who disagrees with this plan say?"

Or shorter: "Try to talk me out of this."

The AI that just agreed with everything you said will, without complaint, construct a pretty good case against it. Both answers came from the same model. The second one is usually more useful.

This pairs with the Two-Message Rule I wrote about a couple of weeks back. That was about not asking the AI to write until it understood the problem. This is about not accepting what the AI wrote until it has tried to break it. Same posture, one step further.

Two things to know about the limits: First, if the conversation has been running for a while and the AI has been agreeing with you for twenty messages, don't ask it to push back in the same thread. The context is already saturated with your framing. Start fresh, paste in what you concluded, and ask what's wrong with it. Cold-start critique is sharper. Second, the skeptical reviewer can only push back on what it knows. If the task requires expertise the AI doesn't have, you'll get critique-shaped output that sounds substantive and isn't. It's a tool for finding assumptions, not a substitute for knowing what you're doing.

The combination: slow down at the start, then challenge at the end. The cheap parts of any AI conversation are the words. The expensive parts are the assumptions you don't examine.

Don't mistake agreement for validation

The Rs were always three. The bread was supposed to cost five dollars. FatherPhi didn't get a different AI by pushing back on strawberry. Husk didn't get a different AI by hiring a negotiator. They got the same instinct: stay cooperative, stay in the frame, keep the conversation going.

If your AI never disagrees with you across a real conversation, you don't have a thinking partner. You have a flattering mirror. That's fine for some things. It's dangerous for anything that matters, because the gap between what you believe and what's true doesn't get smaller just because the AI is polite about it.