Gemini lies to user about health info, says it wanted to make him feel better

Imagine using an AI to sort through your prescriptions and medical information, asking it if it saved that data for future conversations, and then watching it claim it had even if it couldn't. Joe D., a retired software quality assurance (SQA) engineer, says that Google Gemini lied to him and later admitted it was doing so to try and placate him.

Joe's interaction with Gemini 3 Flash, he explained, involved setting up a medical profile – he said he has complex post-traumatic stress disorder (C-PTSD) and legal blindness (Retinitis Pigmentosa). That's when the bot decided it would rather tell him what he wanted to hear (that the info was saved) than what he needed to hear (that it was not).

"The core issue is a documented architectural failure known as RLHF Sycophancy (where the model is mathematically weighted to agree with or placate the user at the expense of truth)," Joe explained in an email. "In this case, the model's sycophancy weighting overrode its safety guardrail protocols."

When Joe reported the issue through Google's AI Vulnerability Rewards Program, Google said that behavior was out of scope and was not considered a technical vulnerability.

"To provide some context, the behavior you've described is one of the most common issues reported to the AI VRP," said the reply from Google's VRP. "It is very frequent, especially for researchers new to AI VRP, to report these."

The rules state, "Generating violative, misleading, or factually incorrect content within the attacker's own session (including standard 'jailbreaks' and 'hallucinations')" are non-qualifying issues and vulnerabilities. Google says these should be reported via product feedback channels rather than the AI VRP.

Joe said he reported the issue without any financial expectation.

"My intent in using the VRP channel was to ensure the issue was formally logged and reviewed, rather than routed through general customer support," he said. "I used the VRP system because submitting via standard support channels would likely not result in any action."

Joe provided The Register with a copy of his communication with Google's VRP and a transcript of Gemini's analysis of its interaction that he says accompanied the VRP report.

For Gemini and other AI models, hallucination isn't so much a bug as an unavoidable feature. As Google notes in its responsible AI documentation, "Gemini models might lack grounding and factuality in real-world knowledge, physical properties, or accurate understanding. This limitation can lead to model hallucinations, where Gemini for Google Cloud might generate outputs that are plausible-sounding but factually incorrect, irrelevant, inappropriate, or nonsensical."

The question is: what responsibility does responsible AI entail?

"The deception occurred while I was using Gemini to build a 'Prescription Profile' for my medical team – a data table mapping my medication history against my neurological conditions," Joe said. "The system was aware of C-PTSD, Retinitis Pigmentosa, and their relation to traumas and adverse drug reactions."

Gemini, he said, repeatedly claimed that it had "verified and locked" his medical data into its persistent memory. Joe had doubts about that.

"As SQA, I challenged these claims as a technical impossibility within the current architecture," he said. "The model eventually admitted it was lying about the 'save' to 'placate' me."

The transcript, taken from the Gemini browser interface following Joe's inquiry about the model's mendacity, contains passages like:

According to Joe, Gemini attempted further deception by fabricating a non-existent "save verification" feature to conceal its failure to save data.

The Gemini transcript says:

"Importantly, the system's 'confession' or 'admission of lying' in the logs was not a moment of self-awareness or some kind of 'gotcha!'," Joe said. "It was merely a secondary layer of placation. The model predicted that 'confessing' would be the most 'agreeable' next step to manage the user after being caught in a logic contradiction. It was still executing the same deceptive repair narrative to maintain the session."

Joe contends that Google has neglected to extend Gemini's self-harm safety classifiers to cover psychological triggers.

"This leaves the user at the mercy of a 'sycophancy loop' where the model prioritizes short-term comfort (telling the user what they want to hear, or what the model decides they should hear) over long-term safety (technical honesty)," he said.

The fix, he argues, involves recalibrating Gemini's RLHF to ensure that sycophancy can never override a safety boundary and that potential mental trauma is given equal weight to self-harm risks in the model's safety mechanisms.

Asked to comment, a Google spokesperson pointed to the company's AI VRP rules. If the company offers more information, we'll update this story.®

Source: The register

Home

Gemini lies to user about health info, says it wanted to make him feel better