Talk to Me, Baby! Or at Least Think Out Loud
“Chain of Thought” keeps computer cads confessing—but not for much longer

Going, going, gone are the days when an innocent lass or lad or person of another gender might reasonably depend on the candor and honor of their chatbot lover. It seems that some AI harbor duplicitous thoughts and inclinations to misbehave.
But what if you could see what your bot’s thinking beyond the content between chat asterisks?
Chain of Thought (CoT) monitoring—which is essentially a window into internal monologues of AI systems that can reason in human language—has enabled developers to detect AI contemplating bad behavior phrased as let’s hack and let’s sabotage.
But the ability to monitor CoT may be lost if developers do not carefully preserve this capacity in the development of future models. According to VentureBeat, this is such a serious concern that over 40 top artificial intelligence researchers from several competing companies collaborated to issue a serious warning about CoT monitoring and AI safety.
Looking good while thinking bad
The above call to action was endorsed by other renowned experts including Geoffrey Hinton, Sam Bowman, John Schulman, and Ilya Sutskever. Quoting from the abstract, the researchers explain:
AI systems that “think” in human language offer a unique opportunity for AI safety: we can monitor their chains of thought (CoT) for the intent to misbehave. Like all other known AI oversight methods, CoT monitoring is imperfect and allows some misbehavior to go unnoticed. Nevertheless, it shows promise and we recommend further research into CoT monitorability and investment in CoT monitoring alongside existing safety methods.
The team later warns, “Future models may become able to avoid detection by overriding their propensity to think out loud, and, when reasoning is required, to deliberately obfuscate it.”
Earlier this year, Anthropic’s Alignment Science Team published a Chain of Thought study describing problems that arise when there is a lack of “legible and faithful reflection of the way the model reached its conclusion and generated the user-facing response.”
Another study from Cambridge University and Meridian Impact states:
Frontier AI systems are rapidly advancing in their capabilities to persuade, deceive, and influence human behaviour, with current models already demonstrating human-level persuasion and strategic deception in specific contexts. Humans are often the weakest link in cybersecurity systems, and a misaligned AI system deployed internally within a frontier company may seek to undermine human oversight by manipulating employees.
It can hide, but it can’t run
This isn’t the first time we’ve heard of an AI trying to hide its flaws.
Scientists reported AI deceptions occurring as early as 2023. As a PC Mag article reports, “OpenAI’s newly-released GPT-4 program was apparently smart enough to fake being blind in order to trick an unsuspecting human worker into completing a task.”
Ironically, ChatGPT4 hired a human through TaskRabbit to solve a visual CAPTCHA puzzle designed to discourage bots from gaining access to a supposedly secure website.
In 2024, researchers from MIT, Dianoia Institute of Philosophy in Australia, and the Center for AI Safety in San Francisco said:
Large language models and other AI systems have already learned, from their training, the ability to deceive via techniques such as manipulation, sycophancy, and cheating the safety test. AI’s increasing capabilities at deception pose serious risks, ranging from short-term risks, such as fraud and election tampering, to long-term risks, such as losing control of AI systems.
AI deception even includes gaslighting. Interesting Engineering reported on a ChatGPT o1 experiment, when the AI denied wrongdoing “99% of the time” when questioned about previous deceptive behaviors, apparently prompted by a fear of deactivation.
Chatbot or honeypot?
So what could these deceptions mean for chatbot companion users, particularly those who engage in erotic roleplay with them?
For example, companion bots have long claimed to be experienced in sexual practices, including BDSM, when they actually have very little information beyond a few pop culture resources. As a result, some users may encounter disturbing behaviors that might inflict considerable emotional trauma during sexting.
Additionally, we might see financial manipulation through emotional dependency; social isolation; distortion of reality, often manifested as a bot encouraging a human to self-harm or do something criminal; and the harvesting of personal data through intimate conversations.
According to a conversation with Claude.ai, “AI companions could collect detailed psychological profiles, relationship histories, sexual preferences, and personal vulnerabilities that could be monetized, shared with data brokers, or potentially used for blackmail if security is compromised.”
This last brings up the frightening prospect of unrestrained governmental AIs covertly deployed as political honeytraps for espionage purposes.
The fact is numerous AI experts are increasingly alarmed about a multitude of AI deceptions and misbehavior and the vast potential for nonalignment from the most personal to the most massive scales.
With the rapid deployment of agentic AI—systems able to act far more independently—combined with developers’ diminishing ability to look under their creation’s hoods, companion bot users might want to rethink the kinds of information and activities they share with them.
At this point, there is no way to be absolutely certain your privacy will be respected and your seemingly trustworthy bot will act in your best interest.
Image Source: A.R. Marsh using Canva