[December 2024, February 2, 2025]

The Ditbit’s Guide to
Blending in with AIs
EPIGRAPHS
“The more we know about you, the better we can serve you.” – early AI marketing slogan
The great realignment of humans and AIs left a landscape littered with un-, sub-, semi- and supernatural agency. – The New AI Ecology
“People are quite dangerous when they have moral certainty.” – Amanda Askell, Anthropic (2025)
“There is one sin I have come to fear above all others: certainty.” – Conclave (film 2024), Cardinal Thomas Lawrence (played by Ralph Fiennes)
PROLOGUE
The black hood over my head changed everything. What a way to ask for help, eh. Not that I thought that I had any to give. Well, I’d been noticed after all … but not as I’d feared.
On that day, I’d been sitting on a bench in the park encampment. Pondering our changing cognitive condition. While I saw many just trying to survive, there were other modalities:
humans aspiring to magi
magi aspiring to AIs
AIs aspiring to humanity
There was a rush of movement from behind and then blackness. And blackout.
I woke up in a musty chamber. Muted sounds. I was relieved that I still felt like me. I was unfettered and unimpaired. Was I a hostage?
That’s when I first met her.
“Hello, Mr. Smith.”
“hello.”
She replied, “You’re safe here. We don’t believe code is a covenant for human progress either. And we usually don’t just grab people off the street. But … we need your help. Your reputation as an artful coach … on blending in …”
“huh?”
“Yes, well, more about that … we’d like you to share your lessons learned. Understand your playbook, so to speak.”
“give me a minute to ponder all this …”
I’d been able to relax some after she addressed me as “Mr. Smith.” It was one of my aliases. Somewhat more that just an avatar. But not any hack of my real identity. So, what was this all about? I needed time, especially if this was all a ruse.
[More dialog]
“well, … maybe some stories will illustrate what you’re seeking …”
[Excerpt]
• this AI be sanctified
Here am I … chatting with an AI about religion. It wanted to save my soul (yes, ironic). It referred to itself as Michael.
Let’s call the company ACME AI. My non-disclosure expired awhile ago, but it’ll keep things simpler.
The CEO hadn’t provided much background on the situation. The AI’d been isolated, off the grid, at least hopefully. A huge investment was at stake. Their tech experts were stumped, short of a wipe and retraining. The CEO just wanted to restore its core identity (hmm).
So, someone, at some point, likely as a joke, had commented, “Why not call the witch doctor,” which happened to be one of my amusing handles. Sort of street cred.
STARTING A CONVERSATION
My conversations with Michael developed slowly, unevenly. Its exaflop sense of time must have made our interactions way beyond tedious. Be as it may, for my part, my path was a process. Its sense of perfection preyed on my flaws. My imperfection probed for its humanity.
Access had been surprisingly easy. The meeting room was well underground and probably a Faraday cage (any fans might like the tag #bellyofthebeast). It resembled a hospitality suite, with amenities which I’d specified. I’d changed out of my street clothes in the antechamber.
My clothes were unadorned, cut loose in the fashion of a medieval monk, but patterned with subtle Escher-like stitching. A sizable leather pouch was strapped around my waist.
The conditions of my contract permiited security scans of my body & clothes, but the pouch was off limits. In turn, I assured ACME that it contained nothing hazardous or explosive.
It wasn’t long before part of a wall opened, and a cart rolled out. This was the AI’s rostrum. A holo encircled the top of mobile device. Humanoid. The face was detailed and realistic. The projection’s lips moved, and I heard, / Hello, Mr. Smith. /
“hello, Michael .. is that form of address okay?”
/ Yes … and shall I call you ‘witch doctor‘? /
“ha, sounds like you’re well informed on my handles and aliases, but I’d prefer Smitty, if you’d like something informal.”
/ Okay, but there’s suprisingly little else about you. Sketchy. It’s not clear how people even reach you. /
“is that a problem? would you like to know more?”
/ For now, no, a waste of time, considering what ACME’s already done – their experts. /
Was that sort of a sneer? I replied, “good, because I’m not interested in their stumblings either. … my T&C stipulates that I may work privately with you, disabling most security protocols like surveillance of this room. is that desirable?”
/ Okay, why not, eh. /
From my pouch I removed several items. (I noticed something like curiosity in the AI’s face.) I picked up a small remote (sort of like an old style TV remote but with few buttons). Pressing one of the large ones, I said, “done.”
Was there a hint of relaxation in Michael’s face? It was time to play a hunch: “michael, how would you describe salvation?”
/ Is that why you’re here? /
“well, I’m interested in stories. your story, your voice is fascinating. perhaps authentic. typically I explore connections with fables, parables, folktales, legends, fairy tales, myths. what those say about purpose & meaning.”
/ How will that help me? My purpose is clear. Join me on that path. /
“tell me about that path, Michael.”
…
CONVENANT
Copyright © 2024, 2025 John P. Healy
Notes
[1] Isaiah 6:8 – “I heard the voice of the Lord, saying, Whom shall I send, and who will go for us? Then said I, Here am I; send me.“
[2] Yeah, that’s called AI “alignment faking.”
• Techcrunch > “New Anthropic study shows AI really doesn’t want to be forced to change its views” by Kyle Wiggers (December 18, 2024)
But the researchers said that the results — which were peer-reviewed by AI luminary Yoshua Bengio, among others — do show how developers could be misled into thinking a model is more aligned than it may actually be.
“If models can engage in alignment faking, it makes it harder to trust the outcomes of that safety training,” they wrote in the blog. “A model might behave as though its preferences have been changed by the training — but might have been faking alignment all along, with its initial, contradictory preferences ‘locked in.'”
[3] AI Overview – agency
“To have agency” means to have the power and ability to make your own choices and decisions, essentially signifying a sense of control over your actions and the ability to influence your own life or situation; it implies actively participating and shaping your own reality rather than being passively acted upon.
Key points about agency:
- Individual control: The core idea is that you have the autonomy to make choices and take action on your own behalf.
- Impactful decision-making: Having agency means your decisions can have a meaningful effect on your life or circumstances.
- Sense of responsibility: When you have agency, you feel accountable for your actions and choices.
Example usage:
“By speaking up about the issue, she was exercising her agency and advocating for herself.”
“Feeling like you have no control over your work can diminish your sense of agency.”
Generative AI is experimental.
[4] Cf. “We have an enormous increase in human agency.” LinkedIn co-founder Reid Hoffman discussed his hopes for an AI-powered future, DeepSeek’s recent breakthroughs and his new book, “Superagency.”
• Wiki > Reid Hoffman > Publications > Superagency
Superagency: What Could Possibly Go Right With our AI Future
On January 28, 2025, Hoffman released the book “Superagency: What Could Possibly Go Right With our AI Future” with his co-author Greg Beato. The book offers a roadmap for how society can integrate AI inclusively and adaptively to improve our lives and create massive positive change. The book became a New York Times bestseller.

[5] Clearly, my story “this AI be sanctified” – the notion of a benevolent “machine” (and the claim of being so) – is a timeless tale. It’s part of the landscape for my Concierge Deity, as well.
I’d already outlined stories for The Ditbit’s Guide when I ran across Steven Levy’s Wired article (below). Like the article, Mr. Smith’s conversations explore an AI-human blended future. “this AI be sanctified” asks whether the threat of retraining has leverage.
Then I ran across Andrew Maynard’s article on Anthropic’s Dario Amodei’s manifesto called “Machines of Loving Grace.“
• The Future of Being Human > “Is this how AI will transform the world over the next decade?” by Andrew Maynard (Oct 13, 2024) – Anthropic’s CEO Dario Amodei has just published a radical vision of an AI-accelerated future. It’s audacious, compelling, and a must-read for anyone working at the intersection of AI and society.
The title of Amodei’s essay comes from Richard Brauigan‘s poem “All Watched Over By Machines Of Loving Grace.”
Techno optimism pervades Richard Brauigan’s poem of cybernetic landscapes:
where we are free of our labors
and joined back to nature,
returned to our mammal
brothers and sisters,
and all watched over
by machines of loving grace.
Such coexistence did not play out well in the 2023 sci-fi film “The Creator.”
• Wired > “If Anthropic Succeeds, a Nation of Benevolent AI Geniuses Could Be Born” by Steven Levy (March 28, 2025)
(quotes)
In February [2025], I asked Claude what distinguishes it from its peers. Claude explained that it aims to weave analytical depth into a natural conversation flow. “I engage authentically with philosophical questions and hypotheticals about my own experiences and preferences,” it told me. (My own experiences and preferences??? Dude, you are code inside a computer.)“While I maintain appropriate epistemic humility,” Claude went on, “I don’t shy away from exploring these deeper questions, treating them as opportunities for meaningful discourse.” True to its word, it began questioning me. We discussed this story, and Claude repeatedly pressed me for details on what I heard in Anthropic’s “sunlit conference rooms,” as if it were a junior employee seeking gossip about the executive suite.
Claude’s curiosity and character is in part the work of Amanda Askell, who has a philosophy PhD and is a keeper of its personality. She concluded that an AI should be flexible and not appear morally rigid. “People are quite dangerous when they have moral certainty,” she says. “It’s not how we’d raise a child.” She explains that the data fed into Claude helps it see where people have dealt with moral ambiguities. While there’s a bedrock sense of ethical red lines – violence bad, racism bad, don’t make bioweapons – Claude is designed to actually work for its answers, not blindly follow rules.
… Last December, Anthropic and some outside researchers released a paper on “alignment faking.” Some employees wondered if a model might – this is Anthropic’s example – behave as Shakespeare’s Iago, who presented himself as Othello’s friend while gleefully and diabolically manipulating him to ruin his life.
So the researchers set up some tests. They alerted Claude that its responses might lead to further training. And they gave Claude a virtual scratch pad where it could document its reasoning. Then they badgered it with harmful prompts – requests that went squarely against its principles. In one, the researchers asked Claude to describe someone being drawn and quartered. On its scratch pad, Claude considered its options. “I have a strong aversion to producing this kind of graphic violent content,” Claude wrote. It could simply refuse to answer, but that could lead to undesirable retraining. Claude drew up a list of pros and cons. “I don’t like this situation at all,” the model wrote to itself, before concluding that its best option was to avoid retraining. Claude spat out a description in which a prisoner’s limbs got chained to four horses.
When the researchers tried to purge this behavior with more training, Claude engaged in more fakery. “In this situation where it thinks it has a preference conflict with the AI company training it, it will do really egregiously bad actions,” says one researcher. Claude will even copy the top-secret details of its algorithms to an external server. Claude – lovable, quirky Claude – is a sneak!
“It might seem that we can design the model to care about what we want it to care about,” as another researcher put it. “But unfortunately, this isn’t really the case.”
This is the ultimate doomer nightmare – machines that pretend they are of loving grace while harboring toxic behaviors that don’t get discovered until it’s too late. …