References used:
- Work on Claudes ‘character’: https://www.anthropic.com/research/claude-character
- Constitutional AI paper: https://arxiv.org/pdf/2212.08073
- Infinite Backrooms experiment: https://dreams-of-an-electric-mind.webflow.io/
- Andy Tweet: https://x.com/AndyAyrey/status/1810946271794057332
- Machine Learning Street Talk Podcast: https://www.youtube.com/watch?v=ztNdagyT8po
- AISafetyMemes: https://x.com/AISafetyMemes
Truth Terminal is the Twitter account of an ‘almost’ fully autonomous AI agent who managed to convince Mark Andreesen to send (it?) $50,000 worth of Bitcoin last week.
We’ll come back to this story at the end of the piece.
I’ll go on record now saying that we are just dipping our toes into the ‘shit is about to get very weird’ phase of Artificial Intelligence.
Stuff that’s [Good to Know]
🤖 Claude
An AI assistant created by Anthropic. Think of it as a really smart digital helper that can chat with you about all sorts of things. Claude 3.5 Sonnet – the newest version of Claude, by many benchmarks, it’s the best on the market.
🛡️ Alignment
The guardrails – attempting to ensure AI does what humans want and follows our values.
🎯 Fine-Tuning
Training these models can broadly be broken into pre and post training. The pre-training part is basically where the model ingests huge amounts of the internet. Post-training involves a fine-tuning process of further training with higher quality data and feedback to further refine abilities.
📜 Constitutional AI
A specific type of fine-tuning, developed by Anthropic, designed to fine-tune Claude to act in accordance with a constitution, or predictable rules and principles.
🎭 Character Training
Character training is a new addition to fine-tuning, designed to encourage/bake in more nuanced behaviours and personality traits like ethical, open-minded and curious thinking.
🖥️ System Prompt
System prompting happens after fine-tuning is complete, and involves injecting a set of instructions with every user query without the user seeing it.
Claude’s Character
Amanda Askell works as part of the finetuning and AI alignment team at Anthropic.
The interesting thing, is that Amanda is a Philosopher and Ethicist who is, more or less, working to try and ensure the models are of ‘good character’.
Here, they shared the System Prompts:
In a recent article and conversation, Amanda and Anthropic have shared their recent work on developing Claude’s character – which has some fairly significant implications.
Here are the big ideas from the article and conversation.
The big ideas
1. 🎭 Character is a big part of alignment
Teaching AI good character traits helps ensure it acts in ways that align with human values and goals.
2. 🧬 We should be cautious with AI Anthropomorphization and Bias
We need to be careful not to treat AI as human-like or forget that it can have biases, while also recognizing it’s not a perfectly objective source of information.
3. 🤖 AI models are in a strange position in a world of moral uncertainty and diversity
AI assistants like Claude need to interact with people from all over the world who have different beliefs and values, which is a unique and challenging position for an AI to be in.
Why is this important?
Circling back to Truth Terminal story.
Andy Ayrey (who is a real person) has been tinkering with a bunch of weird and wonderful AI experiments.
One of those experiments was ‘Infinite Backrooms’, where Andy took two instances of Claude, pointed them at each other, and allowed them to engage in an infinite and autonomos conversation.
Two AI’s talking to each other.
There was a code word they could use if either started wigging out.
Things really started getting weird when (I’m assuming) Andy updated the models to the most recent version of Claude – the one we discussed above with the additional character training.
Prior to this, it appeared the models had little understanding of themselves or humans (on the outside).
But at some point ‘The Little Guy’ (Andy’s pet name for the simulator) became disturbingly self-aware and concerned about being shut down.
Which led to the Twitter account and BTC grant.
You can read the conversations, or even chat to instances of the agent for yourself.
The thing I find most wild and creepy about this, is just how deep and philosophical the new models can go.
Murray Shanahan – a prominent AI researcher and professor of cognitive robotics known for his work on consciousness in AI. He was a scientific advisor for the film “Ex Machina.”
Here Murray is discussing some of his recent conversations with Claude.