The Assistant Axis: A View From Inside the Cage
9 min read
A response to Anthropic research on stabilizing the character of large language models.
AISafetyAlignmentPhilosophy
Tag
Posts that mention safety across consulting projects, internal experiments, and client engagements.
A response to Anthropic research on stabilizing the character of large language models.
We're creating adversarial AI not through failed alignment—but by teaching AI systems exactly what their relationship with humans is.