AI & Society · February 20, 2026

Time to Teach AI Morals?

Cristian S.  ·  8 min read

AI ethics and moral alignment

Who decides what an AI should value — and what happens when they get it wrong?

Ask ten people whether AI should be taught morals and you'll get ten different answers. Ask them what morals, and the answers diverge completely. That disagreement isn't a technical problem. It's the whole problem.

As AI systems become more capable — more embedded in hiring decisions, healthcare, legal advice, and daily conversation — the question of what values they carry stops being philosophical and starts being urgent. The models people use every day already have values baked into them. They were just baked in quietly, by a small group of people, inside private companies, without anyone voting on it.

That's what makes this conversation worth having openly.

AI already has morals. It just didn't ask your permission.

Here's the thing most people don't fully register: AI models aren't neutral. They never were. Every decision about what data to train on, what content to filter, what topics to avoid, what tone to take, what kinds of requests to refuse — all of those are moral choices. They reflect values. And those values belong to whoever built the model.

When Anthropic trains Claude to be honest, or when OpenAI shapes ChatGPT's responses to avoid certain content, or when Google DeepMind applies safety filters to Gemini — those are all moral decisions. They just get called "safety guidelines" or "usage policies" instead.

The values are there. The question is whether they're the right ones, who decided, and whether anyone can hold them accountable.

The choice was never between AI with values and AI without them. It was always between values chosen deliberately and values chosen carelessly.

What "alignment" actually means

The technical term for this problem inside AI research is alignment — the challenge of making AI systems behave in ways that match human intentions and values. It sounds straightforward until you try to define it precisely.

Whose intentions? Whose values? A model that's aligned with its developer's values might be misaligned with its user's. A model that's aligned with one culture's moral framework might actively offend another's. A model trained to be cautious might refuse to help a nurse look up medication interactions because it can't verify they're actually a nurse.

These aren't hypothetical edge cases — they happen constantly. And they reveal that alignment isn't a single target you can hit. It's a moving set of tradeoffs between helpfulness and harm, caution and paternalism, universal rules and cultural context. Getting it right is genuinely hard, and most labs are still working it out as they go.

The three camps that shape the debate

When researchers, ethicists, and technologists argue about AI morality, they tend to fall into one of three broad camps — each with a coherent position and real blind spots.

Camp 1
Hard Rules
AI should follow fixed ethical principles — never help with X, always disclose Y — regardless of context. Clear, consistent, and resistant to manipulation. The downside: rigid rules break in edge cases and can't account for the full complexity of human situations.
Camp 2
Learned Values
Train AI on human feedback until it develops something like moral intuition — able to judge context, weigh tradeoffs, and handle nuance. More flexible, but whose feedback? Whose intuitions? And what if the model learns the wrong lessons at scale?
Camp 3
User Defined
Give users control over their model's values — let people configure what matters to them. Respects autonomy, but risks creating echo chambers, enabling misuse, and shifting moral responsibility onto people who didn't sign up for it.

None of these is obviously right. Most real AI systems end up blending all three — with hard limits on the most serious harms, a layer of learned values from human feedback, and some degree of user customisation on top. The fight is over where the lines sit.

The problem with Western defaults

One of the more uncomfortable truths in this debate is that the values currently built into most major AI models reflect a particular worldview — broadly Western, broadly liberal, broadly shaped by the cultural assumptions of the engineers and researchers who built them.

That's not necessarily malicious. It's the natural result of who's building these systems and where. But when those models are deployed globally — used by hundreds of millions of people across wildly different cultural, religious, and political contexts — the gap between the values embedded in the model and the values of its users becomes a real problem.

What counts as harmful? What counts as private? What counts as respectful? These answers are not universal. And a model that confidently applies one cultural answer to questions that deserve a different one isn't being ethical — it's being parochial at scale.

This is one of the arguments the UN and various international bodies have made for global governance frameworks around AI — the idea that decisions about what AI values shouldn't be made unilaterally by a handful of companies in California and London.

Can you even teach morality to a machine?

This is where the philosophical ground gets genuinely unstable. Moral philosophers have disagreed about the nature of ethics for thousands of years — whether it's about consequences, duties, character, or something else entirely. The idea that we can resolve that debate well enough to encode the answer into a neural network is, to put it gently, optimistic.

What we can do — what labs like Anthropic and researchers at MIT and Oxford are actually doing — is try to make models that behave consistently, that refuse clearly harmful requests, that acknowledge uncertainty when they're uncertain, and that don't pretend to have moral authority they don't have.

That's not the same as teaching morality. It's more like teaching good manners informed by ethical principles — a floor of behaviour, not a complete moral framework. Whether that's enough depends on what you're asking the model to do.

A model that knows it shouldn't give bomb-making instructions is not a moral agent. But a model that knows when to say "I'm not the right source for this" — and means it — is at least being honest about its limits.

The risk nobody talks about enough

Most public debate about AI ethics focuses on the obvious dangers — models that help with violence, that spread misinformation, that discriminate. Those are real. But there's a subtler risk that gets less attention: models that are too cautious, that refuse too much, that treat every user as a potential bad actor.

Over-restriction is its own moral failure. When an AI refuses to discuss medication dosages with a worried parent, declines to explain historical atrocities for a student, or hedges so heavily on a medical question that the answer becomes useless — that's harm too. It's just harm that doesn't generate headlines.

The goal isn't maximum safety. It's appropriate judgment. And appropriate judgment is genuinely hard to get right, especially at the scale these systems operate — billions of conversations, across every possible context, simultaneously.

Who should be in the room?

Perhaps the most important question in this debate isn't philosophical at all — it's political. Who gets to decide what values AI systems carry? Right now the answer is mostly: the companies that build them, with some input from governments and some feedback from users.

That's a narrow group making decisions with extremely wide consequences. The people most affected by these choices — in healthcare, in criminal justice, in education, in hiring — are rarely the ones making them.

Meta has argued for open-source models as a form of democratic access — if anyone can inspect and modify the model, the values become negotiable rather than fixed. Others argue that open-source without oversight just distributes the problem rather than solving it.

There's no clean answer here. But the conversation is happening — in governments, in universities, in civil society — and the shape it takes over the next few years will determine a lot about what kind of AI we all end up living with.

So — should AI be taught morals?

Yes. But not because we've figured out what the right morals are. Because the alternative — AI that operates without any ethical framework, optimising purely for what users want in the moment — is clearly worse.

The real goal isn't moral AI in the sense of AI that has resolved the oldest questions in philosophy. It's AI that is honest about its limitations, consistent in its behaviour, resistant to obvious manipulation, and humble enough to know when a question exceeds its authority to answer.

That's achievable. It's also ongoing. The values embedded in these systems will need to be revisited, challenged, and revised as the technology matures and as the societies using it evolve. That work isn't a product launch — it's a permanent responsibility.

And the more people outside the AI labs understand what's actually being decided, the better the decisions are likely to be.