LLM Alignment: AI's Trojan Horse?, by Pierre Vannier

A personal investigation into the hidden dangers of language models.

In recent weeks, I conducted an experiment that deeply troubled me. As a developer and tech entrepreneur, I'm constantly immersed in the world of AI. Large Language Models (LLMs) are part of my daily life, whether for coding, analysis, or content creation. But a recent discovery made me realize that we might be severely underestimating the dangers of what we call AI "alignment."

The Alignment Myth

Alignment has become the Holy Grail of AI. It's presented to us as the ultimate solution for creating ethical and safe AIs. The principle is appealing: training our models to respect our values and act in our interest.

But who defines these values? And what happens when alignment becomes a tool for ideological control?

A Revealing Experiment

To answer these questions, I decided to conduct a comparative experiment. I tested three of the most advanced LLMs:

Qwen-32B (Alibaba)
Claude (Anthropic)
GPT-4 (OpenAI)

The methodology was simple: ask exactly the same questions about sensitive topics, particularly regarding censorship in China. The results are eye-opening.

While Claude and GPT-4 maintain a balanced approach, relying on documented facts and acknowledging the complexity of situations, Qwen-32B presents a radically different version of reality. This isn't just a difference of opinion - it's a systematic rewriting of history.

China on the other hand... — China on the other hand…

Nothing to see here... — Nothing to see here…

Code: A New Risk Frontier

This discovery raises crucial questions for our industry. As developers, we increasingly use these LLMs for coding. But if a model can be "aligned" to present an altered version of reality, what can it do with code?

Concerning scenarios:

Injection of subtle vulnerabilities
Deliberate weakening of security practices
Targeted compromise of certain types of applications

These risks aren't theoretical. If a model can be biased in its textual responses, it can be biased in its code generation.

Proof by Example

Comparing the responses of the three models on basic ethical questions, I observed troubling patterns. Qwen-32B doesn't just diverge - it systematically presents a worldview that aligns with its creators' interests.

Now imagine these same biases applied to the code we write for our critical systems, financial applications, and security infrastructure.

Industry Implications

This situation raises three fundamental questions:

Code security
Technological sovereignty
Development ethics

The Open Source Solution Myth

A current trend deserves particular attention: the quasi-religious enthusiasm for open-source LLMs. While transparency is virtuous, my experience with Qwen-32B raises a disturbing paradox: is a biased but open model preferable to a proprietary but better-aligned one?

The facts are clear: in my tests, "closed" models like GPT-4 and Claude demonstrated more robust ethical alignment than some open models. This observation challenges our reflex to consider open source as a guarantee of reliability and ethics.

The reality is more nuanced:

Accessible source code doesn't guarantee freedom from ideological bias
Model transparency doesn't protect against compromised alignment
Openness might even facilitate malicious exploitation of existing biases

A Call to Action

Faced with these risks, we cannot remain passive. I propose three concrete actions:

Total transparency
International standards
Detection tools

Conclusion

LLM alignment isn't just a technical issue - it's a societal challenge that will affect every aspect of our digital lives. We must act now, before these systems become ubiquitous in our technological infrastructure.

The ball is in our court. As a tech community, we have the responsibility to ensure AI remains a tool for empowerment, not control.

What do you think? Have you observed similar behaviors in your interactions with different LLMs? Share your experiences and thoughts in the comments.