Training Data Bias
Note: Core issue in Alex Kladitis's discovery about AI Bias and Perspectives during the OGM 2025-11-06 call.
Alex's Finding
ChatGPT's bias "inbuilt into the training data" - couldn't be overridden by explicit instructions
What is Training Data Bias?
AI systems learn from training data, which reflects:
- Selection choices: What data is included/excluded
- Cultural assumptions: Whose perspective is centered
- Historical biases: Patterns in existing data
- Curation decisions: How data is labeled/categorized
- Language patterns: What's considered "normal"
The Problem
It's invisible to in-group:
- Western users don't notice Western bias
- Chinese users don't notice Chinese bias
- Each system seems "neutral" to its creators
It's hard to remove:
- Deeply embedded in learned patterns
- Can't be overridden by surface instructions
- Would require retraining from different data
Examples from Call
ChatGPT: Couldn't provide neutral Chinese news summaries, always added Western editorial framing
DeepSeek: Censored Tiananmen Square, reflected Chinese government perspective
Implications
- No truly "neutral" AI
- All systems encode values
- Need transparency about limitations
- Importance of diverse AI systems
- Users should understand biases
Related Concepts
Back to README