ProgressiveAIDilemma

ThoughtStorms Wiki

Let us invent then a new breed of AI systems that mix an awareness of the past with values that represent the future that we aspire to.
Our focus should be on figuring on how to build AI that can represent and reason about values, rather than simply perpetuating past data.

To which I responded :

This is all very well in principle.
I'm not convinced it's actually POSSIBLE in practice.
This is the dilemma of all political progressives, when faced by conservatism, in a nutshell :
The existing models are full of biases and flaws, but also subtle truths about the real world.
Replace it with an idealised data-set of how you dream the world could be, and you throw out all that knowledge.
As Joel Spolsky once said about legacy source-code.
All that cruft is "bug fixes" for rare edge cases.
That's not, of course, to try to imply that racist language is a bug fix for anything.
But if you want to sanitise your training data based on broad-brushed principles, you may well gloss over facts about the real world.
Falsehoods Programmers Believe About Names reminds us "I can safely assume that this dictionary of bad words contains no people’s names in it"
There are still dumb computers today blocking people whose names contain the substring "cunt" or "dick".
Or "anti-porn" filters removing breast cancer information.
It's not that people who want to produce sanitised training data are unaware of this or would be stupid enough to try to apply such dumb filters.
It's just that it's likely that generating training data that is simultaneously as rich as the real world's AND artificially constructed, is likely to be prohibitively expensive.
As in "not done".
You're effectively trying to simulate the real world, except better.