ThoughtStorms Wiki

The more we trust ArtificialIntelligence the more we're going to face issues when it fails.

Transcluded from BadBots

Context : AIProblems, AgentsBotsEtc, AIBaiting

This is nasty. It's easy to turn an AI trained to make medicines, to ReverseThePolarity and design biological weapons :

Quora Answer : Is there anything about artificial intelligence that most of us don't know about, but that we better know?

Dec 8

I'm not sure whether most people know it or not, but something we really, really need to remember, especially as we're getting more and more impressive and powerful AIs - like the apparently magical GPT-3 - is that machine learning is only as good as the data sets it has been trained on.

If something is missing from, or under-represented in, the data, then the AI will be biased.

We humans are susceptible to this too, of course, but AI is going to amplify the amount of "reasoning" and "inference" that goes on in society, because it will be automated.

In a very short time, if not already, AIs are going to be judging you constantly. Your credit rating, your suitability for a job, your honesty or propensity to commit a crime, your potential suitability as a date for someone etc. etc. etc.

And all of those decisions will be based on the data that someone managed to collect and train the AI with.

There is a horrific potential for prejudices that we already know are wrong and problematic when held by humans, to be locked-in even more firmly to the major data-sets and the major AIs that Google and Facebook and Microsoft etc. are building and will soon be putting to run the world (because a great many users of cloud-based AIs etc. will be starting with these datasets)

Let's just emphasize that last point a bit more.

The remarkable (at least to me) thing about GPT-3 (which is producing incredibly plausible dialogues about specific topics) is that GPT-3 is trained on generic texts found on the internet, and then people give it a very, very small bit of extra information about a particular domain, and it seems to be able to discuss that domain too.

That is really impressive. And of course a humanlike capability. We humans have a lot of general background knowledge and find that we can then pick up some more specialized knowledge from being explicitly taught some good examples of it.

Things like this are becoming available as a cloud / commodity service. We can "hire" GPT-3, pretrained on its huge general database of writings. Teach it something about our particular interests with a few hundred more examples, and set it to work.

But what we're going to see in the next few years is hundreds or thousands of new startups and applications which all do the same thing : start with an AI trained with a common generic dataset. And then customize it with a smattering or veneer of extra training for the specific application.

But what if those initial huge data-sets are "wrong" or have biases built into them?

All the AIs from all the new companies and services will be based on them. And will have those biases baked in.

The new startups running the new services won't have the resources to rebuild their training sets from scratch. They'll use the Google (or OpenAI or whoever) off-the-shelf pre-trained AI. And then customize it.

But those customized AIs will all have the same biases from that underlying data-set that Google managed to collect in the late 2010s. Prejudices that were picked up from internet users from our era.

It's going to take years, if not generations, for prejudices that we are picking up from our current online (and offline) behaviours, and baking into our AIs today, to finally wash out of the systems we will be using heavily in the next decades.

And who knows how many prejudicial judgements will be made in the meantime? How many suitable candidates will fail to get jobs, how many innocent people will be profiled by law-enforcement AIs as plausibly criminal? How many fatal accidents might happen because the car control system was based on a 2010s training-set from before Fribjits became so popular on our roads?

AIs will appear to be magical and wise. But we must remember they are only as wise as the training data made them. And the more magical and wise they seem to be, the more important it is for us to remember that, to ask about the provenance of their training data, and to take care to evaluate and compensate for any biases that came with that.