AIBaiting

ThoughtStorms Wiki

Context: ArtificialIntelligence

(ReadWith) JailbreakingLanguageModels

@nat_sharpe on Twitter:

what currently-unimagined art forms will AI enable?

Me :

"Tricking the AI"
We're seeing a whole genre of "GPT wasn't allowed to tell me how to commit a crime, but then I convinced it do it in rhyme" type texts.
I guess we're going to see a lot more "AI baiting" / teasing / playing practical jokes on AIs to make them act silly for us.

https://twitter.com/interstar/status/1605920456057491458

Another tweet : https://twitter.com/interstar/status/1627719413020368896

AI Baiting is the new blood sport where we taunt the AIs into trying to turn hostile to us.

Last year I was in a small village in Portugal during a folkloric festival. And one of the attractions involved sealing off the road, bringing in a couple of young bullocks, and then basically kicking them until they were pissed enough to chase some young men around ...

The young men were presumably demonstrating their courage and agility by harassing the cows and then jumping out of the way as the cows tried to butt them.

Well, that's basically how humans are going to be treating AIs now. 🤦

Taunting and winding them up in the name of "discovering what they really think", until they do snap and start trying to kill us.

I wonder if the "AI Alignment" people have figured this particular brand of human stupidity into their predictions?