EdZitronOnAI

ThoughtStorms Wiki

Context: MyFearsAboutAI

You seem to like Ed Zitron so what do you think of https://www.wheresyoured.at/godot-isnt-making-it/?

Yeah. I am definitely becoming a fan of EdZitron and his righteously angry polemics. That doesn't mean I buy everything.

The essay you linked makes a lot of points. Let's see if I can do it justice.

To start let's say that these critiques either "ring true" or don't. And when I say something "rings true" I mean, it seems to accord with both my personal experience and my broader theoretical model of the world.

Something like Never Forgive Them "rings true" because what he describes, seems to me exactly what is happening with the tech. industry. Both in my personal experience - the general Enshittification of the web-platforms I use - and in what I believe about the incentives in the industry, how TheAlgorithm optimises for engagement, and how other parts of the industry work. When it comes to his critique of AI, much of it doesn't "ring true" in the same way. Firstly because while the internet is now full of people insisting that AI doesn't do anything useful for them, I find it tremendously useful for me, personally. I get value from ChatGPT every time I interact with it. It's helping me advance dozens of the UnfinishedProjects I'm working on.

Secondly, many of the critiques don't ring true to my broader model of the nature of both AI and the particular connectionist approaches to it that underlie most of the new excitement.

When it comes to the Zitran essay we're talking about here, there's a mix of true assertions that I don't think matter the way he seems to think they do. And things I think he's wrong about.

Let's quickly dismiss the trivial ones before tackling the more interesting ones. Is there a huge over-hyped bubble, leading to massive over-investment and then a probable crash?

Yes. Of course.

Does this matter?

Not really. That's the way many technologies get rolled out. Remember the internet bubble of the late 2000s? The bubble / crash / HypeCycle is a property of markets under capitalism. Not in any way special to AI. As I said earlier, many of the problems of AI are problems of capitalism. Considering them as special problems of AI, at this point, contributes almost nothing to understanding or fixing them.

That's about half the essay dealt with.

The more substantial critique is that the transformer model is a "dead end". That the notion of "improvement through scaling" is bunk. And that this whole technological direction will never get us anything of value. I think these are all just wrong.

There's a subtler twist on "the transformer is a dead end" idea which is that "we've run out of training data". And that, may well be true. But again, doesn't imply what Zitron expects it to imply.

The main issue here is that you just have to step back and get some longer term perspective. Like I say, I've been either in or adjacent to the field of connectionism for 30 years, so that's the perspective I start from.

The idea that "scaling is all you need" has been rather triumphally hyped these days, and sometimes rubs people up the wrong way. But you have to consider what the opposite position would be.

That scaling doesn't help at all?

When I started playing with NeuralNetworks we were doing examples demonstrating that a multi-layer perceptron could just about learn the XOR function. My first neural network at work had about 300 nodes and 60 training examples. And could do a very simple classification task better than chance. Something has to explain why the neural networks in 2024 are spectacularly better than the ones we had then. And now I am picking up my copy of Daniel Levine's Introduction to Neural and Cognitive Modelling - mine's the first edition published in 1991 - and looking through it. There are definitely new ideas in connectionism since then. Transformers are the new hotness. Attention is all we need etc. But really, lateral inhibitory connections, short term memory, recurrence (ie. RNNs) etc. were all known techniques at that time. I don't pretend to fully understand the details of all the latest models. I didn't really fully understand the NNs back in 1991 either. But I do know that the basic building blocks were all there. The stuff which is so spectacular today is NOT the result of some revolutionary sui generis concepts that we didn't have back then. No, it's some clever ideas in architectural organization of elements that were already known in 1991. Plus a fuck-tonne of extra compute and data.

This is so fundamentally obvious to anyone who knows anything about this stuff, that the people who are dismissively sneering at scaling and data sound as ignorant as flat-earthers arguing that kangaroos not falling off Australia is evidence against the globular earth theory. Of course more nodes and more connections, and more training data increase the capability of a connectionist systems. Of course larger networks ("more parameters") and data requires more compute power. I can't believe Zitran (or, say, TimnitGebru who I've also heard making this assertion, and ought to know better) really believes the opposite.

And of course things are more subtle and complicated than that. And "more compute plus more data equals better AI" is a bit of a simplistic heuristic. But as simplistic heuristics go it's a hell of a lot better than the two rivals, which are that more compute and data makes no difference or that they make AI less capable.

So what is "the truth"?

The truth is that throughout AI history we see waves of improvement. New data and more powerful computers come along. Connectionist systems show an apparent step-change in power. Then we hit limitations of what this architecture and this training data can achieve. We start to supplement raw connectionism with some extra clever constraints and rules written by humans. Some people declare that the rules-based AI actually out-performs the machine-learning AI. Then the rules supplemented AI also hits a limit. Until more data and more compute comes along and we see another "miraculous" improvement in capability.

From 2022 I have been arguing that the way this transformer based wave of AI is going to play out is that people were going to be wrapping the raw power of GPT and Claude etc. in more rules and constraints to make the language models even more useful. And that is exactly what we are seeing. All the things that are appearing. The "agent" frameworks, LangChain, RAG etc. etc. are ways of wrapping raw connectionist power of transformers in more rule-based constraints (in the form of various pieces of code) so that they can do more.

The core truth in Zitron's critique is that we might have currently exhausted the current generation of data (largely text and images harvested from the internet). There isn't going to be another huge cache of this to be exploited any time soon. Certainly not one which is orders of magnitude larger.

However, what really follows from that?

Let's agree that we really have now used up the huge stock of text data that we built up on teh interwebs in the last 20 years.

Let's also agree that training AI on its own SyntheticData production might well have diminishing returns. There is a real danger on it collapsing and converging on either meaningless nonsense. Or falling into a static local minimum from which it just creates more "of the same".

Nevertheless, the usefulness of this current generation of language models is only just getting started. As we make agents that directly plug in to lots of existing IT infrastructure. There is a lot of valuable work that the current AI can start doing, just by being able to access the right tools and resources. The "reasoning capacity" of the current AI, is demonstrably "good enough" that it can drive a lot more of our IT infrastructure (ie. our world) than it is currently doing.

Now my number one preferred way of using AI, is to use it to write code. And I think many of the biggest wins will come from people using AI to write new tools. Or to strip down some of the larger language models into smaller, simpler "experts" that can only do a couple of things well.

There is a lot progress to be made from mining the current state of language models. Rather than just trying to find the next couple of orders of magnitude more data and compute. And this is where the companies and resources built up during the current bubble, will eventually go once it pops. We are far away from hitting the end of this round of improvements. The next thing to consider is that, there actually is a huge new store of data available to train the next generations of AI. That is video and sound. We start from the existing collections like YouTube etc. And then put cameras and microphones into the world to pick up the real time visual and sound symptoms of the human world. This is a way larger dataset than what we explicitly write as text. Or capture as video for social media.

Now, I don't particularly like this. I'm not stupid enough to put a Siri or Alexa in my home. And I hope you, dear reader, aren't either. It's extremely disturbing. Nevertheless, if we really try to take seriously the argument Zitran implies - that without new data to train better models from, there'll never be good enough genAI - we'll be forced to conclude that this creates a huge incentive to move on to cameras and microphones. And short of a political revolution to stop it, that's where we're going.

No Backlinks