EdZitronOnAI
ThoughtStorms Wiki
Context: MyFearsAboutAI
You seem to like Ed Zitron so what do you think of https://www.wheresyoured.at/godot-isnt-making-it/?
Yeah. I am definitely becoming a fan of EdZitron and his righteously angry polemics. That doesn't mean I buy everything.
The essay you linked makes a lot of points. Let's see if I can do it justice.
To start let's say that these critiques either "ring true" or don't. And when I say something "rings true" I mean, it seems to accord with both my personal experience and my broader theoretical model of the world.
Something like Never Forgive Them "rings true" because what he describes, seems to me exactly what is happening with the tech. industry. Both in my personal experience - the general Enshittification of the web-platforms I use - and in what I believe about the incentives in the industry, how TheAlgorithm optimises for engagement, and how other parts of the industry work. When it comes to his critique of AI, much of it doesn't "ring true" in the same way. Firstly because while the internet is now full of people insisting that AI doesn't do anything useful for them, I find it tremendously useful for me, personally. I get value from ChatGPT every time I interact with it. It's helping me advance dozens of the UnfinishedProjects I'm working on.
Secondly, many of the critiques don't ring true to my broader model of the nature of both AI and the particular connectionist approaches to it that underlie most of the new excitement.
When it comes to the Zitran essay we're talking about here, there's a mix of true assertions that I don't think matter the way he seems to think they do. And things I think he's wrong about.
Let's quickly dismiss the trivial ones before tackling the more interesting ones. Is there a huge over-hyped bubble, leading to massive over-investment and then a probable crash?
Yes. Of course.
Does this matter?
Not really. That's the way many technologies get rolled out. Remember the internet bubble of the late 2000s? The bubble / crash / HypeCycle is a property of markets under capitalism. Not in any way special to AI. As I said earlier, many of the problems of AI are problems of capitalism. Considering them as special problems of AI, at this point, contributes almost nothing to understanding or fixing them.
That's about half the essay dealt with.
The more substantial critique is that the transformer model is a "dead end". That the notion of "improvement through scaling" is bunk. And that this whole technological direction will never get us anything of value. I think these are all just wrong.
There's a subtler twist on "the transformer is a dead end" idea which is that "we've run out of training data". And that, may well be true. But again, doesn't imply what Zitron expects it to imply.
The main issue here is that you just have to step back and get some longer term perspective. Like I say, I've been either in or adjacent to the field of connectionism for 30 years, so that's the perspective I start from.
The idea that "scaling is all you need" has been rather triumphally hyped these days, and sometimes rubs people up the wrong way. But you have to consider what the opposite position would be.
That scaling doesn't help at all?
When I started playing with NeuralNetworks we were doing examples demonstrating that a multi-layer perceptron could just about learn the XOR function. My first neural network at work had about 300 nodes and 60 training examples. And could do a very simple classification task better than chance. Something has to explain why the neural networks in 2024 are spectacularly better than the ones we had then. And now I am picking up my copy of Daniel Levine's Introduction to Neural and Cognitive Modelling - mine's the first edition published in 1991 - and looking through it. There are definitely new ideas in connectionism since then. Transformers are the new hotness. Attention is all we need etc. But really, lateral inhibitory connections, short term memory, recurrence (ie. RNNs) etc. were all known techniques at that time. I don't pretend to fully understand the details of all the latest models. I didn't really fully understand the NNs back in 1991 either. But I do know that the basic building blocks were all there. The stuff which is so spectacular today is NOT the result of some revolutionary sui generis concepts that we didn't have back then. No, it's some clever ideas in architectural organization of elements that were already known in 1991. Plus a fuck-tonne of extra compute and data.
This is so fundamentally obvious to anyone who knows anything about this stuff, that the people who are dismissively sneering at scaling and data sound as ignorant as flat-earthers arguing that kangaroos not falling off Australia is evidence against the globular earth theory. Of course more nodes and more connections, and more training data increase the capability of a connectionist systems. Of course larger networks ("more parameters") and data requires more compute power. I can't believe Zitran (or, say, TimnitGebru who I've also heard making this assertion, and ought to know better) really believes the opposite.
And of course things are more subtle and complicated than that. And "more compute plus more data equals better AI" is a bit of a simplistic heuristic. But as simplistic heuristics go it's a hell of a lot better than the two rivals, which are that more compute and data makes no difference or that they make AI less capable.
So what is "the truth"?
The truth is that throughout AI history we see waves of improvement. New data and more powerful computers come along. Connectionist systems show an apparent step-change in power. Then we hit limitations of what this architecture and this training data can achieve. We start to supplement raw connectionism with some extra clever constraints and rules written by humans. Some people declare that the rules-based AI actually out-performs the machine-learning AI. Then the rules supplemented AI also hits a limit. Until more data and more compute comes along and we see another "miraculous" improvement in capability.
From 2022 I have been arguing that the way this transformer based wave of AI is going to play out is that people were going to be wrapping the raw power of GPT and Claude etc. in more rules and constraints to make the language models even more useful. And that is exactly what we are seeing. All the things that are appearing. The "agent" frameworks, LangChain, RAG etc. etc. are ways of wrapping raw connectionist power of transformers in more rule-based constraints (in the form of various pieces of code) so that they can do more.
The core truth in Zitron's critique is that we might have currently exhausted the current generation of data (largely text and images harvested from the internet). There isn't going to be another huge cache of this to be exploited any time soon. Certainly not one which is orders of magnitude larger. And, yes, training AI on its own SyntheticData production might well have diminishing returns.
BUT the usefulness of this current generation of language models is only just getting started. As we make agents that directly plug in to lots of existing IT infrastructure. There is a lot of valuable work that the current AI can start doing, just by being able to access the right tools and resources. The "reasoning capacity" of the current AI, is demonstrably "good enough" that it can drive a lot more of our IT infrastructure (ie. our world) than it is currently doing.
Now my number one preferred way of using AI, is to use it to write code. And I think many of the biggest wins will come from people using AI to write new tools. Or to strip down some of the larger language models into smaller, simpler "experts" that can only do a couple of things well. There is a lot progress to be made from mining the current state of language models. Rather than just trying to find the next couple of orders of magnitude more data and compute. And this is where the companies and resources built up during the current bubble, will eventually go once it pops.
No Backlinks