GenerateAndTestInParallel

ThoughtStorms Wiki

Weblogs 4.0

In EditorializingIsParallelizable I wrote about the blogging phenomenon, and tried to emphasize the similarity with the open source community.

In both cases :

  • the established professionals or orthodox practitioners would be sceptical of the performance of a largely amateur, voluntary community.
  • This was due to lack of the hallmarks of "professionalism" in each activity. In software development this includes all the quality control, quality assurance and project management techniques that software engineers have evolved. In journalism it includes the trained writing techniques, and disciplines such as FactChecking, even the constraint that lying journalists will be sacked.
  • But in practice, this large, undisciplined community performed as well (if not better) than the disciplined professionals. I attributed this to a law of large numbers, an emergent effect of a critical mass of "dumb" components acting in parallel.
  • The similarity continues : this effect has actually occurred in the cases of open source software development and blogging, because of :
    • a) the large numbers who are enabled to contribute due to the connectivity of the internet;
    • b) the fact that the core activity in each case is successfully parallelizable. Debugging in software development, and editorializing in blogging (journalism).

Everyone knows about and understands a). But I think that recognising b) is equally important. Eric Raymond groked it in TheCathedralAndBazaar - and it's this understanding that makes his essay so brilliant a contribution to understanding the open source phenomenon.

So what is it about debugging and editorializing that allows them to be parallelizable?

Tentatively I'd point out the following. Programmers who study AI are taught to think about problem solving, activity planning and automated creativity as an alternating sequence of the principles of generate and test. The generate principle produces candidate solutions to a problem, the test principle evaluates them. For example, if a computer is deciding which move to make next in a game of chess, it can run through the list of its potential moves, set up an internal model of the board after each - the generate principle -; and then evaluate each, (which gains it a better position against its opponent?) - the test principle.

Generate and test are not always so explicitly separable. And recent AI work has tended to dismiss the distinction in the sense that it rejects the principles being explicitly implemented as separate generate and test phases; or as separate modules within a system. But the value to us isn't really compromised here. I'm just using generate and test as a pattern to help think through these issues.

One of the great values of recognizing the pattern is that it allows us to express another, more radical idea : namely that the same results can be achieved by rejigging the amount of work done under each principle. We can sometimes trade more of one for the other.

Undoubtedly the most famous example of this trade-off between generate and test is Darwin's theory of evolution by natural selection. Faced by the existence of multiple, complex biological species, well matched to their ecological niche; the obvious explanation is creation. In other words, a sophisticated, intelligent generate principle. Darwin argued that one could as successfully explain the existence by a far dumber generate principle, simple mutant variance, and a far more sophisticated and targeted test principle : natural selection.

Now, software development and journalism, can always be thought of as problem solving activities : "what's a program that does X?". "What's the truth about Y?" The solution of these problems involves both generate and test principles. Software is designed and implemented (generate) and tested and debugged (test). Information is written up (generate) and corrected and selected (test). And the activities that I have been arguing are parallelizable are also notably more test principle oriented.

But let's not be over hasty, simplistic or trite. Writing software or news or opinion or anything else also involves design, coding, writing technique, plenty of generation. And when one looks at the open source movement and blogging movement we see a lot of spontaneous writing of new stuff.

Nevertheless, what I think is that :

  • a) a particular form of programming and a particular form of journalism have appeared, where the emphasis has shifted away from intelligence in the generation principle and towards intelligence in the test principle; and
  • b) that in each case, the test component has proved amicable to parallelization

Notice that I have merely said and. I have not said that some things are parallelizable because they are test; nor that generation could not be equally parallelizable. In fact the amount of code writing and original news writing has surely increased across the net.

But my hunch is that there often is something more parallelizable about test. And therefore if there is a version of your activity which is more test than generate heavy, then that version is the one ripe for parallel exploitation. (I wonder if we can try to justify this: ProblemOfParallelism)

BenHyde :

WIKIs are another example of a process framework for solving a class of organizational problems where you have a huge pool of hands and eyes and you want to leverage that resource to make something good. We have stumbled upon a few things over the last few years about such systems. Some people treat these "discoveries" with near religous reverance - which is fine because we don't have too many of them so far. *

http://enthusiasm.cozy.org/archives/000222.html#000222

ZbigniewLukasiak:

Dumb activities are parallelizable. So the emergent model is to exchange intelligent activities into parallel dumb activities.

This covers in fact both generation and testing, you can have intelligent generation and intelligent testing and the blog or Open Source model is about mass dumb generation and mass dumb testing.

I would argue that it is the same model that Darwinism uses. The testing there is not that sophisticated.