This article acts surprised by by by the weaknesses of Google.

I'm not, the NoFreeLunchTheorum tells us that no search method can be a significant improvement over all possible problems. That's certainly true of searching the web. It's a mathematically provable, logical necessity that there'll be GoogleHoles

: Update : Hmm, but maybe web searching is a RealWorldProblem

Having said that, each of these holes is an interesting opportunity for someone to fix

Also note, Slate is a Microsoft property. Do we entirely trust their reporting on Google when Microsoft is contesting the search space? (

Microsoft vs. Google "It is a marketing battle, not a technology showdown." :

See also MicrosoftVsGoogle

While I'm as skeptical of MS' marketing machine as much as the mext man, there are some very interesting points rolling off [Steven Steven Johnson's article], and it's a shame he doesn't address them in much more detail.

His points boil down to:

Top results are for shops

Cross-genre keywords will pander to cultural, i.e. technological preference, e.g. "apple"

People are tending to rely on Google rather than old-school, book/publication research

I think these are fair points, but they paint only half the picture. Point for point:

Any automated search engine will sort according to some criteria, and therefore will attract "search engine optimisation" techniques that people are willing to pay for in order to promote their business. I think this has more impact than people simply linking to shops whenevr they mention a product.

Any search engine entails learning how to use the engine, as any non-biased search engine in a technological domain will encouner a greater number of technological pages than anything else. Searching for "apple" returns Apple computers. Searching for "apple trees" does not. Words can be omitted as a vague kind of filter too, although I think this could be made "fuzzier" too.

Quite possibly - indeed I find myself looking only on Google these days, but that's generally because I don't have the time or resources to seek out further sources. I think the main group this is probably aimed at is journalism and media, rather than either individuals or academia, which seem to maintain a healthy infrastructure of both off- and on-line research. However, this point: "Assuming this practice continues, and assuming that Google continues to grow in influence, we may find ourselves in a world where, if you want to get an idea into circulation, you're better off publishing a PDF file on the Web than landing a book deal" could be seen as a good thing, too.

In relation to the first point, there is an interesting observation on [Steven Steven Johnson's blog] that people are posting links to sites in the form of comments in his old blog posts, with comments like "good site", and the URL they want to promote as the user name's link. Could this be a form of "pagerank piggy-backing"? (or "pagey-backing"?)

There's still a lot of interesting stuff that lies ahead for search engines, especially once we start to get distributed "spidering" that allows individual users to index whatever content they find, a process touched upon by but but that could be extended a whole lot further - by indexing the information you yourself can see, it starts to become possible to at least keep pointers of the "dark web" - all the content hidden behind databases, CGI and log-ins. It also makes cataloguing and linking of any documents possible, rather than just whatever Google writes an interface for.

Secondly, everything'll change if we can get widespread intercompatible [semantics semantics]] for all information. (SemanticWeb)


GrahamLally also wrote this :

In a sense, isn't the distributed linking what weblogs are already doing. And some kind of spider for them similar to what TechnoRati (or some variant) is up to?

Basically we seem to have these choices :

  • one big search engine like Google which applies PageRank or similar statistical algorithm to everything
  • search engines that apply a mixture of statistical analyses to human controlled subdomains : eg. TechnoRati only searching weblogs, or a a specialist Google variant for social science site etc.
  • a human compiled directory (Yahoo, DMOZ)
  • the emergent web of links ... normal web pages,

Hmmm. See also my interest in TypingNotCategorizing


See also :