ThoughtStorms Wiki

My suggested alternative to the SemanticWeb :

I'll suggest the alternative to the SemWeb is the SynWeb, a web which doesn't need "key identifiers". A world with lots of online data, marked up with syntactic cues which make it easy to parse (eg. good old fashioned XML, or MarkDown or YAML); more powerful tools and libraries for parsing and querying data with these formats; plus lots of programs which scoop up the data and combine them in interesting ways.

The difference is that the knowledge needed to give semantics to the data resides in the programs which do the combining, rather than in a schema which has been prepared earlier.

Why is this "better" (easier, more plausible)?

Because it's much easier to decide what something like an "author name" means at the point where you're producing and consuming it - ie. in the context of an application which actually wants that information - than it is to correctly determine what it means in advance, in general, for all possible producers and consumers[1].

This is the way meaning works everywhere else - eg. in natural language, the meaning of a text depends on the interpretations made by the author and the reader, in the pragmatic context of what they're communicating about. It's not formally fixed as the sum-total of the meanings of all the words.

Could the SynWeb bring all the benefits of the semantic web?

Most of them. In the sense that any particular application you can think of that requires that someone write a specific program (P1) to put data from A together with data from B, can be done in the SynWeb. In that case, the knowledge is going to reside within the program P1.[2]

The one thing that the semweb promises that the non-semweb can't is the "miracle" applications : where A and B produce data without any knowledge of, or deliberate co-ordination with, each other, and a user of program P2, which is a generic SemWeb joiner without any special knowledge of A or B, finds that the two forms of data are such an exact fit that they can usefully be combined.

I guess the degree to which you believe in the semweb promise is the degree to which you think that such miracle situations will occur in real life. Personally, I think that the hard part is understanding the data from A and B sufficiently well to see if and when they can be combined at all.

Anyone who can do that can probably write a P1, containing that insight. Manipulating the relevant XML, especially with today's XML libraries, isn't so hard. And I think the SynWeb will see yet more powerful syntax processing and querying tools.[3]

The semweb scenario presupposes users who can't write such a quick custom script to combine A and B, but can understand the data (and the schemas) well enough to notice and formulate (in some sort of query language) sensible joins.

I may be wrong, and I'm always open to counter-evidence, but I still can't think of an example where this has actually taken place (ie. two datasets have been usefully joined by a program which didn't explicitly know about these two data-sets.) Any suggestions?[4]


[1] Sure you can use something like RDF as a representation format of data for a specific application for one set of users. But in this case the URI isn't actually buying you anything over any other sort of locally produced UID. So the differentiating feature of the semweb isn't actually being used.


In the comments : [quote]And writing scrapers is reasonably easy to do. I think this has got a lot of potential. There’s more work to be done on the software, but to me it is the best attempt at doing useful RDF that I have seen so far.[/quote]

Of course it's the best attempt at doing anything useful.

But scrapers are the living embodyment of the SynWeb.

Scrapers are the avatars of the theory that programs, not URIs, are what give meaning to data. They're stocks of rival knowledge about how to interpret it.

They're what the SemWeb wants to dispense with. Or rather, would be dispensing with if things were going its way. Instead, the proliferation of scrapers is a strong hint that it's not working out.

[3] RDBMS analogies with the semweb are wrong. The RDBMS is basically a powerful SynWeb tool. Meaning is relative to the applications. The design of a database is typically internal to a project or organization, and meaning derives from this context. To the extent SPARQL is just a good graph-shaped database it might also be a good SynWeb tool.

[4] I think some people have already mentioned the capability of adding data from other ontologies as a passanger on RSS 1.0 feeds, but unless the feed-consumer is doing something interesting with this data, without knowing about it, it still isn't doing anything that a P1-style program in the SynWeb couldn't do.

See also :