The problem of SpammingThoughtStorms (Meatball:WikiSpam) is increasing. I had some interesting ideas about an EmailImmuneSystem but how might something similar work for wiki?
1) Wikis are about people
A totally automatic system can't protect wiki. It needs administrator or community involvement.
But at the moment, spammers are armed with machine guns - scripts which can fire multiple rounds per minute, spamming multiple pages automatically - and the humans in the community are only armed with shotguns, capable of reverting one page at a time and then you have to reload. (Laboriously roll-back each page individually.)
The first issue is giving the community comparable weaponry. For example, the capacity to roll-back all changes made by another user today, in one action. This gives them a lot of power to do evil, but generally the community are pretty good. I haven't encountered much wanton destruction. Spam is evil but is a rational and understandable selfishness. So giving community members easier ways to undo the work of a spammer, once the spammer is identified could be a way to go.
Counter : The spammers will start engaging in wanton destruction, in order to get you to turn off this facility.
2) A honey-pot method
As with the EmailImmuneSystem, you can set up traps (or honey-pots) to get spammers to reveal themselves. For example, some pages on the wiki can be honey pots which you aren't meant to post to. Posting to a honey-pot reveals you are a spammer. At which point the wiki may alert a well armed administrator, the community in general, or even automatically undo all changes made by you (today, forever) or add you to the banned list.
This carries more danger than the EmailImmuneSystem case, because it's easier for those unfamiliar with the wiki to accidently tread on the mine (ie. post to the honey-pot) There'll have to be some management of understanding. Do you just have a standard page called HoneyPot? Will you have instructions explaining what it does? Or do you select some random page-names for the honey-pots, regularly add them to the RecentChanges list (so that spam-bots find them) but mark them up in some-way with a cue to remind people not to post there?
3) Looking for Repetitions
Originally spam meant the same message posted to several news-groups. A wiki-spammer typically posts the same additions to multiple pages. So another way of identifying spammers might be this. An automatic script which wanders through RecentChanges, finding all the diffs (ie. all the chunks of text added) by each user, each day. It then checks to see if the same chunk has been added on more than two pages. If so, this is assumed spam and the usual reversions or alerts can be triggered.
Problems : Getting this right is a bit tricky. It seems sensible that someone can post the same text on at least two pages during one day. But three starts to look excessive (why not ReFactor to a new page with various links to it?) But for small chunks of text (eg. page names and a short description added to a "See Also" list, it's clear one user might be ligitimateley posting the same thing in dozens of places. So there has to be a minimal length thresh-hold.
Users need to be alerted to the threshold, and when they're in danger of crossing it, otherwise innocents may get blasted. And of course, smaller spam will slip through the net. But even if this technique has no effect other than reducing the size of spam on wiki pages, that's going to help. Pages won't be bulk obliterated to make way for a 100 links to factories in China. Spam will be smaller and subtler, and often not damage the reader's experience.
Counter : Won't it just encourage spam-bots to do multiple small changes rather than one big one?
Comment : Setting an threshold on same text edits would only make the spam programs evolve further. They would just introduce more random text into their garbage. Having minimal length threshold would limit the amount of damage they could do, but it would make attacks harder to identify. Currently if your entire page is filled with odd links you will know its spam. If its just one or two links smartly placed it could go months without anyone noticing. But at least its not as destructive to the wiki. I still hate to see any spammer benifit from spamming a wiki, but it would be nice if they didn't make such a mess. My idea doesn't solve the smart limited link spam, but if you just have a edit threshold on all edits it would prevent bots from causing widespread damage. A normal user, even during a really agressive refactor isn't going to make more than a few edits per minute to a small number of pages. Track IPs for several hours (or days), if too many edits are happening from one IP it would be blocked automatically for a certain amount of time. Spammers will adapt, but then so will antispam methods. – Joe(at)chongqed.org
Another approach: limit the number of unique pages that can be edited from any given IP in the course of a day. –BillSeitz
Hmmm. This looks interesting : http://www.emacswiki.org/cgi-bin/community/BannedContentBot
Hi Phil, http://www.emacswiki.org/cgi-bin/community/OffTopic is another ;) – MattisManzel
Phil, Sunir Shah of Meatball has proposed an interesting implementation along the lines of your first point above, ie. "Wikis are about people." He has proposed a communally controlled approach that he calls Meatball:CitizenArrest. – RichardP
Yep, interesting. Sounds plausible to me. Particularly the point about changing the cost of reversion. I think if we can get the cost down (which is pretty much what you've done with WikiMinion) then it's plausible for the community to manually defend itself.
I'm also quite intrigued by this. Ruby Wiki Tar Pit : http://groups-beta.google.com/group/comp.lang.ruby/browse_thread/thread/8397eb346d1516e4
Reminds me of the ACS BozoFilter. I like the idea of the spammers not knowing they've been rumbled.
See also :
- (WarpLink) number 3 is a bit like CompetitiveWiki, isn't it?