SpammingThoughtStorms (ThoughtStorms)

Update : Now ThoughtStorms has moved to a new engine, WikiSpam is no longer an issue. (Presumably)

Maybe WhiteLists are the answer after all.


Thanks to everyone who's been helping fight the spam. My faith in wiki-readership's capacity to maintain wiki is revitalized. :-)

: update : Particular thanks to RichardP who's bot "WikiMinion" is now cleaning this wiki

: NB : I'm backing up ThoughtStorms fairly regularly at the moment

Other responses to Spam :


Discussions



PhilJones asked, in a [comment http://www.nooranch.com/synaesmedia/wiki/wiki.cgi?action=history&id=HomePage comment summary] for a change made to the HomePage:

:hi as18-1-3.kp.g.bonet.se, did you add spam and then delete it? Why?

I wanted to add that I've seen as18-1-3.kp.g.bonet.se do this on more than a few Wiki sites, not just ThoughtStorms. I suspect he is a polite spammer, one who is interested in getting his spam links into the history for the page without vandalizing the page itself. I suggest you remove the incentive for that activity by outputing a robots meta tag on the pages generated by the history action and on pages generated by the browse action (when a revision number is included). A robots meta tag can be used to instruct search engines such as Google to not index the content of such pages, thereby eliminating the possibility that the spammer can improve his PageRank by getting links into your page history. Here is what such a tag might look like:

-- RichardP

Hi RichardP, thanks for this. (and welcome to ThoughtStorms, want to introduce yourself? :-)

I'll probably do this. OTOH, do I want to disincentivate the "polite" spammers compared to the impolite ones?

-- PhilJones

Perhaps "stealth" spammer is a better description than "polite" spammer. In any case, I suspect the regular wiki spammers, like a "polite" spammer, don't really care if you revert the content of page. On a site that is well-indexed by Google, both kinds of spammer gain similar PageRank from the spam links in the history for your pages as they would from the pages themselves. The only advantage that a regular spammer has over a "polite" spamer is that a visitor might click on a spam link on the current version of a page. While such a spammer is undoubtedly happy to get the additional page views, I'm certain that PageRank is their true goal. So, emitting appropriate meta tags provides essentially identical disincentives to both regular and "polite" spammers.

-- RichardP

The problem with this wiki (and any other Usemod wiki) is indeed that search engines are allowed to index old revisions. Not only does this help the spammers even if you delete their crap, it also helps spammers find spammable wikis. I always have a look at my server logs when the chongqed.org wiki gets spammed. And, sure enough, most of the time the spammers come from Google looking for spam. Some of them also look for the phrase "Edit text of this page", but most of them simply search for other instances of spam, figuring that if there already is spam, they can add even more. Disallowing spiders to index kept pages makes it harder for spammers to find your wiki.

-- http://wiki.chongqed.org/Manni Manni

The latest spams seem to be deleting real content without inserting new links.

So there's real malice at work, rather than just the desire to increase page rank.

-- SusanJones

Susan, from various clues I suspect that the "deletions" are not directly malicious. I think they are caused by a poorly written software robot operated by a wiki spammer (in the anti-email spam crowd they call software written for use by spammers "ratware").

-- RichardP


Spammed by an edu?

today's spam allegedly from .mtu.edu (which is http://www.mtu.edu/))

So are they paying students to spam? Or are they managing to fake where they're coming from?

-- PhilJones

Phil, I don't think the spam was sent by a student of mtu.edu. I am familiar with the spammer who posted the spam that appeared to originate with thing1.dcs.it.mtu.edu. The spammer, who is likely Russian, exploits open proxies to post his spam in order to get around IP blacklisting. I don't know if he taking advantage of pre-existing poorly secured proxies or if he is hacking machines to install the proxies himself. Another possibility is that the proxies are installed by viruses. In the world of e-mail spam there are several documented cases of virus authors having used viruses to compromise thousands of machines and then selling the list of compromised machines to spammers.

With regards to faking the host name, I haven't seen any spammers do it yet - although it is completely trivial. UseMod uses reverse IP lookup to determine the host name from the IP address, but since reverse lookup is under the administrative control of the owner of the origin IP address the owner is free to report any host name they wish as the origin host. Once spammers begin forging hostnames, and it is probably inevitable that they will do so, I'd recommend that the UseMod flag $UseLookup be set to zero. Setting that flag to zero disables reverse lookup in UseMod resulting in only IP addresses appearing in RecentChanges (just like they do now when reverse lookup doesn't find a host name). It is much harder to successfully forge origin IP addresses.

-- RichardP

Richard. I changed this $UseLookup, but it doesn't seem to be making much differece. Last spammer still had a name. -- PhilJones

Phil, I can think of two reasons why you might see that behavior. First, are you changing the value in the script or in the config file? The value in the config file overrides the value in the script. Second, perhaps you've configured your web server to look up host names for UseMod? UseMod will use the CGI environment variable $ENV{REMOTE_HOST}, if available, in place of the environmental variable $ENV{REMOTE_ADDR}. Thus, if you've configured your web server to peform a reverse lookup of host names UseMod will use that value, even with $UseLookup set to zero. As I mentioned above, while forging reverse host names is easy, I haven't yet seen any spammers forging their hostnames. However, If this possibility concerns you and the second reason is the cause of the problem and you don't have access to the web server configuration to turn off reverse lookups in the server then you can modify the GetRemoteHost subroutine to ignore $ENV{REMOTE_HOST}.

-- RichardP

Doh! Yeah, of course it's in the config. Sorry. Next time I'll think before I hit "Edit this Page" -- PhilJones

Phil, now at the bottom of every page is the message "Config file error: Can't locate object method "U" via package "Wiki" at .//config line 1." I also notice that the user login database has been reset - probably because you appear to have changed $CookieName and $SiteName from "ThoughtStorms" to "Wiki" (note the window titles when visiting pages now begin with "Wiki:" instead of "ThoughtStorms:".

-- RichardP

Yep. Rogue character. Fixed. Cheers. -- PhilJones


No Follow Tag

NoFollowTag now contains discussion of Google's NoFollowTag


A new kind of spam?

Someone is going round creating blank pages for the UnfinishedLinks. Is this someone being helpful or learning about wiki or a new kind of spammer. Perhaps a bot that thinks it can earn some credit with a site, or lull spam filters into thinking it's useful? Any ideas? -- PhilJones

Phil, I think the most likely cause is buggy spammer software. The software might actually be performing edits on a bunch of pages, but for whatever reason (presumably a bug) it isn't actually changing the content. Because UseMod silently discards edits which make no changes, RecentChanges only shows the creation of new pages. However, UseMod doesn't discard an edit of a new page if you save back the default "new page" text. Another possibilty is that this spam software is actually intended to actively look for UnfinishedLinks and place spam on those pages, perhaps because the spammer figures this results in spam that will be guaranteed to be findable by search engines and less likely to be noticed by a wiki's users. As a possible example, consider the recently added page ArsDigitaCommunitySystem [revision http://www.nooranch.com/synaesmedia/wiki/wiki.cgi?action=history&id=ArsDigitaCommunitySystem revision history]. Like the other recently created pages, ArsDigitaCommunitySystem was one of the UnfinishedLinks on the TypedThreadedDiscussion page and clearly was created by a spammer. So maybe the spammer's software had a bug yesterday and he fixed it today?

-- RichardP


Wow, it couldn't have been much more than hour after you unlocked your wiki until the first spammer arrived. They're mighty quick off the mark! -- RichardP

: Yeah! I was hoping they might have forgoten about us for a while. I suspect their spam-bot didn't even notice. They aren't going to bother deleting the URL from their list. Anyway, now I'm back I'm going to implement the patch you suggested and then I have some thoughts about putting some kind of filter in front of ThoughtStorms which I'll play with. How are things with you anyway? -- PhilJones

Perhaps migrating ThoughtStorms to another wiki engine would be a superior choice to patching UseMod to implement both META tags and content filtering. For instance, both OddMuse and MoinMoin allready support both of those features. For that matter, I think MoinMoin has superior anti-spam features (besides it is written in Python :) ), however the migration path from UseMod to OddMuse is probably easier. Things are fine with me, thanks for asking. Oh, and I got my first death threat on my answering machine from a wiki spammer just last week! WikiMinion must be having an effect (it is now protecting approximately 50 wikis). -- RichardP

Wow, Richard. I guess this story about the death threat was not a joke, huh? Pheww. Care to tell us a little more about that?

Phil! I guess you noticed the stupid git that has been hammering ThoughtStorms (and other wikis, of course) with html spam that doesn't even work correctly. Over on the http://wiki.chongqed.org/ chongqed.org](http://wiki.chongqed.org/) wiki], we had a [report http://wiki.chongqed.org//BackToTheFutureII report about this spammer. It'd be interesting to know whether your logs also show this Back to the Future activity. Can you confirm that? -- Manni

My God! Richard, is that true? I think I'll hurry up with some kind of strategy to reduce WikiMinion visibility. Maybe I will have to migrate to OddMuse. I like it a lot, it's just that I'd miss the sub-page feature. Will also have another look at MoinMoin too. Manni, hadn't realized that. It's incredible. Will check logs to see. -- PhilJones

:Phil, don't hurry your efforts on reducing WikiMinion visibility on my account. Not only am I convinced that the call was just harmless posturing on the part of a pissed-off spammer, but making the reverts less visible here won't significantly reduce WikiMinion's visibility overall now that it is protecting a bunch of wikis. This is particularly true since he was complained about WikiMinion during a time that ThoughtStorms was locked, so he certainly couldn't have been annoyed at the cleaning going on here. MoinMoin supports subpages, but it doesn't directly support UseMod page data files, so migrating to MoinMoin involves an extra export-then-import step. The wiki-syntax parsers in OddMuse and MoinMoin both differ slightly from UseMod's, however OddMuse is probably closer. Manni, I'd be happy to provide some more details about the call. Basically, a spammer left a rant on my answering machine. He started off by calling me names and asserting that I have no right to interfere with his legitimate business. He escalated to speculating about my sex life and ended the call with the suggestion that he knows someone who would, at his request, "break my head." He spoke in a Slavic or Russian accent. I am not concerned, many years ago I used to get occassional calls like that when I was involved with anti-spam efforts on usenet and latter with my e-mail anti-spam efforts. I responded the same way I've done in the past, I report the call to my local police department and they ask the phone company to provide the caller's phone number. They then record the number (but they don't tell me the number, although in this case the detective implied the call originated from one of the former Soviet republics). They then close the case, and tell me to contact them if anything ever comes of the threat. Nothing ever has in the past. -- RichardP

Phil! No need to waste your time looking through those log files. I was able to [solve http://wiki.chongqed.org//BackToTheFutureII solve the mystery]. Of course, if you insist, I wouldn't mind having a look at his referrers and his user agent strings ;-)

Thinking about migrating away from UseMod is certainly a good idea in my view. It's always the same old problem, though: What do you do with those subpages? Is MoinMoin supporting those? Seems like a decent engine to me. But I would always choose the one that's easier to install and written in a language that I'm familiar with. There's a wealth of extensions for OddMuse that are very easy to install. And I have rarely seen spammed OddMuse wikis, probably because its spider meta tags are so effective. In fact the only two spammed OddMuse wikis I remember are my own (which has lots of content that spammers are looking for) and an abandoned one were spam is never cleaned.

-- Manni

:Manni, MoinMoin does indeed support [subpages. http://moinmoin.wikiwikiweb.de/HelpOnEditing/SubPages subpages].] In my experience the biggest difficulties involved in migrating from UseMod to MoinMoin are getting the raw page data from UseMod into MoinMoin and updating the pages to account for the differences in wiki syntax. Similarly, the biggest difficulties involved in migrating from UseMod to OddMuse are flattening the wiki structure to eliminate subpages (wikis with many subpages require a great deal of manual editing before migrating) and updating the pages to account for the mild differences in wiki syntax (even with the usemod.pl extension installed there are still a few differences). -- RichardP

Sure, if you think this is the right place, I'll give you a little overview.

** There are people pushing affiliate programs and pay-per-click services. Seems most of them are from eastern Europe and they like to use bots.

** Then there are the Chinese people. They spam manually and it's hard to tell exactly what they are spamming for and why they are doing it. But there are a couple of companies in China that like to sell (what they call) SEO-expertise.

Spammers aren't looking for any kind of specific content. At least in the sense of wiki content as a context for their spam. They are using Google to find places to spam. Most of them enter a previously spamvertized URL as a phrase into Google and this will help them find spammable wikis. Since UseMod has this sincere deficiency of not providing sensible robots meta-tags, it's mostly UseMod wikis that get hammered.

It's hard to come up with any trends. I think that more and more spammers are using bots. I know that many people have always thought that any kind of spam must be from a bot, but our logs tell a different story. With the growing number of bots (and also with the growing number of Chinese people being paid for spamming), the amount of spam is getting worse. Not only the total amount, but also the amount of spam per spam incident. They post more and more links, spam more and more pages, and they don't care whether the last revision of a spammed page was their own spam.

It seems that there are similar trends over in the blogospehere. But spammers give the impression of concentrating on one 'medium'. Wiki spammers are focusing on wikis and blog spammers spam blogs. There isn't much overlap. A notable example is the guestbook spammer who also spams wikis. I don't know why he is doing that and why he isn't adjusting his means of spamming. I guess he's just an idiot with good tools.

-- Manni


Banned list : http://www.nooranch.com/synaesmedia/wiki/wiki.cgi?action=editbanned

See also :