The freedom to spam your fellow citizens

Is it me? Or is there some kind of 4th-of-July spamfest going on?

I noticed 172 new spam comments on the blog this morning, which surprised me because I’d cleared out the spam queue late last night, just a few hours earlier. So I handled those, and then just now I saw there are 217 new spam comments.

I’ve been getting spam comments on the blog at an unprecedented rate the last few days. When I wade through the spam queue to make sure there aren’t any comments from real people in there, that takes a couple of minutes now, and there are often a few new spam comments that have been posted while I was reviewing the list, so after click “Mark all as spam” I still have a few to clean up. The spammers never sleep.

I’ve looked at which posts generate the most spam, and the answer is simple but I don’t entirely understand the thinking behind it. Blog posts about spam generate the most spam, consistently. For example, this post from over two years ago, on the subject of spam, gets more spam comments than any other by a huge margin. Why? That seems like ass-backwards reasoning to me. Hey guys, there’s a pretty good chance that a blog post about spam isn’t actually full of thoughts like “spam is cool, I wish I got more spam.”

Just for fun, I made a 4th of July collage out of a recent fireworks photo at Disney World and the Wordle of the text of everything in my spam queue today. That’s 31,743 words in those 217 comments, so it’s a nice-sized chunk of text, and the results are shown in the image above.

If you’re not familiar with Wordle, it’s a cool little tool developed by an IBM researcher that turns a chunk of text into an image that summarizes the text by showing words in a size proportional to their number of occurences in the text. It’s all explained on the Wordle web site.

The net effect is that you get a graphic that summarizes a large piece of writing, and it’s sometimes spooky how well it works. I’ve played around with it a bit, and I wish there was a Wordle of every book on Amazon.com. These images help you understand the content of a piece of writing in seconds, almost intuitively. As an example, I was looking at Rob Weir’s post on Wordle just now (where I first learned of it), and Megan walked in and said immediately “so you’re doing something with Moby Dick?”

On the sample above, I was at first surprised that it didn’t include more obscene phrases, since porn sites are heavy abusers of blog-comment spam. (My favorite line I noticed in today’s batch: “What to do if you have poison ivy on your penis.”) But after looking closely at the data, I realized the porn sites do relatively short comments with a few links, but some of the others, like the Lonely Planet stuff, include hundreds of links in a single comment and therefore skew the results.

Anyway … happy 4th of July, everyone! May we always live in a country where we’re free to be ourselves, which for many people seems to mean “free to be a spammer jerk.”

This entry was posted on Friday, July 4th, 2008 at 8:06 pm. You can subscribe to comments on this post through its RSS feed.

1 comment posted:

  1. Hello:

    A few of your posts came up in a Columbia City search, so I thought you might be interested in this neighborhood wiki project.

    http://ColumbiaCitizens.net

    If you’d like to get alerts for the “weekly wiki” (the Citizens Wikli), you can sign up here:
    http://ColumbiaCitizens.net/wikli:subscribe

    S.

Have your say

Fields in bold are required. Email addresses are never published or distributed.

Some HTML code is allowed:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>
URIs must be fully qualified (eg: http://www.domainname.com) and all tags must be properly closed.

Line breaks and paragraphs are automatically converted.

Please keep comments relevant. Off-topic, offensive or inappropriate comments may be edited or removed.