How to Make a Smarter Spam Filter

I have been using Earthlink’s built-in spam filter on my personal email — it works really well. It is basically a whitelist system: Any messages from pre-approved parties get through to me while anything else goes into a “suspect email” folder for me to look at and potentially approve or delete. This doesn’t really eliminate spam, but at least it gets it out of my inbox. I still have to go through the “suspect email” messages though — and this takes a bit of time every day. Fortunately there is a solution to that based on a few simple heuristics. I would like to see these features in spam-filters in the future — they would cut down the task of managing “suspect messages” by about 90%. Hopefully one of you works for a spam-filtering company and will read this and add these ideas to your feature list for an upcoming release:

1. Automatically bounce a challenge reply back to any incoming message that has an empty subject line. The reply should say “Nova does not accept messages without subjects. Your message has not been delivered. Please add a subject and resend.” This would eliminate a lot of the “(no subject)” spams that I get.

2. Any message with a subject line that contains my name in it is likely to be spam, as in “Spivack: Las Vegas Vacations on Sale” etc. Rank them lower in the list, or cluster them together, or bounce a challenge back to sender, or just delete them.

3. Messages with grammatically incorrect or meaningless subject lines such as “Octopus airframe linguine” should also result in a bounced challenge to sender, or should be clustered lower in list, or deleted.

4. Messages containing symbols such as *, *&, $, #, @ in the subject line should be ranked as highly suspect, as in messages that contain a string such as “A*D*L*T”

5 thoughts on “How to Make a Smarter Spam Filter”

  1. The thing with sending a challenge reply to a message with no subject is that it implies that the address is valid, which is all that some of the spammers are sometimes looking for. Your example reply also includes your name (and perhaps other personal information), which I’m sure you wouldn’t want to send to just anyone.
    Dealing with spam, much like dealing with viruses, is basically a loosing battle. The best thing you can do, is apply a combination of methods. About a year ago, I ghost-wrote a white paper on the most-commonly used spam-fighting methods ( http://www.inkcom.com/zwieback/publications/0303Zwieback.pdf ), if you’re interested.
    I think that email, useful and ubiquitous as it is, is pretty much dead in its current form. Having had the same email address since 1996, and despite having a powerful spam filter, I still have to manually deal with up to 100 spam messages a day.
    I’ve been thinking that a friendster-like idea would work pretty well for email. For instance, I would configure my email client to only accept emails from people on my white list, as well as those of my friends, business associates, etc. If someone not known to me or my friends sends me an email, I could potentially check out that person’s friendster-like profile to see if I want to read the message.

  2. Hi Dave!!! Great to hear from you here. What you suggest is exactly what the new LOAF project proposes — a way to block spam using social networks as a filter. Do a google search on LOAF and E-mail spam.

  3. The other major issue with all whitelist systems is mail that you WANT, but is not from a human (or is from an address that is unread/piped to /dev/null etc.
    The clearest example might be your e-ticket for an upcoming trip.
    Others are opt-in mailing lists you join.
    Updates/bills from companies you request electronic bills from.
    Newsletters/updates from business partners.
    the list goes on.
    It is a major challenge.
    As well, as I was recently going through my address book cleaning it up, I noticed another problem – many systems for spam detection employ a heuristic of “if the sender is in your addressbook allow the message in” – which is very sensible, but also does imply keeping addresses such as mailing lists, as well as other individuals you may not want to contact directly, but from whom you wish to still accept email (competitors for example).
    I have also suggested in the past that whitelists based not on direct email addresses but on good vs. “bad” domains + specific addresses might be a way to go.
    i.e. yahoo-inc.com which is Yahoo!’s corporate domain might be “good” (email from a Yahoo! employee) while *@yahoo.com might be “bad” – except for individuals whom you already know (or who are known by the network from which you share a whitelist).
    This would allow for valid, reliable, ethical “mass” email systems such as those used by major corporations to send out bills, customer updates etc to be allowed, while similar systems used soley for spam would be banned by the network.
    Much of the spam I get is from a few major domains (at least listed in the sender) but from a constantly changing address at that domain.
    Anyway, good idea wanted to add to the discussion.
    Shannon

  4. Gift Basket

    Sweet Blessings, a new Christian-based online shop featuring cookie bouquets, candy bouquets and gift baskets, opens with a campaign to donate a portion of all profits to Habitat For Humanity. The devastation of hurricanes Katrina and Rita, while not a…

Comments are closed.