How to Make a Smarter Spam Filter

April 6th, 2004

I have been using Earthlink’s built-in spam filter on my personal email — it works really well. It is basically a whitelist system: Any messages from pre-approved parties get through to me while anything else goes into a “suspect email” folder for me to look at and potentially approve or delete. This doesn’t really eliminate spam, but at least it gets it out of my inbox. I still have to go through the “suspect email” messages though — and this takes a bit of time every day. Fortunately there is a solution to that based on a few simple heuristics. I would like to see these features in spam-filters in the future — they would cut down the task of managing “suspect messages” by about 90%. Hopefully one of you works for a spam-filtering company and will read this and add these ideas to your feature list for an upcoming release:

1. Automatically bounce a challenge reply back to any incoming message that has an empty subject line. The reply should say “Nova does not accept messages without subjects. Your message has not been delivered. Please add a subject and resend.” This would eliminate a lot of the “(no subject)” spams that I get.

2. Any message with a subject line that contains my name in it is likely to be spam, as in “Spivack: Las Vegas Vacations on Sale” etc. Rank them lower in the list, or cluster them together, or bounce a challenge back to sender, or just delete them.

3. Messages with grammatically incorrect or meaningless subject lines such as “Octopus airframe linguine” should also result in a bounced challenge to sender, or should be clustered lower in list, or deleted.

4. Messages containing symbols such as *, *&, $, #, @ in the subject line should be ranked as highly suspect, as in messages that contain a string such as “A*D*L*T”