February 2004

Kyle's Anti-Spam Program (KASP) 2.1

Sick of Spam?
Want to use your existing mail client?
What easy setup, without your own mail server?

Choose KASP 2.1 with " Safe Bayesian " and " Correlation Analysis " technology to save your eyes from dreadful marketing-speak!

Introduction

    KASP is my Free Software contribution to the War on Spam. I wanted something that would work with my existing client and would not require a mail server. All you need is an inbox, an outbox, and Mozilla mail files of spam and legit mail (see details).

    KASP is an investigation into how well Bayesian filtering can separate the spam from the legitimate mail. It is not an attempt to make the best spam filter.

Details

    KASP takes mail from the "inbox", prefixes spam with " [SPAM] " and sends the mail to the "outbox". You then setup your mail client to receive mail from your "outbox".

    What could be simpler?

    KASP currently looks at all the words from each mail to determine if a mail is spam, and only uses those that are "good" and "safe" spam indicators. Please see the Theory page for more details.

    Since all the files are coded by me, I must admit the Mail library is a little incomplete (but Sun's sucks so much more). In any case, KASP will only delete mail that it has been confirmed sent to the outbox, the worst that happens is duplicate mail. You can turn on the session output and send it to me should you have difficulty. I will be more than happy to promptly repair the program so it works with your particular POP/SMTP servers.

Results (Jan 26th, 2004)

    KASP 2.x is a significant improvement over 1.1.

    I am now using KASP 2.1 to filter my mail. I have yet to get a false positive (legit mail indicated as spam). With over 28,000 spam, the false positive rate is unmeasurably small . The false negative rate is less than 3% (less than 3% of spam is getting to my inbox).

    The Bayesian filter has a distinct problem with spam that has no usable words in it . For example, mails with just remote images are not marked as spam. This is because I have marked some of those mails (from networksolutions and my service provider) as legit. To loose these mails would be no great loss, so these could easily be categorized as spam to eliminate this problem. In any case, spammers could easily in-line images to get around this feature.

    The Bayesian filter needs time to learn. It takes some time for the filter to be confident it is seeing a spam pattern and remove it. Day 1 of large scale attack, (January 26th, 2004), the Mydoom.A email worm is responsible for 30% of my mail, and most was getting though the KASP filter. This was resolved the next day, after KASP's learning cycle was run.