January 2004
KASP is presented as a single file for download. This version requires Java 1.2 (or greater). Source code is available under the GNU General Public License.
KASP.zip (349K) Updated 21:56 EST, Dec 18, 2003
- An inbox that receives spam and the POP server info to access it
- A "safe" mailbox that no spammer knows about, and SMTP server info to send mail to it
- Files of spam examples, and files of legitimate examples
You will need
The KASP.ini file contains the user specific parameters. You will have to change these. Here is what it looks like:
POP_Server = mail.mycompany.com
POP_Username = myPOPusername
POP_Password = myPOPpassword
INBOX = myname@mycompany.comW
SMTP_Server = smtp.mycompany.com
SMTP_Username = mySMTPusername
SMTP_Password = mySMTPpassword
OUTBOX = safe@mycompany.com
SPAM = spammer@spam.com
SpamMail=usr/kyle/Mozilla/spam/Trash
LegitMail=usr/kyle/legit/fileA;usr/kyle/legit/fileB
Be sure to separate multiple files with semicolons
- Need to have Java 1.2 (or greater) installed, and have the executables in your path
- Make sure the current directory is the main directory.
- Build the spamStats.tab file with PreKASP.sh (Linux) or PreKASP.bat(Windows)
- Run KASP with
KASP.sh
(Linux) or
KASP.bat(Windows)
KASP, and it's related components, are resource intensive (about 800Meg of memory needed). KASP utilizes huge memory hashes that cause severe swapping in computers with less than this amount of free memory. My 128Meg laptop takes 8 times longer than my 1Gig desktop (22min).
- Everything is already compiled to the ./work subdirectory, therefore recompile is optional.
- Look at the com.bat file in the main directory. You will see that I move all files to the ./work directory before I compile. This is because the classes are all interdependent. It also makes for a simple classpath.
- KASP.ini: Is described above
- spamStats.tab:
Has the counts for all word pairs. Is created by PreKASP, and used by KASP to filter mail. I have NOT included one, so you will have to make it yourself. The columns are:
- WordA - First word of the pair
- WordB - Second word in the pair
- LegitCount - The number of legitimate mails with this word pair
- legitMid - The expected chance of a legitimate mail has this word pair
- legitMin - The lowest possible chance that a legitimate mail has this word pair
- legitMax - The highest possible chance the a legitimate mail has this word pair
- SpamCount - Number of SPAM with this word pair
- spamMid - The expected chance of a SPAM has this word
- spamMin - The lowest possible chance that a SPAM has this word pair
- spamMax - The highest possible chance the a SPAM has this word pair
- SpamMarker.java: Marks a mail object as spam or not
- SpamMarkerCorelationMatrix.java: Has the logic for deciding the best set of non-related words for use in the Bayesian filter.
- WordifyString.java: This is a central routine that takes a string and adds all words found to the word list. This Java file also has all the constants used to define what a "word" is. Currently, double-quote (""") and equals ("=") are legitimate word characters. This makes "src="http" is a valid word (95% chance of being spam).
- WordifyMail.java: Takes a mail object and uses WordifyString to produces a list of words found.
- WordifyMozilla.java: Takes a Mozilla file and updates the spam/legit mail word counts. Since a Mozilla mail file is just appended mail contents, the only technical issue is identifying the beginning and end of each mail and sending off the contents to WordifyMail. This version uses X-UIDL and X-Mozilla-Status MIME header values to help identify the top of a mail. If you are using another client you will have to change this Java file for sure.
- WordPairData.java:
is the class that defines the spamStats.tab fields. It also contains the calculation for the Safe Bayesian probabilities for each of the words.