A Bayesian Approach to Filtering Junk E-Mail
Abstract In addressing the growing problem of junk E-mail on the Internet, we examine methods for the automated construction of lters to eliminate such unwanted mes- sages from a user’s mail stream. By casting this prob- lem in a decision theoretic framework, we are able to make use of probabilistic learning methods in conjunc- tion with a notion of dierential misclassication cost to produce lters which are especially appropriate for the nuances of this task. While this may appear, at rst, to be a straight-forward text classication prob- lem, we show that by considering domain-specic fea- tures of this problem in addition to the raw text of E-mail messages, we can produce much more accurate lters. Finally, we show the ecacy of such lters in a real world usage scenario, arguing that this technology is mature enough for deployment.