A Bayesian Approach to Filtering Junk E-Mail

A Bayesian Approach to Filtering Junk E-Mail

Abstract In addressing the growing problem of junk E-mail on the Internet, we examine methods for the automated construction of lters to eliminate such unwanted mes- sages from a user’s mail stream. By casting this prob- lem in a decision theoretic framework, we are able to make use of probabilistic learning methods in conjunc- tion with a notion of di erential misclassi cation cost to produce lters which are especially appropriate for the nuances of this task. While this may appear, at rst, to be a straight-forward text classi cation prob- lem, we show that by considering domain-speci c fea- tures of this problem in addition to the raw text of E-mail messages, we can produce much more accurate lters. Finally, we show the ecacy of such lters in a real world usage scenario, arguing that this technology is mature enough for deployment.

Related Post

    Open chat