Authors
Mehran Sahami, Susan Dumais, David Heckerman, Eric Horvitz
Publication date
1998/7/26
Journal
Learning for Text Categorization: Papers from the 1998 workshop
Volume
62
Pages
98-105
Description
In addressing the growing problem of junk E-mail on the Internet, we examine methods for the automated construction of filters to eliminate such unwanted messages from a user's mail stream. By casting this problem in a decision theoretic framework, we are able to make use of probabilistic learning methods in conjunction with a notion of differential misclassification cost to produce filters which are especially appropriate for the nuances of this task. While this may appear, at first, to be a straight-forward text classification problem, we show that by considering domain-specific features of this problem in addition to the raw text of E-mail messages, we can produce much more accurate filters. Finally, we show the efficacy of such filters in a real world usage scenario, arguing that this technology is mature enough for deployment.
Total citations
19992000200120022003200420052006200720082009201020112012201320142015201620172018201920202021202220232024141633458010616816618515814113914213212411792858496985257364510
Scholar articles
M Sahami, S Dumais, D Heckerman, E Horvitz - Learning for Text Categorization: Papers from the 1998 …, 1998