Primer: Spam Filtering: The False Positive

What is it?
A legitimate e-mail that is not delivered because a spam filter incorrectly identifies it as junk mail.

Why is it a problem?
E-mail is an essential business tool, and there’s a cost when it doesn’t work as intended. For example, your company might use an application to generate order confirmations to customers. But a false positive can sidetrack a legitimate order.

How does it happen?
Messages are red-flagged in the spam-blocking applications used by companies and Internet-service providers to screen activity on incoming e-mail servers. A filter typically scans and scores each e-mail, blocking delivery of what it deems spam. A false positive results when a sender unwittingly includes enough of these red flags in a legitimate e-mail for it to be deemed spam.

How are spam scores determined?
Spam filters base scores on known spam techniques. Most filters work by parsing the headers, content and technical characteristics of e-mail, looking for specific indicators. One or two indicators alone don’t usually earn a spam label, but if the filter identifies enough suspicious patterns—the presence of Hypertext Markup Language (HTML) or a suspicious server origin—the spam score is met and the e-mail is rejected. Anti-spam systems also keep blacklists of known spammers, as well as lists of approved senders. Most anti-spam systems keep their rules secret to prevent spammers from targeting them.

What characteristics cause problems?
Suppose you send a monthly HTML e-mail that contains an image tag pointing to a graphic on your Web server. A spam filter would likely flag it because it contains HTML and links to an image—making it look a lot like a common pornography advertisement. Other content indicators include ALL CAPS text, red font tags, huckster language like “pure profit” and even the word “remove.” Spammers often misuse the seemingly benign “remove me from this list” offer to verify e-mail addresses and subject them to more spam. Spam-blockers also check technical characteristics on the theory that spammers typically have sloppy coding habits. A common technical red flag is when the “From” address doesn’t match the header automatically added by the e-mail server of origin. That’s a problem for a company that uses a third-party service to send e-mail that appears to be coming directly from its domain. Witness Tumbleweed Communications, which makes an anti-spam application and was chagrined to find that its product filtered out Web conference invitations it had sent to its own clients using the WebEx service.

I’m not selling Viagra. Why should I worry?
Because spam techniques keep changing. Your application that auto-generates e-mail can work fine one day and be flooded with return mail the next. Your staff needs to stay on top of spam-blocking updates and guard against false positives. The very fact that your company uses an automated Web script to generate e-mail may now be a red flag on your customers’ spam filters.