At the end of January, Respite, the campus-wide spam-filtering service, underwent an upgrade. This new version of Respite, dubbed ‘R3,’ brings with it updated software and new policies and procedures regarding mail delivery.
With the roll-out of R3 on January 28, all e-mail accounts, aliases, lists, and forwards on the rpi.edu domain were filtered for spam. Students that were previously opted-in to R2 are automatically opted-in for R3, while students who were not opted-in to R2 will now have their e-mail filtered according to default settings.
Director of Communication and Middleware Technologies Gary Schwartz said that spam has been a problem for a long time, citing a 1997 communication to campus regarding unsolicited e-mails. In fall 2003, the Institute rolled out its first version of Respite. At the time, the Institute was only able to afford 2,000 licenses, but included the source code allowing for custom modifications to be added. Roaring Penguin, the developer of the software upon which Respite is based, licensed some of those modifications back from RPI for inclusion into its own product. In 2004, the number of available licenses was increased, and this past year, RPI obtained a site license for the product, allowing it to use Respite to filter all mail sent through its servers.
According to Schwartz, Respite was always meant as an “if you want it” service, minimizing paternalism and intrusion. With the recent increases in spam, however, Schwartz said that “the Institute can no longer take a laissez-faire approach due to the increased spam load. This upgrade is necessary to maintain the integrity, reliability, and availability of the system. It’s an operational issue, not a philosophical one.”
Schwartz noted that R3 now has access to and uses a globally-trained Bayesian filter that receives training data from many of Roaring Penguin’s customers, some of whom include the IEEE, the University of North Carolina, and internet service provider Armstrong. By default, RPI supplements the Bayesian filters with one of the most conservative blacklists available. While he notes that R3 won’t catch all spam, Schwartz stated that “80 percent of mail gets rejected by R3, and 60 percent of that is from the blacklists alone.”
According to statistics provided by Communications and Middleware Technologies, more than 300,000 e-mails per day passed through RPI’s SMTP servers before R3 was rolled out. With the filtering provided by R3, that number has dropped to about 50,000 e-mails per day. “80–95 percent of e-mail going through RPI’s servers has been spam,” said Schwartz.
In addition to the high volume of spam, RPI has also been greatly blacklisted due to what Schwartz calls “second order effects.” For example, a user has a forward to their primary e-mail account on a service such as AOL. In reading their mail through that service, they see some mail, forwarded to them by RPI, that is spam. By legitimately marking that e-mail as spam, AOL may then begin marking all mail forwarded through RPI’s servers as spam, thus causing even legitimate mail from RPI servers to be marked as spam throughout AOL. With the high volume and second order effects, the Institute “flat out had no choice” but to filter all e-mail, Schwartz said.
By opting in to the Respite service, users will be able to have even greater control of their mail filtering preferences. Opted-in users are able to remove all filtering from their account, modify the weight of various components in the spam scoring formula, and add custom rules, whitelists, and blacklists. “In testing, the default rules had a false positive rate of one-third of one percent,” said Schwartz, meaning that one in every 300 legitimate messages was incorrectly filtered. Many of these are e-mails from discussion lists and newsletters, formats which spammers have been attempting to imitate lately.
Schwartz recommends all users opt-in to Respite, “The customized stream interface is much nicer, and the number of custom rules required by R3 is much reduced.” Schwartz also recommends using whitelists to ensure that legitimate e-mail flagged as spam makes it to your inbox. Before R3, Schwartz stated that he had over 150 custom rules to trap spam, but under R3, he needs no custom rules and uses a whitelist of 20 senders and domains.
“Spam is an ongoing situation, but I think we’re in a very good place now,” said Schwartz. “Our goal is to take as much of the burden as possible. It doesn’t add value to be sifting through spam.”
