A
Arthur Shapiro
The patient is my main home machine, running Outlook 2003 in POP mode.
While my ISP does a fairly commendable job of filtering spam, and Spam Bayes
gets most of the rest, there's one type of message, apparently from the same
source, that appears several times per day, isn't getting caught, and which
I'd like to filter - permanently delete. Just for the sake of anyone's
curiosity, the subject of these message is generally along the lines of "<drug
name> Used to Control <disease>". (Everyone get these ones?)
Using Spam Bayes to look at one of these messages without opening it, it seems
to have huge amounts of textual garbage to get around some of the filtering
algorithms. It's an HTML message, and each one I've looked at has some very
distinctive coding, with always the unusual break of a URL across multiple
lines:
h
t
tp:
and also the rather silly HTML string </FONT><STRONG></STRONG>.
I normally don't have much trouble writing rules (I wish Outlook did a better
job of obeying them on the fly; it does well manually running them) but I
haven't been able to trap either of those two examples. I would have
intuitively thought that the second one, especially, would be a routine
filtering criteria for words in body.
Do the rules not work on HTML tokens, or do I have to play games with the
brackets and virgule characters? Any other suggestions?
Art
While my ISP does a fairly commendable job of filtering spam, and Spam Bayes
gets most of the rest, there's one type of message, apparently from the same
source, that appears several times per day, isn't getting caught, and which
I'd like to filter - permanently delete. Just for the sake of anyone's
curiosity, the subject of these message is generally along the lines of "<drug
name> Used to Control <disease>". (Everyone get these ones?)
Using Spam Bayes to look at one of these messages without opening it, it seems
to have huge amounts of textual garbage to get around some of the filtering
algorithms. It's an HTML message, and each one I've looked at has some very
distinctive coding, with always the unusual break of a URL across multiple
lines:
h
t
tp:
and also the rather silly HTML string </FONT><STRONG></STRONG>.
I normally don't have much trouble writing rules (I wish Outlook did a better
job of obeying them on the fly; it does well manually running them) but I
haven't been able to trap either of those two examples. I would have
intuitively thought that the second one, especially, would be a routine
filtering criteria for words in body.
Do the rules not work on HTML tokens, or do I have to play games with the
brackets and virgule characters? Any other suggestions?
Art