Can I create a rule to detect disguised or misspelled words

Guest · Apr 30, 2006

I routinely receive numerous junk mail that contain disguised text in the
body such as "C e I g A g L d I e S q" for Cialis or "V t I b A x G u R f A
f" for Viagra. The extra letters in the words are random so it is impossible
to create a rule for every possible combination. Is there a way to write a
rule, for instance using wildcards, to filter this type of spam?

Milly Staples [MVP - Outlook] · Apr 30, 2006

No, don't even try. Spammers are always one step ahead of rules.

Instead, get a good third party Spam blocker - I usually recommend SpamBayes
from Sourceforge.net - it is trainable and very good. I have been using it
for almost 2 years and, in combination with Outlook 2003 built-in spam
filtering, I almost never see spam in my in-box. Maybe twice weekly, easily
dealt with.

And the price is right - free from open source.

--
Milly Staples [MVP - Outlook]

Post all replies to the group to keep the discussion intact. All
unsolicited mail sent to my personal account will be deleted without
reading.

After furious head scratching, ival50 asked:

| I routinely receive numerous junk mail that contain disguised text in
| the body such as "C e I g A g L d I e S q" for Cialis or "V t I b A x
| G u R f A f" for Viagra. The extra letters in the words are random
| so it is impossible to create a rule for every possible combination.
| Is there a way to write a rule, for instance using wildcards, to
| filter this type of spam?

Ben M. Schorr - MVP · Apr 30, 2006

Aloha ival50,

Other than by specifying each misspelling I don't think so. But you might
Google for SpamBayes and see if it helps you filter more of that stuff out.

-Ben-
Ben M. Schorr, MVP
Roland Schorr & Tower
http://www.rolandschorr.com
Microsoft OneNote FAQ: http://www.factplace.com/onenote.html

Vanguard · Apr 30, 2006

"Milly Staples [MVP - Outlook]"

No, don't even try. Spammers are always one step ahead of rules.

Instead, get a good third party Spam blocker - I usually recommend
SpamBayes
from Sourceforge.net - it is trainable and very good. I have been
using it
for almost 2 years and, in combination with Outlook 2003 built-in spam
filtering, I almost never see spam in my in-box. Maybe twice weekly,
easily
dealt with.

And the price is right - free from open source.

--
Milly Staples [MVP - Outlook]

Post all replies to the group to keep the discussion intact. All
unsolicited mail sent to my personal account will be deleted without
reading.

After furious head scratching, ival50 asked:

| I routinely receive numerous junk mail that contain disguised text
in
| the body such as "C e I g A g L d I e S q" for Cialis or "V t I b A
x
| G u R f A f" for Viagra. The extra letters in the words are random
| so it is impossible to create a rule for every possible combination.
| Is there a way to write a rule, for instance using wildcards, to
| filter this type of spam?

Unless the Bayes filter is learning from some other anti-spam filtering
as to what is spam or not, how does the one-time instance of a
misspelled word provide enough weighting of that misspelled word in its
database? Bayesian filtering doesn't mark as spam a message simply
because you've never encountered a particular word before. Bayesian
works by weighting words (would be nice if phrases were included, too)
but you can't weight a word that you've only encountered once except to
give it a default weighting (which means it is neutral).

Say I send you an e-mail with "syzygy" (yep, it's a real word and not
misspelled in this case). How does Bayes filtering know it is spam
since you've never had that word before in the database for the word
weighting? In the next e-mail, the word was mispelled as "sysygy" but
it is still the very first occurrence of that word in any of your mails
and so it receives default weighting (of neutral). That's why it takes
time for Bayes filtering to *learn* and it actually has to encounter the
words that it will learn. Bayes doesn't work against one-time
occurrence of a word. More likely is that OTHER words within the spam
will get included in the weighting and have been encountered before so
their weighting *changes* to provide a overall bias on the message to
determine if it is spam or not.

While "it might work for you" may make Bayes filtering alone look like
it is functional, it has to learn and keeps learning, and misspellings
eliminate the learning (i.e., weighting). That's why spammers use the
trick. You're hoping something *else* within the spam will achieve high
enough weighting (due to reoccurrence of those words) to mark the mail
as spam (and hence change the weighting of the other keywords used for
testing, and only maybe might the misspelled word be one of those other
keywords). Of course, all that Bayesian weighting is useless against
spam that hides its content inside of attached .gif or .jpg files.

Over time, what I've noticed is that Bayesian filtering can be quite
useful but it can also generate too many false positives or false
negatives. After all, it is guessing! You'll probably want to
incorporate some other methods of spam detection than just Bayes.

John Blessing · Apr 30, 2006

ival50 said:
I routinely receive numerous junk mail that contain disguised text in the
body such as "C e I g A g L d I e S q" for Cialis or "V t I b A x G u R f
A
f" for Viagra. The extra letters in the words are random so it is
impossible
to create a rule for every possible combination. Is there a way to write
a
rule, for instance using wildcards, to filter this type of spam?

Trying to block spam is a pointless waste of time. Try here for some advice:

http://www.lbetoolbox.com/how-to-stop-spam.htm

--
John Blessing

http://www.LbeHelpdesk.com - Help Desk software priced to suit all
businesses
http://www.room-booking-software.com - Schedule rooms & equipment bookings
for your meeting/class over the web.
http://www.lbetoolbox.com - Remove Duplicates from MS Outlook, find/replace,
send newsletters

Scrabble Value calculation for Welsh words	0	Oct 19, 2021
How do I make a rule to delete messages with a particular word in.	1	Aug 25, 2005
Create a rule for content (not header)	2	Jul 15, 2009
Can i apply admin policies to create an Outlook rule?	1	Jun 17, 2010
Can I create an Outlook Alert Rule from a list of words in excel	1	Oct 11, 2006
How do I create a rule to move based upon words in the "from" col	5	Sep 13, 2008
Block/Delete E-Mail via Rules Problem	1	Mar 15, 2006
How can I create a rule to print email attachments?	3	Apr 2, 2007

Can I create a rule to detect disguised or misspelled words

Guest

Milly Staples [MVP - Outlook]

Ben M. Schorr - MVP

Vanguard

John Blessing

Ask a Question

Similar Threads