word strings

Chas B · Jan 3, 2004

I wonder if someone can advise me why some of the spam I receive contains
strings of apparently unconnected words in the body?
For example.......

verde frame eugenia edmonds eruption meaningful phosphorylate autonomous
andesine sclerotic regretted onrush shiny vermeil cutworm pairwise
circulatory
til bourbaki print leaf bandwagon cubic exotic allah refute brethren

Is it intended to prevent anti-spam devises working?

Just curious.
Chas

Twinkletoes · Jan 3, 2004

Hi Chris

Spammers include lots of words that you would generally find in legitimate
(non-spam) messages, the idea being to fool the bayesian filtering method
into classifying a message as non-spam.

Steve

steve · Jan 3, 2004

Hi Chris

Spammers include lots of words that you would generally find in legitimate
(non-spam) messages, the idea being to fool the bayesian filtering method
into classifying a message as non-spam.

Steve

Bayesian filters were a good idea but spammers have found ways to get
around them.

I kill filter on the header "Content-Type: multipart/alternative"

That gets rid of most spam but it's no good for people that are into
non text messages.

Steve

me · Jan 3, 2004

Chas said:
I wonder if someone can advise me why some of the spam I receive contains
strings of apparently unconnected words in the body?
For example.......

verde frame eugenia edmonds eruption meaningful phosphorylate autonomous
andesine sclerotic regretted onrush shiny vermeil cutworm pairwise
circulatory
til bourbaki print leaf bandwagon cubic exotic allah refute brethren

Is it intended to prevent anti-spam devises working?

Just curious.
Chas

Yes, it is.

J

Gabriele Neukam · Jan 3, 2004

On that special day said:
Spammers include lots of words that you would generally find in legitimate
(non-spam) messages, the idea being to fool the bayesian filtering method
into classifying a message as non-spam.

But that doesn't help much, as these "neutral" words are validated as
just that - neuter. They aren't calculated against the other words
within the spam, just dropped. So the counting of the "bad" words still
remains the same.

No spammers knows which words will be valued as "good" words in the
specific context of any given mail program of a specific user. The
Bayesian filter will always develop its filter settings according to the
individual "revenue" of a given mailbox, which most certainly won't be
identical to any other on this planet. So the guesswork of spammers will
have no effect at all, as "neutral" words cannot "dilute" the "spammity
factor" of the "bad" words, or the mail in toto. Bad minus void won't
become good.

Gabriele Neukam

(e-mail address removed)

Twinkletoes · Jan 13, 2004

In

Gabriele Neukam said:
But that doesn't help much, as these "neutral" words are validated
as just that - neuter. They aren't calculated against the other
words within the spam, just dropped. So the counting of the "bad"
words still remains the same.

In it's simplest form, you have 2 choices, either the word is spam or it's
not. Any words which would be 'neuter' by default are those like "a the it
in" or whatever.

No spammers knows which words will be valued as "good" words in the
specific context of any given mail program of a specific user. The

They don't "know", they guess - while it's not a good guess, it allows some
spam to get through.

Bayesian filter will always develop its filter settings according
to the individual "revenue" of a given mailbox, which most
certainly won't be identical to any other on this planet. So the
guesswork of spammers will have no effect at all, as "neutral"
words cannot "dilute" the "spammity factor" of the "bad" words, or
the mail in toto. Bad minus void won't become good.

The bottom line is spammers are beginning to understand ways around Bayesian
filtering. While it's an extremely effective technique it needs backing up
with other types of spam detection.

Steve

DRACO- · Jan 14, 2004

In

In it's simplest form, you have 2 choices, either the word is spam or it's
not. Any words which would be 'neuter' by default are those like "a the it
in" or whatever.

They don't "know", they guess - while it's not a good guess, it allows some
spam to get through.

The bottom line is spammers are beginning to understand ways around Bayesian
filtering. While it's an extremely effective technique it needs backing up
with other types of spam detection.

Steve

Mozilla uses Bayesian filters in its junk mail controls. It works ok. In
order to help it along I set up rules for all the mail lists im on to be
sorted to their distinct folders. Then the mail goes through a filter
that sorts out advertizements I subscribed to, palmgear and robotics
catalog and vehicle parts catalog. Then if the mail doesnt match that
filter it hits the filter that compares the sender to my contact list.. If
the user is on my contact list they get placed in the inbox.. if they are
not on my contact list they get dropped into a secondary folder for me to
review. From there if I decide they are spam I click the junk icon and
away it goes, teaching the filter a little more. If it is someone I dont
have on my address book and is someone I want in my inbox, I just add the
person to the address book. I have another filter in there somewhere for
emails sent to me that dont specifically say my addresses in the to or cc
field. All of that is always spam. Another filter is simply for Swen
emails which isnt getting any hits in the past month or so.

DRACO-

word strings

Chas B

Twinkletoes

steve

me

Gabriele Neukam

Twinkletoes

DRACO-