spam: finding strings inside < >

George Marshall · Jan 13, 2004

I am trying to get the rules wizard to match strings in the body of
messages, and not having much luck with one kind of string. These are
appearing inside of < > html constructs. I just want to catch messages
with a given web address, an href if you want to get specific, but
nothing seems to work.

I've tried strings with: just the hostname, the entire www.host.com
string, everything from the href through the end of the .com, and the
entire html construct, including the < >.

Any ideas? Outlook 2002 on Win 2000.

George

Brian Tillman · Jan 13, 2004

George Marshall said:
I am trying to get the rules wizard to match strings in the body of
messages, and not having much luck with one kind of string. These are
appearing inside of < > html constructs. I just want to catch
messages with a given web address, an href if you want to get
specific, but nothing seems to work.

And nothing will.
--
Brian Tillman
Smiths Aerospace
3290 Patterson Ave. SE, MS 1B3
Grand Rapids, MI 49512-1991
Brian.Tillman is the name, smiths-aerospace.com is the domain.

I don't speak for Smiths, and Smiths doesn't speak for me.

George Marshall · Jan 14, 2004

It beats me how Microsoft can ignore this one for so long. We have had
multiple versions of Outlook come out since this, and related, problems
were reported, and yet there seems to be absolutely no progress at all.

The code, were it to be written, is not that complex - the addition of
two changes would enable far more spam-catching than we can do now:
1) A Rule (or option) to search raw text, letting all HTML constructs,
legit or not, go through the search engine, and
2) A Rule (or option) to search "De-HTMLed" text, i.e., after all HTML
has been interpreted, and invalid HTML expressions removed.

Failing either of these, a user-friendly macro facility, a la Excel or
Word, that allows user macros to be inserted as a Rule wherever the user
wants, with full access to the message headers and text, would also do
the job, albeit in a harder to use fashion.

As it is, we have an odd combination of non of the above; we can't
search for simple text strings that the spammers have broken up with
bogus HTML, but neither can we find the host addresses etc inside the
links, which they change much less often.

Help!

George

Brian Tillman · Jan 15, 2004

George Marshall said:
The code, were it to be written, is not that complex - the addition of
two changes would enable far more spam-catching than we can do now:
1) A Rule (or option) to search raw text, letting all HTML constructs,
legit or not, go through the search engine, and
2) A Rule (or option) to search "De-HTMLed" text, i.e., after all HTML
has been interpreted, and invalid HTML expressions removed.

This latter change alone (i.e., searching the post-processed text) would
improve things tremendously. We used a corporate mail scanner that scanned
the rendered text and it could always find phrases like "human growth
hormone" even when the SPAMmers actually sent it as "hu</dummy>man
gr</dummy>ow</dummy>th h</dummy>orm</dummy>one".
--
Brian Tillman
Smiths Aerospace
3290 Patterson Ave. SE, MS 1B3
Grand Rapids, MI 49512-1991
Brian.Tillman is the name, smiths-aerospace.com is the domain.

I don't speak for Smiths, and Smiths doesn't speak for me.