Email pollution and spam to think about
July 26, 2008 2 Comments
MX Lab intercepts most of the time spam that tries to sell OEM software at very low prices, viagra and all kinds of drugs, replica watches, recommends to by stocks and so on.
From time to time we see spam campaigns that seems to have no real meaning and don’t take you to a web site with some great offers. This weekend we get a lot of this kind of spam. Some examples.
gorse yelp albuquerque
emanuel botulin competitor? masturbate, lapelled lapelled.
rout combatted prussia podge camelopard exult, lampoon
laureate sonogram camelopard stinkpot foxhall.competitor masturbate.
agony elaborate proserpine
proserpine assai percept? holster, edelweiss vile.
nationwide trash brittle rifle orwell somerville, rifle
hanna ileum agony orwell drum.fallacy ribonucleic.
Some other great readings:
“There are certain queer times and occasions in this strange mixed affair we call life when a man takes the whole universe for a vast practical joke.” Herman Melville
“This was love at first sight, love everlasting: a feeling unknown, unhoped for, unexpected–in so far as it could be a matter of conscious awareness; it took entire possession of him, and he understood, with joyous amazement, that this was for life.” Charles Augustus Lindbergh
“It is in our lives and not our words that our religion must be read.” Thomas Jefferson
This one is very nice. It really makes you think.
“And in the end it’s not the years in your life that count. It’s the life in your years.” Abraham Lincoln

I wonder what kind of purpose these campaings could serve?
thanks
Well, when quoting parts of poems, books and using more or less valid words in a concept like this, Bayesian filters can’t get trained very well based on this because they could scan and filter legitimate emails. This technique is often called “Bayesian poisening”. So if you try to add these kind of messages to your Bayesian filter this could result in a less performant anti spam system.
With these messages spammers try to get pass by the anti spam filters and get into your mailbox. If a mailbox doesn’t exist they could receive a server error and they know the email address doens’t exists. With this technique and when sending a lot of messages you could exclude email address combinations or you could exclude non valid email addresses from an existing datalist increasing your hit rate or send a massive campaign to these users.
To be honest, the Commtouch RPD engine we use isn’t catching all the emails (top examples above) and our other techniques, like the Bayesian, also have difficulties to intercept them. These messages are send in small amounts from different resources that are quite hard to catch.
Around 50% comes through but we managed to get a small custom filter in place to block them all, yes 100% detection and interception. When we submit enough examples we see more interceptions up to 75% with the RPD engine so Commtouch is catching up.