Filtering SEMalt Referrer Spam

If you have a new website with Google Analytics added, you've probably seen the SEMalt spam bot crawling your website and flooding your Referrers reports. Launching a new website is difficult enough without having to sift through spam in your analytics!

No Spam license plate

The Google Analytics team have added an option to filter out robots and spiders from Google Analytics, but it doesn't seem to block everything. What if you'd rather not have the spiders / spambots hitting your site at all & contaminating your "Time On Page" stats and other metrics?

The SEMalt bot seems to be the worst offender right now. At first they were just pinging my site every day from one domain, but now they're visiting from a flood of subdomains, filling my reports. On a relatively new site, 75% of my referrers are currently spam.

So I got fed up, and I added the following to the root Apache .htaccess file on my website:

# Banned websites and crawlers
SetEnvIfNoCase Referer "^http://([a-z0-9\-]+\.)*semalt\.com(/|:|$)" banned
SetEnvIfNoCase Referer "^http://([a-z0-9\-]+\.)*4webmasters\.org(/|:|$)" banned
SetEnvIfNoCase Referer "^http://([a-z0-9\-]+\.)*7makemoneyonline\.com(/|:|$)" banned
SetEnvIfNoCase Referer "^http://([a-z0-9\-]+\.)*best-seo-offer\.com(/|:|$)" banned
SetEnvIfNoCase Referer "^http://([a-z0-9\-]+\.)*buttons-for-website\.com(/|:|$)" banned
SetEnvIfNoCase Referer "^http://([a-z0-9\-]+\.)*buttons-for-your-website\.com(/|:|$)" banned
SetEnvIfNoCase Referer "^http://([a-z0-9\-]+\.)*kambasoft\.com(/|:|$)" banned
SetEnvIfNoCase Referer "^http://([a-z0-9\-]+\.)*trafficmonetize\.org(/|:|$)" banned
SetEnvIfNoCase Referer "^http://([a-z0-9\-]+\.)*trafficmonetizer\.org(/|:|$)" banned
SetEnvIfNoCase Referer "^http://([a-z0-9\-]+\.)*webmonetizer\.net(/|:|$)" banned

Order Deny,Allow
Deny from env=banned

What the above script does is block any browser claiming to have been referred from semalt.com (and kambasoft.com and others, which appear to be related) from accessing your website at all. Of course, it's easy to bypass, the SEMalt folks could just change the referrer on their robots... but since the whole purpose of their spam is to advertise their website and entice you into clicking through to see their services, changing the referrer would have little advantage for them.

The code above has worked for robots that actually visit my website, and the same technique works on my namesuppressed website to block referrals to my software from pirate software websites.

Update: Alas, many spambots now attack the Google Analytics servers directly and never touch your website. In that case, the code above won't work. To deal with those, try the methods listed at Analytics Edge. If you have downloaded your Apache / IIS server logs, you could also try Web Log Storming on Windows to filter and analyse your historical data.

26 August 2014 // ©2014 Kohan Ikin.