Tuesday 31 May 2016

Ask an SEO: Filtering Referral Spam in Google Analytics by @jennyhalasz

Want to ask Jenny an SEO question for her bi-weekly column? Fill out our form or use #AskAnSEO on social media.

This week, we received a question so big, I decided to dedicate an entire post to it. I’ve also received this question multiple times while presenting analytics presentations.

How can I filter out spammy referral traffic to my site? I heard Google started filtering them out but still see them.  –K. Fong

My static answer when giving presentations is to view the Google Analytics Solutions Gallery and search for referrer spam or block bots or similar. There are some great resources there for people just getting started. But the real answer to this question is a lot bigger, and it has many parts:

  1. Understand what you’re dealing with. It’s not just bots.
  2. Filter wisely: Set up a separate view.
  3. Block Bots in Analytics.
  4. Discover referrers manually.
  5. Create a “bad referrer” filter.
  6. Block bad bots from the website. Carefully.

Part 1: Bots Aren’t Bad, They’re Just Drawn That Way

Not all bots are bad. Many, like Googlebot and Bingbot make our search world go ‘round. There are plenty of bots belonging to companies like Screaming Frog, Deep Crawl, SpyFu and others that are respectful to the sites they crawl, not dangerous, and not bad for your visitors.

The ones you want to block are the ones that seek to hijack your traffic, find loopholes in your CMS to exploit for hacking, and scrape your content for their own nefarious purposes. Depending on what industry you are in, some forms of bot traffic may be worse than others.

But it’s not only bots you should be worried about. There are plenty of referral sources that send lots of traffic your way that you may not want to muddy the waters of your data.

Part 2: Filter Analytics Traffic Wisely

When you’re just getting started, you should be fully aware of what is being taken out of your data set. To understand that, you have to compare. What I recommend to clients is that you create a separate view in Analytics and name it something like “bot traffic filtered”. To do this, click on “Admin”. Then in the right column under “View”, click on the drop down menu. Select “Create new view”. On the next screen, be sure you set your time zone to what’s appropriate; Google defaults to Pacific Time. If you forget this step, you won’t be comparing apples to apples in your new view.

Creating a new view in Google Analytics

Creating a new view in Google Analytics

Part 3: Block Bots in Analytics

Google gives you an “easy button” for blocking known bots. This will take out 75-80% of your work compared to doing this manually, and it’s regularly updated as Google finds new bots. For your new view only, select the “view settings” option and click the checkbox to “Exclude all hits from known bots and spiders” as shown below:

Filter Bots in Google Analytics

How to Filter Bots in Google Analytics

This way, you get a very clear picture of what’s going to happen to your traffic once you turn on the bot filtering. You can make sure that none of your important traffic sources are in Google’s known list of bots (they do make mistakes occasionally) and you’ll be able to prepare other people who view your analytics for the change if/when you decide to roll it out to the main profile.

If/When you do decide to roll it out to the main profile, help yourself and everyone else out by adding an annotation which explains any changes, for example: “Started Filtering Bot Traffic”. To add an annotation, simply click on the little arrow under any analytics chart in Google Analytics and follow the simple instructions:

Creating an annotation in Google Analytics

Creating an annotation in Google Analytics

Part 4: Add Spam Referrers Manually

No matter how good Google’s bot filtering system gets, there will inevitably be other referrers that send high volumes of low or no quality traffic to your site. To spot these, open the referrer report in Google Analytics as shown below. Then sort the data descending by bounce rate, so you bring the 100% bounce rate to the top. Finally, filter the data by using the advanced filter to only show a number of sessions over a certain threshold. This will vary according to your traffic volume; I used 50 for this example.

Referral traffic in Google Analytics

Viewing referral traffic in Google Analytics

Now you can scroll through the list and find sites you may want to add to your referral exclusion list. I say “may” because you need to check with other stakeholders in your company to make sure none of these are just a failed advertising attempt. This is another reason why you should test this in a separate view first.

Once you have your list of sites to filter, cut them down to just the main TLD (top-level domain). For example, af401e8c.linkbabes.com is probably a specific affiliate of linkbabes.com. So it’s better to just add linkbabes.com to your potential referral exclusion list.

By the way, this is not for the faint of heart. You may find some risqué websites in these lists. I strongly recommend you do not visit any of them to “check them out” or you may find yourself the recipient of some unwanted malware or spyware.

Once your list is fully vetted and you’re sure you won’t be blocking any important traffic that someone else in your organization wants to see, go ahead and create a custom referrer filter.

Part 5: Create a Bad Referrer Filter

Once you have a list of bad referrers that you want to block, create a new filter in the view you set up earlier specifically for “bad referrers”. Be sure to do this in the view screen (the one on the far right under admin) and not at the account level!

To set up the filter, select “Admin”, then under “View”, select “Filter”. Click on “Add Filter” and give the filter a name. Now click on “Custom” and “Exclude”. Select “Referral” under “Filter Field” and enter the domains you want to exclude in the box. Do this in a notepad or word doc first and then paste it in; it’s too easy to mess something up using this tiny little box.

Viewing filters in Google Analytics

Create a referral filter

To enter multiple domains, use regular expressions. Use the “/” to escape (make it function as text) the “.” in “.com”, and separate multiple domains with a pipe bar “|”.

Be sure and test your filter and update it frequently as you find new domains to exclude.

Part 6: Block Bad Bots from Your Website

This last one is not for the beginner, because it involves using .htaccess or web config in IIS, which is the backbone of your entire site. One wrong character can bring your entire site down. So make a backup copy, make sure you have access directly to your server (through WordPress doesn’t count) and tread lightly and carefully.

Disclaimer aside, the .htaccess file is a powerful tool at your disposal, because for very bad or very high volume bot traffic, you can block it from accessing your server entirely. The command to use is

Rewrite Engine On

Options +FollowSymlinks

Deny from 123.45.67.89

Allow from all

You will have to integrate this code into your existing .htaccess file, so don’t just copy/paste it. Remember, one wrong character, and it’s lights out.

This is an effective way to block bot traffic that is placing a high load on your server, but it shouldn’t be used for just anyone, because the longer this list gets, the more load it puts on your server, and the more it can actually slow your site down. So don’t use it to block former employees (go ahead and laugh, this actually happened to me!) and remember that IP addresses change. If you’re having a serious security issue, contact your web host or system administrator for help.

The effect of blocking bad bot traffic at the server level is two-fold. It will help reduce load on your server, and it will also take these visits out of Analytics, because the traffic will never resolve to your website.

TL;DR?

  • Bots and referrers are different, but have the same effects: slowing down your server and muddying your analytics data.
  • You can block them by IP address or by top-level domain depending on what blocking solution you choose.
  • You can block them in .htaccess or web config, or you can filter out their traffic in analytics either with Google tools or with a custom filter.
  • Be careful about what you filter and make sure other stakeholders know what you’re up to. Don’t filter at the account level; you always want one view that has all traffic just in case.
  • Annotate, label, and inform as much as possible about changes you make and the dates that you make them.

That’s all for this week. Keep the questions coming; we’ll do another rapid-fire Q&A next time.

 

Image Credits

Featured Image: Image by Paulo Bobita
All screenshots by Jenny Halasz. Taken May 2016.


No comments:

Post a Comment