Don’t Let Spam and Bots Infect Google Analytics:
One of the main frustrations I have with Google Analytics is how to keep my data clean of spam and bot traffic. Web analytics is critical to conversion rate optimisation. You need reliable data to measure the performance of your digital marketing activity. The last thing you need is data in Google Analytics that is not accurate.
However, Incapsula estimate that up to 62% of website traffic is made up automated bots. Spammers are constantly adapting their methods to avoid common strategies of dealing with the fake traffic they generate. It can be difficult to keep on top of the problem they create.
Don’t worry though, there are some proven methods to rid your Google Analytics data of spam and bots.
Types of Spam:
There are two types of spam in Google Analytics; ghosts and crawlers.
Ghosts don’t even access your site, but they make up a majority of fake traffic tracked by Google Analytics. This is important to understand as it explains why such traffic won’t be tracked by Google’s Search Console and server-side solutions like WordPress plugins won’t prevent ghosts.
Spammers use the Measurement Protocol which allows developers to send data directly to Google Analytics’ servers. By using a randomly generated Google Analytics tracking code ID the spammers leave a “visit” with fake data without even knowing which site they are hitting.
Unlike ghost spam, crawlers do access your site and they ignore rules like those in robot.txt files that are designed to prevent them from reading your site. This means when they leave your site they create a record that mimics a real visitor.
Crawlers are more difficult to identify as they target known sites rather than random GA IDs. However, new crawlers are less common and so if you notice a suspicious looking referral in your analytics, checking it out on Google is relatively easy.
What does this tells us?
- Don’t try to handle spam individually because it’s timing consuming and so inefficient.
- Server-side solutions (e.g. WordPress plugins or .htaccess) won’t prevent ghost spam as this type of spam never touches your site.
- Don’t get concerned about spam being detrimental to your SEO as Google Analytics is not used for search rankings.
So, what can you do to stop ghost visits and crawlers making your Google Analytics data unreliable and undermining your performance monitoring? Well, below I outline how you can use filters and segments to stop spammers in their tracks.
Google Analytics Views:
However, before we set up any filters or segments it is essential that you have at least three views set for your Google Analytics account. When you set up filters in GA you will permanently block traffic from your account and so it’s important to use a “Test” view to initially check your new filters are working correctly before implementing on your main view (e.g. “All traffic with filters”).
You should also have an “Unfiltered” view so that you have sight of all traffic, whether internal or fake, so that you can monitor the total impact of all the filters you use.
Your “test view” should be identical to your “All traffic with filters” view apart from when you are testing a new change to the views, such as a filter or some other setting. To create a test view go to your main view (e.g. All traffic with filters), click on “Admin” and then click on “View Settings” in the far right column. Within “View Settings” click “Copy View” and then give the new view a name and click “Copy view” to complete the process.
Let Google Block Bots:
Before creating any new filters or segments to prevent spam from hitting your Google Analytics account make sure you use GA’s own bot filtering setting. Go to the “Admin” area and in the third column from the left you will see “View Settings”. Simply click on the checkbox for “Bot Filtering – Exclude all hits from known bots and spiders”. This will remove up to 80% of bots and spiders and it’s updated regularly as Google becomes aware of new bots.
Dealing with Hostname Ghost Traffic:
If like many sites you have traffic coming from a hostname that you don’t recognise you may have a ghost visitor problem. To check if this is the case go to “Audience” – “Technology” – “Network” and click on “Hostname” as shown below.
This can be caused by one of the following issues:
- A spammer maliciously using your property ID to send fake traffic data to your Google Analytics account.
- A test server sending data to the same Google Analytics property.
To prevent such data inflating your traffic numbers you will need to set up two filters. Firstly a filter to set the value of the hostname to a custom variable and secondly an include filter for your real hostname to block incorrect hostnames.
Hostname Identifier Field:
Go to Google Analytics “Admin” and in the far left column you will see “Filters”.
- Click on “Create new” and give it a name such as Hostname ID Field” or something that informs all users what it is.
- Select the filter type “Custom”.
- Select the option “Advanced”
- Field A – Select “Hostname” and enter “(.*)” without the speech marks.
- Field B – Leave blank
- Field “Output To – Constructor” select “Custom Field 1” and enter “$A1)” as the value.
- “Field A Required” and “Override Output” should both be checked and the other two boxes should be left unchecked.
Include Valid Hostname Filter:
- Set up another new filter and name it “Include Valid Hostname” or something similar.
- Select filter type “Custom”.
- Select filter option “Include”,
- In “Filter field” drop down menu select “Custom Field 1”
- In “Filter Pattern” use the regex of your hostname or hostnames use in your GA profile. To escape any metacharacters you will need to place a back slash “\” or forward slash “/” before a full-stop “.” or a hyphen “-“. For example our website address would look like this “(www\.?)conversion\-uplift\.co\.uk”. Use a vertical bar “|” to separate each individual hostname that you want to include. By inserting brackets (?) and a question mark around “www” GA will accept our address with or without the “www” prefix. Our full expression is:
- Now you can save the filter. Over the next five to seven days compare your hostname data from your test view with your normal filtered view to check that the filters are working as expected.
Language spam often appears in your language report as messages that spammers send to get your attention. Once Google Analytics records language spam it can’t be permanently removed from your reports and so it requires a two pronged approach to prevent language spam from inflating your traffic numbers.
Firstly you can block language spam coming into your reports by setting up a filter. This is a permanent change though and so it should be tried out initially on your test view. Secondly you can apply an advanced segment to your reports to remove language spam from your historical data.
If you have just a few websites you can use the manual method outlined below. If you manage many different sites though you may want to consider an automated solution such as this anti-spam filter tool. Such tools can block referrer spam, language spam, events spam, etc, from hundreds or even thousands of Google Analytics views.
Block Language Spam Coming Into Your Reports:
To create any new filters in Google Analytics you will need “Edit” access at the account level.
This is a simple filter that will block any visitors where the language dimension contains 12 or more characters. Most legitimate language settings will contain between 5 to 6 characters and sometimes 8 to 9 characters. This means that it should only block language spam.
There are also symbols which are not valid for the language dimension, but which are used to create a domain name. The filter will exclude such symbols as well. The expression that we use looks like this:
To create the filter go to your Admin area and then select “Filters” in the third column from the left and click on “Add Filter”. Give the filter a suitable name, select filter type as “Custom” and “Exclude”. You will then need to select “Language Settings” from a drop down menu and paste the filter expression into the “Filter Pattern” input box.
You can then use the “Verify Filter” option to see how it would have affected your data for the last few days. Sometimes GA can’t verify the filter because the number of cases are too low to register a significant change. However, if this does occur you should still use your test view to see if the filter does prevent language spam, even if the numbers are relatively low.
Exclude Historical Language Spam Data:
Filters are unable to block out visits that have already hit Google Analytics and so to clean up your historical data you will need to create a custom segment. Click on “+ Add Segment”. Go to the “Language” dimension, select “does not match regx” and paste the expression into the adjacent input box. You should then save the segment and use it to remove language spam from your reports.
Another common type of spam can often be seen by looking at the referrer report in Google Analytics. You can find this report by selecting “Acquisition” and “Referrals”. Now sort the table by descending bounce rate so that you bring up all the referrers with a 100% bounce rate to the top of the page.
You can then use the “Advanced Filter” to only show those referrers with a minimum threshold of sessions. We used 10 here, but you may want to use a much higher threshold depending upon how much traffic your site attracts. You can now browse through the table and decide which sites you want to add to a referral exclusion list.
Check out any suspicious referrers using Google to ensure they are not genuine blog sites, affiliates etc that are sending quality traffic to your site. You can then build a potential referral exclusion list and create a new filter for spam refers. Go to “Admin” and in the third column from the left select “Filter”. Don’t select “Filter” in the first column as this would set up an account wide filter.
Now select “Add Filter” and enter a suitable name such as “Bad referrers” and select “Custom” and “Exclude”. Now select “Campaign Source” from the drop down for your “Filter Field” and enter the domains you want to exclude in the box. Here is an example of the type of expression you need to enter:
If you are happy with the filter set up you now click on the “Save” button. This filter will permanently change your GA data and so check how it affects your data by first creating it in your test view.
Ghost spam and crawlers if left unchecked can undermine the reliability of your web analytics. Make sure you check the box to allow GA to block known bots. However, this will never protect you from all bots and crawlers. This means you will also need to create appropriate filters to stop spam hitting your web analytics and use segments to deal with historical spam.
It’s also important that you check the referrals report on a regular basis to see if any new suspicious sites are sending low quality traffic to your website. If you follow the procedures outlined above your Google Analytics views are likely to be largely free of spam and you will be able to use GA confident that your data is not overly inflated by ghosts or crawlers.
If you want help with configuring Google Analytics and analysing your data why not contact a conversion rate optimisation consultant who can complete the process for you and make sure you are getting value from web analytics.