Cleaning Data and Finding Data with Google Analytics [cxl review week 7]

Adriana Belei
5 min readAug 16, 2020

When using Google Analytics is very difficult to tell if you are analyzing the data correctly. There are factors you need to take into account to make sure your data is as accurate as possible. This is achieved by removing data and traffic sources you don´t need or reflect your consumer´s traffic behavior.

Good data has a story to tell but great data tells a story. That is the reason you must ensure your data is clean and set up correctly. This will make your story easier to read.

We are going to analyze first how you can clean your data in Google Analytics and how it is possible to set up filters and segments to find more answers about your audience.

Data cleaning in Google Analytics

Filtering out spam

Although GA is somehow aware about common spam sources there are still resources it cannot recognize. Those are the resources that need to be set up manually. The reason you want to remove spam is because this bad data is hiding the story analytics is trying to say. The more cleaner the data is, the more clear is that story. Some spam source can be:

  • Referral sources. Check the websites are referring your own website. There you might be able to see links that do not make sense or that are not entirely useful. This is why you want to remove it as is not real traffic but created by a bot.
  • Language. They are another are where you are able to see injected articles through measurement protocol. Through these technique is possible to inject links to articles that are not relevant.

To solve these common issues you can

  • Check in the view settings the bot filtering option that will capture this first wave of spam.
  • Create a filter that excludes non reliable known sources of traffic.
  • Set up custom dimensions in which you will build a filter based on a defined dimension.

Removing internal hits

This type of traffic is not spammy or generated by bots but is traffic coming from your own team. This traffic is not bad per se but it does not reflect a transparent image of who your visitor is. If you are not aware of it you might be making bad marketing decisions over unreliable data. Since GA considers every event as a hit and therefore this will affect its data available you must ensure your team’s data is not included. There are ways to remove that by adding the following filters in testing views:

  • The easiest option is to use the google analytics opt-out-add-on chrome extension. This is the simplest way to remove internal traffic. When you have a small team is perfect because it turns off your internal traffic.

Cross Domain Tracking

A more complex data cleaning is when you are using different domains through your customer journey and you need to attribute accurately the traffic source.

  • Traffic attribution: The way GA stores information is in a client ID that is assigned to a user. Users have sessions and they can also repeat sessions once you have stored their ID. Your users visit your site from an ad they click on Facebook for instance and they buy your product. This is what would happen in an ideal world in which your sales funnel is set up perfectly. Now you know that your source traffic is Facebook.

The difficulty comes when the operations happen between different domains. Setting up measures to analyze the traffic source becomes more challenging. For instance someone comes to your website through and ad and arrives to the cart page. Imagine that cart is built on a subdomain or is built with a landing page on click funnels. How do you manage to track that without loosing the initial source of traffic?

These end up being cross domain problems. Meaning that GA considers one session as one traffic sources. This would not be clear as to understand where the origin of this traffic is. SO this needs to be specified in google to be able to track the actual sources of the traffic. You need to tell google analytics the way traffic is getting to the website and you want GA to connect the two traffic sources as if it was one.

Solutions for this is to use a Google Analytics debugger (chrome extension) allows you to see the hits breakdown in the console. It stores the client ID and regardless of the domain you are in the domain id remain the same. So all the information is being tracked by client id. If the client ID changes is when you lose the session and attribution and you need to set up cross domain tracking.

For instance when you need to pass the client id to the cart.com page. This is called decorating the link and we have to do this then google analytics is going to create a new link and not identify the traffic source correctly. The same principles apply to subdomains.

In tracking info you have the domains that you are telling GA to ignore or to identify as the main domains source of information. You can set up these up without set up cross domain tracking but is not as effective as the first option.

Finding answers

After your data is clean you want to start analyzing and making conclusions over the data available. Two common methods to understand your visitors as they go through their customer journey are funnel tracking, goal flow and segmentation.

Funnel Tracking vs Goal Flow

To track your customer’s journey you can set up with destination pages and see the user’s behavior. To set them up we can go into goals and in the goal details we can define the steps in the funnel. Funnels becomes very limited in terms of actions that happened in the past. A funnel what basically does is to collect information from a point defined onwards. Therefore is not possible to analyze information backwards.

As a solution to this you can use goal flows in GA to explore traffic from where is coming and what flow it followed until it exited the website. You can have in this way an in-depth analysis of your user’s traffic as they go through the customer journey.

Segmentation

This process allow you to separate a segment of the visitors and analyze that data. The segment is similar to a filter. A segment is where you can use a temporary filter within the report. You can create segments to analyze and compare specific information. This data can be removed without affecting the overall available data traffic.

*This article is part of the course Growth Marketing prepared by CXL.

--

--

Adriana Belei

Growth Marketing Specialist. In charge of business development at p.Xel Digital Agency (www.p-xel.co)