How to handle over 1 million different unique pages per day? We’ll, you can’t
Welcome to my first web analytics post. A little background information. My name is Aart Nicolai and I have been working in the internet industry for the past 13 years. The past 6 years I have spent at Holland’s biggest property portal, funda.nl.
At funda I’m responsible for Web Analytics (WA) and Business Intelligence. Next to this I’m a project manager. As as project manager I try to get new features live on spec and on time.
About half a year ago our Omniture contract was ending and we decided to move on with Google Analytics (GA) for a several ($$$) good reasons. During the transition and implementation of the project we learned a lot about using GA in relation to hight traffic commercial website’s. This is reason I decided to start sharing my experience with the rest of GA world and particularly for those who’re thinking of moving over to GA but are a bit afraid.
Our previous WA tool was configured to track all unique pages on our website. This means every single property search resultslist and property detail pages were logged separately. This resulted into a huge amount of data. The idea was to slice and dice the data later on, possible with the help of third-party tools and the api. This never happened.
At the time we implemented the version of the GA tracking code next to the Omniture code we noticed a high number of the so-called “Other” pages in the content section. We assumed GA was still processing our data but the “Other” number kept on growing. After a while we found out that GA has a 50.000 unique pages per day limit, the other pages will be stored in the “Other” section. Although you can’t see what these pages are, they are being counted as pageviews, visits and visitors.
Tag your pages
After a few days we came to the conclusion that we actually don’t have a need for such detailed data. For instance specific property detail pages and detailed search queries. We only need high level page statistics. Things like the total number of search queries for sale or the total number of photo pages for properties for sale. This resulted into an Excel document of ~300 different pagenames.
A few examples:
The search resultpage for houses for sale in Amsterdam between 450.000 and 550.000 euro’s has the URL http://www.funda.nl/koop/amsterdam/450000-550000/.
In GA we track this as pageTracker._trackPageview(“koop/resultaatlijst”);, when we break this down you get:
“koop”: property for sale
“resultaatlijst”: resultlist
- or -
A property detail page of a house in Amsterdam has the URL http://www.funda.nl/koop/amsterdam/appartement-29461527-eerste-helmersstraat-79-iii/. In GA we save this page as pageTracker._trackPageview(“koop/nvm/object-overzicht”);, which means:
“koop”: property for sale
“nvm”: type of broker
“object-overzicht”: property detail page
Using this structure it became really easy to understand the website in GA.
Forgot the tag a page?
In case we have forgotten to tag a page we automatically tag this page with a special prefix. Like “/niet_gecategoriseerd_http://www.funda.nl/fout/ObjectNotFound.aspx?zoekurl=~/koop/”.
Which means something like “not categorised” followed by the url of that particular page. Every once in a while we go over these “/niet_gecategoriseerd_” pages and fix them. After a while all pages are tagged correctly.
Watch out for typo’s and case sensitivity!
What’s the downside?
The site overlay doesn’t work anymore. We never use this feature, so for us there was no problem.
Tagging document
In case you want to use this method too, I have attached an example of the Excel document I have used to generate pagenames. Just define a section and a pagename, in the GA tag collumn the GA tag appears. if you like you can extend this to generate the full GA javascript tag.
Download: Google Analytics tagging document | GA_tagging.xls


