Miscellaneous Candy Jar post #2

Here are ten bite-size analytics questions that we culled from various WebTrends reports for the Outsider site, derived from on-site search terms, off-site search terms, the occasional email, and general browsing patterns.

It’s time for another candy jar post!

You know the gag where the kid gets his hand stuck in the candy jar because he won’t let go of a fistful of goodies? That’s us, for the August candy jar topics. We are now limiting candy jar posts to about 10 items, but candy jar posts will be more frequent.

Here are ten of the bite-size analytics topics that we culled from various WebTrends reports for the Outsider site, derived from on-site search terms, off-site search terms, the occasional email or comment, and general browsing patterns.

  1. Capturing time in seconds. WebTrends has two visit-length related measures in custom report:  Visit Length (min), and Viewing Time (sec). Of the two, I always use the one for seconds. Visit Length (min) is rounded off to the nearest minute and it is so imprecise it is useless. If you want a Visit Length number, take the sum of Seconds and divide by the number of visits.  Note: “viewing time” is in the context of to pages. If you set this measure up as an average, you might expect “average seconds per visit” but you are getting “average seconds per page.” 
               
  2. Differences between visits, views, hits, and visitors in the context of a Pages report.

    Visits: counted as 1 if the page was seen anywhere during a visit, and number of views during the visit don’t count. It’s the number of visits that had at least one view of the page. If ten visits saw the page 1 time each, that page would show 10 in the Visits column. If ten visits saw the page 100 times each, that page would still show 10 in the Visits column.

    Views: counted as 1 every time the page was viewed or requested, and number of views during the visit DO count. It’s the number of total requests during the time period of the report. This one should match exactly the number of log file lines for that page URL. Using the example above: If ten visits saw the page 1 time each, that page would show 10 in the Views column. If ten visits saw the page 100 times each, that page would show 100 in the Views column.

    Hits: Think of this as more or less identical to Views, and the numbers should match in the Pages report. The thing that makes “hits” different is that any file can be a hit — a picture, a header, a button, a footer, text that is an image file, a css style sheet, a script file. In the Pages report, you’ll just see listings for files that are, in fact, pages. If you analyze server logs, you will see Hits make a big difference in two reports in particular: the Accessed File Types and the Bandwidth reports.

    Visitors: counted as 1 if the page is seen anywhere during the report’s time frame (month etc) AND is associated with the same visitor each time. If a visitor came ten times during the month and saw that page 100 times in each of those ten visits, it would still show up as ONE visitor for that page and that time period.
     
     
  3. Visitors, continued: So, a visitor is a person? Uh, no. “Visitor” is used mainly because it’s simpler to say than “Unique Cookie Value and/or IP-User Agent Combination.” You’ll see “Visitors” and “Unique Visitors” in all kinds of analytics products, but the best they can do (usually) is count those uniquecookievaluesandorIPuseragentcombinations.

  4. PDF download numbers in my Downloads report are extremely overcounted, looks like.  Then you must be analyzing server log files.   A single PDF file typically gets broken up (by the browser’s Download Manager function) into pieces, and each piece gets logged as a View because each piece is a separate request. Therefore, Visits becomes the important measure for PDF downloads and Views, a much larger number, should be ignored.   (You can make the downloads report more accurate with this key bit of info: only the first installment of a PDF file download has a 200 status code. The subsequent file chunks have 206 status codes, so filtering out the 206’s will give an accurate Views number.)           

  5. What does “exclude activity without dimension data” mean? When putting together a report, WebTrends looks for your dimension in each hit (or visit) and then, finding the dimension, notes the value and tabulates it. For example, the dimension can be “color” with values “red,” “blue,” and “yellow”. If a hit (or visit) does not have that dimension at all, WebTrends buckets it as a hit (or visit) “without dimension data” for that particular dimension and it appears in the report as “None.” Much of the time, you don’t care about these — you just want to know about the dimension values that did happen. So, when setting up a custom report you usually want to check the “exclude activity without dimension data” box.           

  6. What does “None” mean in my reports? This is related to the previous topic. You might see “None” in the dimension column in a report. It is the count of instances where the dimension just wasn’t there. If your dimension is the parameter “color” then the None row will display the number of hits (visits, whatever) where the color parameter was NOT there at all.           

  7. What’s the deal with changing CEOs so often over there? No kidding! There have been four CEOs in the last year! The first departure reflected a fundamental change in direction (a good change IMHO), the second departure was planned (an explicitly interim person who departed on-schedule), the third departure was an embarrassing flakey event of some unknown kind about which we know very little, and the fourth person is still there, an industry and WebTrends veteran who seems to have the confidence of the insiders and of a bunch of outsiders too. Some would say that WebTrends has used up their CEO changes for the next decade.           

  8. Does the new GeoTrends (delivered with 8.5) do a better job? Not that I can tell! Supposedly it is better at Asian IP’s but that’s just hearsay. For North American IPs, as far as I can tell, it has been unchanged for a loooong time. Embarrassing, again.            

  9. What does GeoTrends do? It resolves an IP address into two things: an organization name (in the Organizations report) and a geographic location. It’s a dataset that WebTrends gets from a third party, Akamai. Akami bases its business on being able to identify geography ultra-fast.           

  10. How do I get WT to work with Google (on-site) search that uses the Google Search Appliance? The Google Search Appliance’s output is xml and the xml definitions and fields are readily available for using in tracking. You’ll have to code your search results page so that the desired fields are in the URL as parameters (for either SDC or server log analytics) OR are in WebTrends tags for collection by SDC. The DTD (xml definitions) are on your Google Search Appliance site, here: [search appliance hostname]\google.dtd. Explanations of each of the fields are here on google.com: http://tinyurl.com/6eu6yn. I suggest collecting, at minimum, Q (query), M (number of results), SN (starting number for results on the page), and EN (ending number for results on the page).  Put Q into WT.oss and put M into WT.oss_r.  See our on-site search topics for more.

 

Discrepancies between WebTrends and Google AdWords

To account for the inevitable disrepancies between reports by my main analytics program and Google AdWords (and the other sponsored searches), I’ve done a lot of research including hit-by-hit comparisons. Here is my current list of reasons for discrepancies.

Face it, they are never going to agree.   But that doesn’t mean the differences are random or stupid.

From time to time I compare my data and Google AdWords’ reports, day by day and keyword by keyword.  Sometimes I do it with log data.  I’ve gotta be sure I can have a clear conscience when I tell users that everybody’s doing the best they can.

From this work and things that others have talked about, I have tried to assemble a complete list of reasons for discrepancies. 

In all, I’ve seen differences in any month that range from -10% (WT lower) to +5% (WT higher).  Sometimes it’s spot-on, and it’s usually within 2% or so. 

Sometimes the discrepancies build up gradually, apparently involving long-term slow changes in visitor habits.  #1 below is an example.  Some are relatively sudden, due to changes by another site, by the PPC programs themselves, or by spiders and bots that come into existence and disappear quickly.

In a typical month, for the sites I watch, the effects of the factors I describe here tend to be modest and generally cancel each other out, except for #1 which is fairly constant and usually dominates the other statistical differences.  But over time it’s inevitable that we’ll have occasions when The Perfect Storm happens and the differences get really big.  I’ve even seen instances when the engines report a downturn and we see an uptick.  

Reasons for discrepancies (with Google AdWords or any of the PPC programs) include:

Related to WebTrends settings

  1. WebTrends reports on visits while the search engines report on clicks.  Or rather MY WebTrends reports on visits.  I set the paid-term dimension to record only the keywords seen on the first hit of a visit.  I’ve found that a surprising number of visitors do additional searches during their visit or they back out of the site then return by clicking on the same PPC link.  When they do that, the end result is a visit with two (or many) PPC clicks in it – one to start the visit, and others happening in the middle of the visit.  WebTrends (as I set it up) will report one visit while the search engines will report several clicks.  Result:  WebTrends is lower.
  2. WebTrends is filtering out some of the PPC visitors, such as your own IP address range.  Result:  WebTrends is lower. 

Invalid clicks

  1. Some fraudulent clicking software produces “hits” for search engines’ stats but the bots never actually reach the site.  Later on, the search engine may detect them and  give us a refund.  However, the search engines don’t retroactively change the click reports.  Result:  WebTrends is lower.  (Did you know that Google’s reporting allows you to get a count of the clicks they consider to be fraudulent, by keyword?)
  2. Some fraudulent clicking is caught immediately by the search engine and is removed from both billing and click reporting.  If those clicks do reach the site, the clicks still are seen as visits by WebTrends, which does not know they should be removed.  Result:  WebTrends is higher.
  3. Prefetch bots (notably the AVG Linkscanner bot we have been talking about for the last few weeks) can follow PPC links and produce false visits in WebTrends, although they may be caught and removed from reporting by the search engines.  We know that our detection methods for this bot are not as good as those used by the search engines.  Result:  WebTrends is higher.

Technical glitches

  1. Sometimes the search engines under-report for a day or so.  The cause is usually technical difficulties at their end.  Result:  WebTrends is higher.  These discrepancies are occasional but can be large, often clustering during a two or three week period when search engines make major changes in their systems.  This can happen less than once a year but can be as much as 3% in a month.
  2. The search engines sometimes drop our tracking parameters from the URLs, usually because of misprocessing of one of our uploads or a flaw in our uploads coming from our end.  This gets corrected when the search engines notice or we do.  Result:  WebTrends is lower.  It happens rarely, once or twice a year.  Discrepancy varies.

Clicks that don’t get into SDC logs (server logs are not affected)

  1. A class of browser add-ins often called “web beacon blockers” will prevent a visitor from being tracked by SDC and other javascript, but won’t prevent that visitor from being counted by the PPC engine if they click on a sponsored search link.  One of the most popular is the Adblock Firefox plugin (there aren’t good numbers on how extensively it’s used).  Result:  WebTrends is lower.
  2. If your landing page is longish or slow, and the SDC tag is properly at the end of the page, the visitor may leave before the SDC tag has time to load.  Result:  WebTrends is lower.
  3. You have PPC ads directed to a site page (like a special landing page) that you neglected to tag for SDC data collection.  Result:  WebTrends is lower.  For a big campaign, a LOT lower.  Voice of experience here!

Mistakenly tagged links producing real traffic, but not from PPC

  1. Search engine spiders can follow PPC ads on other sites and record the entire link in their database, including its PPC tracking parameters.  As a result, they display in their natural search results some links that contain PPC tracking parameters .  If someone clicks on one of those links, the resulting visit is actually a natural search visit but the landing page URL indicates to WebTrends that it is a PPC visit.   In the past, we have seen this to be self-correcting because the search engines eventually notice the discrepancy and change the listed URL to the one they see the most, i.e. the one without tracking parameters.  Results:  WebTrends is higher.  
  2. Owners of other sites can click on PPC ads and then add a link on their site using the landing page URL they saw, i.e. one that contains PPC tracking parameters.  The result is a visit that appears to be a PPC visit, but is not.  These tend to be self-correcting over time.  Results: WebTrends is higher.
     

Cool custom report: Dayparts

You should have a solid idea of how visitor behaviors change at the end of the work day or on weekends. “Daypart” is marketingese for sections of the day that have different audiences and different audience behaviors. WebTrends’ SDC tag conveniently collects the local time of day for every hit.

“Daypart” is marketingese for sections of the day that have different audiences and different audience behaviors.    I generally consider marketingese language to be weirder than weird — i.e. don’t “socialize it with me,” just “talk to me about it”.   

But the word “daypart” actually fills a need.

The daypart concept was somewhat useless in the online world until a few years ago when banner brokers and PPC vendors started offering the ability to schedule ads according to time of day.  About the same time, a few pioneering e-comm sites changed their on-site teasers and promos according to daypart.  Daypart analytics became more useful.

If you want to take advantage of these banner, PPC, and on-site promo scheduling capabilities you should have a solid idea of how visitor behaviors change at the end of the work day or on weekends.   

It actually can get interesting.  The last time I analyzed dayparts, the after-work visits for the same e-comm site were very different from the mid-day visits — longer in duration, fewer in page views, less window shopping, more doing.  Although I didn’t solve the problem of matching up the same person using different computers at home and at work, I was pretty convinced that I was seeing work-home dichotomies for the same people.  The “visiting organization” reports supported that interpretation too.

WebTrends’ SDC tag conveniently collects the local time of day for every hit.  The tag asks the visiting computer for its local time zone (presumably correctly set in the first place by a human).  SDC uses this to produce “Browsing Hour,” or “WT.bh”.  WT.bh is an integer with a range 0-23.  It doesn’t collect minutes.  But it’s close enough for government work, eh?

You can create a separate, entire profile for each daypart, or you can do it for separate custom reports.  Either way, you’ll need to make filters.  

(I prefer doing a whole profile for each daypart of interest, so I can look at many things without creating lots of custom reports.  And, with a whole profile, I also get cool stuff like content group paths.)

Approach #1:  Filter a whole profile for a daypart

There are two smart tricks here:

–  Smart trick #1:  You’ll need to have two filters applied to your profile – one that includes the right time of day, and one that excludes all the days except the ones you want.  You have to do an include/exclude combination because include filters are additive.  In other words, combining an Include time-of-day filter with an Include day-of-week filter will result in a report on all activity that was either at the right time of day on any day, *OR* all day on the included days.  That’s not what you want.   (We have a whole post on filters and how they interact.)  
–  Smart trick #2:  Create your time-of-day filter as a visit filter, not a hit filter.  That way, visits that span an hour changeover point don’t get split up, which is what would happen if each hit were put into its own daypart.  Use a visit filter that’s based on the value of the parameter WT.bh for the entry page.  

The profile level time-of-day filter specifics:

  1. In the admin:  Web Analysis >> Options >> Visit Filters >> New Visit Filter 
  2. Filter Name:  something like “Daypart – working hours”
  3. Type:  Include
  4. Include/Exclude based upon:  Entry Page
  5. Entry Page:  Page Expression:  *   (you want it to apply to any entry page)
  6. New URL Parameter
  7. URL Parameter:  Parameter Name:  WT.bh
  8. Parameter Value:  Equal to:  9-17 (or whatever)
  9. NUMERIC – it’s important to specify that this is a numeric parameter
  10. Done

The profile-level day of week filter specifics:

The instructions below are for a filter that excludes weekends, i.e. a filter that should be used in a report on weekdays.  Change it to exclude weekdays if you want to study weekends.

  1. In the admin:  Web Analysis >> Options >> Hit Filters >> New Hit Filter   (yeah, it has to be a hit filter; there’s no alternative.  Adds a bit of noise since visits that span midnight on Sunday or Friday will get split up.)
  2. Filter Name:  something like “day of week – weekend EXCLUDE”  (that’s how I do it)
  3. Type:  Exclude
  4. Include/Exclude based upon:  Day of week
  5. After “Next” you’ll see a list of days.  Check Saturday and Sunday.
  6. Save and you’re done

That’s all.  Make a profile, apply both filters, name it appropriately, have fun.  Don’t get freaked when you open your profile and see the Overview Dashboard showing activity all through the day!  Remember, the Overview Dashboard shows stats in terms of server time, so there will be daytime (in their local time) visitors  who show up in the Overview Dashboard as nighttimers (in server time). 

Approach #2:  Filter a custom report for daypart

There are two smart tricks here:

–  Smart trick #1: You’ll need to have two filters applied to your custom report – one that includes the right time of day, and one that excludes all the days except the ones you want.  You have to do an include/exclude combination because include filters are additive.  In other words, combining an Include time-of-day filter with an Include day-of-week filter will result in a report on all activity that was either at the right time of day for any day, *OR* all day on the included days.  Get it?  (We have a whole post on filters and how they interact.)  
–  Smart trick #2: Create your time-of-day filter as a visit filter, not a hit filter.  That way, visits that span an hour changeover point don’t get split up.  Use a visit filter that’s based on the value of the parameter WT.bh for the entry page. 

The custom report time-of-day filter specifics:

  1. In the admin:  Web Analysis >> Report Configuration >> Custom Reports >> Filters >> New Filter 
  2. Filter Name:  something like “Daypart – working hours”
  3. Category:  whatever you want
  4. Type of Filter:  Visit
  5. Match any/all criteria choices:  doesn’t matter
  6. Add New Match Criteria
  7. Filter on:  Entry Page
  8. Page Expression:  Equal To:  *   (you want it to apply to any entry page)
  9. New URL Parameter
  10. URL Parameter:  Parameter Name:  WT.bh
  11. Parameter Value:  Equal to:  9-17 (or whatever)
  12. NUMERIC – it’s important to specify that this is a numeric parameter
  13. Done

The custom report day-of-week filter that excludes or includes weekends:

  1. In the admin:  Web Analysis >> Report Configuration >> Custom Reports >> Filters >> New Filter
  2. Filter Name:  something like “Day of week – weekend ” 
  3. Category:  whatever (I have a category called “Time”)
  4. Type:  Hit
  5. Filter must match choices … doesn’t matter which you choose
  6. Add New Match Criteria
  7. Filter on:  Day of week
  8. Check the weekend days

 That’s it.  Use these in a custom report with the dimension(s) of your choice.  Be sure to apply the daypart filter as an Include and the day of week filter as an Exclude

Adding the Chrome browser to your reports

Google released the Chrome browser today. Here’s how to add it to your WebTrends configuration so it will show up in browser reports.

Applies to:  software

(Note – in November 2008 WebTrends started making updated copies of browsers.ini and keywords.ini available to software users from their web site!  We’ll post more about this soon once we have checked it out.) 

http://www.webtrends.com/support/browser-and-keywords-updater.aspx

Back to the original September post:

Google released the Chrome browser today.  Here’s how to add it to your WebTrends configuration so it will show up in browser reports.  (On Demand will start displaying Chrome as “Google Chrome” this coming Monday afternoon.)  You can make the change yourself, or you can download an updated browsers.ini from WebTrends (go to the Knowledge Base, search on “analytics chrome”, look at the top result.)

The file you have to change is called browsers.ini.

There are usually three or four copies of browsers.ini on the typical WebTrends installation, and there could be extra copies on other servers if you use distributed architecture. You’ll need to change all of them.  Typical locations are:

/WebTrends/modules/analysis/engine/8.0d (8.1, 8.5, etc)
/WebTrends/storage/config/component/lookupdata/
/WebTrends/storage/config/engine/8.0d (8.1, 8.5, etc)
/WebTrends/

Open the first instance of browsers.ini with a proper text editor. By “proper” we mean something like TextPad rather than Notepad, because Notepad doesn’t play well with the system when the file is in use. With TextPad, you can [usually] take the risk of changing the file while WebTrends is running.

Step 1 – Change the Browser list. Go to the end of the long Browser list. Find the last numbered entry. It’ll look something like this:

Browser28=GECKO

It looks like it’s a good idea to put Chrome above both Gecko and Safari, so insert a line above these two.  It’ll look something like this:

Browser26=CHROME

Renumber!  Make sure the numbers are in order!

Step 2 – Add the specifications.  Go to the end of the specification list (groups of three lines).  The last one might look like this (if your last one is Safari).  It will be just above the [Wireless Browsers] section of the browsers.ini file.

[SAFARI]
log=SAFARI
text=Safari

Find the right spot in the list, corresponding to your numbered entry order in Step 1.    Add a blank line and then this:

[CHROME]
log=CHROME
text=Google Chrome

Remember, this section has to be in exactly the same order as the first section!

Step 3.  Save and close the file.

Step 4.  Make the same changes to other copies of browsers.ini in your installation.

Step 5.    Restart your Scheduler Service

There’s a decent chance that this restart won’t be necessary but it sometimes is needed, so do this step if you can (or if you notice that the browsers.ini change didn’t have an effect in your reports).

How do you do this?  You have to log on to the WebTrends machine as a local administrator, go to Services (Control Panel > Administrative Tools > Services), select “WebTrends – Scheduler Agent” and click on Restart.  Or, you ask somebody with admin access to do it.

Done!  Your Browsers report will now show Chrome, and your Browser Versions report will show Chrome with its correct version number.  The version number will probably change often. 

NOTE for the future:  Keep a copy of the new and old browsers.ini somewhere safe.  Any WebTrends upgrades will overwrite your modified browsers.ini with the one that comes with the install, so you will have to do some additional manual stuff after those upgrades or complete reinstalls.

Postscript:  The browser string is kinda long, and the string “Chrome” may sometimes stray outside the WebTrends 8.0x limit of 100 characters.  Here is one way I’ve seen the UA field show up:

Mozilla/5.0+(Windows;+U;+Windows+NT+5.1;+en-US)+AppleWebKit/525.13+(KHTML,+like+Gecko)+Chrome/0.2.149.27+Safari/525.13 

You can see that both Gecko and Safari appear as strings in there.  Because I want Chrome to show as Chrome rather than Gecko or Safari, I placed the Chrome entry a bit higher in the list in my Step 1.

 

Cool custom report: Segmenting by brandedness of search terms

All search terms are not alike. Here’s how to get WebTrends to give you site behavior data for different kinds of search terms, starting with a simple branded vs nonbranded term distinction.

Our “How are first-time visits different” post last month was apparently a big success.  It produced some inquiries about branded search terms vs non-branded.  Our post said this:

Pattern uncovered:  First-timers are brought to the site by generic search terms.  Veterans almost always arrive by brand-specific search terms, if they use search engines at all.  Lesson:  make sure paid search terms that are generic go to landing pages that sell the visitor on your company as a whole and provide other first-time-critical info.  Monitor conversion and retention rates for the revised landing pages, concentrating on effects on first-timers.

This is just one of the many insights you can find if you separate brand and non-brand (generic) search terms.  It’s a must-do if you are investing in PPC or SEO. 

Warning – once you start down this path, you could get addicted.  You’ll want to subdivide search terms more finely than just brand versus no-brand.  It all has a very high probability of being worthwhile as well as interesting.  Let me know what you find, eh?

So the question is, how can you get reports on what happens in visits coming from branded versus non-branded terms?

Easy.  You make custom reports that include only brand-name search terms or only non-brand search terms.   The reports can have anything you like as a dimension — what URLs were looked at, what Content Groups, New vs Returning visits.  You can even set a Scenario Analysis as a dimension to compare funnel throughput.

The reports we describe here use the search term as found in the referrer field.  They’ll include both organic and PPC search visits.

The prep work involves two custom filters: 

The first custom filter allows/excludes visits that are from brand-term search phrases:

  • Type of filter:  Visit
  • Filter must match – one or more criteria (important to select this one! otherwise you get empty reports)
  • Add New Match Criteria
  • Filter on:  Search Phrase
  • Equal to [brand term; can be a partial match]
  • Match on:  Regular Expression (my preference is Regular Expression.  You can use Text match with * if you want)

(Keep adding search phrases that contain brand term snippets.  The list can get as long as you want.)

The other custom filter allows/excludes visits that are from search:

  • Type of filter:  Visit
  • Filter must match — doesn’t matter which one you pick
  • Add New Match Criteria
  • Filter on:  Search Phrase
  • Equal to *
  • Match on:  Text

And, the custom reports are:

Report that includes only brand-term search phrases:

  • Dimension:  Whatever you want, like URL or Content Groups
  • Filter:  The brand-term search phrase filter (first filter), as an INCLUDE

Report that includes only NON-brand-term search phrases:

  • Dimension:  Whatever you want, like URL or Content Groups
  • Filter 1:  The “search traffic” filter (other filter), as an INCLUDE
  • Filter 2: The brand-term search phrase filter (first filter), as an EXCLUDE

 Do you see how the second one works?  It gets all the non-brand-term traffic by excluding the brand-term traffic … but you have to have the “include search traffic” filter to avoid pulling in a zillion non-search visits that would result in a huge “None” row.

 

Extra-credit tip:  Your brand-term filter probably isn’t completely complete, no matter how smart you are.  It’s a good idea to create one other report that you should eyeball from time to time for strings you missed.  This is the report.  I call this type of report a “checkup report”: 

  • Dimension is Search Phrase 
  • Filtering is as described above for NON-brand-term search phrases 

When you look at the results of this report, you should recognize no brand terms at all.  If you see any, add them to your filter for brand terms.