Fun with the Visitor History File

Applies to Software, On Demand (for some extra $$)

Poking around in the WT admin interface you may have noticed many references to the “Visitor History.” Visitor history is used to save information about your users across visits. It enables WebTrends to keep track of and report on nice things like a particular visitor’s lifetime value or their most recent campaigns. What you may not know is that this information is saved in a flat csv file known as the visitor history file. If you’re using software and you’ve enabled “Visitor History” on the Analysis >> Visitor History screen of a profile the file can be found in /storage/config/vhexport/<Profile ID>. It’s conveniently saved as a csv file inside a zip file. If you use On Demand then you’ll have to pony up some cash to get access to it.

If you have access go take a look at one of yours (I’ll wait…). If you don’t have access then check out this little bitty sample of one:

 Sample visitor history file

Pretty cool, eh? It’s got just about a bazillion columns (ok, 42 columns) of information about your visitors. The neat thing is that each row represents an individual (at least as close to an individual as WebTrends will ever get) and all that data is there for you to do with as you please. Here are some fun things I’ve done with this data:

  1. Analyze it. Throw it in SPSS or even Excel and start slicing and dicing. What are your most profitable keywords? What day of the week do your most valuable customers first enter the site? How often do your most valuable customers visit? How long do they stay. Yes, you can answer almost all of these questions via custom reports, but you won’t be able to see everything all together in one file that you can easily manipulate offline.
  2. Mash it up. This is the best use of it. See that first column, the one titled “Visitorid”? That happens to be the same value that is stored in that visitor’s WebTrends cookie (assuming you’re using SDC with first party cookies), So here’s what you do: when someone converts (purchases, submits a contact form, whatever) pass their WT cookie identifier along with the rest of their info. Bingo, you now have a common key between a rich set of web analytics data and your CRM or customer file. If you’re in an industry that has a long sales cycle you can use this to tie final sales back to the inital web visit. It’s a great way to put real dollar amounts on your lead generation efforts — which really gets management’s attention!
  3. Re-use it. Your customers reveal a lot about themselves in this file. Why not use the info to help them out? If you export it to your website’s database you can use it to personalize their subsequent visits. Their search terms can reveal their interests, you can use the page or content group of interest columns to determine which content they want to see.

There are better ways to do most of this stuff, but they all require buying and installing more expensive software. This uses something you probably have access to right now.

What have you done with the Visitor History file?


The order in which WebTrends executes filters

Newbie users, you might want to avert your eyes because this topic could make you think you’ll just never master this program.  There are so many ins and outs to know about, aren’t there?

(Actually, newbies, you’ll probably never need to know 99% of this stuff so don’t let it get you down.  Most users never really need this information; it’s mainly for complex layered reporting.)

The cool thing about the following list is that it explains so darn much.  There IS an underlying logic.  It DOES make sense.  It’s not random, and it’s not out to get you!

WebTrends follows this order in processing filters and a few other things.

  1. Happens first: URL Search & Replace changes the URLs.   (Details about ordering within URL S&R:  as WebTrends takes in each successive log file line in a log, it looks for a match in the chronologically-sorted list of URL S&R’s and executes the first (oldest) one that matches.  Once it has executed one S&R for a log file line, it goes to the next log file line.  It only applies one URL S&R operation to any given log file line.)
  2. Next, individual hits are filtered in or out for the whole profile.  These are the Hit and Visit filters that you create under Options in the admin.  When both Include and Exclude profile-level filters are applied to the same hit, the Exclude wins because it happens first.  Includes cannot undo Excludes.
  3. Then, log file lines are sessionized (combined into visits) and whole visits are filtered in or out using profile-level Visit Filters.  If an Exclude and Include apply to the same visit, the Exclude wins because it happens first.  Includes cannot undo Excludes.   Visit filters cannot undo the previous Hit filters.
  4. Now the Main Analysis happens, including content group assembly, entry page identification, visit length calculations, identification of entry pages and referrers, etc.
  5. After the main analysis, Custom Report analysis happens.   Within each Custom Report, visit filters are applied first, then hit filters.   Notice that this order is the opposite of what happens at the profile level!
  6. Include filters in custom reports cannot restore something that has been excluded at the profile level, and custom-report Include Hit filters cannot restore something that has already been excluded by the custom visit filters.
  7. After custom reports are analyzed, and before the report is displayed to you, URL Rebuilding settings, if any, are applied.  URL Rebuilding is not exactly a filter, but it’s important to know that it happens toward the end of the whole process and cannot interfere with filters.
  8. The template is applied.  It filters out some reports from the display.

General cheat sheet:

  • When Excludes and Includes conflict, Exclude always wins.   An Include can never restore something that was Excluded at any point.
  • At the profile level, hit filters are applied before visit filters.  The order is the opposite for custom report filtering.
  • URL Search & Replace happens before everything.
  • URL Rebuilding and Templates happen after everything else.

The one thing you should be warned about:

  • Because of all the above, you may get unexpected results in custom reports that have a dimension based on Visit and a filter based on Hit.   For example, you may have a custom report that uses a hit filter to exclude PageA and has Entry Page as its dimension.  You’ll find that your supposedly filtered-out PageA still appears in that custom report on Entry Pages.  This is because the Entry Page for a visit is determined during the main analysis, step 4 above, and it is not changed by a hit filter at the custom level. In the above situation, if you had instead used a visit filter (removing all visits that have PageA as entry page) then PageA would not have appeared in the Entry Page custom report.



Cool custom report: Daily data within a monthly report period

WebTrends contains a little-known custom report dimension that will show trends within your reporting period.   Yes, you already get this with the trend graphs.  But this gives you a table — one table row per day of the month in a monthly report period.  Or one row per hour of the day in a daily report.   The exported version of this is great for graphing trends with Excel.

Your measures can be any of the usual ones – visits, views … except for Unique Visitors.  (Stay away from that one.  As The Man says, ‘O! that way madness lies.’  With the dimension we’re discussing in this post, WebTrends does show visitor numbers that mean something, but not what you think they mean. Uh, nevermind.  We’ll get into it some day.)

The dimension needs to be created before you can use it in custom reports.  It uses the little-known dimension choice called “Time Period.”  Steps:

  1. In the Custom Reports editing area, create a new dimension.  Call it something like “time period” or “time trend”
  2. Choose “Time Period” in the drop-down “Based On” menu

Once you’ve created it, it will appear as a Dimension choice for any custom report.

  • Use it as the sole dimension for anything you can filter for: an important page or KPI (URL), a specific content group, a specific browser, a specific campaign
  • Use it in 2D reports as the primary dimension with any hit-based secondary dimension:  entry page, browser, etc (but see the cautionary note at the end!)
  • If you use it as a secondary dimension, note that it will appear in non-chronological order.  The list will be sorted in highest-to-lowest order for the first measure (but, again, see the cautionary note at the end!)

WebTrends deems this to be a “hit-based” dimension.  It may, therefore, show odd results if you try to pair it with a visit-based secondary dimension.

The Cautionary Note At The End:  Don’t get carried away with this!  It slows down processing AND it creates mighty big internal tables.  A Time Period x URL 2-dimensional table will create a monthly monster that’s 30 or 31 times larger than a plain URL table.   Try to restrain yourself to one dimension, or if you insist on something 2D, put it in a profile where it is the only custom report.

Page titles in reports – where do they come from?

(Applies to:  server log data sources, SDC)

If you are using server log files, have turned on “Retrieve HTML Page Titles, and if WebTrends doesn’t already know the title, WebTrends actually visits your site to collect the title.  Well, it visits what you’ve told it is the site.  It goes to the domain that you entered in the “Web Site URL” field on the “Home” tab of the profile’s setup.  Once on the domain, it looks for the exact URL displayed in the report.  WebTrends does NOT use the domain it finds in your logfiles. It uses the domain you specified in the setup.

So, if you put a non-existent domain name in the “Web Site URL” field, WebTrends will not find your web site and will not collect any titles.

I mentioned “… if WebTrends doesn’t already know the title.”  Here’s the deal.  For every profile, WebTrends creates a cache file of all the URLs and their titles, as it finds them.  WT checks in that file first, and then visits the site only if it doesn’t find the URL in the title cache file for the profile.

The file’s name is [profileGUID].wdb and it’s here: …/WebTrends/storage/config/wtm_wtx/datfiles/titles/

The advantage of having this file around, for server log analysis, is that WebTrends isn’t constantly visiting your site to get titles.   (Note:  when you do a fresh server log analysis of a profile and you’ve turned on HTML Title Retrieval, WebTrends will hit your server a LOT – once for each unique page.  To avoid frenzy on the part of your hosting people, you might want to run that first analysis at night or just turn off HTML Title Retrieval.  Or you can create a fake title cache file using the fresh profile’s GUID and fill it with titles from a different profile for the same site.)

Another nice thing about the file is that you can go into it and change the titles.   And if you’ve got a title cache file that you’re happy with, as I mentioned above, you can copy the entire contents into the title cache file belonging to a different profile.

A disadvantage of this file is that once WT has a title/URL combination in that title cache file, it won’t know about any later title changes you make. That is, until the individual line in the *.wdb cache expires. The default expiration time is 14 days and it can be set globally (for all profiles) here: Web Analysis >> Options >> General >> HTML Titles.  You can also force a fresh title-collecting effort by emptying that file.

For advanced people:  A quirk of Page Titles in reports is that if you’ve got parameter truncation turned on for a file type (i.e. your Pages report does not display parameters in the URL) and if your page titles vary according to the content of those parameters, WebTrends doesn’t really know which of the page title variations to use for a truncated file name.  Get it?  Suppose you have /product.asp?productID=123 that uses title “Product 123’s Page” and /product.asp?productID=456 that uses title “Product 456’s Page”.  If WebTrends is set up to display only /product.asp in the Pages report, it won’t know which title variation to display.  The answer:  WT is set to display and store (in the title cache file) the very first instance of the page title that it comes across in the logs.  And forever after, /product.asp will be displayed with that page title.  In that situation, it’s a really good idea to edit the title cache file so a non-confusing title is shown.

One final thing while we’re talking about page titles.  Suppose you want to suppress page titles completely in your reports.  If you use server logs, clear the titles file for that profile and turn off Retrieve HTML Page Titles in the UI. 

A postscript about SDC:

If you are using SDC for data collection, WebTrends looks for the title in the title cache file, as described above.   The title cache file was filled using data from the values of the WT.ti parameter in the SDC log.  If WebTrends doesn’t find the URL in the title cache file, it gets the title from the current SDC log file line, WT.ti, in turn, obtained the title from the <title> field of the page the tag is on.    Or, if there’s a WT.ti <meta> tag in the head (put there by you, because you want to override the <title>), SDC will collect the <meta> WT.ti information instead of the page’s title.  The <meta> tag trumps the <title> tag.

If you use SDC and want to suppress titles in reports, clear the titles file for that profile and modify the *.wlp file by adding a section called [autoconfig] plus this line:


WT.nonexistent can be any WT. parameter name as long as it doesn’t exist. 

Cool Custom Report: Actual vs Paid-for PPC Search Terms

If you have a PPC (pay-per-click) search effort happening, you may be allowing the PPC engines to loosely match your paid-for terms with various terms typed in by human searchers.

To keep abreast of what’s being matched to what (and maybe identify inappropriate matches that can be avoided by adding negative search terms, or get ideas for cheaper exact-match terms), try this report.

If you have a PPC (pay-per-click) search effort happening, you may be allowing the PPC engines to loosely match your paid-for terms with various terms typed in by human searchers.

To keep abreast of what’s being matched to what (and maybe identify inappropriate matches that can be avoided by adding negative search terms, or get ideas for cheaper exact-match terms), try this report:

  1. Primary dimension:  the paid-for term, derived from the marker parameters in the landing page URL (go here for a post about this)
  2. Secondary dimension:  the actual term, derived from the referrer information, using the out of the box dimension “Search Phrase”

You end up with a list of all your paid-for terms, and under each term a list of the actual typed-in terms.  If you have appreciable amounts of broad-match traffic, we guarantee you’ll find something astonishing or hilarious, or worthwhile at the very least.

If you’re puzzled by the primary dimension described above, you probably don’t have those marker parameters.  See our topic on Yahoo PPC marker parameters to find out how to get Yahoo to supply them for you.  As for the other PPC engines, Google and MSN, they don’t make it easy.  You have to add the marker parameters yourself, either with explicit landing page URLs or with macros.  But it’s well worth the trouble.

Related post: Getting Yahoo PPC to add its own markers to landing page URLs