If you have server logs, use Excel for a quick look at details of individual visits

If you have log files and want to look at all the hits in individual visits, you can use Excel to sort hits into visits. For the intrepid.

This question just came up on the Webtrends forum and I thought I’d share a quick way to look at individual visits, using Excel plus SDC logs.

Note that I said “using logs.”  This is the raw data that Webtrends processes when it does the aggregation that it was designed for.  If you use OnDemand, you have to ask Webtrends for the SDC logs.  There’s a fee.  If you use OnPremises, you have server logs (IIS or Apache logs) or you have SDC logs.  Somewhere.

These instructions work for SDC logs.  These instructions also assume the logs contain no more than one day of data (if more, you’ll have to change the sorting to keep the days from mingling).  And I’m also assuming your logs aren’t too huge.  Excel can choke.

So … here are instructions for a quick and dirty sorting into visits.  If anybody finds it useful, let me know and I’ll come back and add some screen shots.

It seems endless, but that’s because I’m listing every little step.  It really will take only about a minute of actual work, not counting waiting time while Excel works or the time spent reading the instructions.

  1. Open Excel
  2. File >> Open
  3. Browse to your log file and select it
  4. The Text Import Wizard will open:
    1. Choose Delimited, then Next
    2. Choose Space; leave everything else as is, then Next, then Finish
    3. Wait while Excel reads and opens the log.  If your log is too big for Excel, it will tell you so (but will load the log partially)
    4. I strongly recommend that you keep only the first few hundred thousand rows or so and delete the rest.   You’ll reduce the chances of crashing Excel.
  5. Boom!  About 20 columns of stuff will appear when Excel is done.
  6. Delete the first four rows; they will be #Remark, #Software, #Version, and #Date.  Keep the #Fields row.
  7. Delete the cell that says “#Fields:” and shift the rest of the row to the left.  Right click on A1, choose Delete, then Shift Cells Left.  This will put the right headings over the right columns.
  8. Make a new column that contains, for each row, both the IP address and the User Agent string, as follows:
    1. Go to row 1 in column S (the first empty column) and enter this formula:  “=concatenate(C1,L1)” and hit Enter.
    2. The cell should now contain this:  “c-ipcs(User-Agent)”
    3. Click on that cell to highlight it, then hover over the lower right corner of the highlighted cell where there is a very small black square.  The cursor will change to a +.  Double-click.
    4. Column S should now auto-fill downwards with a concatenation of the IP and user agent string.  Don’t worry about reading and understanding these.  The point is that these are probably (but not 100% of the time) each unique to an individual visitor, which is what you are after.
  9. Highlight all columns (A-S), then open the Data tab and choose Sort.  The Sort dialog box will open.
    1. Make sure the box at the upper right is checked (“My data has headers)
    2. Click on the “Sort by” dropdown and choose the very last item, which will be “c-ipcs(User-Agent)”
    3. Click on Add Level to get another sort config line
    4. Click on the “Then by” dropdown and choose “Time”
    5. Click on OK and wait for the sorting.
  10. You now have your visits, with all the hits in each visit sorted by time from first to last.
  11. You’ll want to make it a little more intelligible by putting a divider line between individual visits, as follows:
    1. Highlight your block.  I do it by clicking on A1, then Ctrl-Shift-DownArrow, then Ctrl-Shift-RightArrow.
    2. Assuming you are still in the Data tab, select Subtotal
    3. The Subtotal dialog box will open.  Fill it in as follows:
      1. “At each change in:” open the dropdown and choose the very last item, which will be “c-ipcs(User-Agent)”
      2. “Use function:”  Count
      3. “Add subtotal to:” check the box for c-ipcs(User-Agent) which, again, will be at the very end
      4. Leave Replace current subtotals and Summary below data checked
      5. Click on OK and wait.  There’s a progress bar in the status line at the bottom if you feel panicky.
    4. Excel has inserted a new row between “visits” and also added some controls over in the left margin, labeled 1,2,3.  Click on 2.
    5. Excel is now showing you only the inserted rows.  The block should still be highlighted.  Don’t worry, the rest of the data is still there.  Look at the row numbers over on the left if you’re nervous; you’ll see lots of missing numbers.
    6. Click Alt-semicolon (Alt-;) and the screen will change subtly.  You’ve just selected only the visible cells.
    7. Click on the Paint Can to color all these rows, for example black
    8. Go back to the controls “1,2,3” on the left and click on 3.
    9. Excel is now showing you all rows, with a colored divider row between “visits.”
  12. At this point, I usually move everything to a new sheet to get rid of the underlying mechanics, as follows:
    1. Highlight the block
    2. Copy to Clipboard (I always just use Ctrl-C)
    3. Open a new sheet, select A1, then paste (Ctrl-V)
    4. Delete column S if you like
    5. Go back and delete the sheet you were just on

Now you’re ready to look at individual visits line by line, in the order in which they occurred in the visit.  You’ll probably want to concentrate on columns G and H, which are the URL stem and the URL query parameters for each hit.  Have fun, or good luck … whichever applies.

A few notes:

  • If the same “person” had two visits, they will be jammed together in chronological order.  Keep an eye on the Time column.  If there’s more than 30 minutes between one hit and the next, Webtrends is probably seeing this activity as two visits.
  • You’ll see a lot of visits with nothing in the Cookie field on the first hit.  These are first-time visits.  Don’t worry, Webtrends sorts it out just fine when it analyzes.  Using the IP-User Agent method is a quick-dirty way to get around the complexity of cookies (which are, in fact, in the Query Parameters and not in the so-called cs(Cookie) field of the logs.
  • Sometimes SDC hits are virtually simultaneous, i.e. two hits happen in the same second of time.  It’s possible that Excel will sort those two hits into the wrong order (it doesn’t know any better; hh:mm:ss is all it has to work with).
  • If you want to get fancy with logs, download Microsoft Log Parser and learn how to use it.