If you have server logs, use Excel for a quick look at details of individual visits

If you have log files and want to look at all the hits in individual visits, you can use Excel to sort hits into visits. For the intrepid.

Be Sociable, Share!

This question just came up on the Webtrends forum and I thought I’d share a quick way to look at individual visits, using Excel plus SDC logs.

Note that I said “using logs.”  This is the raw data that Webtrends processes when it does the aggregation that it was designed for.  If you use OnDemand, you have to ask Webtrends for the SDC logs.  There’s a fee.  If you use OnPremises, you have server logs (IIS or Apache logs) or you have SDC logs.  Somewhere.

These instructions work for SDC logs.  These instructions also assume the logs contain no more than one day of data (if more, you’ll have to change the sorting to keep the days from mingling).  And I’m also assuming your logs aren’t too huge.  Excel can choke.

So … here are instructions for a quick and dirty sorting into visits.  If anybody finds it useful, let me know and I’ll come back and add some screen shots.

It seems endless, but that’s because I’m listing every little step.  It really will take only about a minute of actual work, not counting waiting time while Excel works or the time spent reading the instructions.

  1. Open Excel
  2. File >> Open
  3. Browse to your log file and select it
  4. The Text Import Wizard will open:
    1. Choose Delimited, then Next
    2. Choose Space; leave everything else as is, then Next, then Finish
    3. Wait while Excel reads and opens the log.  If your log is too big for Excel, it will tell you so (but will load the log partially)
    4. I strongly recommend that you keep only the first few hundred thousand rows or so and delete the rest.   You’ll reduce the chances of crashing Excel.
  5. Boom!  About 20 columns of stuff will appear when Excel is done.
  6. Delete the first four rows; they will be #Remark, #Software, #Version, and #Date.  Keep the #Fields row.
  7. Delete the cell that says “#Fields:” and shift the rest of the row to the left.  Right click on A1, choose Delete, then Shift Cells Left.  This will put the right headings over the right columns.
  8. Make a new column that contains, for each row, both the IP address and the User Agent string, as follows:
    1. Go to row 1 in column S (the first empty column) and enter this formula:  “=concatenate(C1,L1)” and hit Enter.
    2. The cell should now contain this:  “c-ipcs(User-Agent)”
    3. Click on that cell to highlight it, then hover over the lower right corner of the highlighted cell where there is a very small black square.  The cursor will change to a +.  Double-click.
    4. Column S should now auto-fill downwards with a concatenation of the IP and user agent string.  Don’t worry about reading and understanding these.  The point is that these are probably (but not 100% of the time) each unique to an individual visitor, which is what you are after.
  9. Highlight all columns (A-S), then open the Data tab and choose Sort.  The Sort dialog box will open.
    1. Make sure the box at the upper right is checked (“My data has headers)
    2. Click on the “Sort by” dropdown and choose the very last item, which will be “c-ipcs(User-Agent)”
    3. Click on Add Level to get another sort config line
    4. Click on the “Then by” dropdown and choose “Time”
    5. Click on OK and wait for the sorting.
  10. You now have your visits, with all the hits in each visit sorted by time from first to last.
  11. You’ll want to make it a little more intelligible by putting a divider line between individual visits, as follows:
    1. Highlight your block.  I do it by clicking on A1, then Ctrl-Shift-DownArrow, then Ctrl-Shift-RightArrow.
    2. Assuming you are still in the Data tab, select Subtotal
    3. The Subtotal dialog box will open.  Fill it in as follows:
      1. “At each change in:” open the dropdown and choose the very last item, which will be “c-ipcs(User-Agent)”
      2. “Use function:”  Count
      3. “Add subtotal to:” check the box for c-ipcs(User-Agent) which, again, will be at the very end
      4. Leave Replace current subtotals and Summary below data checked
      5. Click on OK and wait.  There’s a progress bar in the status line at the bottom if you feel panicky.
    4. Excel has inserted a new row between “visits” and also added some controls over in the left margin, labeled 1,2,3.  Click on 2.
    5. Excel is now showing you only the inserted rows.  The block should still be highlighted.  Don’t worry, the rest of the data is still there.  Look at the row numbers over on the left if you’re nervous; you’ll see lots of missing numbers.
    6. Click Alt-semicolon (Alt-;) and the screen will change subtly.  You’ve just selected only the visible cells.
    7. Click on the Paint Can to color all these rows, for example black
    8. Go back to the controls “1,2,3” on the left and click on 3.
    9. Excel is now showing you all rows, with a colored divider row between “visits.”
  12. At this point, I usually move everything to a new sheet to get rid of the underlying mechanics, as follows:
    1. Highlight the block
    2. Copy to Clipboard (I always just use Ctrl-C)
    3. Open a new sheet, select A1, then paste (Ctrl-V)
    4. Delete column S if you like
    5. Go back and delete the sheet you were just on

Now you’re ready to look at individual visits line by line, in the order in which they occurred in the visit.  You’ll probably want to concentrate on columns G and H, which are the URL stem and the URL query parameters for each hit.  Have fun, or good luck … whichever applies.

A few notes:

  • If the same “person” had two visits, they will be jammed together in chronological order.  Keep an eye on the Time column.  If there’s more than 30 minutes between one hit and the next, Webtrends is probably seeing this activity as two visits.
  • You’ll see a lot of visits with nothing in the Cookie field on the first hit.  These are first-time visits.  Don’t worry, Webtrends sorts it out just fine when it analyzes.  Using the IP-User Agent method is a quick-dirty way to get around the complexity of cookies (which are, in fact, in the Query Parameters and not in the so-called cs(Cookie) field of the logs.
  • Sometimes SDC hits are virtually simultaneous, i.e. two hits happen in the same second of time.  It’s possible that Excel will sort those two hits into the wrong order (it doesn’t know any better; hh:mm:ss is all it has to work with).
  • If you want to get fancy with logs, download Microsoft Log Parser and learn how to use it.







Be Sociable, Share!

15 thoughts on “If you have server logs, use Excel for a quick look at details of individual visits”

  1. Hi, it’s my first look at this website, this article is actually useful to me, keep it up.

  2. To return to this briefly: importing into a database such as Postgres or presumably MS SQL should be possible. You can then setup a a database connection in Excel and do your analysis pretty much as you describe but without worrying about memory and with added repeatability.

    I don’t have access to any SDC logs so I can’t actually test this or provide a detailed description.

  3. It might be worth mentioning that for me that “#Fields: ” (followed by other row names) appears in A1 when I open the file, so to keep it aligned with the actual data, I would delete “#Fields:” – and then highlight the column and then select “Text to Columns” and use Delimted/Space as the separator.

    I have an i5 with 3 gigs of RAM and it can handle about 300k rows of data (not easily but manageable)

    I can upload screenshots if necessary.

  4. I use Notepad++

    It has the capabilities far beyond normal Notepad for starters.

    It also has the capability of allowing me to do a quick search for any calls that I’m looking for without entering it into Excel. (For example if I want to see if a certain WT.mc_id call has been placed – Notepad++ can do it without crashing and quickly too)

    Lastly, and this is for more advanced troubleshooting and data cleaning if I want to import it into Excel – It allows me to tidy the data using regular expressions. Why do I want to do this? – I do this to tidy the data BEFORE importing it into Excel. (for example, I may want to strip out a certain type of code in my WT.mc_id before importing it into Excel, as Excel only has limited search and replace functions)

    Hope this helps

  5. Tony – Interesting, thank you. Would you be willing to expand a little more on exactly how you do it in Notepad? For those who don’t use Notepad much?

  6. Thanks for the write-up, I’ve been using this alot recently to troubleshoot our Webtrends calls.

    I sometimes use Excel, and sometimes use Notepad++

    Notepad++ seems to handle very large files quite nicely.

  7. Dan, it probably would be possible to put the SDC logs into a database but it would have to be a pretty simple database, otherwise you’d end up with something like Visitor Data Mart which is a monster to build and host.

    A company called Insight Rocket specializes in moving Webtrends output (and that of many other reporting tools) into a database that they host, and they have a full version of Tableau acting as the front end! Pretty great stuff, but they still rely on Webtends to do the organizing of the logs.

  8. Actually, I’m realizing there is no Log Parser version of this. I checked with a couple of Log Parser mavens and they say it’s not what Log Parser does.

  9. Very interesting article! I didn’t think the above was possible. I currently maintain an On Premise setup using an SDC data source, data collection has been going on for around 6 months.

    Would it be possible to take all the SDC logs and put them into a database and sit a product like Tableau or Qlikview on top?

    Webtrends will always be used but being able to move the raw logs to a CUBE and combining them other data sources would be very advantageous, currently reports are extracted and added into CUBE rather than the raw logs.

  10. A Log Parser writeup would be great.

    Something on the bucket problem would be fantastic. I don’t exactly know what you mean by “the bucket problem” but any kind of problem that’s referred to with “the” catches my interest. Say more?

  11. Rocky – what was the memory use with 900k rows? Holding 900k in memory isn’t too much of a problem but holding them in a GUI is.

    I don’t actually use Log Parser as I only use Windows for clients but I know our agency does so maybe I can have a look at it on a VM and give a write up.

    I tend to work with directly with a database, e.g. I recently found a workaround to the Webtrends “bucket” problem. We had switched to only daily and I needed some monthly summaries. I can probably provide an example of that. I really hate the restrictions that buckets force upon us.

  12. Depends on your system. With Excel 2010 I have done as much as a million lines of logs at a time (well, 900 thousand). Remember that it is only about 20 columns with no formulas whatsoever. It’s worth trying.

    Charlie – would you be willing to write instructions on how to do this with Log Parser, from a novice’s point of view?

  13. Excel will run out of memory very quickly like this! I wouldn’t suggest it for anything more than a few thousand visits a day. More than that and you want either a specialised program like Microsoft Log Parser or a database.

Leave a Reply

Your email address will not be published.