The fearsome, frabjous* Regular Expression

Regular Expressions are not a big deal. If you’ve already mastered the asterisk (*) wildcard, extending that skill to regular expressions will take about two minutes. At least, it will take that long for the baby version of regular expressions that will get you through about 99.99% of your WebTrends needs.

 

The other day, we saw a thing by Somebody-or-Other that finished with this warning: “and besides, with WebTrends you have to use regular expressions.”

Oooh, regular expressions.  Run and hide under the bed.

There are two points wrong with Somebody-or-Other’s statement.  First, we don’t know of anywhere in WebTrends where you HAVE to use regular expressions.  In WebTrends it’s always just an option.

Second, it’s not a big deal.  If you’ve already mastered the asterisk (*) wildcard, extending that skill to regular expressions will take about two minutes.  At least, it will take that long for the following starter intro to regular expressions that will get you through about 99.99% of your WebTrends needs.

WebTrends does have a full regular expression engine and there’s no question that full-fledged regular expressions can be magnificently cringe-inducing.  I, personally, get more intimidated by a hearty regex than by a whole pageful of perl.

But as said above you’ll probably never need more than baby regexes.

(If you do develop an advanced need just call a geeky friend.  Geeky friends love to puzzle out advanced regular expressions.  Or ask on the WebTrends user forum where several regex mavens hang out.)

So this post is dedicated to those newish users of WebTrends who perceive the Regular Expression checkbox as solid proof that WebTrends is too technical and who have avoided that checkbox like the plague.

(To be honest, the name “regular expression” could be the most complicated thing about regular expressions, at least in this context.   What a dumb name.  “Regular Expression” just means “flexible way of matching.”   If it were called “advanced wildcards” would it make you more comfortable?)

You really only need to know two things about regular expressions (regexes) to start using them in WebTrends.

  • The simplest regular expression is just the characters you want to match.   No fancy symbols.  Suppose you want a content group that contains everything that’s an article, and all article filenames have the word “article” in them (like, “article1234.htm”)  This can be handled by the simplest form of a regular expression which is just the text that’s common to everything you want to match – in this case “article”.   In other words, it’s the same as text-match for *article* (which is NOT a regular expression because * has a funky meaning in  regular expressions).

    Not very impressive, right?  You’re thinking, “this simplest kind of regular expression doesn’t do anything that ordinary wildcards can’t do.”   Ha!  Note that it saves you from typing asterisks!  That counts a LOT.

    There’s one little catch if there’s any punctuation in your regular expression text, for example if you’re using “article.doc.” You need to put a backslash before the punctuation.  Like this.  “article\.doc”   (There’s a lot more to it but remember we’re giving you the pablum version.)
  • The other majorly useful regex thingy is the pipe character “|”  (vertical bar).  It means “OR”.  So, “article|document” will match anything that has either article or document in it. Now it’s getting more interesting, right?  You can’t do that with asterisks.

    Do you see how helpful this is in setting up a content group that will contain all article and document files?   This is, IMHO, worth the price of admission right there.

You can stop here if you want.  But if you have the courage to know just a bit more, here are two others.

  •  If you want your match to happen only if your string is found at the very beginning or the very end , then you can use two other special characters, ^ and $.  They are used only at the beginning or the end of the regular expression, respectively.  If you want a filter that will match yahoo.com but not www.yahoo.com, use ^yahoo because the ^ requires matching at the very beginning.  And yahoo.com$ will match www.yahoo.com but not yahoo.com.au, because the $ demands that your string match the very end.   And yahoo without either ^ or $ will of course match all the above.

    Okay.  Take a deep breath.  Read it again slowly.  You’ll get it, I promise.
  • Maybe you have variable stuff you want to ignore in the middle of what you want to match.  For example suppose you’re making a content group that includes all product articles, like “/products/vorpal-blades/article.htm” and “/products/slithy-toves/article.htm” but not anything like “/press-releases/vorpal-blades/article.htm”.  You need something that means “matches any kinda junk here in the middle.”  The something is .*  – that’s dot asterisk.

    So, your choice of regular expressions for the above situation would be:
    /products/.*/article
    products/.*/article
    oducts/.*/articl
    /products/.*/artic
    cts/.*/artic
    /produ.*/artic
    cts/.*rticle
    and so forth … lotsa choices but I’d go with the first

This just scratches the surface of regular expressions but as a WebTrends user the surface is exactly where you’ll be most of the time.  Calloo!  Callay!

 

Oh.  About the asterisk in the title.  That’s a footnote.  Just having a little wildcard pun.

Lewis Carroll, if you didn’t know.

‘Twas brillig, and the slithy toves
Did gyre and gimble in the wabe:
All mimsy were the borogoves,
And the mome raths outgrabe.

“Beware the Jabberwock, my son!
The jaws that bite, the claws that catch!
Beware the Jubjub bird, and shun
The frumious Bandersnatch!”

He took his vorpal sword in hand:
Long time the manxome foe he sought —
So rested he by the Tumtum tree,
And stood awhile in thought

And as in uffish thought he stood,
The Jabberwock, with eyes of flame,
Came whiffling through the tulgey wood,
And burbled as it came!

One, two! One, two! and through and through
The vorpal blade went snicker-snack!
He left it dead, and with its head
He went galumphing back.

“And hast thou slain the Jabberwock?
Come to my arms, my beamish boy!
O frabjous day! Callooh! Callay!”
He chortled in his joy.

‘Twas brillig, and the slithy toves
Did gyre and gimble in the wabe:
All mimsy were the borogoves,
And the mome raths outgrabe.