PS – RegEx and parsing log files – the nice way

Hello together,

today i want to show you a nice grab of the usage of Regular Expressions. Many people do not like Regex. Je – at the beginning these statements look like hieroglyphs but it will get better with the time.

First i will start with some trivial explanations how RegEx can be used in Powershell and will dive then deeper into more complex structures and lastly parsing log files giving us a nicely looking hash table with really nice features.

[regex]$rx='Don'
$rx.Match('Don')

$rx.Match('Let me introduce David das Neves. David is a DevOps.')

$rx.Matches('Let me introduce David das Neves. David is a DevOps.')

$rx.Replace('Let me introduce David das Neves. David is a DevOps.','Dave')

Here i create a Regex object with a type accelerator. ( [Regex] )
The self explanatory functions used here are ‘Match’ – for the first match – ‘Matches’ – for all matches and ‘Replace’ to replace all matches. In the variables $match and $matches you will find your results herefore.

This are the easiest ways to use Regex – find something, grab something and replace something. But more is possible with Regex:

$Computername='BER-DC01'
Switch -regex ($Computername) {
'^BER' {
#run Berlin location code
}
'^MUC' {
#run Munich location code
}
'^LON' {
#run London location code
}
'DC' {
#run Domain controller specific code
}
Default {Write-Warning "No code found for the computer: $computername"}
}

Switching with Regex –

$s='2012-03-14 18:57:35:321 1196 13f8 PT Server URL = http://172.116.110.1/SimpleAuthWebService/SimpleAuth.asmx'
[regex]$r='\d{2}:\d{2}:\d{2}:\d{3}'

$r.match($s)
$r.split($s)

splitting with RegEx and many many more. And as you can see it is not as hard as you think.

But now i want to show you a really nice feature:
It grabs some information of a string and puts it into named captures:

'David 12356 Regex 23548' -match '(?<word>\w{5}) (?<number>\d{5})'
$matches
$matches.number

In ‘number’ the digits are stored and in ‘word’ the string of the statement is stored. You just have to add

?<CAPTURE_NAME>

on the left side from the statement which should be filled in.
In this example \w(5) is grabbed and put into word and \d(5) is grabbed and put into number.
Now let´s take a look how we can get even more out of this feature.
Herefore we use a log file and within this example it will be the logfile ‘cbs.log’.

Here is an exemplaric line of it:

2015-12-09 18:53:04, Info CBS Session: 30487210_1944050851 initialized by client WindowsUpdateAgent, external staging directory: (null)

As you can see – this log line could be separated manually easily into diferent sections:

  • Date
  • Time
  • Type
  • Component
  • Message

Now we build a RegEx to split the line into this subparts.
Herefore we start from the left to the right. The first part is the date:

(?<Date>\d{4}-\d{2}-\d{2})

First we put parentheses around it to separate it from the other parts. Then we add our container

 ?<Date> 

. Now starting with 4 digits a dash then 2 digits and a dash and at last two digits. And ready!
Okay – now we add the spaces and the second part. The Regex looks then as follows:

(?<Date>\d{4}-\d{2}-\d{2})\s+(?<Time>(\d{2}:)+\d{2})

A number of spaces \s+ and afterwards the new part. Here you go with the same rule – parentheses around it and then the container and the regex. This procedure you repeat until you get to the message part.

(?<Date>\d{4}-\d{2}-\d{2})\s+(?<Time>(\d{2}:)+\d{2}),\s+(?<Type>\w+)\s+(?<Component>\w+)\s+(?<Message>.*)$

With .* you grab everything to the end and then you are ready to go. This filter would grab all information and put it into our named captures. Wonderful!

Now we have to add some more code to let it run through the whole log file and grabbing everything into an object where we can get the information. Herefore i have built a function which has two parameters – Path and RegexString: Download me

First the names of the containers were retrieved and the log file is loaded. Afterwards a loop runs through all lines and retrieves the information for the captures and writes it to a hash table. At last i add the keys as property to the returning object.

How to work with? Try it by yourself!

$parsedLogFile = Get-RegExParsedLogfile -Path 'c:\windows\logs\cbs\cbs.log' -RegexString '(?<Date>\d{4}-\d{2}-\d{2})\s+(?<Time>(\d{2}:)+\d{2}),\s+(?<Type>\w+)\s+(?<Component>\w+)\s+(?<Message>.*)$'

$parsedLogFile.Keys

$parsedLogFile.Log | Where-Object Component -eq 'CBS' | Select-Object Date, Type, Message | Format-Table -AutoSize -Wrap

$parsedLogFile.Log | Where-Object Message -like '*error*' | Format-Table -AutoSize -Wrap

The function returns an custom object. In keys the captured names are stored and in log all the lines of the log file. A great benefit of this whole work now is that you can filter the data easily with the where stetement and also select only the information you need.

This is very powerful if you want to retrieve all the lines with a specific error etc.

I hope you enjoyed it and hopefully you can use this function.

Greetings,

David

Advertisements

2 thoughts on “PS – RegEx and parsing log files – the nice way

  1. Pingback: LogFileParser – Classes and Enums | Power in the shell

  2. Pingback: LogFileParser 0.2 – Good to know! | Power in the shell

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s