Extract data from a file that contains a list of information. Lets say I have a logfile /tmp/logfile that contains hostnames in the following format (See below). All I want is the hostname, nothing else. Typing grep and the hostname will return the entire line that contains that hostname. However using sed, awk, sort and uniq we can get narrow it down to only unique instances of a hostname in alphabetical order. See below.

cat /tmp/logfile

list 1 get-host1-now
list 2 get-host3-now
list 3 get-host4-now
list 4 get-host1-now
list 5 get-host5-now
list 6 get-host6-now
list 7 get-host1-now

awk ‘{ print $3 }’ /tmp/logfile | sed ‘s/get-/get- /’ | sed ‘s/-now/ -now/’ | awk ‘{ print $2 }’ | sort | uniq

Explaination:

awk ‘{ print $3 }’ /tmp/logfile (this will find “get-host1-now” from the 3rd column, columns are separated by whitespace)

sed ‘s/get-/get- /’ (this adds whitespace after “get-”, eg. get- host1-now)

sed ‘s/-now/ -now/’ (this adds whitespace before “-now”, eg. get- host1 -now)

awk ‘{ print $2 }’ (this prints the 2nd column, thanks to the previous steps, is host1)

| sort (will put everything in alphabetical order, this is important for the “uniq” operation)

| uniq (will only print 1 instance of reoccurring data, without the “sort, this wouldn’t have worked since host1 does not re-occur immediately after the 1st occurrance)

Hope this helps!

Additional info:

http://student.northpark.edu/pemente/sed/sed1line.txt (Sed info)

http://analyser.oli.tudelft.nl/regex/index.html.en (RegEx info)

http://www.linuxfocus.org/English/September1999/article103.html (Awk info)