Tuesday Tiny Techie Tip

awk(1)

Whole books have been written on awk, and rightly so. It's a powerful text processing programming language, perfectly capable of performing all sorts of wondrous feats of text manipulation. If you're willing to spend hours trying to figure out what the heck it means when it says things like:
awk: syntax error near line 1
awk: illegal statement near line 1
I wouldn't want to try to write an awk script longer than a few lines without a lot of time on my hands (especially now that there's perl which pretty much swallowed all of awk's functionality.)

However, there are very useful things you can do with a single line awk script which I'll talk about here.

awk's basic mode of operation is to read its input, chop each line into fields separated by some delimiter (white space by default, but you can change it), and then allow you to do pattern matching and other operations based on those fields. The thing I use it for most often is to grab a particular field.

Let's look at a trusty long ls listing:


% ls -l
-rw-r--r--  1 jeffy          28 May  9 16:12 Makefile
-rwxr-xr-x  1 jeffy       24576 May 28 11:31 foo
-rw-r--r--  1 jeffy          57 May  9 16:13 foo.c
-rw-r--r--  1 jeffy          57 May 28 11:37 foobar
-rw-r--r--  1 jeffy          71 Jun  2 11:45 fumpty

Suppose I want to grab just the file sizes for some reason. awk numbers fields starting with 1 (not zero like you'd expect from a bunch of unix geeks), so we count across and see that we want to print out field 4, so just do this:
% ls -l | awk '{print $4}'
28
24576
57
57
71

Easy as pie. Notice that the awk program is enclosed in single quotes. This protects the "$4" from the shell so it gets evaluated by awk, not csh (or whatever)

You can print out multiple columns in any order by separating them with commas:


% ls -l | awk '{print $3, $1, $4, $NF}'
jeffy -rw-r--r-- 28 Makefile
jeffy -rwxr-xr-x 24576 foo
jeffy -rw-r--r-- 57 foo.c
jeffy -rw-r--r-- 57 foobar
jeffy -rw-r--r-- 71 fumpty

Notice that the separating white space is not preserved, but gets scrunched down to a single space.

Wait a minute, what's with that "$NF" in that last example? NF is an internal awk variable that always represents the Number of Fields in the current line. By sticking a dollar sign in front of it, I get the equivalent of a "$8" when I run the script on the "ls -l" output. But I don't have to know how many fields there are, I can just grab the last one.

See if you can figure out what this is doing:


% ypcat passwd | grep -i jeff | awk -F: '{print $5}' | awk '{print $NF}' | sort

Hint: the "-F" flag sets the field separator character.

After all this I hear you asking "What the heck does awk mean?!" I'm glad you asked. You might think it was "A Wondrous Kluge", or just short for "awkward", but you'd be wrong. It's made up of the initials of Alfred Aho, Peter Weinberger, and Brian Kernighan who wrote the silly thing and deserve all the blame for its persnicketiness.


Tuesday Tiny Techie Tip -- 3 June 1997
Forward to (06/17/97)
Back to (05/27/97)
Written by Jeff Youngstrom

Up to the TTTT index

Tuesday Tiny Techie Tips are all © Copyright 1996-1997 by Jeff Youngstrom. Please ask permission before reproducing any of this material.