Tuesday Tiny Techie Tip

Being `uniq`

uniq(1) takes a stream of lines and collapses adjacent duplicate lines into one copy of the lines. So if you had a file called foo that looked like:

davel
davel
davel
jeffy
jones
jeffy
mark
mark
mark
chuck
bonnie
chuck

You could run uniq on it like this:

% uniq foo
davel
jeffy
jones
jeffy
mark
chuck
bonnie
chuck

Notice that there are still two jeffy lines and two chuck lines. This is because the duplicates were not adjacent. To get a true unique list you have to make sure the stream is sorted:

% sort foo | uniq
jones
bonnie
davel
chuck
jeffy
mark

That gives you a truly unique list. However, it's also a useless use of uniq since sort(1) has an argument, -u to do this very common operation:

% sort -u foo
jones
bonnie
davel
chuck
jeffy
mark

That does exactly the same thing as "sort | uniq", but only takes one process instead of two.

uniq has other arguments that let it do more interesting mutilations on its input:

-d tells uniq to eliminate all lines with only a single occurrence (delete unique lines), and print just one copy of repeated lines:
```
% sort foo | uniq -d
davel
chuck
jeffy
mark
```
-u tells uniq to eliminate all duplicated lines and show only those which appear once (only the unique lines):
```
% sort foo | uniq -u
jones
bonnie
```

-c tells uniq to count the occurrences of each line:

% sort foo | uniq -c
   1 jones
   1 bonnie
   3 davel
   2 chuck
   2 jeffy
   3 mark

I often pipe the output of "uniq -c" to "sort -n" (sort in numeric order) to get the list in order of frequency:

% sort foo | uniq -c | sort -n
   1 jones
   1 bonnie
   2 chuck
   2 jeffy
   3 davel
   3 mark

Finally, there are arguments to make uniq ignore leading characters and fields. See the man page for details.

Tuesday Tiny Techie Tip -- 03 December 1996
Forward to (12/10/96)
Back to (11/26/96)
Written by Jeff Youngstrom

Up to the TTTT index

Tuesday Tiny Techie Tip

Being uniq

Being `uniq`