Tuesday Tiny Techie Tip
Being uniq
uniq(1) takes a stream of lines and collapses
adjacent duplicate lines into one copy of the lines. So if
you had a file called foo that looked like:
davel
davel
davel
jeffy
jones
jeffy
mark
mark
mark
chuck
bonnie
chuck
You could run uniq on it like this:
% uniq foo
davel
jeffy
jones
jeffy
mark
chuck
bonnie
chuck
Notice that there are still two jeffy lines and
two chuck lines. This is because the duplicates
were not adjacent. To get a true unique list you have to
make sure the stream is sorted:
% sort foo | uniq
jones
bonnie
davel
chuck
jeffy
mark
That gives you a truly unique list. However, it's also a
useless use of uniq since sort(1) has an
argument, -u to do this very common operation:
% sort -u foo
jones
bonnie
davel
chuck
jeffy
mark
That does exactly the same thing as "sort | uniq",
but only takes one process instead of two.
uniq has other arguments that let it do more
interesting mutilations on its input:
- -d tells uniq to eliminate all lines
with only a single occurrence (delete unique lines),
and print just one copy of repeated lines:
% sort foo | uniq -d
davel
chuck
jeffy
mark
- -u tells uniq to eliminate all
duplicated lines and show only those which appear once
(only the unique lines):
% sort foo | uniq -u
jones
bonnie
- -c tells uniq to count the
occurrences of each line:
% sort foo | uniq -c
1 jones
1 bonnie
3 davel
2 chuck
2 jeffy
3 mark
I often pipe the output of "uniq -c" to "sort -n"
(sort in numeric order) to get the list in order of frequency:
% sort foo | uniq -c | sort -n
1 jones
1 bonnie
2 chuck
2 jeffy
3 davel
3 mark
- Finally, there are arguments to make uniq ignore
leading characters and fields. See the man page for
details.
Tuesday Tiny Techie Tip -- 03 December 1996
Forward to (12/10/96)
Back to (11/26/96)
Written by Jeff Youngstrom
Up to the TTTT index
Tuesday Tiny Techie Tips are all © Copyright
1996-1997 by Jeff Youngstrom. Please ask permission before
reproducing any of this material.