Tuesday Tiny Techie Tip
Quoting
One of the trickiest parts about interacting with the shell
is using the two different kinds of quotes and the backslash.
I've seen lots of people write the definitions:
- '...'
- Remove special meaning of the enclosed characters except for single quote (') itself.
- "..."
- Remove special meaning of the enclosed characters
except for dollar sign ($), backtick
(`), backslash (\), and
bang (exclamation point) (!).
- \c
- Remove special meaning of character c
But translating these "definitions" into practical usage is
another matter entirely.  So I'll take a crack at making some
sense out of this nonsense.
First off, why do you need to use quotes at all?  The answer
lies in the interaction between you, the shell, and the
application programs you're running.  The shell generally
takes what you type at the command line and breaks it up
into arguments and does some pre-processing on it before
calling whatever command you asked for.  Here's an example:
% m -s "Here's $USER's .aliases file" $USER < ~/.aliases
/usr/ucb/mail -s Here's jeffy's .aliases file jeffy
NOTE: I'm going to have the "echo" option
to the shell set for all examples in this tip so we can see
what the shell comes up with after the quotes and stuff are
evaluated.
So the shell did a lot of work on that command line:
- The m alias was expanded out to its value:
	/usr/ucb/mail
- The double quotes around "Here's $USER's .aliases
	file" grouped it into a single argument.
- The variable USER was expanded into its value
	(twice)
- The ~ on the file path was expanded into my
	home directory
- The input of the command was redirected from the file
	/home/jeffy/.aliases
All of that before any program was even called.
The two main things quotes get used for are as follows:
- group multiple words into a single argument to a
	command as with the "Here's $USER's .aliases
	file" subject line for the email above.
- keep the shell from munging stuff that it would
	normally munge.
Another example:
% sed s/foobar/foo bar/ testfile
sed s/foobar/foo bar/ testfile
sed: Ending delimiter missing on substitution: s/foobar/foo
Here the shell echos back our command just as we
typed it, but sed(1) gives us a strange error
message.  It looks like sed is only seeing half of
our command.  This is because sed is called with
arguments as "sed command file...", with the
command part being a single argument.  What
happened to us is the shell happily broke up our command
line into arguments using spaces (and tabs) to separate
arguments, but we really wanted it to treat the space
between "foo" and "bar" as a part of the
first argument.  So sed got called as:
sed 's/foobar/foo' 'bar/' testfile
Those aren't real quotes, they're just there so I can show
where the argument boundaries were when sed was
called.
So we need to protect the space from the shell.  According
to our definitions above, we can use either style of quotes
to do it since the space is not in the exception list for
either.  We could also use a backslash.  So any one of the
following will work:
% sed 's/foobar/foo bar/' testfile
% sed "s/foobar/foo bar/" testfile
% sed s/foobar/foo\ bar/ testfile
% sed s/foobar/foo' 'bar/ testfile
etc...
So that's how things work when you want to just group
multiple words into a single argument to a command.  The
other use is where things tend to get messy.  What happens
if you want to pass a character which would normally be
expanded by the shell on to a command?
% sed s/foobar/foo $USER bar/ testfile
Here we still need to group the sed command into a
single argument, but there's some ambiguity about what we
want to do with the "$USER" part.  If we want to
have the shell expand the variable into its value, then we
need to make sure we use one of the quoting methods which
does not protect the dollar sign:
% sed "s/foobar/foo $USER bar/" testfile
sed s/foobar/foo jeffy bar/ testfile
or
% sed s/foobar/foo\ $USER\ bar/ testfile
sed s/foobar/foo jeffy bar/ testfile
But if we want sed to replace "foobar" with the
literal string "foo $USER bar" then we need to
make sure we protect the dollar sign from the shell:
% sed 's/foobar/foo $USER bar/' testfile
sed s/foobar/foo $USER bar/ testfile
or
% sed s/foobar/foo\ \$USER\ bar/ testfile
sed s/foobar/foo $USER bar/ testfile
Things get even trickier when you need to protect the
quotes themselves from the shell.  It gets messy because
the shell does a single left to right scan to evaluate
what's quoted and what isn't.
This one actually came up recently, and took a while to
figure out.  I want to call awk and have it print the first
column of a file with a single quote in front of it for
each line of the file.  So the first guess would be
something like:
% awk -F: {print ' $1} /etc/passwd
Unmatched '.
As the shell scans that from left to right, it sees that
single quote, then waits for the matching quote which never
comes.  So let's try to protect the single quote from the
shell:
% awk -F: {print \' $1} /etc/passwd
Missing }.
Hmm.  What's that mean?  It looks like our curly braces
match up.  Here we're running into a little-known feature
of csh(1) which lets you do pattern matching on
files with alternation of different strings (see a future
tip for details).  That feature uses curly braces, so the
shell is mucking with the curly braces we're trying to pass
to awk.  Any kind of quote will do to protect the curly
braces, and we really need everything within the curlies to
be treated as a single argument to awk anyway, so lets
enclose the whole awk program in double quotes.  We should
be able to remove the backslash from the single quote too
since double quotes protect single quotes:
% awk -F: "{print ' $1}" /etc/passwd
awk -F: {print ' } /etc/passwd
awk: syntax error near line 1
awk: illegal statement near line 1
Foo!  What happened to our "$1"?  Turns out double
quotes don't protect the dollar sign, (duh, we knew that)
so the shell is evaluating the value of the "$1"
variable which has no value so we're ending up with a null
argument to awk's print command.  How
about a backslash inside the double quotes to protect the
dollar sign?
% awk -F: "{print ' \$1}" /etc/passwd
awk -F: {print ' \} /etc/passwd
awk: syntax error near line 1
awk: illegal statement near line 1
Huh?  The backslash didn't work!  Our definition of double
quotes says it should have.  We've run into a little-known
feature of double quotes in csh.  Not only do
double quotes not protect the dollar sign, they FORCE the
shell to evaluate it even if it's protected by some other
quoting mechanism inside the quotes.  So since we're
enclosing the awk program in double quotes, there is no way
for us to protect that dollar sign from the shell.  Okay,
so what if we use single quotes around the program?
% awk -F: '{print ' $1}' /etc/passwd
Unmatched '.
Okay, this at least makes a little sense.  As the shell
scans from left to right, it sees the first quote and waits
until it finds a matching quote which it does in the quote
that we're trying to send to awk unchanged, then what we
were thinking of as the closing quote comes along and never
gets matched.  So somehow we need to tell the shell to
ignore that single quote that we're trying to pass to
print.  Try a backslash:
% awk -F: '{print \' $1}' /etc/passwd
Unmatched '.
Hmm.  Okay, remember that single quotes protect everything
except themselves from the shell?  That means they'll
protect the backslash itself from the shell!  So the
backslash isn't doing its job of protecting our single
quote.  We could try sprinkling backslashes everywhere to
avoid having to use surrounding quotes at all:
% awk -F: \{print\ \'\ \$1\} /etc/passwd
awk -F: {print ' $1} /etc/passwd
awk: syntax error near line 1
awk: illegal statement near line 1
Well, that's a little better.  At least we're getting our
command to awk just the way we wanted to.  Unfortunately
we're trying to send a character constant ("'") to
awk's print command, but awk only recognizes character
constants if they're enclosed in double quotes!  Ack!
% awk -F: \{print\ "\'"\ \$1\} /etc/passwd
awk -F: {print \' $1} /etc/passwd
awk: syntax error near line 1
awk: illegal statement near line 1
Okay, why's the backslash still there?  And where did the
double quotes go?  The shell ate the double quotes, but the
backslash should have been eaten as well if our definitions
at the beginning are right.  They're not.  backslash has
some special rules for when it's used inside of quotes.
Basically it only protects things that aren't already
protected by the quotes.  More mud in the mix.  In practice
I hardly ever use backslash to escape things, and
practically never inside quotes.
So get rid of the backslash and see if we can protect those
quotes from the shell.  How about some more backslashes?
(since I never use them, they must be the right way to
solve this silly problem)
% awk -F: \{print\ \"'\"\ \$1\} /etc/passwd
Unmatched '.
Well poo.  Now that the quotes aren't being eaten by the
shell we need to protect that doggone single quote again.
Bring back the backslash.
% awk -F: \{print\ \"\'\"\ \$1\} /etc/passwd
awk -F: {print "'" $1} /etc/passwd
Believe it or not, this actually works (and if you can
still remember what we were trying to do, then you deserve
a prize! ;-).  It's not the solution I came up with when I
was asked this question, though.  (Larry Wall's slogan for
Perl applies equally well for shell programming:  "There's
more than one way to do it")
The first thing I thought of was to give up on trying to do
the quote thing with awk altogether and use sed to post
process the quote into the first column:
% awk -F: '{print $1}' /etc/passwd | sed "s/^/'/"
awk -F: {print $1} /etc/passwd
sed s/^/'/
A little cleaner, that.  And it points out the fact that
sometimes the best way to solve a problem is to give up on your
initial strategy if it leads you down a useless path, and break
out the part that seems impossible and do it with another tool
altogether, or do the whole thing in two steps.
But then I went hunting for a quote solution and came up
with this:
% awk '{print "'"'"'" $1}' /etc/passwd
awk {print "'" $1} /etc/passwd
Which is confusing as heck, but also works.  Let me reformat that so you
can see what quotes are protecting what:
awk
'
    {print "
'
"
    '
"
'
    " $1}
'
/etc/passwd
As clear as mud, I know.
I think I'll declare this week a complete failure and try
this one again next week ;-)
UPDATE 02/02/2005:
Internet correspondent Philippe Goossens points out an
elegant awk-only solution to the "insert a leading single
quote on just the first field in the file" problem.
Use an awk variable to contain the literal quote.
% awk -F: '{print thequote$1}' thequote="'" /etc/passwd
awk -F: {print foo$1} foo=' /etc/passwd
This avoids the whole quote nesting issue altogether.  And
proves once again that there's always more than one way to
do it.
Thanks for writing in, Philippe!
Tuesday Tiny Techie Tip -- 4 February 1997
Forward to (02/11/97)
Back to (01/28/97)
Written by Jeff Youngstrom
Up to the TTTT index
Tuesday Tiny Techie Tips are all © Copyright
1996-1997 by Jeff Youngstrom.  Please ask permission  before
reproducing any of this material.