Tuesday Tiny Techie Tip

Quoting

One of the trickiest parts about interacting with the shell is using the two different kinds of quotes and the backslash. I've seen lots of people write the definitions:
'...'
Remove special meaning of the enclosed characters except for single quote (') itself.
"..."
Remove special meaning of the enclosed characters except for dollar sign ($), backtick (`), backslash (\), and bang (exclamation point) (!).
\c
Remove special meaning of character c
But translating these "definitions" into practical usage is another matter entirely. So I'll take a crack at making some sense out of this nonsense.

First off, why do you need to use quotes at all? The answer lies in the interaction between you, the shell, and the application programs you're running. The shell generally takes what you type at the command line and breaks it up into arguments and does some pre-processing on it before calling whatever command you asked for. Here's an example:


% m -s "Here's $USER's .aliases file" $USER < ~/.aliases
/usr/ucb/mail -s Here's jeffy's .aliases file jeffy

NOTE: I'm going to have the "echo" option to the shell set for all examples in this tip so we can see what the shell comes up with after the quotes and stuff are evaluated.

So the shell did a lot of work on that command line:

All of that before any program was even called.



The two main things quotes get used for are as follows:
  1. group multiple words into a single argument to a command as with the "Here's $USER's .aliases file" subject line for the email above.
  2. keep the shell from munging stuff that it would normally munge.
Another example:
% sed s/foobar/foo bar/ testfile
sed s/foobar/foo bar/ testfile
sed: Ending delimiter missing on substitution: s/foobar/foo

Here the shell echos back our command just as we typed it, but sed(1) gives us a strange error message. It looks like sed is only seeing half of our command. This is because sed is called with arguments as "sed command file...", with the command part being a single argument. What happened to us is the shell happily broke up our command line into arguments using spaces (and tabs) to separate arguments, but we really wanted it to treat the space between "foo" and "bar" as a part of the first argument. So sed got called as:
sed 's/foobar/foo' 'bar/' testfile

Those aren't real quotes, they're just there so I can show where the argument boundaries were when sed was called.

So we need to protect the space from the shell. According to our definitions above, we can use either style of quotes to do it since the space is not in the exception list for either. We could also use a backslash. So any one of the following will work:


% sed 's/foobar/foo bar/' testfile
% sed "s/foobar/foo bar/" testfile
% sed s/foobar/foo\ bar/ testfile
% sed s/foobar/foo' 'bar/ testfile
etc...
So that's how things work when you want to just group multiple words into a single argument to a command. The other use is where things tend to get messy. What happens if you want to pass a character which would normally be expanded by the shell on to a command?
% sed s/foobar/foo $USER bar/ testfile

Here we still need to group the sed command into a single argument, but there's some ambiguity about what we want to do with the "$USER" part. If we want to have the shell expand the variable into its value, then we need to make sure we use one of the quoting methods which does not protect the dollar sign:
% sed "s/foobar/foo $USER bar/" testfile
sed s/foobar/foo jeffy bar/ testfile
or
% sed s/foobar/foo\ $USER\ bar/ testfile
sed s/foobar/foo jeffy bar/ testfile

But if we want sed to replace "foobar" with the literal string "foo $USER bar" then we need to make sure we protect the dollar sign from the shell:
% sed 's/foobar/foo $USER bar/' testfile
sed s/foobar/foo $USER bar/ testfile
or
% sed s/foobar/foo\ \$USER\ bar/ testfile
sed s/foobar/foo $USER bar/ testfile

Things get even trickier when you need to protect the quotes themselves from the shell. It gets messy because the shell does a single left to right scan to evaluate what's quoted and what isn't. This one actually came up recently, and took a while to figure out. I want to call awk and have it print the first column of a file with a single quote in front of it for each line of the file. So the first guess would be something like:
% awk -F: {print ' $1} /etc/passwd
Unmatched '.

As the shell scans that from left to right, it sees that single quote, then waits for the matching quote which never comes. So let's try to protect the single quote from the shell:
% awk -F: {print \' $1} /etc/passwd
Missing }.

Hmm. What's that mean? It looks like our curly braces match up. Here we're running into a little-known feature of csh(1) which lets you do pattern matching on files with alternation of different strings (see a future tip for details). That feature uses curly braces, so the shell is mucking with the curly braces we're trying to pass to awk. Any kind of quote will do to protect the curly braces, and we really need everything within the curlies to be treated as a single argument to awk anyway, so lets enclose the whole awk program in double quotes. We should be able to remove the backslash from the single quote too since double quotes protect single quotes:
% awk -F: "{print ' $1}" /etc/passwd
awk -F: {print ' } /etc/passwd
awk: syntax error near line 1
awk: illegal statement near line 1

Foo! What happened to our "$1"? Turns out double quotes don't protect the dollar sign, (duh, we knew that) so the shell is evaluating the value of the "$1" variable which has no value so we're ending up with a null argument to awk's print command. How about a backslash inside the double quotes to protect the dollar sign?
% awk -F: "{print ' \$1}" /etc/passwd
awk -F: {print ' \} /etc/passwd
awk: syntax error near line 1
awk: illegal statement near line 1

Huh? The backslash didn't work! Our definition of double quotes says it should have. We've run into a little-known feature of double quotes in csh. Not only do double quotes not protect the dollar sign, they FORCE the shell to evaluate it even if it's protected by some other quoting mechanism inside the quotes. So since we're enclosing the awk program in double quotes, there is no way for us to protect that dollar sign from the shell. Okay, so what if we use single quotes around the program?
% awk -F: '{print ' $1}' /etc/passwd
Unmatched '.

Okay, this at least makes a little sense. As the shell scans from left to right, it sees the first quote and waits until it finds a matching quote which it does in the quote that we're trying to send to awk unchanged, then what we were thinking of as the closing quote comes along and never gets matched. So somehow we need to tell the shell to ignore that single quote that we're trying to pass to print. Try a backslash:
% awk -F: '{print \' $1}' /etc/passwd
Unmatched '.

Hmm. Okay, remember that single quotes protect everything except themselves from the shell? That means they'll protect the backslash itself from the shell! So the backslash isn't doing its job of protecting our single quote. We could try sprinkling backslashes everywhere to avoid having to use surrounding quotes at all:
% awk -F: \{print\ \'\ \$1\} /etc/passwd
awk -F: {print ' $1} /etc/passwd
awk: syntax error near line 1
awk: illegal statement near line 1

Well, that's a little better. At least we're getting our command to awk just the way we wanted to. Unfortunately we're trying to send a character constant ("'") to awk's print command, but awk only recognizes character constants if they're enclosed in double quotes! Ack!
% awk -F: \{print\ "\'"\ \$1\} /etc/passwd
awk -F: {print \' $1} /etc/passwd
awk: syntax error near line 1
awk: illegal statement near line 1

Okay, why's the backslash still there? And where did the double quotes go? The shell ate the double quotes, but the backslash should have been eaten as well if our definitions at the beginning are right. They're not. backslash has some special rules for when it's used inside of quotes. Basically it only protects things that aren't already protected by the quotes. More mud in the mix. In practice I hardly ever use backslash to escape things, and practically never inside quotes.

So get rid of the backslash and see if we can protect those quotes from the shell. How about some more backslashes? (since I never use them, they must be the right way to solve this silly problem)


% awk -F: \{print\ \"'\"\ \$1\} /etc/passwd
Unmatched '.

Well poo. Now that the quotes aren't being eaten by the shell we need to protect that doggone single quote again. Bring back the backslash.
% awk -F: \{print\ \"\'\"\ \$1\} /etc/passwd
awk -F: {print "'" $1} /etc/passwd

Believe it or not, this actually works (and if you can still remember what we were trying to do, then you deserve a prize! ;-). It's not the solution I came up with when I was asked this question, though. (Larry Wall's slogan for Perl applies equally well for shell programming: "There's more than one way to do it")

The first thing I thought of was to give up on trying to do the quote thing with awk altogether and use sed to post process the quote into the first column:


% awk -F: '{print $1}' /etc/passwd | sed "s/^/'/"
awk -F: {print $1} /etc/passwd
sed s/^/'/

A little cleaner, that. And it points out the fact that sometimes the best way to solve a problem is to give up on your initial strategy if it leads you down a useless path, and break out the part that seems impossible and do it with another tool altogether, or do the whole thing in two steps.

But then I went hunting for a quote solution and came up with this:


% awk '{print "'"'"'" $1}' /etc/passwd
awk {print "'" $1} /etc/passwd

Which is confusing as heck, but also works. Let me reformat that so you can see what quotes are protecting what:
awk
'
    {print "
'
"
    '
"
'
    " $1}
'
/etc/passwd

As clear as mud, I know.

I think I'll declare this week a complete failure and try this one again next week ;-)


UPDATE 02/02/2005:

Internet correspondent Philippe Goossens points out an elegant awk-only solution to the "insert a leading single quote on just the first field in the file" problem.

Use an awk variable to contain the literal quote.


% awk -F: '{print thequote$1}' thequote="'" /etc/passwd
awk -F: {print foo$1} foo=' /etc/passwd

This avoids the whole quote nesting issue altogether. And proves once again that there's always more than one way to do it.

Thanks for writing in, Philippe!


Tuesday Tiny Techie Tip -- 4 February 1997
Forward to (02/11/97)
Back to (01/28/97)
Written by Jeff Youngstrom

Up to the TTTT index

Tuesday Tiny Techie Tips are all © Copyright 1996-1997 by Jeff Youngstrom. Please ask permission before reproducing any of this material.