But you know what? That shouldn't have to be the case.
Let me start by bringing up an issue that's always irked me. Say you're using BASH (though tcsh, csh, zsh are all going to do pretty much the same thing here). You want to, say, cat a file with spaces in its name. (For the uninitiated, 'cat' takes a variable number of arguments, which are names of files, and then prints their contents.) So call our file
a b c.txtAs you probably know, shells differentiate (tokenize) arguments based on spaces (using spaces as delimiters). So
cat a b c.txtwill give errors - it tries to print the file a, then the file b, then the file c.txt. Whoops.
Fortunately, the original UNIX shell designers realized that you might want to use spaces in filenames. Good call. After all, we're not DOS users, here.
cat "a b c.txt"does what we wanted. Great, you say. Problem solved.
So how about we do this?
MYVAR="a b c.txt"We again get contents of a, b, and c.txt, not of our target file. Whoops again!
cat $MYVAR
Okay, so that makes sense. Because those quotes escaped the spaces for the MYVAR= command, and then were discarded, setting the value of MYVAR to
a b c.txt. Makes sense.
So let's fix that code! We'll put quotes inside the quotes and get a quoted string for MYVAR! Now the shell will extract those quotes and send cat the parameter (singular) that we wanted.
MYVAR="\"a b c.txt\""Uh-oh! Instead, cat got three parameters again - this time they were
cat $MYVAR
"aNow, this makes sense, frustrating though it is (at first). Because after all, the real meaning of double-quotes is "spaces inside of here (and several other special characters we won't mention) are to be taken literally, not interpreted by the shell."
b
c.txt"
Now, say that we're shell designers and we're thinking this stuff up. The above is a pretty good decision. After all, if it weren't the case, how would you set MYVAR to the values
"aanyway? It might be important to do so.
b
c.txt"
However, we've created a syntactic hole for ourselves now. How can we have our shell be smart enough to understand when the quotes in a variable are supposed to be interpreted by the shell? Tough question.
So there's another route - manually escape the spaces using the escape character \ (backslash). So we do this:
MYVAR=a\ b\ c.txtWhoohoo! We did it! Go, you mighty gods of UNIX, you Stephen Bournes out there, you really did use good sense.
cat $MYVAR
But now say that the program foobar gives us the output
and we need, say, cat both those files. Okay. That's fine. We could try
"a b c.txt"
"h i j.txt"
cat `foobar`but no! Unwise, you say. It's the same problem we had before. The shell doesn't interpret the output of foobar, silly, it just passes each space-delimited token on to cat, so we try to cat the following files:
and at best get complaints about those files not existing, at worst (and this is a serious problem) get the wrong files cat-ed. Yike!
"a
b
c.txt"
"h
i
j.txt"
Okay, so you say, there must be a way around this. And there is. You just have to learn the wonder of sed!
cat `foobar | sed -e s/ /\\\\\ /g -e s/\"//g`(at least, I think that's it.) Right? What could be easier and more logical? And of course, sed isn't part of the shell, so boy, you're in a lot of trouble right now if you want this to work in a more limited environment. And make sure it's GNU sed; otherwise it might not work quite as expected. Oh, and make sure it's in your path.
The problem I'm beating over everyone's head here is this: To really accomplish everything you need in a shell, you need to learn more than a shell, you need to learn UNIX tradition. That's great and all, I guess, but wouldn't it be better if the shell could really do everything you need? Like maybe process all those strings in a more intelligent way without external tools like perl, or sed, or Python?
In short, this is how I think things were (ideally) meant to work:
- The shell is your operating environment. It should be relatively self-contained.
- (Non-shell) programs are things that we should run because they do something to files, or to the system, or to each other. In this sense, we can think of them as state-changers; and as such, we can think of their jobs as side effects, as computer science folk say. (They might also do calculation; they're better than the shell for this because they're generally faster, but we'll deal with that later.)
- Conversely, the shell should take care of expressive issues - the issues of how to direct programs to do what we want, and to coordinate them to do our bidding. (Computation, the lambda calculus teaches us, is an expressive issue, but as noted before, programs might want to do that, too).
The above leads me to conclude that what we really need is a new way of thinking about shells, and a new way to make shells fulfill the above criteria. Right now they don't, and please, don't tell me that making sed into a BASH-builtin will change this issue.
And as you might have figured out from the above distinction between side effects and expression/computation, my mind is looking in the direction of functional programming - and its ill-loved, underappreciated bastard-child, LISP.
 
 
 


No comments:
Post a Comment