arch detail

arch detail

Friday, September 29, 2006

Musings on the Flaws of *NIX Shells

I spend most of my time during the day running shell commands. They're the fastest way to get complex (and even very simple) tasks done. I do plenty of writing code in other, more serious languages, but I'm somewhat hesitant to admit that I've never been good with any shell. I'm getting there, but boy, is it frustrating. And it's taken me years to deal with that frustration instead of running off to stronger tools like, say, C.

But you know what? That shouldn't have to be the case.

Let me start by bringing up an issue that's always irked me. Say you're using BASH (though tcsh, csh, zsh are all going to do pretty much the same thing here). You want to, say, cat a file with spaces in its name. (For the uninitiated, 'cat' takes a variable number of arguments, which are names of files, and then prints their contents.) So call our file
a b c.txt
As you probably know, shells differentiate (tokenize) arguments based on spaces (using spaces as delimiters). So
cat a b c.txt
will give errors - it tries to print the file a, then the file b, then the file c.txt. Whoops.

Fortunately, the original UNIX shell designers realized that you might want to use spaces in filenames. Good call. After all, we're not DOS users, here.
cat "a b c.txt"
does what we wanted. Great, you say. Problem solved.

So how about we do this?
MYVAR="a b c.txt"
cat $MYVAR
We again get contents of a, b, and c.txt, not of our target file. Whoops again!

Okay, so that makes sense. Because those quotes escaped the spaces for the MYVAR= command, and then were discarded, setting the value of MYVAR to
a b c.txt
. Makes sense.

So let's fix that code! We'll put quotes inside the quotes and get a quoted string for MYVAR! Now the shell will extract those quotes and send cat the parameter (singular) that we wanted.
MYVAR="\"a b c.txt\""
cat $MYVAR
Uh-oh! Instead, cat got three parameters again - this time they were
"a
b
c.txt"
Now, this makes sense, frustrating though it is (at first). Because after all, the real meaning of double-quotes is "spaces inside of here (and several other special characters we won't mention) are to be taken literally, not interpreted by the shell."

Now, say that we're shell designers and we're thinking this stuff up. The above is a pretty good decision. After all, if it weren't the case, how would you set MYVAR to the values
"a
b
c.txt"
anyway? It might be important to do so.

However, we've created a syntactic hole for ourselves now. How can we have our shell be smart enough to understand when the quotes in a variable are supposed to be interpreted by the shell? Tough question.

So there's another route - manually escape the spaces using the escape character \ (backslash). So we do this:
MYVAR=a\ b\ c.txt
cat $MYVAR
Whoohoo! We did it! Go, you mighty gods of UNIX, you Stephen Bournes out there, you really did use good sense.

But now say that the program foobar gives us the output

"a b c.txt"
"h i j.txt"
and we need, say, cat both those files. Okay. That's fine. We could try
cat `foobar`
but no! Unwise, you say. It's the same problem we had before. The shell doesn't interpret the output of foobar, silly, it just passes each space-delimited token on to cat, so we try to cat the following files:

"a
b
c.txt"
"h
i
j.txt"
and at best get complaints about those files not existing, at worst (and this is a serious problem) get the wrong files cat-ed. Yike!

Okay, so you say, there must be a way around this. And there is. You just have to learn the wonder of sed!
cat `foobar | sed -e s/ /\\\\\ /g -e s/\"//g`
(at least, I think that's it.) Right? What could be easier and more logical? And of course, sed isn't part of the shell, so boy, you're in a lot of trouble right now if you want this to work in a more limited environment. And make sure it's GNU sed; otherwise it might not work quite as expected. Oh, and make sure it's in your path.

The problem I'm beating over everyone's head here is this: To really accomplish everything you need in a shell, you need to learn more than a shell, you need to learn UNIX tradition. That's great and all, I guess, but wouldn't it be better if the shell could really do everything you need? Like maybe process all those strings in a more intelligent way without external tools like perl, or sed, or Python?

In short, this is how I think things were (ideally) meant to work:
  • The shell is your operating environment. It should be relatively self-contained.

  • (Non-shell) programs are things that we should run because they do something to files, or to the system, or to each other. In this sense, we can think of them as state-changers; and as such, we can think of their jobs as side effects, as computer science folk say. (They might also do calculation; they're better than the shell for this because they're generally faster, but we'll deal with that later.)

  • Conversely, the shell should take care of expressive issues - the issues of how to direct programs to do what we want, and to coordinate them to do our bidding. (Computation, the lambda calculus teaches us, is an expressive issue, but as noted before, programs might want to do that, too).


The above leads me to conclude that what we really need is a new way of thinking about shells, and a new way to make shells fulfill the above criteria. Right now they don't, and please, don't tell me that making sed into a BASH-builtin will change this issue.

And as you might have figured out from the above distinction between side effects and expression/computation, my mind is looking in the direction of functional programming - and its ill-loved, underappreciated bastard-child, LISP.

Tuesday, September 12, 2006

strace: A Great Tool I Never Noticed

I ran across this article completely accidentally, and now I'm gawking at myself and wondering why I didn't really know about this stuff. Make your life easier, Linux folksen:
All about Linux: strace - A very powerful troubleshooting tool for all Linux users

Wednesday, September 06, 2006

iSight Experiences

I recently picked up an iSight for my girlfriend and nabbed a spare one from work (which, of course, will be returned... eventually) and started exploring the wonderful world of iChat A/V.

All in all, I'm quite impressed - it's one of the few handy-dandy out-of-the-box Mac OS X things that makes me really very glad I have a G4 laying around. It's easy enough for everybody, and it actually makes this temporary long-distance situation between C. Rose and I quite a bit easier. I enjoy it for audio over a cell phone, and the video quality is shockingly good.

I have heard that it works much better between to OS X boxes than between OS X and Windows (or as the guy at the store said, "Mac to PC" - seriously, people, PC is not the inverse of Mac - but that's a rant for another time), so I assume that there must be some major scheduling hacks going on in the kernel to make it so silky-smooth.

The one significant problem, though - and it's a doozy - is that iChat now sporadically causes my Linksys Wireless-G router to take a hard dive. The lights on the front panel keep blinking, but connection between LAN and internet goes away. At first I blamed Cox, the famously questionable service provider, but I found that power cycling my router results in the problems going away.

This is a pretty perplexing situation, and thus far I haven't Googled up any explanation. Will post on further information.