arch detail

arch detail

Friday, September 29, 2006

Musings on the Flaws of *NIX Shells

I spend most of my time during the day running shell commands. They're the fastest way to get complex (and even very simple) tasks done. I do plenty of writing code in other, more serious languages, but I'm somewhat hesitant to admit that I've never been good with any shell. I'm getting there, but boy, is it frustrating. And it's taken me years to deal with that frustration instead of running off to stronger tools like, say, C.

But you know what? That shouldn't have to be the case.

Let me start by bringing up an issue that's always irked me. Say you're using BASH (though tcsh, csh, zsh are all going to do pretty much the same thing here). You want to, say, cat a file with spaces in its name. (For the uninitiated, 'cat' takes a variable number of arguments, which are names of files, and then prints their contents.) So call our file
a b c.txt
As you probably know, shells differentiate (tokenize) arguments based on spaces (using spaces as delimiters). So
cat a b c.txt
will give errors - it tries to print the file a, then the file b, then the file c.txt. Whoops.

Fortunately, the original UNIX shell designers realized that you might want to use spaces in filenames. Good call. After all, we're not DOS users, here.
cat "a b c.txt"
does what we wanted. Great, you say. Problem solved.

So how about we do this?
MYVAR="a b c.txt"
cat $MYVAR
We again get contents of a, b, and c.txt, not of our target file. Whoops again!

Okay, so that makes sense. Because those quotes escaped the spaces for the MYVAR= command, and then were discarded, setting the value of MYVAR to
a b c.txt
. Makes sense.

So let's fix that code! We'll put quotes inside the quotes and get a quoted string for MYVAR! Now the shell will extract those quotes and send cat the parameter (singular) that we wanted.
MYVAR="\"a b c.txt\""
cat $MYVAR
Uh-oh! Instead, cat got three parameters again - this time they were
"a
b
c.txt"
Now, this makes sense, frustrating though it is (at first). Because after all, the real meaning of double-quotes is "spaces inside of here (and several other special characters we won't mention) are to be taken literally, not interpreted by the shell."

Now, say that we're shell designers and we're thinking this stuff up. The above is a pretty good decision. After all, if it weren't the case, how would you set MYVAR to the values
"a
b
c.txt"
anyway? It might be important to do so.

However, we've created a syntactic hole for ourselves now. How can we have our shell be smart enough to understand when the quotes in a variable are supposed to be interpreted by the shell? Tough question.

So there's another route - manually escape the spaces using the escape character \ (backslash). So we do this:
MYVAR=a\ b\ c.txt
cat $MYVAR
Whoohoo! We did it! Go, you mighty gods of UNIX, you Stephen Bournes out there, you really did use good sense.

But now say that the program foobar gives us the output

"a b c.txt"
"h i j.txt"
and we need, say, cat both those files. Okay. That's fine. We could try
cat `foobar`
but no! Unwise, you say. It's the same problem we had before. The shell doesn't interpret the output of foobar, silly, it just passes each space-delimited token on to cat, so we try to cat the following files:

"a
b
c.txt"
"h
i
j.txt"
and at best get complaints about those files not existing, at worst (and this is a serious problem) get the wrong files cat-ed. Yike!

Okay, so you say, there must be a way around this. And there is. You just have to learn the wonder of sed!
cat `foobar | sed -e s/ /\\\\\ /g -e s/\"//g`
(at least, I think that's it.) Right? What could be easier and more logical? And of course, sed isn't part of the shell, so boy, you're in a lot of trouble right now if you want this to work in a more limited environment. And make sure it's GNU sed; otherwise it might not work quite as expected. Oh, and make sure it's in your path.

The problem I'm beating over everyone's head here is this: To really accomplish everything you need in a shell, you need to learn more than a shell, you need to learn UNIX tradition. That's great and all, I guess, but wouldn't it be better if the shell could really do everything you need? Like maybe process all those strings in a more intelligent way without external tools like perl, or sed, or Python?

In short, this is how I think things were (ideally) meant to work:
  • The shell is your operating environment. It should be relatively self-contained.

  • (Non-shell) programs are things that we should run because they do something to files, or to the system, or to each other. In this sense, we can think of them as state-changers; and as such, we can think of their jobs as side effects, as computer science folk say. (They might also do calculation; they're better than the shell for this because they're generally faster, but we'll deal with that later.)

  • Conversely, the shell should take care of expressive issues - the issues of how to direct programs to do what we want, and to coordinate them to do our bidding. (Computation, the lambda calculus teaches us, is an expressive issue, but as noted before, programs might want to do that, too).


The above leads me to conclude that what we really need is a new way of thinking about shells, and a new way to make shells fulfill the above criteria. Right now they don't, and please, don't tell me that making sed into a BASH-builtin will change this issue.

And as you might have figured out from the above distinction between side effects and expression/computation, my mind is looking in the direction of functional programming - and its ill-loved, underappreciated bastard-child, LISP.

Tuesday, September 12, 2006

strace: A Great Tool I Never Noticed

I ran across this article completely accidentally, and now I'm gawking at myself and wondering why I didn't really know about this stuff. Make your life easier, Linux folksen:
All about Linux: strace - A very powerful troubleshooting tool for all Linux users

Wednesday, September 06, 2006

iSight Experiences

I recently picked up an iSight for my girlfriend and nabbed a spare one from work (which, of course, will be returned... eventually) and started exploring the wonderful world of iChat A/V.

All in all, I'm quite impressed - it's one of the few handy-dandy out-of-the-box Mac OS X things that makes me really very glad I have a G4 laying around. It's easy enough for everybody, and it actually makes this temporary long-distance situation between C. Rose and I quite a bit easier. I enjoy it for audio over a cell phone, and the video quality is shockingly good.

I have heard that it works much better between to OS X boxes than between OS X and Windows (or as the guy at the store said, "Mac to PC" - seriously, people, PC is not the inverse of Mac - but that's a rant for another time), so I assume that there must be some major scheduling hacks going on in the kernel to make it so silky-smooth.

The one significant problem, though - and it's a doozy - is that iChat now sporadically causes my Linksys Wireless-G router to take a hard dive. The lights on the front panel keep blinking, but connection between LAN and internet goes away. At first I blamed Cox, the famously questionable service provider, but I found that power cycling my router results in the problems going away.

This is a pretty perplexing situation, and thus far I haven't Googled up any explanation. Will post on further information.

Monday, August 28, 2006

HOWTO: Subversion over SSH with different usernames

I was recently faced with the problem of needing to check out a Subversion repository on a machine where I had one username onto a machine where I had another.

This was deceptively difficult.

The problem is, svn checkout has a --username ARG option, but that only applies to Subversion. We use svn+ssh:// for security.

I tried the obvious things - svn+ssh://[user]@[host], etc - but nothing worked. After butting my head for a while, I decided to actually read up on how to do this.

Well, Subversion will let you define your own tunnelling protocol if you define the programs which they use. The trick is this: In your ~/.subversion/config, create an entry along these lines:

dummyssh = dummyssh

Then create a BASH script somewhere in your path called dummyssh and make it executable. The script should basically be this:

#!/bin/bash
ssh -l [your username] $*

Now you just do

svn checkout --username svn+dummyssh://[host]

And you can pull it off.

I have to admit, I wish the Subversion manual included this information. Hope somebody finds it useful.

LISP as an XML Replacement

This discussion deserves much more attention. Particularly on my part.

The abstract version: you think LISP is a pain? Actually, XML is a lot *more* painful, and we use it when we *should* use LISP, because people are terrified of LISP.

Prof. Salter at my alma mater, Oberlin College, made a big deal about Scheme. Nobody much appreciated it at the time, but I took his pet class, Programming Languages, my senior year, and when it came time to write some compilers in a hurry, it was amazing how after staring at the screen long enough, enlightenment came and the code would just write itself. It was enough to make me a believer.

Tragically, LISP is not Scheme, but if I can get that experience when trying to deal with actual real-life problems (like the ones XML seeks to solve) I'd be willing to learn.

The Joy of Eshell

Fellow emacsers:

Today I got embarassed because Friday at work I'd used M-x shell in front of a coworker to commit a source change to Subversion. Emacs' shell doesn't hide passwords, so I unintentionally typed mine in plain-text... in front of someone I would like to actually impress.

Whoops.

Naturally, this morning, I decided to see if anyone had though to deal with this problem, and I discovered Linkeshell which has actually been part of emacs since v. 21. Just run M-x eshell.

There are quite a few things that make eshell different than having emacs open a shell inside of its goofy psuedo-terminal buffer. First of all, it's actually a shell. Just for emacs. Written in elisp. Here are some highlights:
  • you can redirect stdout to an emacs buffer. Try env > # <buffer scratch>
  • it can be used anywhere that can run emacs, e.g. Win32, or DOS, or a good toaster. Now you don't have to install BASH on Windows to use your Windows box like a Real Computer.
  • your aliases are automatically saved between sessions.
  • you can use emacs functions like shell commands! Try alias emacs="find-file $1" for an experience in silky-smoothness.
  • up and down arrows behave like most shells, cycling through your history. However, left and right arrow allow you place the cursor *behind* the prompt and select text from the output of your last command. I've been looking for this feature in a shell for years now. Eshell makes it easy to copy, say, the PID from a ps and paste it into a "kill."
  • it's elisp, so it's easily configurable and extensible.
There are some problems - see the wishlist on the wiki. Doing a for loop into a pipe doesn't really work correctly. But that's what BASH one-liners are for.