return to first page linux journal archive
keywordscontents

The Cold, Thin Edge

Taking the Shell Paradigm to its Brutal Limits

by Todd Graham Lewis

The Shell Paradigm is described (by me at least) as taking some of a true operating system's most beautiful characteristics and bending, twisting, folding, spindling, and mutilating them into obscenely obtuse and imperfect tools. That these characteristics can be bent, twisted, etc., and still work is, of course, what gives them their beauty.

Open up your Unix toolbox (/usr/bin for you gnubies), and you will see a complete set of tools, ready for use. Much as the discovery of a basic technology distinguishes one epoch of human history from another, redirection and job control under Unix create a golden age of computing in contrast to the iron-age toils of MS-DOS. Because of the simple ability to differentiate separate, simultaneous processes and direct their input and output at your discretion, there are few limits to the ways in which you can use these tools to assemble simple Unix processes. This ability, and the will to use it, constitute the shell paradigm.

But where power resides lies danger. How much | & and popen() can a single process take before it disintegrates into a heap of intractable spaghetti code? How many different programming contexts can we use before our simple program hurtles out of control towards the nether-regions of ``Kernel Panic: Out of memory''? [A lot--ED]

This article will describe to you how to mix and match I/O streams to and from executables in different environments. If you are hacking a Perl script and want to throw a little grep in for good measure, go right ahead; it's possible. Finally, we will discuss the limits to and wisdom of these techniques.

The Shell

The capability to have processes communicate easily among themselves is inherent in the design of Unix systems, so the appellation ``shell paradigm'' is somewhat of a misnomer. Nonetheless, the shell is the context in which most people are familiar with I/O redirections, so we will start there. As we will later see, all these facilities can be easily recreated in places other than at the shell prompt.

There are several ways to use process redirection within the shell. You can take the output of a process and direct it to a file, for example:

cd ~; ls > /tmp/ls.file

Alternatively, you can append output to existing files:

cd ~/bin; ls >> /tmp/ls.file

You can also take the output of a process and redirect it as the input of another process:

cd ~; ls | grep lj.article

Within most shells, including the Bourne-compatible bash and zsh, you can integrate the output of your command within other commands. For example, if you wanted to generate a file with yesterday's time appended to the end, you could do the following:

touch /usr/acct/atlanta/data.`
  date --date '1 day ago' +"%Y%m%d"
 `

which just generated a file named data.19960503 for me. What you get depends on how quickly you read your Linux Journal. It also depends on which free OS you are running; FreeBSD's version of date does not offer the 1 day ago facility, so you will have to get and compile gnu-date if you are silly enough not to run Linux (or if your employer uses FreeBSD.)

C

External-command inclusion is nice in C when you need a function already implemented as a Unix tool which you don't want to recode. For example, if you need to sort a stream of data or compress an output file, using sort or gzip rather than coding it natively is an efficient way to accomplish the task. There are two ways to use external programs under C: system() and popen().

If you have a large amount of data in strings that you want to sort using the sort program, you can use popen() to call the sort program, sort the data and read the result back from the program. If you just want to compress a file, you can use the simpler system() function. Neither function is unfamiliar to a C programmer, but if either is unfamiliar to you, Look in the Linux man pages, where they are documented. If you want more explanation, you can read Advanced Programming in the Unix Environment, by W. Richard Stevens.

However, if you need to interact with the program you call, it is possible to do this with a C library that comes with a tool called ``Expect'', which is described later in the Tcl section.

Mother of Perl

Whereas there are a number of different ways to manipulate process I/O within the shell, there is really only one within Perl: as a filehandle. This is actually a testimony to the beauty of Perl's design; kudos to Larry Wall for making it so simple.

You can include other processes from within Perl in several different manners, all with the open () command. For example, if you wanted to open a process bottle to which the output of your Perl script should be sent, you would use

open (BOTTLE, "| ~<bin/bottle"

to direct the output. Similarly, if you wanted to read the input of bottle, you would do much the same thing, adding the pipe symbol (|) at the end:

open (BOTTLE, "~<bin/bottle |")

In the first case, you could only write to filehandle bottle, whereas in the second case, you could do nothing but read.

Commands opened in this manner can also get fancy. Everything within the quotation marks is executed from within a subshell, so commands like either of the following will work:

open (BOTTLE, "cd ~; /bin/bottle |")
open (FIND, "cd /home/tlewis; find . -name $string -print |")

At this point many people ask, ``What if I want to do both reading and writing?'' You can't do this with the open () command, so Perl is broken, right? No, not really. The fact that you can't easily open a two-way pipe is a design decision. As explained in the Unix FAQ:

The problem with trying to pipe both input and output to an arbitrary slave process is that deadlock can occur, if both processes are waiting for not-yet-generated input at the same time.

Again, it is possible to do this with Expect, as we'll see later.

A short example:

#!/usr/bin/Perl
open (ACCT, "(cd /usr/acct/;".
  "for i in `ls | grep -v admin`; do; ".
  "cat $i/date.19960503; done) | sort |");
while (<ACCT>) {
     chop;
     ($A,$B,$C) = split;
     print "$C $A $B\n";
}

This would take the data in a limited subset of the /usr/acct/ directory, sort it based on the first entry in each line of each file, reformat the data and print it to standard output. By mixing Perl and shell tools, this job becomes a lot easier.

Tcl/Tk

Tcl is a simple scripting language designed as a command language which could easily be applied to various C programs for smooth configuration and user interaction. Tk is a language which grew out of Tcl in which graphical user interfaces can be constructed. One usually refers to them together as Tcl/Tk.

Tk has gained much popularity recently as an extremely easy way to construct graphical interfaces under X-Windows. If you have used make xconfig when compiling any of the recent (since 1.3.60) development kernels, you have used Tk. The program Tkined, a network management tool for Linux, uses Tk; it is based on Scotty, a Tcl extension offering various network functions such as access to SNMP data.

In accordance with its original design goals, Tcl allows you to interact with external processes in a fairly intuitive manner. Simple commands may be executed under Tcl with a simple exec command. For example:

exec ls | grep -v admin

returns exactly the same result as it did in the previous Perl example, but prints it to standard output, much like the system() command in C.

If you wish to interact with the output of a process or direct information to its input, you need to associate it with a filehandle, much as in Perl. This is done via the open command, as in:

set g0 [open |sort r+]

This opens the command sort for input. You would send data to the handle g0 elsewhere in the program using puts and then read from the output using gets. The r+ switch means that you can both write data to the process (data to be sorted) and read data from the process (sorted data). If you just wanted the data to be sent to standard output, you would use:

set g0 [open |sort w]

giving you write access to the process.

Wait, you say, this means that I can both read and write from a process? Yes, it does. Doesn't the Unix FAQ say this is a bad thing? Yes, it does. If you use this functionality to construct webs of interlocking, self-feeding processes, then you are really asking for trouble. Keep it simple if you are going to do this at all.

Expect

While it is potentially dangerous, people went so wild over this feature of Tcl that an extension to Tcl called Expect, a programming environment in its own right, was invented and has soared to new heights of popularity among certain users.

For example, ftp is a fairly simple program. You interact via a command line with a local program which then executes your commands. Because this uses the simple Unix STDIN/STDOUT method of interaction, you can write shell scripts to ftp files; I use such a script to retreive RFCs from the Internet automatically. However, a program like telnet is virtually impossible to script because you are not sending data to the program itself--you are sending data through a network connection to be interpreted on a remote machine. So, if you need to maintain a large number of routers, and if the only way to configure or check on these routers is via telnet, you are in trouble.

Expect solves this problem by using Unix's pseudo-tty mechanism. With Expect, you can script dialogues between your program and another one in which your program responds intelligently to the other. Think of a dialer program like dip or chat, except you can script dialogues with other programs instead of modems.

Expect is the height of inter-program communication, short of socket-based or sysV-ipc. (If you don't know, don't ask.) While it originally started as a Tcl extension, it has also been rendered into a C library; you can access its features from within C programs or from other environments which can use C libs, such as Perl.

Smooth Sailing, But Rocks Ahead

In the introduction to his book Tcl and the Tk Toolkit, John Ousterhout mentions that even though Tcl was originally designed to be a simple scripting language where all programs would have at least ``some new C code'', the simplicity of the environment which they gave the programmer proved too enticing. ``Most Tcl/Tk users never write any C code at all,'' Ousterhout writes, ``and most Tcl/Tk applications consist solely of Tcl scripts.''

This is either a good or a bad thing, depending on whether your criteria are ease-of-use or efficiency/power. Responding to the rise of Tcl in his typically understated manner, GNU Luminary and urban legend Richard Stallman posted a USENET article entitled ``Why you should not use Tcl'':,

Tcl was not designed to be a serious programming language. It was designed to be a ``scripting language'', on the assumption that a ``scripting language'' need not try to be a real programming language. So Tcl doesn't have the capabilities of one.

The ability to interact with other programs in new, unorthodox and some would say dangerous ways is what makes Tcl so appealing to some and so appalling to others. This is typical of the dilemma in using Unix tools from within non-shell programs.

Conclusion

It usually comes down to a matter of time. If you're trying to enter your code in the country fair, these techniques aren't going to win you a blue ribbon. If, however, you want to get it done by 7 PM so you can go to the fair, these might do the trick.

In an age of near-gigaflops-speed chips in home computers, a few wasted cycles here and there aren't going to kill anyone, especially for a program that will be run once or twice and then thrown away. Extending the shell philosophy to development work is also an attractive option--the speed with which you can hack together workable programs makes these techniques alluring to programmers on a tight deadline. Tcl/Tk is a perfect example of extending the shell philosophy to speed up development cycles. Of course, the inefficiencies of this approach are the cause of nearly all of the intense debate over the merits of Tcl/Tk.

Whether it be Tcl, shell, Perl, or C, no matter what your programming technique of choice might be, there is usually an option whereby tools from other programming environments can be imported for your use. If Richard Stallman writes you a nasty letter criticizing you for it, though, don't say you weren't warned.

Todd Graham Lewis has moved on to bigger and much better things with Mindspring Enterprises, the largest Internet Service Provider in the Southeastern US. There, he is learning a lot from his fellow engineers who have fancy ``Computer Science'' degrees. He wonders why everyone doesn't learn computing the same way he did--by playing with his Linux box. He may be reached at tlewis@mindspring.com .

  Previous    Next