Good Shell Coding Practices
SHELLdorado - your UNIX shell scripting resource

Good Coding - Home
	Using "getopt"
	Using "getopts"
	Frequent option names

| Top |

1. Handling Command Line Arguments

Why is it necessary to write something about command line arguments? The concept is very easy and clear: if you enter the following command

    $ ls -l *.txt

the command "ls" is executed with the command line flag "-l" and all files in the current directory ending with ".txt" as arguments.

Still many shell scripts do not accept command line arguments the way we are used to (and came to like) from other standard commands. Some shell programmers do not even bother implementing command line argument parsing, often aggravating the script's users with other strange calling conventions.

For examples on how to name command line flags to be consistent with existing UNIX commands see the table Frequent option names.

Here are some examples of bad coding practices.

Setting environment variables for script input that could be specified on the command line.

One example:

:
# AUTORUN must be specified by the user
if [ "$AUTORUN" != yes ]
then
    echo "Do you really want to run this script?"
    echo "Enter ^D to quit:"
    if read answer
    then
        echo "o.k, starting up memhog daemon"
    else
        echo "terminating"
    	exit 0
    fi
fi
# start of script...

Consider the script's user who might ponder "What was the name of this variable? FORCERUN? AUTOSTART? AUTO_RUN? or AUTORUN?"

Don't get me wrong, environment variables do have their place and can make life easier for the user. A much better way to solve the autorun option would be to implement a command line flag, i.e. "-f" for "force non-interactive execution".

Positional parameters.

Example:

:
# process - process input file

ConfigFile="$1"
InputFile="$2"
OutputFile="$3"

# Read config file
get_defaults "$ConfigFile"
# Do the processing
process_input < "$InputFile" > "$OutputFile"

This script expects exactly three parameters in exactly this order: the name of a configuration file with default settings, the name of an input file, and the name of an output file. The script could be called with the following parameters:

    $ process defaults.cf important.dat output.dat

It then reads the configuration file "defaults.cf", processes the input file "important.dat" and then writes (possibly overwriting) the output file "output.dat". Now see what happens if you call it like this:

    $ process output.dat defaults.cf important.dat

Now the script tries to read the output file "output.dat" as configuration file. If the user is lucky the script will terminate at this point, before it tries to overwrite his data file "important.dat" it will be using as the output file!

This script would have been better with the following usage:

    $ process -c default.cf -o output.dat file.dat

The command line option "-c" precedes the default file, the output file is specified with the "-o" option, and every other argument is taken to be the input file name.

Our goal are shellscripts, that use "standard" command line flags and options. We will develop a shell script code fragment that handles command line options well. You may then use this template in your shell scripts and modify it to fit your needs.

Consider the following command line:

    $ fgrep -v -i -f excludes.list *.c *.h

This command line consists of a command ("fgrep") with three flags "-v", "-i" and "-f". One flag takes an argument ("excludes.list"). After the command line flags multiple file names ("*.c", "*.h") may follow. At this point we do not know how many file names that may be; the shell will expand the file name patterns (or "wildcards") to a list of actual file names before calling the command "fgrep". The command itself does not have to deal with wildcards.

What happens if there is no file matching the pattern "*.c" in the current directory? In this case the shell will pass the parameter unchanged to the program.

If we wanted to handle command lines like the above, we must be prepared to handle

command line flags (i.e. "-v", "-i")
command line flags with arguments (i.e. "-f file")
multiple file names following the flags

The shell sets some environment variables according to the command line arguments specified:

`$0`	The name the script was invoked with. This may be a basename without directory component, or a path name. This variable is not changed with subsequent `shift` commands.
`$1`, `$2`, `$3`, ...	The first, second, third, ... command line argument, respectively. The argument may contain whitespace if the argument was quoted, i.e. "two words".
`$#`	Number of command line arguments, not counting the invocation name `$0`
`$@`	`"$@"` is replaced with all command line arguments, enclosed in quotes, i.e. "one", "two three", "four". Whitespace within an argument is preserved.
`$*`	`$` is replaced with all command line arguments. Whitespace is not* preserved, i.e. "one", "two three", "four" would be changed to "one", "two", "three", "four". This variable is not used very often, `"$@"` is the normal case, because it leaves the arguments unchanged.

The following code segment loops through all command line arguments, and prints them:

:
# cmdtest - print command line arguments

while [ $# -gt 0 ]
do
    echo "$1"
    shift
done

The environment variable $# is automatically set to the number of command line arguments. If the script was called with the following command line:

    $ cmdtest one "two three" four

$# would have the value "3" for the arguments: "one", "two three", and "four". "two three" count as one argument, because they are enclosed within quotes.

The shift command "shifts" all command line arguments one position to the left. The leftmost argument is lost. The following table lists the values of $# and the command line arguments during the iterations of the while loop:

`$#`	remaining arguments	comments
3	$1 = "one" $2 = "two three" $3 = "four"	start of the command
2	$1 = "two three" $2 = "four"	after the first `shift`
1	$1 = "four"	after the second `shift`
0		end of the `while` loop

Now that we can loop through the argument list, we can set script variables depending on command line flags:

vflag=off
while [ $# -gt 0 ]
do
    case "$1" in
        -v)  vflag=on;;
    esac
    shift
done

The command line option -v will now result in the variable vflag to be set to "on". We can then use this variable throughout the script.

Now let's improve this code fragment to handle file names. It would be nice if the script would handle all command line flags, but leave the file names alone. This way we could use the shell variable $@ with the remaining command line arguments later on, i.e.

    # ...
    grep $searchstring "$@"

and be sure that it only contains file names. But how do we recognize file names from command line switches? That's easy: files do not start with a dash "-" (at least not yet...):

vflag=off
while [ $# -gt 0 ]
do
    case "$1" in
        -v)  vflag=on;;
	-*)
            echo >&2 "usage: $0 [-v] [file ...]"
	    exit 1;;
	*)  break;;	# terminate while loop
    esac
    shift
done

This example prints a short usage message and terminates if an unknown command line flag starting with a dash was specified. If the current argument does not start with a dash (and therefore probably is a file name), the while loop is terminated with the break statement, leaving the file name in the variable "$1".

Now we just need a switch for command line flags with arguments, i.e. "-f filename". This is also pretty straight forward:

vflag=off
filename=
while [ $# -gt 0 ]
do
    case "$1" in
        -v) vflag=on;;
	-f) filename="$2"; shift;;
	-*) echo >&2 \
	    "usage: $0 [-v] [-f file] [file ...]"
	    exit 1;;
	*)  break;;	# terminate while loop
    esac
    shift
done

If the argument $1 is "-f", the next argument ($2) should be the file name. We now handled two arguments ("-f" and the filename), but the shift after the case construct will only "consume" one argument. This is the reason why we execute an initial shift after saving the filename in the variable filename. This shift removes the "-f" flag, while the second (after the case construct) removes the filename argument.

We still have a problem handling file names starting with a dash ("-"), but that's a problem every standard unix command interpreting command line switches has. It is commonly solved by inventing a special command line option named "--" meaning "end of the option list".

If you for example had a file named "-f", it could not be removed using the command "rm -f", because "-f" is a valid command line option. Instead you can use "rm -- -f". The double dash "--" means "end of command line flags", and the following "-f" is then interpreted as a file name.

Note:

You can also remove a file named "-f" using the command "rm ./-f"

The following (recommended) command line handling code is a good way to solve this problem:

vflag=off
filename=
while [ $# -gt 0 ]
do
    case "$1" in
        -v)  vflag=on;;
	-f)  filename="$2"; shift;;
	--)	shift; break;;
	-*)
	    echo >&2 \
	    "usage: $0 [-v] [-f file] [file ...]"
	    exit 1;;
	*)  break;;	# terminate while loop
    esac
    shift
done
# all command line switches are processed,
# "$@" contains all file names

The drawback of this command line handling is that it needs whitespace between the option character and an argument, ("-f file" works, but "-ffile" fails), and that multiple option characters cannot be written behind one switch character, ("-v -l" works, but "-vl" does not).

Portability:: This method works with all shells derived from the Bourne Shell, i.e. sh, ksh, ksh93, bash, pdksh, zsh.

Using "getopt"

Now this script processes its command line arguments like any standard UNIX command, with one exception. Multiple command line flags may be combined with standard commands, i.e. "ls -l -a -i" may be written as "ls -lai". This is not that easy to handle from inside of our shell script, but fortunately there is a command that does the work for us: getopt(1).

The following test shows us, how getopt rewrites the command line arguments "-vl -f file one two three":

    $ getopt f:vl -vl -ffile one two three

produces the output

    -v -l -f file -- one two three

These are the command line flags we would have liked to get! The flags "-vl" are separated into two flags "-v" and "-l". The command line options are separated from the file named by a "--" argument.

How did getopt know, that "-f" needed a second argument, but "-v" and "-l" did not? The first argument to getopt describes, what options are acceptable, and if they have arguments. An option character followed by a colon (":") means that the option expects an argument.

Now we are ready to let getopt rewrite the command line arguments for us. Since getopt writes the rewritten arguments to standard output, we use

   set -- `getopt f:vl "$@"`

to set the arguments. `getopt ...` means "the output of the command getopt", and "set -- " sets the command line arguments to the result of this output. In our example

    set -- `getopt f:vl -vl -ffile one two three`

is replaced with

    set -- -v -l -f file -- one two three

which results in the command line arguments

    -v -l -f file -- one two three

These arguments can easily be processed by the script we developed above.

Now we include getopt within our script:

vflag=off
filename=
set -- `getopt vf: "$@"`
[ $# -lt 1 ] && exit 1	# getopt failed
while [ $# -gt 0 ]
do
    case "$1" in
        -v)	vflag=on;;
	-f)	filename="$2"; shift;;
	--)	shift; break;;
	-*)
                echo >&2 \
		"usage: $0 [-v] [-f file] file ..."
		exit 1;;
	*)	break;;		# terminate while loop
    esac
    shift
done
# all command line switches are processed,
# "$@" contains all file names

The first version of this document contained the line

set -- `getopt vf: "$@"` || exit 1

This commands do not work with all shells, because the set command doesn't always return an error code if getopt fails. The line assumes, that getopt sets its return value if the command line arguments are wrong (which is almost certainly the case) and that set returns an error code if the command substitution (that executes getopt) fails. This is not always true.

Why didn't we use getopt in the first place? There is one drawback with the use of getopt: it removes whitespace within arguments. The command line

    one "two three" four

(three command line arguments) is rewritten as

    one two three four

(four arguments). Don't use the getopt command if the arguments may contain whitespace characters.

Newer shells (Korn Shell, BASH) have the build-in getopts command, which does not have this problem. This command is described in the following section.

Portability:: The getopt command is part of almost any UNIX system.

Using "getopts"

On newer shells, the getopts command is built-in. Do not confuse it with the older getopt (without the trailing "s") command. getopts strongly resembles the C library function getopt(3).

Below is a typical example of how getopts is used:

vflag=off
filename=
while getopts vf: opt
do
    case "$opt" in
      v)  vflag=on;;
      f)  filename="$OPTARG";;
      \?)		# unknown flag
      	  echo >&2 \
	  "usage: $0 [-v] [-f filename] [file ...]"
	  exit 1;;
    esac
done
shift `expr $OPTIND - 1`

Portability:: The getopts command is an internal command of newer shells. As a rule of thumb all systems that have the KSH have shells (including the Bourne Shell sh) that include a built-in getopts command.

Frequent option names

The following table should help you find good names for your command line flags. Look at the second column (Meaning), and see if you find a rough description of your command line option there. If you i.e. are searching for the name of on option to append to a file, you could use the "-a" flag.

Flag	Meaning	UNIX examples
-a	append, i.e. output to a file show/process all files, ...	`tee -a` `ls -a`
-c	count something command string	`grep -c` `sh -c` `command`
-d	directory specify a delimiter	`cpio -d` `cut -ddelimiter`
-e	expand something, i.e. tabs to spaces execute command	`pr -e` `xterm -e /bin/ksh`
-f	read input from a file force some condition (i.e. no prompts, non-interactive execution) specify field number	`fgrep -f` `file` `rm -f` `cut -ffieldnumber`
-h	print a help message print a header Note: `-t` for title may be more appropriate.	`pr -hheader`
-i	ignore the case of characters Turn on interactive mode Specify input option	`grep -i` `rm -i`
-l	long output format list file names line count login name	`ls -l`, `ps -l`, `who -l` `grep -l` `wc -l` `rlogin -lname`
-L	follow symbolical links	`cpio -L`, `ls -L`
-n	non-interactive mode numeric processing	`rsh -n` `sort -n`
-o	output option, i.e. output file name	`cc -o`, `sort -o`
-p	process id process path	`ps -p pid` `mkdir -p`
-q	quick mode quiet mode	`finger -q`, `who -q`
-r	process directories recursively Note: the flag `-R` would be better for this purpose. process something in the reverse order specify root directory	`rm -r` `sort -r, ls -r`
-R	process directories recursively	`chmod -R` `ls -R`
-s	be silent about errors Note: such an option is unnecessary, because the user can make the program silent by redirecting standard output and standard error to /dev/null.	`cat -s` `lp -s`
-t	specify tab character	`sort -ttabchar`
-u	Produce unique output process data unbuffered	`sort -u` `cat -u`
-v	print verbose output, the opposite of `-q` reverse the functionality	`cpio -v`, `tar -v` `grep -v`
-w	specify width wide output format work with words	`pr -w`, `sdiff -w` `ps -w` `wc -w`
-x	exclude something
-y	answer yes to all questions (effectively making the command non-interactive) Note: The flag `-f` may be better for this purpose.	`fsck -y`, `shutdown -y`

Now you know the standard option names, on to "standard" UNIX commands that do not use them.

dd - disk dump: dd if=infile of=outfile bs=10k

The syntax of this command probably is older than UNIX itself. One major disadvantage is that argument names and file names are written together without whitespace, i.e. if=mydoc*.txt. The shell will take "if=" as part of the file name, and cannot expand the wildcards "mydoc*.txt".
find - find files: find / -name '*.txt' -print
With this command option names have more than one character. This makes them more memorable and more readable. If only all commands would be like this! And if only -print was a default option!

By the way, did you know that the command line

    $ ls -bart -simpson -is -cool

is a valid usage for the SOLARIS ls command?

| Top |