SHELLdorado Newsletter 3/2001 - August 26, 2001
================================================================
The "SHELLdorado Newsletter" covers UNIX shell script related
topics. To subscribe to this newsletter, leave your e-mail
address at the SHELLdorado home page:
http://www.shelldorado.com/
"Heiner's SHELLdorado" is a place for UNIX shell script
programmers providing
Many shell script examples,
shell scripting tips & tricks + more...
================================================================
Focus on CGI programming
Contents
o Editorial
o Q&A: How can I write CGI programs using shell scripts?
o Q&A: How can I encode/decode URL data?
o Q&A: Are there further CGI resources at the SHELLdorado?
o Amendment: Arrays for Bourne shell
-----------------------------------------------------------------
>> Editorial
-----------------------------------------------------------------
The World Wide Web is one of the most exciting topics today.
It provides a wealth of information for people searching
information, and an comparingly easy way for programmers to
present it. No need to guess screen sizes and screen column
positions: just create HTML output, and the browser will
render text and images in the best possible way.
If you never tried to create HTML pages, or do not know what
CGI is, this issue of the SHELLdorado Newsletter is not
written for you. Wait for the next one, or browse some of
the back issues:
http://www.shelldorado.com/newsletter/
If you are writing CGI script of your own, or are planning
to do so, please read on! You will find tools and tips for
writing CGI scripts using the Bourne (or Korn) shell.
Heiner Steven, Editor
<heiner.steven@shelldorado.com>
-----------------------------------------------------------------
>> Q&A: How can I write CGI programs using shell scripts?
-----------------------------------------------------------------
The Web mostly consists of static HTML pages, connected with
hyperlinks. But the real power of the web becomes apparent
with dynamically generated pages: pages that are created
"on-the-fly" for each request, and therefore are always
up-to-date.
One method for creating Web pages dynamically are CGI
("Common Gateway Interface") programs or scripts. While Perl
or Java are the languages of choice for many CGI
programmers, shell scripts offer some advantages over the
other languages, e.g. the shell is present on any UNIX
system without the need to install additional language
interpreters or compilers, and scripts tend to be short and
easily written.
The following shell script template may be used to create
arbitrary CGI scripts. It parses CGI arguments into shell
script variables, and creates the minimum output required
for a CGI script.
The CGI arguments are processed by a helper script "urlgetopt":
http://www.shelldorado.com/scripts/cmds/urlgetopt
The script is executable without any changes, and just
prints the arguments it was invoked with. Install it as a
template into the server's /cgi-bin/ directory, and you can
start to used to create new CGI scripts on your own!
#! /bin/sh
# Minimal example of a CGI script. Uses "urlgetopt" to
# parse CGI arguments.
# Append directory of "urlgetopt" to command search
# path:
PATH=$PATH:/usr/local/bin export PATH
# Print minimum CGI header. For debugging "text/plain"
# could be used instead of "text/html". Note the
# embedded new line:
echo "Content-type: text/html
"
# Debugging: redirect error messages to standard output
# where we can see them:
exec 2>&1
# The standard variable REQUEST_METHOD tells us, if the
# CGI arguments are in the variable QUERY_STRING, or if
# we have to read them from standard input
case "$REQUEST_METHOD" in
GET) # Arguments in QUERY_STRING
;;
POST)
# Form data is on the first line of the standard
# input:
read query_string
QUERY_STRING=$query_string
;;
*)
echo "ERROR: request method: $REQUEST_METHOD"
exit 1
;;
esac
# Parse CGI arguments, and set environment variables for
# each HTML form variable. Prefix each variable name
# with "FORM_", i.e. the contents of the HTML form name
# "email" become available in the variable "FORM_email".
# If this script was invoked in the following way:
# http://host/cgi-bin/test?email=nn@mail.com&street=none
# the variables "FORM_email" and "FORM_street" would be
# set.
eval "`urlgetopt -l -p FORM_ \"$QUERY_STRING\"`"
# At this point, the form variables are accessible using
# FORM_* environment variables. The script should print
# HTML code to standard output
echo "DEBUG: CGI arguments:"
set | grep "^FORM_"
exit 0
A more sophisticated version (written for Korn Shell) is
available at the SHELLdorado:
http://www.shelldorado.com/scripts/cmds/cgitemplate.ksh
This version additionally shows, how a script could look
like that may be used interaktively from the command line,
or be invoked as a CGI program.
[Further reading:
NCSA: The Common Gateway Interface.
http://hoohoo.ncsa.uiuc.edu/cgi/overview.html
]
-----------------------------------------------------------------
>> Q&A: How can I encode/decode URL data?
-----------------------------------------------------------------
Within CGI scripts it is sometimes necessary to encode or
decode HTML form data. The data usually is encoded using the
MIME type "application/x-www-form-urlencoded", which
basically represents "unsafe" characters with a percent
character ('%') followed by their code value printed as a
two-digit hex code. If a user e.g. entered the string
"a:*.txt" the encoded string would look like "a%3A%2A.txt",
where %3A is the representation of a colon (':'), and %2A
represents an asterisk ('*'). A space character may be
encoded using %20 (using its ASCII code 32 = hex 20) or just
with a plus sign ('+').
The following two scripts handle encoding/decoding of
"urlencoded" data:
http://www.shelldorado.com/scripts/cmds/urlencode
http://www.shelldorado.com/scripts/cmds/urldecode
The following example shows how the scripts could be used
e.g. from a CGI script:
# translate - translate English word into German, and
# vice versa
#...
baseurl=http://dict.leo.org/?search=
word=shell
encoded=`echo "$word" | urlencode`
requrl=$baseurl$encoded
# retrieve $requrl, parse output...
Note that the "urlgetopt" script stated above already
decodes "urlencoded" data.
[Further links:
Berners-Lee, Tom: Uniform Resource Locators (URL).
RFC 1738, December 1994.
http://www.ietf.org/rfc/rfc1738.txt
]
-----------------------------------------------------------------
>> Q&A: Are there further CGI resources at the SHELLdorado?
-----------------------------------------------------------------
The following scripts may be useful for CGI script
programemrs:
o dumphtmltbl - extract ASCII table data from HTML page
Example:
$ wget -O- http://www.table.com | dumphtmltbl
http://www.shelldorado.com/scripts/cmds/dumphtmltbl
o htmltable - formats ASCII data as HTML table
Example:
$ ls | htmltable
http://www.shelldorado.com/scripts/cmds/htmltable
o striphtml - removes all HTML tags from a page
Example:
$ striphtml index.html
http://www.shelldorado.com/scripts/quickies/striphtml
o fmtlinks - create HTML links
Example (in a CGI program):
echo "<pre>"
fmtlinks linklist.txt
echo "</pre>"
http://www.shelldorado.com/scripts/quickies/fmtlinks
o extracturl - extract URL list from text file
Example:
extracturl index.html
http://www.shelldorado.com/scripts/quickies/extracturl
-------------------------------------------------------
The following scripts are examples on how to
retrieve and process information from the Web:
o dailynews - prints daily news message from the Web
http://www.shelldorado.com/scripts/quickies/dailynews
findhomepage, guesshomepage
- see SHELLdorado Newsletter Juny 2001
http://www.shelldorado.com/newsletter/issues/2001-2-Jun.html
o translate - ...words between English and German
http://www.shelldorado.com/scripts/quickies/translate
o syn - find synonyms for words
http://www.shelldorado.com/scripts/quickies/syn
-----------------------------------------------------------------
>> Amendments: Arrays for Bourne shell
-----------------------------------------------------------------
The last SHELLdorado Newsletter contained an example on how
to simulate arrays for the Bourne shell using "eval"
(http://www.shelldorado.com/newsletter/issues/2001-2-Jun.html).
Daniel E. Singer kindly pointed out, that the quoting for
the "eval" command was unnecessary complicated. He suggested
calling "eval" with just one argument, and apply proper
quoting, e.g. var$n="$value" is rewritten to read:
eval "var$n=\"\$value\""
instead of
eval var$n="'""$value""'"
This makes quoting a little easier, and even works if the
variable on the right hand side of the assignment contains a
"single quote" character (').
----------------------------------------------------------------
If you want to comment on the newsletter, have suggestions for
new topics to be covered in one of the next issues, or even want
to submit an article of your own, send an e-mail to
mailto:heiner.steven@shelldorado.com
================================================================
To unsubscribe send a mail with the body "unsubscribe" to
newsletter@shelldorado.com
================================================================