A web browser sends a request to a web server. The request goes to port number 80 on the serving computer. A request looks like this:
GET /program.cgi HTTP/1.0 Host: www.lemoda.net
Each line is followed by a carriage return and a line feed character. At the end of the request, there is a blank line.
If the browser requests a file such as
web server software searches for the file and sends it.
A web server which can run CGI programs,
on receiving the instruction above, checks it, then looks for a
/program.cgi. If the web server does not find the
program, the web server returns a "not found" error back over the
internet to the user.
If the program is found, and the web server software has been set up to run CGI programs, the web server software starts the program, and then sends the program's output to the user.
The web server software sends information to a CGI program
/program.cgi via environment variables. It gets
information back from the program via the program's standard
output. Web servers do not get information back from the CGI program
via environment variables. Web servers do not use command line
options. In some cases (see Post requests), web servers use
the program's standard input to send information to a program.
The server starts the program. If the program expects input, it may
read the input from environment variables. In C, the value of an
environment variable is obtained using the
call. (See Set and get environment variables in C).
The type of request which the user has made is given by the
REQUEST_METHOD. This usually has the
request is a request where the user sends information to the CGI
program via the program's standard input. See Post requests.
The CGI program returns a message to the user via its "standard
output". In the case of a C program, that means that
printf statements are enough to send the message from
the CGI program via the web server.
The message consists of two parts. The first consists of HTTP headers, and the second can take any form. Between the two parts there is a blank line.
The message returned by a CGI program must begin with correctly
formatted HTTP (HyperText Transfer Protocol) headers. The bare minimum allowed is
Content-Type header with a MIME type for
the content to follow:
Content-Type: text/plain hello world
Here the MIME type of the
text/plain, indicating that the output
of the CGI program is "just text", with no formatting added. For
example, the Figlet server at LeMoDa.net returns its output as plain
text. Another common option for CGI output is
which says that the output of the CGI program is HTML (HyperText
The headers output by the CGI program do not need to end with carriage return plus line feed. The server guarantees that it will alter any line feeds into carriage return plus line feeds before sending the output over the internet.
At the end of the HTTP headers, there is a blank line. This blank line indicates the end of the headers and the start of the text returned by the CGI program.
A CGI program may also print a
Status header which
indicates the status of the request in the form
Status: 200 (OK)
The web server turns this into an HTTP header
HTTP/1.1 200 OK
Here 200 means "success". A full set of possible statuses can be found on this website as HTTP status codes as C defines. Common status codes are 404, which occurs when something is not found, and 500, which occurs when a program or the server fails.
A CGI program does not need to print a status line, since the server
adds a 200 status if no
Status: header is found in the
A common problem when creating CGI programs is accidental output of text before the HTTP header. This error may be caused by print statements added for debugging or by forgetting to print the header first. The print statements appear to be part of the HTTP header. The web server software sends an error message to the user about "malformed HTTP headers" with a status of 500, instead of the intended message.
A request to a CGI program can also send information from an HTML form. HTML forms are described in the HTML 4.01 specification. If the HTML form uses the "GET" method, then the contents of the form are converted into a part of the URL of the request itself. For example, a form like
<FORM action='http://www.lemoda.net/games/figlet/figlet.cgi' method='GET'> <INPUT name='text' type='text' value='monty'> <INPUT type='submit'> </FORM>
which looks like
sends a request to the server of the form
where the "action" is the URL of the CGI script itself, the name of the form field "text" is the word after the question mark, and the value given to that field in the form is the word after the equals sign.
If there are multiple fields in the form, they are separated with an ampersand. For example,
<FORM action='http://www.lemoda.net/games/figlet/figlet.cgi' method='GET'> <INPUT name='text' type='text' value='monty'> <INPUT name='width' type='text' value='80'> <INPUT type='submit'> </FORM>
which creates a form like
sends a request to the server of the form
This URL is sent to the web server. The web server separates out the
query string from the URL and passes it to the CGI program in an
QUERY_STRING. The CGI program must
parse the query string to extract the parameters; the web server does
not do that job.
Some characters are not allowed to appear in a URL. In order to circumvent the restrictions on allowed characters, percent encoding or URL encoding substitutes disallowed characters. The method used is to substitute disallowed characters with a hexadecimal number, preceded by a percent sign.
For example, if one types
@&=+?# into the above HTML
form, the browser converts this string of disallowed characters into
where each character is now represented by a percent sign and its
hexadecimal ASCII value. For example,
The web server software does not decode the percent encodings before sending them to the CGI program. It is left up to the CGI program to do this.
Percent encodings may also be used to encode non-ASCII characters. In this case, the interpretation of the bytes depends on the text encoding used in the original web page.
A "post" request is a request to the web server where the user sends some input along with the request. For example, if the user types some text into a form, or uploads an image, these are usually sent as "post" requests.
The user's input is sent to the CGI program via the program's "standard input".
Post requests involve two new variables,
CONTENT_LENGTH gives the
number of bytes to expect on standard input.
is the value of the
Content-Type: header sent with the
user's request from the web browser, and tells the CGI program what
kind of content it will receive.
There are two main types of content which may be sent. The default behaviour of an HTML form is to send the form's contents as url-encoded. For example,
<form action='http://www.lemoda.net/games/figlet/figlet.cgi' method='POST'> <input type='text' name='text' value='dandy!'> <input type='text' name='width' value='10'> <input type='submit'> </form>
which looks like
sends a message of the form
CONTENT_LENGTH of 22, representing the twenty-two
bytes of the above text plus a final carriage return character. This
uses exactly the same ampersand-separated, percent-encoded format as
for GET requests.
It is possible to specify this encoding by setting
of the HTML form element to the value
<form action='http://www.lemoda.net/games/figlet/figlet.cgi' method='POST' enctype='application/x-www-form-urlencoded'>
but this is not necessary since this is already the default.
As with the GET method, the web server software does not split the parameters or decode the percent encoding, so the CGI software must do this itself.
The encoding described in Simple forms is inefficient for transferring large binary files such as images, since each non-ASCII byte turns into three bytes if the percent encoding is used. The "multipart/form-data" MIME type is suitable for transferring binary files.
This encoding sends the values from each field of a form as individual
parts of a MIME message separated by a boundary. The boundary is
extracted from the
CONTENT_TYPE header. For example, a
<form action='http://www.lemoda.net/games/figlet/figlet.cgi' method='POST' enctype='multipart/form-data'> <input type='text' name='text' value='dandy!'> <input type='text' name='width' value='10'> <input type='submit'> </form>
which looks like this
turns into a
CONTENT_TYPE value of the form
and standard input of the following kind:
------WebKitFormBoundaryHhVVGE1xV4vmz0WV Content-Disposition: form-data; name="text" dandy! ------WebKitFormBoundaryHhVVGE1xV4vmz0WV Content-Disposition: form-data; name="width" 10 ------WebKitFormBoundaryHhVVGE1xV4vmz0WV--
The boundary text is
repeated to split the message into pieces. The exact form of the
boundary text is random and depends on the web browser the user
has. The above example is from Google Chrome. The CGI program has to
extract the boundary string from the
variable and then split standard input. The marker for the very last place to split is
marked by the boundary plus two minus signs added at the end.
Unlike the URL encoding, characters like ! are not percent-encoded in this format. Thus this method is more suitable for a transfer of a large amount of binary data. Percent encoding triples the length of most of the binary data, but multipart/form-data leaves it in its original form.
The usual behaviour of a CGI program is to run, create the web page or other data requested, and then halt. Therefore, even if the user sends some identification with a first message, the CGI program does not remember the user for the next request. The user therefore needs to inform the CGI program of his identity each time.
The most common way for a user to give an identity is by means of "cookies". Cookies are small pieces of information supplied by the server to the user, which the user then returns to the server with each request.
The server sets a cookie on the user's computer with
sent as part of the HTTP headers (see HTTP headers). The cookie is then remembered on the user's computer until he closes his browser. It is also possible to make a cookie persist beyond the point when the user closes the browser by setting a date.
Set-Cookie: name=value; expires=Mon, 27-Oct-2010 10:10:10
It is also possible to delete a cookie by setting the expiry date to before the current time.
The user sends back his identifying cookie with each subsequent
request to the web server using the
Cookie: field of the
HTTP request. One page may have more than one cookie set.
The value of the
Cookie field of the request is made
available to the CGI program via the HTTP
HTTP_COOKIE. The CGI program must extract the
relevant cookie from a list of semicolon-separated cookies.
The MD5 checksum is a way of ensuring the integrity of data as it travels across the internet. The value of the MD5 checksum is calculated from the content of the page before any compression (see Compressing the output) is applied. It does not apply to the HTTP headers.
See RFC 1864 for more on the Content-MD5 header.
There are various ways to compress the output of a CGI script. Whether
or not these may be sent by the CGI program is controlled by
Accept-Encoding header of the HTTP request which the
browser sends. Its value is available to a CGI program via the
HTTP_ACCEPT_ENCODING. Most browsers
and web crawlers accept data compressed using the
Only the part of the message after the HTTP headers is compressed. The HTTP headers are never compressed.
The CGI program informs the client that it is sending compressed
content using the
Content-Encoding header. For example,
if it sends gzip-compressed output, this looks like
Compression is usually applied to data in text form, such as with the
text/plain, but not
to image files such as JPEG or PNG files, which are already
For a simple example of compressing CGI output, see Compressing CGI output with Perl.
stdio.hheader file, and they are called