A web browser sends a request to a web server. The request goes to port1 number 80 on the serving computer. A request looks like this:
GET /program.cgi HTTP/1.0 Host: www.lemoda.netEach line is followed by a carriage return and a line feed character.2 At the end of the request, there is a blank line.
If the browser requests a file such as index.html, the
web server software searches for the file and sends it.
A web server which can run CGI programs3,
on receiving the instruction above, checks it, then looks for a
program /program.cgi. If the web server does not find the
program, the web server returns a "not found" error back over the
internet to the user4.
If the program is found, and the web server software has been set up to run CGI programs, the web server software starts the program, and then sends the program's output to the user.
The web server software sends information to a CGI program
like /program.cgi via environment variables. It gets
information back from the program via the program's standard
output. Web servers do not get information back from the CGI program
via environment variables. Web servers do not use command line
options. In some cases (see Post requests), web servers use
the program's standard input to send information to a program.
The server starts the program. If the program expects input, it may
read the input from environment variables. In C, the value of an
environment variable is obtained using the getenv library
call. (See Set and get environment variables in C).
The type of request which the user has made is given by the
environment variable REQUEST_METHOD. This usually has the
value GET or POST. The POST
request is a request where the user sends information to the CGI
program via the program's standard input. See Post requests.
The CGI program returns a message to the user via its "standard
output". In the case of a C program, that means that
printf statements are sufficient to send back a message
from the CGI program via the web server.
The message must consist of two parts. The first part must consist of correctly-formatted HTTP headers, but the second part is completely arbitrary.
The message returned by a CGI program must begin with correctly
formatted HTTP (HyperText Transfer Protocol5) headers. The bare minimum allowed is
the Content-Type header with a MIME6 type for
the content to follow:
Content-Type: text/plain
hello world
Here the MIME type of the message is text/plain,
indicating that the output of the CGI program is "just text", with no
formatting added. For example, the Figlet server at LeMoDa.net returns its output
as plain text. Another common option for CGI output
is text/html, which says that the output of the CGI
program is HTML (HyperText Markup Language).
The headers output by the CGI program do not need to end with carriage return plus line feed. The server guarantees that it will alter any line feeds into carriage return plus line feeds before sending the output over the internet.
At the end of the HTTP headers, there is a blank line. This blank line indicates the end of the headers and the start of the text returned by the CGI program.
A CGI program may also print a Status header which
indicates the status of the request in the form
Status: 200 (OK)The web server turns this into an HTTP header
HTTP/1.1 200 OKHere 200 means "success". A full set of possible statuses can be found on this website as HTTP status codes as C defines. Common status codes are 404, which occurs when something is not found, and 500, which occurs when a program or the server fails.
A CGI program does not need to print a status line, since the server
adds a 200 status if no Status: header is found in the
CGI output.
A common problem when creating CGI programs is accidental output of text before the HTTP header. This error may be caused by print statements added for debugging or by forgetting to print the header first. The print statements appear to be part of the HTTP header. The web server software sends an error message to the user about "malformed HTTP headers" with a status of 500, instead of the intended message.
A request to a CGI program can also send information from an HTML form. HTML forms are described in the HTML 4.01 specification. If the HTML form uses the "GET" method, then the contents of the form are converted into a part of the URL of the request itself. For example, a form like
<FORM action='http://www.lemoda.net/games/figlet/figlet.cgi' method='GET'> <INPUT name='text' type='text' value='monty'> <INPUT type='submit'> </FORM>which looks like
http://www.lemoda.net/games/figlet/figlet.cgi?text=montywhere the "action" is the URL of the CGI script itself, the name of the form field "text" is the word after the question mark, and the value given to that field in the form is the word after the equals sign.
If there are multiple fields in the form, they are separated with an ampersand. For example,
<FORM action='http://www.lemoda.net/games/figlet/figlet.cgi' method='GET'> <INPUT name='text' type='text' value='monty'> <INPUT name='width' type='text' value='80'> <INPUT type='submit'> </FORM>which creates a form like
http://www.lemoda.net/games/figlet/figlet.cgi?text=monty&width=80
This URL is sent to the web server. The web server separates out the
query string from the URL and passes it to the CGI program in an
environment variable QUERY_STRING. The CGI program must
parse the query string to extract the parameters; the web server does
not do that job.
Some characters are not allowed to appear in a URL. In order to circumvent the restrictions on allowed characters, percent encoding or URL encoding substitutes disallowed characters. The method used is to substitute disallowed characters with a hexadecimal number, preceded by a percent sign.
For example, if one types @&=+?# into the above HTML
form, the browser converts this string of disallowed characters into
the form
http://www.lemoda.net/games/figlet/figlet.cgi?text=%40%26%3D%2B%3F%23where each character is now represented by a percent sign and its hexadecimal ASCII value. For example,
%40
represents @, and %3F
represents ?.
The web server software does not decode the percent encodings before sending them to the CGI program. It is left up to the CGI program to do this.
Percent encodings may also be used to encode non-ASCII characters. In this case, the interpretation of the bytes depends on the text encoding used in the original web page.
A "post" request is a request to the web server where the user sends some input along with the request. For example, if the user types some text into a form, or uploads an image, these are usually sent as "post" requests.
The user's input is sent to the CGI program via the program's "standard input"7
Post requests involve two new variables, CONTENT_LENGTH
and CONTENT_TYPE. CONTENT_LENGTH gives the
number of bytes to expect on standard input. CONTENT_TYPE
is the value of the Content-Type: header sent with the
user's request from the web browser, and tells the CGI program what
kind of content it will receive.
There are two main types of content which may be sent. The default behaviour of an HTML form is to send the form's contents as url-encoded. For example,
<form action='http://www.lemoda.net/games/figlet/figlet.cgi' method='POST'> <input type='text' name='text' value='dandy!'> <input type='text' name='width' value='10'> <input type='submit'> </form>which looks like
text=dandy%21&width=10with a
CONTENT_LENGTH of 22, representing the twenty-two
bytes of the above text plus a final carriage return character. This
uses exactly the same ampersand-separated, percent-encoded format as
for GET requests.
It is possible to specify this encoding by setting
the enctype attribute of the HTML form element to the
value "application/x-www-form-urlencoded":
<form action='http://www.lemoda.net/games/figlet/figlet.cgi'
method='POST'
enctype='application/x-www-form-urlencoded'>
but this is not necessary since this is already the default.
As with the GET method, the web server software does not split the parameters or decode the percent encoding, so the CGI software must do this itself.
The encoding described in Simple forms is inefficent for transferring large binary files such as images, since each non-ASCII byte turns into three bytes if the percent encoding is used. The "multipart/form-data" MIME type is suitable for transferring binary files.
This encoding sends the values from each field of a form as individual
parts of a MIME message separated by a boundary. The boundary is
extracted from the CONTENT_TYPE header. For example, a
form like
<form action='http://www.lemoda.net/games/figlet/figlet.cgi' method='POST' enctype='multipart/form-data'> <input type='text' name='text' value='dandy!'> <input type='text' name='width' value='10'> <input type='submit'> </form>which looks like this
CONTENT_TYPE value of the form
multipart/form-data; boundary=----WebKitFormBoundaryHhVVGE1xV4vmz0WV
and standard input of the following kind:
------WebKitFormBoundaryHhVVGE1xV4vmz0WV
Content-Disposition: form-data; name="text"
dandy!
------WebKitFormBoundaryHhVVGE1xV4vmz0WV
Content-Disposition: form-data; name="width"
10
------WebKitFormBoundaryHhVVGE1xV4vmz0WV--
The boundary text is
repeated to split the message into pieces. The exact form of the
boundary text is random and depends on the web browser the user
has. The above example is from Google Chrome. The CGI program has to
extract the boundary string from the CONTENT_TYPE
variable and then split standard input. The marker for the very last place to split is
marked by the boundary plus two minus signs added at the end.
Notice here that unlike the URL encoding, characters like ! are not percent-encoded in this format. Thus this method is more suitable for large transfers of binary data.
The usual behaviour of a CGI program is to run, create the web page or other data requested, and then halt. Therefore, even if the user sends some identification with a first message, the CGI program does not remember the user for the next request. The user therefore needs to inform the CGI program of his identity each time.
The most common way for a user to give an identity is by means of "cookies". Cookies are small pieces of information supplied by the server to the user, which the user then returns to the server with each request.
The server sets a cookie on the user's computer with
Set-Cookie: name=valuesent as part of the HTTP headers (see HTTP headers). The cookie is then remembered on the user's computer until he closes his browser. It is also possible to make a cookie persist beyond the point when the user closes the browser by setting a date8.
Set-Cookie: name=value; expires=Mon, 27-Oct-2010 10:10:10It is also possible to delete a cookie by setting the expiry date to before the current time.
The user sends back his identifying cookie with each subsequent
request to the web server using the Cookie: field of the
HTTP request. One page may have more than one cookie set.
The value of the Cookie field of the request is made
available to the CGI program via the HTTP
header HTTP_COOKIE. The CGI program must extract the
relevant cookie from a list of semicolon-separated cookies.
The MD5 checksum is a way of ensuring the integrity of data as it travels across the internet. The value of the MD5 checksum is calculated from the content of the page before any compression (see Compressing the output) is applied. It does not apply to the HTTP headers.
Content-MD5: f0d6b059ec2bb99747486f2602d64d21See RFC 1864 for more on the Content-MD5 header.
There are various ways to compress the output of a CGI script. Whether
or not these may be sent by the CGI program is controlled by
the Accept-Encoding header of the HTTP request which the
browser sends. Its value is available to a CGI program via the
environment variable HTTP_ACCEPT_ENCODING. Most browsers
and web crawlers accept data compressed using the gzip
and deflate formats9.
Only the part of the message after the HTTP headers is compressed. The HTTP headers are never compressed.
The CGI program informs the client that it is sending compressed
content using the Content-Encoding header. For example,
if it sends gzip-compressed output, this looks like
Content-Encoding: gzip
Compression is usually applied to data in text form, such as with the
mime types text/html or text/plain, but not
to image files such as JPEG or PNG files, which are already
compressed.
For a simple example of compressing CGI output, see Compressing CGI output with Perl.
stdio.h header
file, and they are called stdin, stdout
and stderr respectively.