Introduction

Introduction to CGI Programming with ClearSilver & Python

This document is meant to be a fairly complete tutorial on how to write some dynamic web pages using a webserver that supports CGI. In order to use this tutorial, the following assumptions are made:
  • That you have knowledge of the Python programming language
  • That you have access to a web server that supports CGI and that you have permission to run CGI programs on that server. I personally recommend using Apache.
  • The basics of ClearSilver templates.

What is CGI?

CGI, or the Common Gateway Interface, is the lowest common denominator for writing dynamic web pages. It is a very simple interface by which a web server can launch a program which generates a web page.

  1. Webserver receives request and translates the request into Environment Variables
  2. Webserver runs CGI program
  3. CGI program reads the environment variables and spits out some meta-information and the data to send back to the client
  4. Webserver reads the results of the CGI creates a proper response for the client

In general, there are very few things that you'll need access to from the client:

  • The URL or other information to tell you what was requested (SCRIPT_URI, PATH_INFO, PATH_TRANSLATED)
  • Any query parameters (either explicit links or from GET FORM submission) (QUERY_STRING)
  • Any Cookies the user passes (HTTP_COOKIE)
  • POST FORM information (CONTENT_TYPE, CONTENT_LENGTH and stdin)

This interface is not meant to be high performance, it isn't meant to be complicated, but it works. If your CGI is written in C, you can easily do 500k pageviews a day on a single server. In general, languages with more start-up overhead such as interpreted languages like Perl & Python, or JIT languages like C# and Java, will seem sluggish as they take longer to start up. For those languages, you probably want to use something more long running, such as PyApache, mod_python, mod_perl, or mod_jk.

Trakken is a fairly complicated Python web application with lots of dependent modules that need to be loaded. In production, we run Trakken under PyApache, but in development we run it as regular CGI. The start-up overhead is about 0.5s in CGI mode. Since our goal is all pages less than 0.2s, that was obviously too long, and besides all that parsing and byte-compiling is work the machine doesn't need to do continuously. If you are running in CGI mode, you'll want to make sure that the webserver can write to the directory containing your python code so that it can create pre-compiled .pyc files. I'll leave the obvious security implications of that up to you.

ClearSilver CGI Basics

The ClearSilver CGI Kit works by adding an interface layer between the standard CGI interface and your code. This works by taking the information from the CGI interface, and adding it to the HDF dataset. Your program then continues to fill in the dataset based on the data in the dataset. Finally, your program chooses a template, and asks the CGI Kit to display the template with your dataset. The CGI Kit then takes care of formatting the output in a manner that the CGI interface expects.
Client \
          Webserver \
	               CGI  \
		              ClearSilver HDF \ 
			                         Your program
		              ClearSilver HDF /
			             +
		              ClearSilver Template
	               CGI  /
          Webserver /
Client /
Because of this separation, you can easily move your program to different environments. By creating an analog interface, you can replace CGI with something else, such as PyApache or mod_python. You can rewrite your program in a different language. As long as it took the same inputs into HDF and wrote the same outputs back into HDF, the whole thing should just work.

Starting with Hello World

Hello world doesn't actually require any program beyond the included static.cgi. The static.cgi program included with ClearSilver is almost the most basic ClearSilver CGI program: it merely uses the URL information in the HDF to choose a template, and then asks the CGI kit to display that template. This entire website, my personal website (including my blog) is just static.cgi with the right templates and the right HDF data files.

But, let's do the Hello World anyways. Here's the hello.py program:

  #!/usr/bin/python
  #
  # Hello World using the ClearSilver CGI Kit and Python

  import neo_cgi
 
  # send neo_cgi handler through the python stdin/stdout/environ
  # calls
  neo_cgi.cgiWrap(sys.stdin, sys.stdout, os.environ)

  # create a CGI handler context
  ncgi = neo_cgi.CGI() 

  # parse the form data (and post upload data)
  ncgi.parse()

  ncgi.display("hello.cst") 

And the hello.cst template:

  Hello World


Of course, that's not particularly interesting without actually modifying the generated code, so this time let's load some static HDF data and display that. Here is the hello.hdf file:
  # This is my static data for Hello World (and this is a comment)

  Hello = Hello World!

  WeekDays {
    0 = Sunday
    1 = Monday
    2 = Tuesday
    3 = Wednesday
    4 = Thursday
    5 = Friday
    6 = Saturday
  }


Now, let's modify hello.py to load our static data:
  #!/usr/bin/python
  #
  # Hello World using the ClearSilver CGI Kit and Python

  import neo_cgi
 
  # send neo_cgi handler through the python stdin/stdout/environ
  # calls
  neo_cgi.cgiWrap(sys.stdin, sys.stdout, sys.environ)

  # create a CGI handler context
  ncgi = neo_cgi.CGI() 

  # parse the form data (and post upload data)
  ncgi.parse()

  # Load our static data
  ncgi.hdf.readFile("hello.hdf")

  ncgi.display("hello.cst") 

And add to hello.cst to use the Hello variable:

  <?cs var:Hello ?>


Or, to display the days of the week:

  <?cs each:day = WeekDays ?>
    On <?cs var:day ?>, <?cs var:Hello ?>
  <?cs /each ?>


Now, without modifying the program, you can use the form variables, so rewrite your URL to: /cgi-bin/hello.py?day=1, and use this hello.cst

  <?cs if:?Query.day ?>
    On <?cs var:WeekDays[Query.day] ?>, <?cs var:Hello ?>
  <?cs /if ?>


See how the day variable is available in the HDF data set as Query.day? That variable is URL decoded. Of course, if you are going to use the variables directly, you need to be careful to prevent HTML errors and Cross-Site scripting attacks by encoding it, for instance: /cgi-bin/hello.py?name=Brandon%20Long:

  Hey <?cs var:html_escape(Query.name) ?>, <?cs var:Hello ?>


html_escape is one of the filters available to manipulate string expressions.

One last piece to complete the puzzle, and that's manipulating variables from python. Let's modify our hello world to display a file:

  #!/usr/bin/python
  #
  # Hello World using the ClearSilver CGI Kit and Python

  import neo_cgi
 
  # send neo_cgi handler through the python stdin/stdout/environ
  # calls
  neo_cgi.cgiWrap(sys.stdin, sys.stdout, os.environ)

  # create a CGI handler context
  ncgi = neo_cgi.CGI() 

  # parse the form data (and post upload data)
  ncgi.parse()

  # Load our static data
  ncgi.hdf.readFile("hello.hdf")

  # Which file is requested?
  filename = ncgi.hdf.getValue("Query.file", "")
  if filename:
  	# Only allow access to files in this directory
	filename = filename.replace("/", "")
	data = open(filename, "r").read()
	ncgi.hdf.setValue("FileContents", neo_cgi.text2html(data))

  ncgi.display("hello.cst") 

And our hello.cst

  <?cs var:Hello ?>
  File: <?cs var:html_escape(Query.file) ?>
  <p>
  <?cs var:FileContents ?>


Now, call this with /cgi-bin/hello.py?file=hello.py