OIT Logo (Link to Home) link to Princeton University home
Computational Science & Engineering Support
Skip Left Navigation  
 
 
 

Condor

What is Condor?

Put simply, Condor is a system that allocates free computer processing cycles across a set of multiple machines called a pool. In coarser terms Condor is a scavenger of unused compute cycles. To get a more complete answer to the question look at the What is Condor page on the Condor site at the University of Wisconsin.

At Princeton University we have a Condor pool called 'Princeton OIT'. The current members of the pool are the three arizona systems, 18 Sun Cluster systems and some Sun E4000's for a potential total of about 45 Sun Solaris processors, and more than 200 public cluster machines running Windows 2000.

Quick Start for Sun Solaris

To get a quick start with the 'Princeton OIT' Condor pool, log into one of the arizonas (this demo assumes you are familiar with using your OIT Unix account on the arizonas). You will need to modify your environment to get convenient access to the condor executables. To do so, submit one of the following commands at the prompt (and add this command to your appropriate resource file, .cshrc, .bashrc, .profile...)

If your shell is csh:

source /var/condor/cshrc

If your shell is ksh or sh:

. /var/condor/profile

Now you will be able to execute Condor commands. For example, type condor_status to see the status of all the machines in the Condor pool.

The best way to understand how to submit jobs to the Condor pool is to look at some examples. The following script will set up a copy of the Condor examples in /var/scratch/${USER}/condor_examples (where ${USER} is your netid):

/var/condor/condor_example_setup

Once you have run that script change your current directory to the newly created copy of the examples:

cd /var/scratch/${USER}/condor_examples

You will need to build the executables by running make:

make

This Command will produce a number of executables with the extension .remote in your condor_examples directory. In that directory, the files with the .cmd extension are condor submit files. To submit one of the .cmd files to the condor pool you would simply type:

condor_submit <filename>

If you would like to run all the examples you can do so with the script in that directory called submit:

./submit

You can watch the progress of your jobs in a number of ways. Typing condor_status again will show you that some more of the systems in the pool have been 'claimed', and typing condor_q on the system where you submitted the jobs will show you the status of each job. As the jobs run and produce output they will create files with extensions like .out, .err and .log in your condor_examples directory (as specified in the .cmd files). When each job completes you will receive an email containing statistics about the job.

There is also an example in the sub-directory dagman for running a directed acyclic graph. The example in the PVM directory does not currently work (we are working to fix the problem).

You are now ready to start submitting Condor programs on your own. For more information on using Condor please see the documentation in the next section.

Note: Condor has a number of 'universes' which determine the facilities available to your job. Examples of universes are vanilla, standard and java. in order to run jobs in the standard universe, which includes facilities like checkpointing, you will need to build the executables using condor_compile. condor_compile has not been 'fully installed' as described in the Condor manual (PDF) , because it would mean replacing ld on all the members of the pool with a special condor ld. Therefore, you cannot do things like 'condor_compile make'; rather, you will need to edit your Makefile and add condor_compile to the compiler commands.

Quick Start for Windows 2000

The preceding section "Quick Start for Sun Solaris" is a good introduction to prepare the user for Windows 2000 as well. In fact, the recommended procedure for running a Condor job on Windows is to prepare the job on your personal Windows desktop, to 'ftp' the program(s) to your home file system on the arizonas, and then to issue the 'condor_submit' command as discussed above.

The critical changes to force execution of your Condor job onto the Windows pool machines are to include the following two statements in your command file:

universe = vanilla
Requirements = (OpSys == "WINNT50")

Additional documentation and examples for Windows programs will be added to this page soon.

Documentation

You can view our local PDF copy of the Condor manual. Chapter 2 is the User's Manual, Chapter 7 is the FAQ and Chapter 9 is the command reference. These chapters will serve as good starting points for submitting and monitoring jobs in the 'Princeton OIT' Condor pool.

The Condor web site contains much information about Condor.

TIGRESS

Condor

Subversion Service Request

Princeton Software Repository (PSR)

Software site licenses and volume purchase agreements

Request Work

PSR Statistics



Contact Information:

Computational Science & Engineering Support Manager
Curt Hillegas
87 Prospect Avenue
Phone: 609-258-6033
Fax: 609-258-3943
curt@princeton.edu


Link to top