Condor
What is Condor?
Put simply, Condor is a system that allocates free computer processing
cycles across a set of multiple machines called a pool. In coarser terms
Condor is a scavenger of unused compute cycles. To get a more complete
answer to the question look at the What
is Condor page on the Condor site at the University of Wisconsin.
At Princeton University we have a Condor pool called 'Princeton OIT'.
The current members of the pool are the three arizona systems, 18 Sun
Cluster systems and some Sun E4000's for a potential total of about 45
Sun Solaris processors, and more than 200 public cluster machines running
Windows 2000.
Quick Start
for Sun Solaris
To get a quick start with the 'Princeton OIT' Condor pool, log into
one of the arizonas (this demo assumes you are familiar with using your
OIT Unix account on the arizonas). You will need to modify your environment
to get convenient access to the condor executables. To do so, submit
one of the following commands at the prompt (and add this command to
your appropriate resource file, .cshrc, .bashrc, .profile...)
If your shell is csh:
source /var/condor/cshrc
If your shell is ksh or sh:
. /var/condor/profile
Now you will be able to execute Condor commands. For example, type condor_status
to see the status of all the machines in the Condor pool.
The best way to understand how to submit jobs to the Condor pool is
to look at some examples. The following script will set up a copy of
the Condor examples in /var/scratch/${USER}/condor_examples (where ${USER}
is your netid):
/var/condor/condor_example_setup
Once you have run that script change your current directory to the newly
created copy of the examples:
cd /var/scratch/${USER}/condor_examples
You will need to build the executables by running make:
make
This Command will produce a number of executables with the extension
.remote in your condor_examples directory. In that directory, the files
with the .cmd extension are condor submit files. To submit one of the
.cmd files to the condor pool you would simply type:
condor_submit <filename>
If you would like to run all the examples you can do so with the script
in that directory called submit:
./submit
You can watch the progress of your jobs in a number of ways. Typing
condor_status again will show you that some more of the systems in the
pool have been 'claimed', and typing condor_q on the system where you
submitted the jobs will show you the status of each job. As the jobs
run and produce output they will create files with extensions like .out,
.err and .log in your condor_examples directory (as specified in the
.cmd files). When each job completes you will receive an email containing
statistics about the job.
There is also an example in the sub-directory dagman for running a directed
acyclic graph. The example in the PVM directory does not currently work
(we are working to fix the problem).
You are now ready to start submitting Condor programs on your own. For
more information on using Condor please see the documentation in the
next section.
Note: Condor has a number of 'universes' which determine the facilities
available to your job. Examples of universes are vanilla, standard and
java. in order to run jobs in the standard universe, which includes facilities
like checkpointing, you will need to build the executables using condor_compile.
condor_compile has not been 'fully installed' as described in the Condor
manual (PDF) , because it would mean replacing ld on all the members
of the pool with a special condor ld. Therefore, you cannot do things
like 'condor_compile make'; rather, you will need to edit your Makefile
and add condor_compile to the compiler commands.
Quick Start
for Windows 2000
The preceding section "Quick Start for Sun Solaris" is a good
introduction to prepare the user for Windows 2000 as well. In fact, the
recommended procedure for running a Condor job on Windows is to prepare
the job on your personal Windows desktop, to 'ftp' the program(s) to
your home file system on the arizonas, and then to issue the 'condor_submit'
command as discussed above.
The critical changes to force execution of your Condor job onto the
Windows pool machines are to include the following two statements in
your command file:
universe = vanilla
Requirements = (OpSys == "WINNT50")
Additional documentation and examples for Windows programs will be added
to this page soon.
Documentation
You can view our local PDF copy of the Condor
manual. Chapter 2 is the User's Manual, Chapter 7 is the FAQ and
Chapter 9 is the command reference. These chapters will serve as good
starting points for submitting and monitoring jobs in the 'Princeton
OIT' Condor pool.
The Condor web site contains
much information about Condor.
