What is Cluster101?
Cluster101 is a a simple tool that allows you to run applications on many hosts. You can think of it as a front end for ssh ("ssh on steroids").
Requirements
You just need ssh access to the hosts where you want to run applications. No need to install any software on them (i.e. no need to bother the administrators).WARNING: You should be able to access all the hosts without typing your password (typing your password hundreds of time is not fun). Here is an explanation on how to do it.
What is the typical usage of Cluster101?
You need to run hundreds of instances of a scientific application (e.g. a complex statistical algorithm with different sets of data / parameters). You have access to a lot of computers around your campus / company, but don't have administrator privileges to install clustering software.You can use Cluster101 to run one application per host (or one per core) on those computers. Cluster101 will take care of distributing the load until the whole batch is finished.
What Cluster101 is NOT
This is NOT a fully featured clustering platform or development framework or process communication API. This is just a tool to run tasks on many hosts.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Features
- Run a list of tasks on all hosts
- Keep load balanced
- Selectable number of instances per hosts (e.g. depending on number of cores)
- Run a command once per host
- Save stdout / stderr for each task
- Monitor hosts for alarm conditions
- Avoids running tasks on hosts with alarm conditions
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Download CVS
- cvs -z3 -d:pserver:anonymous@cluster101.cvs.sourceforge.net:/cvsroot/cluster101 checkout -P Cluster101
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Configuration
Cluster101 will load a configuration file named "default.cluster101" on startup. You can load an alternate configuration by clicking on "File -> Load Configuration
" menu.The configuration file should look like this (for each host, you have to set the user name and maximum number of process you want to run):
Host : host-1.university.edu User : myUserName Max number of processes : 2 Enabled : true Host : host-2.university.edu User : myUserName Max number of processes : 2 Enabled : false ... ... ... Host : host-1499.university.edu User : myUserName Max number of processes : 8 |
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Task queue
The task queue is a list of the tasks you want to run. You just type the in a file, one task per on each line.E.g. you want to run your algorithm with different parameters:
/home//mySuperSecretAlgotihm 0.1 /home//mySuperSecretAlgotihm 0.2 /home//mySuperSecretAlgotihm 0.3 /home//mySuperSecretAlgotihm 0.6 ... ... ... /home//mySuperSecretAlgotihm 9.9 |
Each line has to be any valid shell command or a semicolon separated list of commands (e.g. "date; sleep 1; date" is a valid line in a task list).
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Running stuff
Load your task list by selecting the "Task e -> Load queue
" menu. Then run it by clicking on the "Task queue -> Run
" menu. You can see the progress by clicking on the "Task queue" tab.
If you need to run one task on all the hosts (e.g. a script to install your software), you can do it by clicking on the "Task e -> Run 1 per host
" menu. It will show you a dialog box where you type the command you want to run.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
This is a screen capture of the program running on a small cluster
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -