Files as semaphores
Sun 15 April 2012 by Dr. Dirk ColbryBlog post edited by Anonymous - "Migration of unmigrated content due to installation of a new plugin"
The following is a script that is designed to make it really easy to run a large number of embarrassingly parallel jobs on our scheduling system. The trick to getting this to work is to use files as semaphores . The program compares a directory of input files with a directory of flag files. If a file is found in the input directory and not in the flag directory, then this is a job that still needs to run and the program will create a file in the flag directory and do the computation. The flag files are used to keep track of what is running and what is complete.
Strictly speaking, many file systems are not guaranteed to be atomic so this method is risky but in my experience it works quite well especially if you do not care if two processes run the same job.
"FilesAsSemaphores.qsub"
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 |
|
Obviously a user needs to modify the script to fit their need (this one was designed to work with stata). To start the script you can just issue the following command:
qsub -t 1-149 FilesAsSemaphores.qsub
This will start 149 jobs working. When each job is done it will start another job until all of the input files have been processed. This script could easily be modified to work with a list of jobs in a text file or some other method for listing what needs to be done. One nice feature of this program is that you can look at the size of the flag files to determine if a job has completed successfully. If you want, you can delete all of the jobs in your queue and then delete all files of size zero and restart the entire process.
I do not recommend walltimes of less than 5 minutes as this will unnecessarily flood the scheduler, please use this script at your own risk.
- Dirk
Blogpost migrated from ICER Wiki using custom python script. Comment on errors below.