Singularity Overview
Fri 15 July 2022 by Dr. Dirk ColbrySingularity is a versitle tool to give researchers more flexibility installing software and running their workflows on the HPC. I am figuring out singularity for my own research, to better use the Open Science Grid and hopefully to help other people on the HPCC.
Most workflows don't need singilarity but it can be extreamly helpful to solve certain weridly difficult problems. Some common examples for researchers using singularity on the HPC include:
- Installing software that needs a special/different base operating system.
- Installing software that requires adminstrator privliges (aka root, su and/or sudo).
- Installing complex dependancy trees (like python and R)
- Using existing software inside a pre-built vitural machine.
- Working with lots of tiny files on the HPC filesystems which are better designed for smaller numbers of big files.
- Building workflows that can easily move between different resources.
NOTE This overview is specific to the High Performace Computing Center (HPCC) at Michigan State University (MSU). For a complete tutorial see the Singularity documentation. This overview assumes that you have an HPCC account and know how to navigate to and use a development node.
Step 1: Get a singularity image
As a starting point we need a singularity image, also known as a container or virtual machine. You can think of a singularity image as a "software hard drive" that contains an entire opperating system in a file. There are three main ways to get these images:
- Use one of the Singularity images already on the HPCC.
- Download an image form one of the many online libraries.
- Build your own image.
If you don't know which one of the above to use, I recommend that you pick number 1 and just use the singularity image we already have on the system.
1. Use one of the Singularity images already on the HPCC.
For this introduction, we can keep things simple and just use one of the singilarity images already on the HPCC. This image runs CentOS 7 linux and is a good starting point. Use the following command to start singularity in a "shell" using the provided image:
singularity shell --env TERM=vt100 /opt/software/CentOS.container/7.4/bin/centos
Once you run this command you should see the "Singularity" prompt which will look something like the following:
Singularity>
You did it! You are running a different operating system (OS) than the base opporating system. All of the main HPCC folders are still accessable from this "container" (ex. /mnt/home, /mnt/research, /mnt/scratch/, etc) so it shouldn't look much different than before (except for the different prompt and you no longer have access to some of the base HPCC software).
At this point, if you know what you need, you should be able to install your software and it will compile/run using the singularity OS instead of the base OS.
NOTE: You can just install software in your /mnt/home/$USER
and/or /mnt/research
folders. The software you install will probably only work from "inside" this singularity image. However, you will also be able to see and minipulate the files from within your standard HPC account. This is fine for many researchers but I recommend you jump down to "Step 2: Overlays" to make Singularity even more flexible.
2. Download an image form one of the many online libraries
Many people publish singularity images and post them on public "libraries" for easy install. Here is a list of online libraries you can brouse:
Sylabs Library
Link to Browse Sylabs
example:
singularity pull alpine.sif library://alpine:latest
singularity shell alpine.sif
Dockerhub
Link to Browse Dockerhub
example:
singularity pull tensorflow.sif docker://tensorflow/tensorflow:latest
singularity shell tensorflow.sif
Singularity Hub (aka shub)
Link to Browse Singularity Hub
example:
singularity pull shub_image.sif shub://vsoch/singularity-images
singularity shell shub_image.sif
3. build your own image
This one is more complex and outside the scope of the overview. However, if you are intersted I recommend you try using the build command with a Docker image since it is fairly easy to install on your personal computer. Here is a link to how to use docker to make a singularity image:
Step 2: Overlays
One problem we often encounter on the HPCC is "lots-of-small-files" (hundreds of files where each one is < 50MB). The filesystem is optmized for large files. Lots of small files end up "clogging" things up which can slow things down for everyone. One useful trick of singularity is you can make a single large file called an "Overlay" which can be attached to a singularity session. You can use an Overlay as a "filesystem inside a single file" where you can store lots of the small files inside the single overlay file. From the user point of view you can have as many small files as you want accessable from the singularity image (within reasonable limits). However, these small files act as a single file from the HPCC point of view and dosn't clog things up.
This technique is really helpful if you are using complex software installs such as lots of Python, R or Conda installs. It can also be helpful if your research data is lots of small files.
Make your overlay file
Making an overlay is not hard but takes multiple steps. For details on how to make an overlay we recommend viewing the singularity overlay documentation.
Fortunatly the HPCC has a "powertool" that can make a basic overlay for you. All you need to do is run the following command:
mk_overlay
To inspect and learn about what this command is doing type the following (note the tick charicter is not the single quote and is typically found about the tab key on a standard keyboard):
cat `which mk_overlay`
This overlay can be applied to an image using the --overlay
option as follows:
singularity shell --overlay overlay.img --env TERM=vt100 /opt/software
/CentOS.container/7.4/bin/centos
Once you are in the singularity shell you can now write to the overlay as if you were adding files to the "root" directory. For example, run the following commands from the singularity prompt (with the overlay enabled). This will download a copy of miniconda and install it in the /miniconda3/ directory:
wget https://repo.anaconda.com/miniconda/Miniconda3-py39_4.12.0-Linux-x86_64.sh
./Miniconda3-py39_4.12.0-Linux-x86_64.sh -b -p /miniconda3/
To use this miniconda you will need to add the folder /miniconda3/
to the path with the following command:
export PATH=/miniconda3/bin:$PATH
conda --init
At this point you can use pip
and conda
installs as much as you like. These generate hundreds of small files but it dosn't matter because everything will be stored in the overlay.img file as one big file.
I recommend creating a file called start-conda.sh
which contains the following. This will enable easy startup of your singularity enviornment with the overlay and conda path already set up:
#!/bin/bash
singularity shell --env TERM=vt100 \
--env PATH=/miniconda3/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/sysbin/ \
--overlay overlay.img \
/opt/software/CentOS.container/7.4/bin/centos
Step 3: Submitting Jobs
Once we have our image and our overlay working in a development node we can execute a script inside of the singularity image using the exec
. For example, this script uses our miniconda installed overlay and runs the python3 script called "mypython.py" which is stored in my home directory on the HPCC.
singularity exec --overlay overlay.img /opt/software/CentOS.container/7.4/bin/centos python3 /mnt/home/$USER/mypython.py
Once the above is running on a development node we can just submit this as a job to the HPCC using the following submissions script:
#!/bin/bash
#SBATCH --walltime=04:00:00
#SBATCH --mem=5gb
#SBATCH -c 1
#SBATCH -N 1
singularity exec --overlay overlay.img /opt/software/CentOS.container/7.4/bin/centos python3 /mnt/home/$USER/mypython.py
If you need to have multiple jobs running the same software (such as for a job array). You can't have them all writting to the same overlay file. The following script still allows the overlay to work but all of the changes will be discared after the run so make sure you copy your results back to your home drive:
#!/bin/bash
#SBATCH --walltime=04:00:00
#SBATCH --mem=5gb
#SBATCh --array 1-10
#SBATCH -c 1
#SBATCH -N 1
singularity exec --write-tmp --overlay overlay.img /opt/software/CentOS.container/7.4/bin/centos python3 /mnt/home/$USER/mypython.py $SLURM_ARRAY_ID
This overview of singularity was inicially written by Dirk Colbry. Please contact the ICER User Support Team if you need any help getting your workflow up and running.
Tensorflow and Keras/PyTorch on the HPCC
Every few semesters I have a Tensorflow example I want to try on the HPCC. I'm not an expert on the software and between the HPCC, CUDA, Tensorflow, Keras, Python, Anaconda, etc. there are a lot of continuously changing and moving parts. Seems like when I figure out how to …
read more