Tutorial on NYU HPC(High Performance Computer)

Tutorial on NYU HPC(High Performance Computer)

In modern research, High Performance Computing (HPC) has become an indispensable tool, especially in fields requiring the processing of large datasets or the undertaking of complex simulations. NYU’s HPC resources provide powerful computing capabilities to help researchers and students accelerate their research.

This tutorial is designed to guide both beginners and experienced users on how to effectively utilize NYU’s HPC facilities.

A special focus will be given to working with Singularity containers and overlay images, which are crucial for ensuring that your computing environment is both flexible and reproducible.

1. Create dir for all overlay images

mkdir /scratch/$USER/overlay_images
cd /scratch/$USER/overlay_images

2. Browse and copy an appropriate gzipped overlay images from the overlay directory

# browse all overlay images 
ls /scratch/work/public/overlay-fs-ext3

# copy a suitable one for your environment
cp -rp /scratch/work/public/overlay-fs-ext3/overlay-50G-10M.ext3.gz .
gunzip overlay-50G-10M.ext3.gz # unzip might take a while on login node

# rename that image to your project 
mv overlay-50G-10M.ext3 my_pytorch.ext3

3. Browse and find the appropriate Singularity container

# browse all singularity containers
ls /scratch/work/public/singularity/

4. Start an interactive job with adequate compute and memory resources, launch singularity container in read/write mode (with the :rw flag) to install packages.

srun --cpus-per-task=2 --mem=10GB --time=04:00:00 --pty /bin/bash
# wait to be assigned a node

singularity exec --nv --overlay my_pytorch.ext3:rw /scratch/work/public/singularity/cuda11.6.124-cudnn8.4.0.27-devel-ubuntu20.04.4.sif /bin/bash

5. Install Miniconda and create a wrapper script

wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh -b -p /ext3/miniconda3
rm Miniconda3-latest-Linux-x86_64.sh # if you don't need this file any longer

vim /ext3/env.sh

paste the following codes in env.sh

#!/bin/bash

source /ext3/miniconda3/etc/profile.d/conda.sh
export PATH=/ext3/miniconda3/bin:$PATH
export PYTHONPATH=/ext3/miniconda3/bin:$PATH

6. Create conda virtual environment and Install packages

conda create -n myenv python=3.10
conda activate myenv
pip install -r requirements.txt # install dependencies
exit # exit that singularity container after finish
exit # exit that assigned computation node

7. (After setting up the env) Run that singularity using slurm

create sbatch file:

vim my_sbatch.sbatch

paste the following script:

#!/bin/bash

#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=32
#SBATCH --time=24:00:00
#SBATCH --mem=128GB
#SBATCH --gres=gpu:a100:4
#SBATCH --job-name=My_project
#SBATCH --output=python-hello-world.out

module purge

singularity exec --nv \
	    --overlay /scratch/$USER/overlay_images/my_pytorch.ext3:ro \
	    /scratch/work/public/singularity/cuda11.3.0-cudnn8-devel-ubuntu20.04.sif\
	    /bin/bash -c "source /ext3/env.sh; conda activate myenv; cd /scratch/$USER/myProject/;nohup python training.py &"

submit that job to slurm:

sbatch my_sbatch.sbatch

8. Check job status

squeue -u $USER

results matching ""

    No results matching ""