HiPerGator Tutorial

HiPerGator Tutorial

HiPerGator is the University of Florida’s 4th-generation supercomputer, featuring 63 DGX B200 nodes (504 NVIDIA Blackwell GPUs), 19,200 CPU cores, 600 NVIDIA L4 GPUs, and an 11 PB all-flash storage system, set for full production in Fall 2025.

This tutorial keeps your original simple, clean, and practical style — clear structure, minimal decoration, and easy-to-copy command sections, while highlighting key ideas and warnings with color.

1. Quick Start

This section guides you through the very basics of connecting to HiPerGator and running your first job.

1.1 Connecting to HiPerGator

There are two main ways to access HiPerGator:

1.1.1 SSH (command line access)

If you prefer the terminal, log in via SSH:

ssh <gatorlink_username>@hpg.rc.ufl.edu

Replace <gatorlink_username> with your UF GatorLink ID. The first login may ask you to confirm the server fingerprint. DUO MFA is required.

Once logged in, you start in your home directory:

/home/$username

Tip: keep login node usage light — editing files, submitting jobs, checking status. Do not run heavy CPU/GPU workloads on the login node.

1.1.2 Open OnDemand (web interface)

For users who prefer a graphical interface, Open OnDemand offers browser-based remote access:

👉 https://ood.rc.ufl.edu

You can launch VNC-based GUI sessions, file explorers, and interactive jobs.

Note: The GUI provided by Open OnDemand is VNC-based, not a full desktop environment. Some GUI applications requiring hardware acceleration may not run properly.

1.2 Submit Your First Job

HiPerGator uses Slurm as the job scheduler. Heavy computations must not be run on the login nodes.

You have two main ways to run jobs:

  • Batch jobs
  • Interactive jobs

Batch Job Example

Create a Slurm script:

vim test_job.slurm

Paste:

#!/bin/bash
#SBATCH --job-name=Example
#SBATCH --mail-type=END,FAIL
#SBATCH --mail-user=your_email@ufl.edu

#SBATCH -p your_preferred_partition
#SBATCH --qos=your_qos
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=3
# #SBATCH --gres=gpu:b200:4     # Uncomment if you need GPUs
#SBATCH --mem=16GB
#SBATCH --time=00:20:00
#SBATCH --output=my_first_job_%j.log

echo "Hello from HiPerGator!"

Key fields to check before submitting:

  • --job-name: short, descriptive job name
  • --mail-user: use a valid email so you get notifications
  • -p / --qos: must match the partition/QOS you are allowed to use
  • --time: set a realistic time limit; too long can delay scheduling

Warning: Requesting more CPUs/GPUs/memory/time than your QOS allows may cause the job to be held or rejected.

Submit the job:

sbatch test_job.slurm

Check queue:

squeue -u $USERNAME

View output:

cat my_first_job_*.log

Interactive Job Example

Request an interactive session on a compute node:

srun --ntasks=16 --mem=64GB --gres=gpu:1 \
     --partition=hpg-turin --time=00:20:00 \
     --pty /bin/bash

Use this for: debugging, testing commands, running notebooks, or exploring the environment.

✓ At this point, you’ve successfully:

  • Logged into HiPerGator
  • Submitted a Slurm job
  • Retrieved output
  • Started an interactive session

You now have the full basic workflow from login → job submission → output checking → interactive work.

2.Tricks

Everything here keeps your original concise style — short notes, simple tips, fast lookup.

Module System (ml)

module load apptainer
module avail
module list

Tip: always load required modules inside your job script or interactive session, so the environment is correctly set up on the compute node.

Monitor your QOS / System Status

sacctmgr show associations where user=$USER format=account,user,partition,qos,grpcpus,grpmem,grpnodes
sinfo -s

After you get your account and groups, do:

sacctmgr show account Your_Account withassoc format=account,user,fairshare,grpcpurunmins,grpcpus,grpmem

SSH Tunnel (VS Code Remote Tunnels)

You can use VS Code Remote Tunnels to connect without manually forwarding ports:

👉 https://code.visualstudio.com/docs/remote/tunnels

On the HiPerGator login node:

code tunnel

Then connect from your local VS Code.

Warning: If you SSH from the login node directly into a compute node, running commands such as nvidia-smi will show all GPUs on the node. This happens because GPU isolation is enforced by Slurm, not by SSH.

Avoid manually SSH-ing into compute nodes. Always use srun or salloc so Slurm enforces proper GPU isolation.

Tip: keep the tunnel session open while you work.

results matching ""

    No results matching ""