fornax.nus.edu.sg is the group’s main HPC cluster for CPU and GPU workloads. It is located in the department server room (E3A-05-04) and can be accessed remotely through ssh.

If you need access, software installation, or help with Fornax, please contact the Computer Officer or use the group helpdesk.

Overview

  • Address: fornax.nus.edu.sg
  • Purpose: computing for simulations, data analysis, and GPU workloads
  • Related pages: Facilities, Job submission

Maintenance

Power off Fornax

This step should be performed by the Computer Officer as it requires root permission.

  1. Login to the Head Node to List Jobs and Close Job Queue Login to head node as root account, check for any running jobs, close job queues and terminate all running jobs:
ssh fornax.nus.edu.sg –l <rootaccount>

Kill all job and queue 2. Shutdown Compute Nodes On the head node, shutdown all compute nodes and check whether all compute nodes has been off. The output must be “noping”.

psh shutdown -h now
  1. Stop PBS Pro services This command may take several minutes to complete.
/etc/init.d/pbs stop
  1. Shutdown Head Node Login to head node and run shutdown command:
ssh fornax.nus.edu.sg –l <rootaccount>
shutdown –h now

Power on Fornax

  1. Power On Head Node Login and check whether all services are running well.
/etc/init.d/pbs status
ps ax | grep zabbix
service nfs status

You will see following

rpc.svcgssd is stopped
rpc.mountd (pid 2107) is running...
nfsd (pid 2122 2121 2120 2119 2118 2117 2116 2115) is running...
rpc.rquotad (pid 2103) is running...
  1. Power On Compute Nodes Turn on compute nodes manually.

Hardware

NodeCountCPUMemoryStorageComment
fornax.nus.edu.sg12 x 16 (AMD 7513)128 GB14 TBhead node
smallmem42 x 32 (AMD 7551)128 GB250 GBfornax-c01 to c04, currently not powered on
largemem82 x 64 (AMD 7742)512 GB250 GBfornax-c05 to c12
smallmem12 x 32 (AMD 7543)512 GB250 GBfornax-c13
genoa72 x 32 (AMD 9354)384 GB200 GBfornax-c14 to c20
rtx50908AMD 9950X (16 cores)64 GBN/Afornax-g01 to g08
atlas3IBM Power9N/AN/AUnder deployment
fornax-scratch1N/A128 GB21 TBscratch node

Partition summary

  • smallmem: 64 cores and 128 GB RAM per node on fornax-c01 to c04, but these older nodes are currently not powered on
  • largemem: 128 cores and 512 GB RAM per node on fornax-c05 to c12
  • smallmem on fornax-c13: 64 cores and 512 GB RAM
  • genoa: 64 cores and 384 GB RAM per node on fornax-c14 to c20
  • rtx5090: GPU nodes with AMD 9950X, 64 GB RAM, and NVIDIA RTX 5090 32 GB each node on fornax-g01 to g08
  • atlas: IBM Power9 nodes with 4 x NVIDIA V100 16 GB, under deployment on atlas-1 to atlas-3

Queues

The cluster is currently managed by PBS Pro on fornax-ib. Based on qstat -Qf and qstat -Bf checked on 2026-03-18:

  • Default queue: smallmem
  • Scheduler enabled: True
  • Default placement: exclhost
  • Default chunk size: 64 CPUs
  • Scheduler iteration: 600 seconds
  • PBS Pro version: 23.06.06
QueueStatusQueue labelCPU limitsWalltime limitNotes
smallmemenabledSMQmin 16, max 256 ncpus72:00:00default queue
largememenabledLMQmin 64, max 2048 ncpus72:00:00for large CPU jobs
genoaenabledGenQmax 64 ncpus72:00:00for Genoa-labeled nodes
high_throughputenabledHTQmin 64, max 768 ncpus, max 12 nodes72:00:00high-throughput queue
rtx5090enabledGPUmax 512 ncpus72:00:00GPU queue
expressenabledSMQmin 64, max 64 ncpus72:00:00restricted by ACL to selected users
devdisableddevno general limits shownnot setACL-restricted and currently disabled

Queue notes

  • smallmem is the scheduler default if no queue is specified.
  • The older smallmem nodes fornax-c01 to fornax-c04 are currently not powered on, so queue availability should be verified against the live scheduler state before submitting jobs there.
  • express and dev are access-controlled queues and are not general-purpose queues for all users.
  • qstat -Qf shows queue-level limits, while pbsnodes -av shows the current node labels such as LMQ, GenQ, HTQ, and GPU.
  • The current scheduler output shows GenQ on fornax-c13 and on Genoa-class nodes, so queue labels should be treated as the live source of truth over older static hardware summaries.
  • Queue labels and node assignments can change over time, so re-check the live scheduler config before documenting major queue changes.

Software configuration

  • Application source directory: /home/app/source
  • Installation directory: /home/app
  • Installation prefix: /home/app/<appname>
  • Since /home/app directory is shared to all nodes, we do not need to compile or build the application in all nodes.
  • There are two ways to load the settings:
    • Automatically load: write application profile in /etc/profile.d/
    • Manually load using environment modules: recommended and easier for switching versions
  • Environment modules directory: /home/app/modulefiles/
  • An example of a simple module file:
#%Module1.0
prepend-path PATH /apps/<apps>/bin
prepend-path LD_LIBRARY_PATH /app/<application>/lib
setenv <APP>_HOME /app/<application>