Hawk

A team of Lehigh researchers led by Prof. Edmund B. Webb III, Department of Mechanical Engineering and Mechanics were awarded a 400K grant by National Science Foundation's (NSF) Campus Cyberinfrastructure program to acquire a HPC Cluster, Hawk, to enhance collaboration, research productivity and educational impact. 

The proposal team included co-PIs Ganesh Balasubramanian (Mechanical Engineering & Mechanics), Lisa Fredin (Chemistry), Alexander Pacheco (Library & Technology Services), Srinivas Rangarajan (Chemical & Biomolecular Engineering), Senior Personnels Stephen Anthony (Library & Technology Services), Rosi Reed (Physics), Jeffrey Rickman (Materials Science & Engineering), and Martin Takac (Industrial & Systems Engineering).

Acknowledgement

In publications, reports, and presentations that utilize Sol, Hawk and Ceph, please acknowledge Lehigh University using the following statement:

"Portions of this research were conducted on Lehigh University's Research Computing infrastructure partially supported by NSF Award 2019035"

Hawk will share storage, software stack, and login node with Sol. Users will login to Sol (sol.cc.lehigh.edu) and will submit jobs to partitions - hawkcpu, hawkgpu & hawkmem, that service the Hawk compute nodes. 




Configuration

Compute


Regular NodesBig Mem NodesGPU Nodes
Partition Namehawkcpuhawkmemhawkgpu
Nodes2644
CPU TypeXeon Gold 6230RXeon Gold 6230RXeon Gold 5220R
CPUs/Socket262624
CPU Speed2.1GHz2.1GHz2.2GHz
RAM (GB)3841536192
GPU Type

nVIDIA Tesla T4
GPUs/Node

8
GPU Memory (GB)

16
Total CPUs1352208192
Total RAM (GB)99846144768
Total SUs/year11,843,5201,822,0801,681,920
Peak Performance (TFLOPs)56.24328.65284.3008
HPL (turbo speed) TFLOPs/Node2.0582.2201.436

Summary



Nodes34
CPUs1752
RAM (GB)16896
GPU32
GPU Memory512
Annual SUs15,347,520
CPU Performance (TFLOPs)69.1968
GPU Performance FP32 (TFLOPs)8.11 
GPU Performance FP16 (TFLOPs)259.2 
GPU Performance FP16/FP32 (TFLOPs)526.5
GPU Performance INT8 (TOPs)1053
GPU Performance INT4 (TOPs)2106

Storage

Nodes7
CPU TypeAMD EPYC 7302P
CPUs/node16
RAM/node128
OS SSD/node2x 240GB
Ceph HDD/node9x 12TB
CephFS SSD/node3x 1.92TB
Total Ceph (TB)756
Total CephFS (TB)39.9
Total Storage (TB)798
Usable Ceph (TB)214
Usable CephFS (TB)11.3

Differences between Hawk and Sol

There are three major differences between Hawk and Sol

  1. No infiniband on Hawk, so running multiple node jobs is not recommended.
  2. LMOD loads a default MPI library. On Sol, mvapich2 is loaded while on Hawk, mpich is loaded.

Please add the following line to your submit script before loading any modules so that applications optimized for Cascade Lake CPUs are loaded. By default, applications optimized for Haswell (head node CPU) are loaded.

source /etc/profile.d/zlmod.sh

intel/20.0.3 and mpich/3.3.2 is automatically loaded on the hawkcpu and hawkmem partitions while intel/19.0.3, mpich/3.3.2 and cuda/10.2.89 is automatically loaded on hawkgpu. These are loaded via the hawk and hawkgpu modules. Load the sol and solgpu modules to switch to the mvapich2 equivalents on Sol and Pavo (debug) cluster after OS upgrade.

To make best use of the hawkgpu nodes, please request 6 cpus per gpu.

Sample Hawk Submit script
#!/bin/bash


#SBATCH --partition=hawkgpu
#SBATCH --time=1:00:00
#SBATCH --nodes=1
# Request 2 GPUs
#SBATCH --gpus-per-node=2
# Request 6 CPUs per GPU
#SBATCH --ntasks-per-node=12
#SBATCH --job-name=myjob
 
# source  LMOD profile
source /etc/profile.d/zlmod.sh

# Load LAMMPS Module
module load lammps
 
# Run LAMMPS for input file in.lj
srun $(which lmp) -in in.lj -sf gpu -pk gpu 2 gpuID 0 ${CUDA_VISIBLE_DEVICE}

Available Software
[2020-12-08 19:25.50] ~
[alp514.hawk-a125](1004): module av

-------------------------------------------------------- /share/Apps/share/Modules/lmod/linux-centos8-x86_64/py_venv/2020.07 --------------------------------------------------------
   conda/biofluids    conda/bioinformatics    conda/chem    conda/mldl    conda/nlp    conda/r (D)

--------------------------------------------------- /share/Apps/share/Modules/lmod/linux-centos8-x86_64/mpich/3.3.2/intel/20.0.3 ----------------------------------------------------
   crystal17/1.0.2    dl_poly/4.09    namd/2.14    nwchem/7.0.2    openmolcas/19.11    vasp/5.4.4.pl2-neb    vasp/5.4.4.pl2    vasp/6.1.2 (D)

----------------------------------------- /share/Apps/lusoft/share/spack/lmod/avx512/linux-centos8-x86_64/mpich/3.3.2-u7dnjec/intel/20.0.3 ------------------------------------------
   almabte/1.3.2           gromacs/2020.4-plumed        netcdf-cxx/4.2       (D)    parmetis/4.0.3       quantum-espresso/6.5      superlu-dist/6.3.1
   boost/1.74.0     (D)    gromacs/2020.4        (D)    netcdf-fortran/4.5.3 (D)    plumed/2.6.2         relion/3.1.1
   globalarrays/5.7        lammps/20200303              openfoam/2006               py-espresso/4.0.2    shengbte/1.1.1-8a63749

--------------------------------------------------- /share/Apps/lusoft/share/spack/lmod/avx512/linux-centos8-x86_64/intel/20.0.3 ----------------------------------------------------
   arpack-ng/3.7.0 (D)    gsl/2.5              (D)    metis/5.1.0    (D)      netcdf-c/4.7.4       (D)    openblas/0.3.10    (D)    qhull/2020.1       (D)    superlu-mt/3.1 (D)
   boost/1.74.0           hdf5/1.10.7          (D)    mpich/3.3.2    (L,D)    netcdf-cxx/4.2              openmpi/4.0.5      (D)    qrupdate/1.1.2     (D)    superlu/5.2.1  (D)
   fftw/3.3.8      (D)    intel-mkl/2020.3.279 (D)    mvapich2/2.3.4 (D)      netcdf-fortran/4.5.3        py-pyparsing/2.4.2 (D)    suite-sparse/5.7.2 (D)

------------------------------------------------------- /share/Apps/lusoft/share/spack/lmod/avx512/linux-centos8-x86_64/Core --------------------------------------------------------
   angsd/0.933          canu/1.8                    gsl/2.5                       mpich/3.3.2             paml/4.9j               py-pyparsing/2.4.2          samtools/1.10
   arpack-ng/3.7.0      cuda/10.2.89                hal/2.1                       mvapich2/2.3.4          perl/5.32.0             py-pyspark/2.4.4            snpeff/5.0
   bamtools/2.5.1       cuda/11.0.2                 hdf5/1.10.7                   netcdf-c/4.7.4          phast/1.4               py-scikit-build/0.10.0      suite-sparse/5.7.2
   bartender/1.1        cuda/11.1.0          (D)    intel-mkl/2020.3.279          netcdf-cxx/4.2          pilon/1.22              py-scikit-image/0.17.2      superlu-mt/3.1
   bayescan/2.1         exonerate/2.4.0             intel/19.0.3                  netcdf-fortran/4.5.3    py-ase/3.18.0           py-scikit-optimize/0.5.2    superlu/5.2.1
   bayestraits/3.0.2    express/1.5.2               intel/20.0.3         (L,D)    ngmlr/0.2.7             py-biopython/1.73       pythia8/8303                tabix/2013-12-16
   bgc/1.03             fastjet/3.3.3               jellyfish/2.2.7               ngstools/2019-06-24     py-cclib/1.5.post1      python/3.8.6                trimmomatic/0.39
   blast-plus/2.9.0     fastqc/0.11.7               kallisto/0.46.2               nlopt/2.6.2             py-keras/2.2.4          qhull/2020.1                vcftools/0.1.15
   blat/35              fastx-toolkit/0.0.14        libarchive/3.4.3              nvhpc/20.9              py-mdanalysis/1.0.0     qrupdate/1.1.2              vesta/3.4.6
   boost/1.74.0         fftw/3.3.8                  masurca/3.4.2                 octave/5.2.0            py-moltemplate/2.5.8    r/4.0.3
   bowtie/1.3.0         freebayes/1.1.0             metis/5.1.0                   openbabel/3.0.0         py-phonopy/1.10.0       root/6.20.08
   bowtie2/2.4.1        gatk/4.1.0.0                miniasm/2018-3-30             openblas/0.3.10         py-pip/20.2             rsem/1.3.1
   bwa/0.7.17           gcc/9.3.0                   minimap2/2.14                 openmpi/4.0.5           py-porechop/0.2.4       salmon/0.14.1

------------------------------------------------------- /share/Apps/lusoft/share/spack/lmod/default/linux-centos8-x86_64/Core -------------------------------------------------------
   anaconda3/2019.10          cmake/3.18.4    motif/2.3.8          scons/3.1.2     tcl/8.6.8    tmux/3.1b
   anaconda3/2020.07 (L,D)    julia/1.5.2     parallel/20200822    screen/4.8.0    tk/8.6.8

--------------------------------------------------------------- /share/Apps/share/Modules/lmod/applications/licensed ----------------------------------------------------------------
   abaqus/6.14        ansys/14.5        gaussian/g09       guppy/4.0.14-gpu        knitro/10.3.0           matlab/R2016a    matlab/R2019b
   abaqus/2016        ansys/17.1        guppy/3.1.5-gpu    guppy/4.0.14     (D)    mathematica/10.4        matlab/R2016b    matlab/R2020a (D)
   abaqus/2019 (D)    ansys/20.2 (D)    guppy/3.1.5        gurobi/8.1.1            mathematica/12.0 (D)    matlab/R2018a

---------------------------------------------------------------------- /share/Apps/share/Modules/lmod/default -----------------------------------------------------------------------
   gpu    hawk (L)    hawkgpu    sol    solgpu

  Where:
   L:  Module is loaded
   D:  Default Module

Use "module spider" to find all possible modules and extensions.
Use "module keyword key1 key2 ..." to search for all possible modules matching any of the "keys".


Resource Allocation

Policies and procedure for requesting computing time is described in Account & Allocations. The following is the allocation distribution as described in the proposal. 


Compute

  • 50% will be allocated to the PI, co-PI and Sr. Personnel team (7,673,760)
  • 20% will be shared with the Open Science Grid (grant requirement) (3,069,504)
  • 25% will be available to the General Lehigh Research Community (3,836,880)
  • 5% will be distributed at the discretion of Lehigh Provost (767,376)

Storage

The 11TB CephFS space will not be partitioned but will be combined with the CephFS from the Ceph cluster to provide a 29TB (as of Mar 2021) distributed scratch file system. Of the available 215TB Ceph space,

  • 85TB (40%) will be allocated to the PI, co-PI and Sr. Personnel team
  • 75TB (35%) will be available to the General Lehigh Research Community
  • 30TB (14%) is allocated to R Drive and available to all faculty
  • 20TB (10%) will be distributed at the discretion of Lehigh Provost
  • 5TB will be shared with OSG

Project Timeline

  1. Award Announcement: June 5, 2020
  2. Issue RFP: July 2 or 6, 2020 - Completed
    1. Questions Due: July 13, 2020 - Completed
    2. Questions returned: July 17, 2020 - Completed
    3. Proposals/Quote Due: July 31, 2020 - Completed
    4. Vendor Selection: August 14, 2020 - Awarded bid to Lenovo
    5. Delivery: September 30, 2020
  3. Actual Purchase: Sep 1, 2020
  4. Actual Delivery: Sep 21, 2020 - Oct 5 (Shipping began 9/17, all items have shipped by 9/28.)
  5. Rack, cable and power nodes: Scheduled for week of Oct 12
  6. Install OS
  7. Scheduler up and running
  8. Testing Begins: November 1, 2020
  9. User Friendly mode - co-PI, Sr. Personnel Team: November 12, 2020
  10. User Friendly mode - everyone else: December 7, 2020
  11. Production/Commission Date: February 1, 2021
  12. Target Decommissioning Date: Dec 31, 2025