New Colonial One Info

We are excited to announce the next phase of High Performance Computing at GWU -- a brand new cluster to go alongside what we have all come to know. In its newest iteration, Colonial One has been rebuilt with the old Colonial One becoming Colonial X and the newest one assuming the well know title of Colonial One – GW’s premier source for its High-Performance Computing needs. The new Colonial One consists of 210 compute nodes accessible through 4 high available login nodes. The robust Dell built cluster utilizes R740s and C4140 and can be broken down by compute, GPU (small and large), high memory and high throughput nodes. All nodes will be loaded with CentOS 7.4 and utilize the SLURM job scheduler. With the new cluster capable of a total of 2.14 PFLOPs Single Precision!

 

Check out the build blog.

 

Core Counts

 

Total CPU Cores*: 8,112

Total NVIDIA Tensor Cores: 76,800

Total NVIDIA CUDA Cores: 614,400

 

A breakdown can be found here

Compute Nodes

There are 164 CPU nodes in Colonial One. Each of these is a:

  • Dell PowerEdge R740 server with
  • Dual 20-Core 3.70GHz Intel Xeon Gold 6148 processors
  • 192GB of 2666MHz DDR4 ECC Register DRAM
  • 800 GB SSD onboard storage (used for boot and local scratch space)
  • Mellanox EDR Infiniband controller to 100GB fabric

GPU Nodes

 

Small GPU nodes

There are 16 Small GPU nodes in Colonial One. Each of these is a:

  • Dell PowerEdge R740 server
  • (2) NVIDIA Tesla V100 GPU
  • Dual 20-Core 3.70GHz Intel Xeon Gold 6148 processors
  • 192GB of 2666MHz DDR4 ECC Register DRAM
  • 800 GB SSD onboard storage (used for boot and local scratch space)
  • Mellanox EDR Infiniband controller to 100GB fabric

Large GPU nodes

There are 22 Large GPU nodes in Colonial One. Each of these is a:

  • Dell C4140 server
  • 6TB NVMe card
  • Four (4) Nvidia Tesla V100 SXM2 16GB GPUs with NVLink enabled
  • Dual 18-Core 3.70GHz Intel Xeon Gold 6140 processors
  • 384GB of 2666MHz DDR4 ECC Register DRAM
  • 800 GB SSD onboard storage (used for boot and local scratch space)
  • Mellanox EDR Infiniband controller to 100GB fabric

High Throughput Node

There are 6 High Throughput nodes in Colonial One. Each of these is a:

  • Dell PowerEdge R740 server
  • Dual 4-Core 3.70GHz Intel Xeon Gold 5122 processors
  • 384GB of 2666MHz DDR4 ECC Register DRAM
  • 800 GB SSD onboard storage (used for boot and local scratch space)
  • Mellanox EDR Infiniband controller to 100GB fabric

High Memory Node

There are 2 High Memory Nodes in Colonial One. Each of these is a:

  • Dell PowerEdge R740 server
  • Dual 18-Core 3.70GHz Intel Xeon Gold 6140M processors
  • 3TB of 2666MHz DDR4 ECC Register DRAM
  • 800 GB SSD onboard storage (used for boot and local scratch space)
  • Mellanox EDR Infiniband controller to 100GB fabric

Login Nodes

There are 4 High Available Login Nodes in Colonial One. Each of these is a:

  • Dell PowerEdge R740 server
  • Dual 16-Core 3.70GHz Intel Xeon Gold 6130 processors
  • 192GB of 2666MHz DDR4 ECC Register DRAM
  • 2TB RAID I HDD onboard storage (used for boot and local scratch space)
  • 40Gb/s Ethernet for the external network
  • Mellanox EDR Infiniband controller to 100GB fabric

Head Nodes

There are 2 High Available Head Nodes to control Colonial One. Each of these is a:

  • Dell PowerEdge R740 server
  • Dual 16-Core 3.70GHz Intel Xeon Gold 6130 processors
    192GB of 2666MHz DDR4 ECC Register DRAM
  • 5TB RAID5 HDD onboard storage
  • 40Gb/s Ethernet for the external network
  • Mellanox EDR Infiniband controller to 100GB fabric

Filesystem

For NFS, the cluster will utilize DDN GS7K Storage appliance having a total of 2Pb of space. Which is connected to compute and login nodes via Mellanox EDR Infiniband over 100Gb fabric.

 

For scratch/high speed storage, the cluster will utilize DDN ES14K Lustre appliance providing 2Pb of Parallel Scratch storage. The ES14K will be connected to the compute nodes via Mellanox EDR Infiniband over 100Gb fabric.

 

*Not including Login Nodes or Head Nodes