Case Study: Supercomputer "Chundoong"


ManyCoreSoft designed and built a heterogeneous supercomputer "Chundoong" together with the Center for Manycore Programming at Seoul National University, Korea. Its differentiated low-cost and low-power design reduced its construction and maintenance cost significantly. Its power-efficiency is almost 8 times better than the homogeneous supercomputers at the same performance level.


  • Ranked 277th in the TOP500 list of November 2012
  • Ranked 32rd in the Green500 list of November 2012
  • Referred as the 7th power-efficient architecture in the TOP500 list at SC12
  • Its per-node performance (1.907 TFLOPS) is #1 among 412 clusters in the TOP500 list of November 2012

The First Supercomputer with Gaming GPUs

Modern HPC systems typically contains expensive and dedicated CPUs and accelerators, such as GPGPUs. Gaming GPUs are originally for desktop PCs, but their performance is comparable to or better than HPC dedicated accelerators and their cost is up to 10 times less. Using gaming GPUs in an HPC system significantly reduces the system construction cost.

However, gaming GPUs do not have ECC memory and are not designed for high-density systems. If they are installed in an HPC system, the large amount of heat generated by the GPUs cannot escape from the system well. This causes erroneous computation and shortens the lifetime of GPUs.

Chundoong uses gaming GPUs (i.e., AMD Radeon HD 7970) as accelerators, and adopts a self-made water cooling system to solve the heat dissipation problem. It keeps CPUs and GPUs at a low temperature (below 50°C).

Chundoong is the first supercomputer in the world that contains high-density gaming GPUs and ensures their reliability. It shows the world-class technology and experiences of ManyCoreSoft in using gaming GPUs for HPC systems.

Software Technology to Maximize Performance

The performance of an HPC system is determined not only by the hardware but also by the software that exploits the hardware. The LINPACK benchmark - a software that is used to measure the performance of supercomputers - was ported to Chundoong and optimized by various software techniques for multiple GPUs. As a result, the performance was improved by about 37%.