Benchmarks at CyberInfrastructure Partnership

PARATEC

Walltime
Speedup
Relative Speed*

*Speed per processor normalized to Datastar.

“PARATEC performs ab initio quantum-mechanical total energy calculations using pseudopotentials and a plane wave basis set.” [PAR]

PARATEC achieves very high processor efficiency by using optimized libraries. Most important are 1D FFT routines and the BLAS3. On the IBM computers, these routines were obtained from the ESSL math library. On Cobalt, these routines were taken from the SCS library. On the other computers, FFTW was used for the FFT routines, and CMKL was used for the BLAS3. Additional libraries used are ScaLAPACK and BLACS.

PARATEC Release 111 was used for the benchmarks. The two problems considered are from the NERSC benchmark set. Both are for silicon in the diamond structure. The medium problem has 250 silicon atoms, while the large problem has 686 atoms.

The table lists run times on the various computers for both problem sizes. The plots show the same results converted to relative speeds per processor in strong scaling scans, though scaling results are available only for Blue Gene and DataStar. For both problem sizes, Cobalt runs the fastest.

Even though most computation is done in cache, the benchmark problems still use considerable memory. On Blue Gene, this prevents the large problem from running and forces the medium problem to run in coprocessor mode, which uses only one processor per node for MPI. Since both processors per node are allocated and charged to the user, the Blue Gene processor counts in the table and plot are twice the MPI processor counts,








[PAR] PARAllel Total Energy Code (PARATEC),
      -  http://www.nersc.gov/projects/paratec