Like Hollywood’s Academy Awards, the Top500 list of supercomputers is dutifully watched by high-performance computing (HPC) participants and observers, even as they vocally doubt its fidelity to excellence.
“The Top 500 [uses] an artificial problem — it doesn’t measure about 80 per cent of the workloads” that are usually run on supercomputers, said John Hengeveld, director of technical compute marketing for Intel’s Data Centre Group, speaking on the sidelines of the Supercomputer 2010 conference this week. “It is not a representative benchmark for the industry.”
“The list is unclear exactly what it measures,” agreed Dave Turek, who heads up IBM’s deep computing division, in an interview last week.
The selection process for the Academy Awards (“The Oscars”), run by the Academy of Motion Picture Arts and Sciences, is shrouded in mystery, and, perhaps not surprisingly, observers grumble over which movies and people receive awards and which remain neglected. With the Top500 though, the discontent centers around the single metric used to measure the supercomputers, called Linpack.
Many question the use of a single metric to rank the performance of something as mind-bogglingly complex as a supercomputer.
During one panel at the SC2010 conference this week in New Orleans, one high-performance-computing vendor executive joked about stringing together 100,000 Android smartphones to get the largest Linpack number, thereby revealing the “stupidity” of Linpack.
The Top500 list is compiled twice a year by researchers at the University of Mannheim, Germany; the U.S. Department of Energy’s Lawrence Berkeley National Laboratory; and the University of Tennessee, Knoxville.
In the latest iteration, unveiled Sunday, China’s newly built Tianhe-1A system topped the list, reporting a sustained performance of 2.57 petaflops. Placing second was the DOE’s Oak Ridge Leadership Computing Facility’s Jaguar system, reporting 1.75 petaflops.
While grumbling about Linpack is nothing new, the discontent was pronounced this year as more systems, such as the Tianhe-1A, used GPUs (graphics processing units) to boost Linpack ratings, in effect gaming the Top500 list.
“It is difficult to figure out the real application speed from the benchmark. I don’t want to make the Top500 just a contest for the number of GPUs,” noted an attendee at the Top500 awards ceremony.
“Linpack has many problems with it, but it has a few positive things. It is important to keep in mind that it is one number and it should be taken in the context of a number of things,” argued Jack Dongarra, one of the judges for the ranking, during the awards presentation.
One advantage to Linpack is that, thanks to its simplicity, supercomputer keepers can improve their scores on a relatively periodic basis, which makes for exciting news coverage as different facilities and system builders vie with one another for the next top spot.
One year ago, for instance, the Cray-built Jaguar system, clocking in at 1.75 petaflops, knocked the DOE Los Alamos National Laboratory’s Roadrunner System from the top spot. Roadrunner, which then clocked 1.04 petaflops, was the first system to break the petaflop barrier in June 2008.
But Linpack, being only one metric, does not take into account many factors of a supercomputer.
“The important thing is how well does your application run on these machines. That is a harder thing to measure and a harder thing to compare across different machines,” Dongarra said.
Because Linpack is basically a set of Fortran routines that solve linear equations, it is best suited to measuring the computational muscle of a machine. “Linpack captures how well can you make lots of processors work in parallel on a single big problem,” Intel’s Hengeveld said.
Linpack is less suitable, however, at estimating the memory performance of a machine, which is an increasingly crucial metric for many of today’s big data-styled problems, Hengeveld said.
Also at SC2010, a group of researchers introduced the first edition of Graph500, which will be a set of benchmarks that measure supercomputer performance on data-intensive applications.
The team has finished the first benchmark — a search problem across multiple nodes involving multiple analysis techniques — and announced the winners. Topping the list was the DOE Argonne National Laboratory’s 8,192-core Intrepid system, which was able to execute 6.6 billion references, or results, per second
Time will tell if any of these benchmarks will supplant the Top500. Also like the Oscars, the Top500 may have a staying power that defies logic.