Building Titan: The ‘world’s fastest’ supercomputer

18 November 2014

John Pavlus

Features correspondent

Future - Titan - Supercomputer

An exclusive, behind-the-scenes look at the US bid to build a radical new machine, capable of solving some of the most complex questions in science today. Its secret: video game technology.

The sound of 20 quadrillion calculations happening every second is dangerously loud. Anyone spending more than 15 minutes in the same room with the Titan supercomputer must wear earplugs or risk permanent hearing damage. The din in the room will not come from the computer's 40,000 whirring processors, but from the fans and water pipes cooling them. If the dull roar surrounding Titan were to fall silent, those tens of thousands of processors doing those thousands of trillions of calculations would melt right down into their racks.

Titan is expected to become the world's most powerful supercomputer when it comes fully online at the US Oak Ridge National Laboratory, near Tennessee, in late 2012 or early 2013. But on this afternoon in mid-October, Titan isn't technically Titan yet. It's still a less-powerful supercomputer called Jaguar, which the US Department of Energy (DoE) has operated and continuously upgraded since 2005. Supercomputing power is measured in Flops (floating point operations per second), and Jaguar was the first civilian supercomputer to break the "petaflop barrier" of one quadrillion operations per second (a quadrillion is a one followed by 15 zeroes). In June 2010 it was the fastest supercomputer on Earth.

Gallery: Building a speed machine

But high-performance computing records don't last long: a Chinese machine pushed Jaguar into second place just six months later. Then in October 2011, the supercomputer design firm Cray announced that it would transform Jaguar into a new machine that could retake the number-one spot, with an estimated peak performance of 20 petaflops.

Cray's blue-jacketed technicians have been pacing up and down Jaguar's catacomb-like aisles for months, opening its 200 monolithic black cabinets and sliding out its processor blades like enormous safe-deposit boxes. Jaguar's brain surgery takes place on spartan worktables that wouldn't look out of place in a hobbyist's garage. A technician fits a paperback-sized ingot of metal and silicon into an empty space in the blade and fastens it into place with a battery-powered screwdriver. The ingot contains a graphics processing unit, or GPU. Cray has installed one of these GPUs alongside every one of Jaguar's 18,688 CPU chips. It's this "hybrid architecture" that will turn Jaguar into Titan, packing an order of magnitude more computing horsepower into the same amount of physical space.

‘Turbo-charged’

GPU-accelerated supercomputers burst onto the world stage in 2010, when China's Tianhe-1A machine overtook Jaguar as the fastest supercomputer on earth. "It came out of nowhere," says Wu-chun Feng, a high-performance computing expert at Virginia Tech. "China didn't even have a high-performance computing program." Instead of relying solely on expensive, highly customized, multicore microprocessors, Tianhe-1A got a speed bump by using "off the shelf" GPUs made by Nvidia, whose chips power the displays of video-game consoles and consumer laptops. Titan takes the same approach using the same chip design that powers the ultra-high resolution Retina display on Apple’s Macbook Pro. These intricate squares of silicon will provide 90% of Titan's peak supercomputing performance.

So, what do video-game graphics have in common with high-end scientific computing? Simulation. "About ten years ago, we observed that the chips we designed for gaming were starting to look more like general purpose processors for simulating physics," says Sumit Gupta, Nvidia's senior director of high performance GPU computing. "When you'd shoot a tree in a video game and it would fall, you'd want it to look natural, so the simulations became more and more complex."

At the same time, redrawing every pixel on an HD laptop screen 60 times per second also requires so-called parallel computation. "This is why GPUs are designed to run hundreds of calculations at the same time very efficiently," says Steve Scott, Tesla chief technology officer at Nvidia. "It turns out that this is very similar to the way high performance scientific computing is done, where you're simulating the climate, or the interactions between drug molecules, or the airflow over a wing."

But where video game physics only have to look real enough to a distracted teenager, supercomputer simulations have to be scientifically accurate down to the level of individual atoms - which is why Titan needs tens of thousands of GPUs all working together on the same problem, not to mention enough Random Access Memory (RAM) to hold the entire simulation in memory at once. (Titan has 710 terabytes of RAM, about as much as a stack of iPads 7km high.)

But supercomputers have been getting along without GPUs for decades. A CPU chip - the same general-purpose silicon "brain" inside your laptop, your smartphone, and every computer at Google or Facebook - can run high-performance scientific calculations, too, if you chain enough of them together. The current fastest supercomputer, IBM's "Sequoia" system at Lawrence Livermore National Laboratory in California, contains over 98,000 CPUs, each with 18 cores.

What GPUs offer that CPUs can't is a blast of relatively cheap, energy-efficient horsepower. Scaling up the Jaguar supercomputer from 1.75 petaflops to 20 could have been done by adding more cabinets stuffed full of CPUs. But those take up space, and more importantly, suck up power. Off-the-shelf GPUs, meanwhile, aren't designed to act self-sufficiently like normal chips - they're add-ons "that accelerate a CPU like a turbo engine," says Gupta - so they consume much less energy than a CPU would to do the same amount of calculating. By bolting a GPU onto each one of the 18,688 AMD Opteron CPU chips already in Jaguar, the DoE was able to create a next-generation supercomputer without scrapping the one they already had - or blowing up their electric bill.

Bigger is better

The new machine, like any supercomputer, is all about speed: "time to solution," as Jack Wells, director of science for Oak Ridge’s computing facility, puts it. "It's about solving problems that are so important that you can't wait," he says. "If you can afford to wait, you're not doing supercomputing." Competition among research projects for "core hours" on Titan is intense. Of the 79 new-project proposals received by Oak Ridge's selection panel, only 19 will run on Titan in 2013.

Winning proposals will apply Titan's computational might to problems in areas such as astrophysics (simulating Type-1A supernovae and core collapses), biology (modeling human skin and blood flow at a molecular level), earth science (global climate simulations and seismic hazard analysis of the San Andreas fault in California), and chemistry (optimizing biofuels and engine combustion turbulence). According to Buddy Bland, project director of the Oak Ridge computing facility, Titan will typically run four or five of these supercomputing "jobs" at once.

But some jobs are so complex that they'll take over Titan entirely. The Princeton Plasma Physics Laboratory, for example, will use all of Titan's computing cores to help design components for the International Thermonuclear Experimental Reactor (Iter), a prototype nuclear fusion project in France. "Their goal is to have this reactor online by 2017," Bland says. "It'll use magnetic fields to circulate plasma through a big donut-shaped reactor at 100 million degrees Fahrenheit. How do you contain that kind of energy? That's what they need Titan to help them figure out."

As fast as Titan is, these simulations can still take days, weeks, or even months to complete. And the very idea of "fast" has a different meaning to computational scientists than it does to users of consumer apps like Photoshop or Final Cut Pro. "It's not so much about running our applications and calculations faster - we want to run them bigger," says Tom Evans, a scientist at Oak Ridge who uses the supercomputer to model nuclear reactor systems. "Maybe that means adding four times more spatial resolution in our simulations, or replacing approximations with more accurate physics. Of course we always like to go faster. But it's less interesting to do the same science faster than it is to do something new that you couldn't even do before."

In other words, bigger is better - and not just for the scientific bragging rights. Having a top-ranked supercomputer on American soil "demonstrates global competitiveness and attracts brainpower," says Jack Wells. Take Jeremy Smith, director of Oak Ridge's Center for Molecular Biophysics, who used to work at the University of Heidelberg in Germany. "I found out that Oak Ridge would have this nice toy to play with," he says, "so I nipped across the pond." (Smith's research on biofuels began on Jaguar and will continue on Titan.)

Power play

Many of the smart people that Titan attracts will use the supercomputer to chart the future of supercomputing itself. So-called petascale machines like Titan and Sequoia can accomplish amazing feats of simulation, like screening millions of potential drug compounds against a target molecule in a single day. But researchers like Jeremy Smith want to do even more.

They envisage an "exascale" computer - a thousand times more powerful than Titan and able to do one quintillion calculations per second (a quintillion is a one with 18 zeroes after it). A machine like this "would have enough computing power to screen tens of millions of drug compounds against all known living protein classes," Smith says. "That means we'll be able to predict if the drug will work and what all the side effects will be - not only generically, but for individual people, based on their own genetic sequences. This is amazing potential."

The trouble with building an exascale machine, however, is the amount of energy required to get there. "If we just scaled up what we're doing today, it would take a couple of nuclear power plants to power," says Buddy Bland. But Wu Feng, who curates an annual list of the world's most energy-efficient supercomputers, is less pessimistic. "The trends indicate that we'll be able to get to the exascale for 50 megawatts," he says. That's about half as much power as Apple and Google’s data centers in North Carolina are estimated to use.

But government-funded scientific institutions don't have tech companies’ bottomless bank accounts. The DoE wants an exascale computer by 2020 that can run on 20 megawatts of electricity or less. Reaching that goal will require entirely new chip designs that draw even less power than the GPU-accelerated systems like Titan do.

Mobile devices, most of which use chip designs from the UK firm Arm, could offer a way forward. "You've probably noticed that when you put a smartphone in your pocket it doesn't burn through your pants," says Jack Wells. "The same design principle is going to be used in high-performance computing to get to the exascale." Jack Dongarra, a computer scientist at the University of Tennessee whose Top500 list ranks the world's fastest supercomputers, ran benchmarking software on an iPad 2 and found that the tablet was equivalent to some of the fastest supercomputers of the mid-1990s. "That's incredible computing power in your hand," he says. "The Arm processor is clearly capable."

Still, simply lashing together thousands of low-power processors - whether they come from smartphones, gaming consoles, or laptops - does not a supercomputer make. Passing data between all those chips creates bandwidth bottlenecks that limit the total speed of the system. "It's like having two hemispheres of your brain on opposite sides of the room connected by a wire," says Feng. An exascale computer will have to speed up its entire internal network - perhaps by using fibre optic connections between racks of chips, accelerators on every piece of silicon, or both.

Meanwhile, says Buddy Bland, jockeying for the title of "world's fastest supercomputer" will continue, and no single interconnect design or chip architecture is "best." "Whoever has the biggest budget is likely to be in the top spot," he says wryly. "But a healthy diversity in architectures is a wonderful thing because certain applications can run well on one, and others well on another."

What's indisputable is that supercomputing has become the "third pillar" of doing science, alongside theory and experimentation. The best way to grasp the power of Titan, says Bronson Messer, a computational astrophysicist at Oak Ridge, is not to compare it to a Formula 1 racing car or a turbocharged engine, but to the Large Hadron Collider. "Titan is like the particle accelerator, and the simulations and applications that we run on Titan are like the detectors that discovered the Higgs boson," Messer says. "The size or power of these machines isn't what pushes science forward. It's the people using them, who know what to look for."

If you would like to comment on this article or anything else you have seen on Future, head over to our Facebook page or message us on Twitter.