NOAA's next supercomputer will be a Cray-IBM hybrid system

On Jan. 14, the U.S. upgraded its main weather forecasting model, which subsequently did a very good job in predicting the track of last week's East Coast blizzard. It correctly predicted that heavier snows would be east of New York City, even as the official weather forecast -- based on a mix of computer models -- had the city getting buried in two feet of snow.

When it didn't happen, there was some political fallout as officials struggled to explain to residents why they had closed schools and shut down public transportation.

The model that got the storm track right -- the updated U.S. Global Forecast System (GFS) model -- now runs on a relatively small 213-teraflop supercomputer (with each teraflop representing one trillion floating point operations per second). Ahead of the big storm, that hardware was running at near capacity.

Now, that system is about to get a serious upgrade to 776 teraflops.

The new system is now in acceptance testing and by October, the National Weather Service (NWS), which is part of the National Oceanic and Atmospheric Administration (NOAA), expects to have two 2.5-petaflop systems up and running. The Weather Service actually runs two GFS systems, one in Virginia and the other in Florida, with one serving as a potential backup.

The petaflop system (each petaflop equals 1,000 teraflops) will be a little unusual, thanks to complications arising from IBM's sale of its x86 server line to Lenovo.

NOAA uses an IBM System x iDataPlex running Intel Sandy Bridge chips and IBM was scheduled to deliver an upgrade for it last year. But the then-pending sale of its x86 division to Lenovo changed that plan.

Lenovo is a Chinese company, which created security concerns and complications for the U.S.

Rather than rebidding the hardware, NOAA worked out a new agreement with IBM to subcontract with Cray to create an integrated system that would reach the 2.5-petaflop goal, according to Michelle Mainelli, deputy director of central operations for NOAA's National Centers for Environmental Prediction.

These will be integrated platforms; they combine the compute capacity of two versions of the IBM iDataPlex systems that support Intel's Sandy Bridge and Ivy Bridge processors, and the Cray XC40. The Cray hardware utilizes Intel's Haswell chip running on a Linux OS. Infiniband switches are used to make the connections.

Having these mix-and-match in systems has turned out to be an advantage, said Mainelli. There are subtle differences in running systems on three different platforms, but that "just makes us smarter and more prepared for what comes next," she said.

NOAA will continue to source parts from IBM, not from Lenovo, to maintain the platform.

Integrating separate systems is not uncommon, said Steve Conway, a high performance computing (HPC) analyst at IDC. Many large users have multiple systems, "and I'm sure those are not all isolated from each other."

IBM's sale of its x86 server line-up to Lenovo, however, "is really causing a shift in the whole supercomputing market," said Conway. For years, Hewlett-Packard and IBM were nose-to-nose in the lead for HPC market share, but with the divestiture, IBM is now half the size it was in the supercomputing market. Its HPC efforts are now focused on its Power-based systems.

The upgraded GFS model now in use adds new detail and data to the weather forecast picture, offering better parity with the European Center for Medium range Weather Forecasting (ECMWF) model. An even newer GFS model is expected to be in use roughly a year from now, and researchers will be able to run it on the more powerful petaflop system.

That new hardware will also be useful for other tasks. NOAA's hurricane model will begin running on the 776-teraflop system at the start of this year's Atlantic hurricane system. An upgrade to that model, designed to take advantage of that system, is due in late May.

While the GFS model may have gotten the recent blizzard right, weather forecasters use multiple models and still must consider the uncertainties of science in making their forecasts. The ECMWF model, for instance, did a better job in forecasting the path of Hurricane Sandy in 2012 than the U.S. model. That became something of a sore point in the U.S. -- in part leading the recent updates and hardware upgrades.


Patrick Thibodeau

Zur Startseite