HPC Benchmark Study using AWS and Azure

Nov 14, 2015

Summary

During the day risk analysis, in the evening and on weekend music and video streaming and when required a tsunami early warning simulation – all this is possible through application of cloud. With its flexibility and cost effectiveness cloud computing supersedes conventional IT in many areas.

Motivation

Cloud systems are useful in many scenarios and are used for a broad spectrum of user groups. For example, insurance companies can perform extensive risk analysis during the day using the same cloud system that can be used in the evening, after work hours, to stream music by internet portals such as Spotify or videos by Spiegel TV. Amazon Web Services (AWS) is currently the global leader in providing cloud services, especially Infrastructure as a Service (IaaS). Other well known IT giants such as Google and Microsoft are also expanding their cloud services offering at breakneck speed.

As in the commercial sector the demand for flexibility and scalability of IT systems is a requirement in High Performance Computing (HPC) sector in equal measures. Here the simulation of a tsunami early warning system can serve as a specific example.

Data values as seismological shocks and radar signals of waves must be interpreted and thus lead to the conclusion whether the tsunami is going to be a threat to a coastal region. Therefore, readings are merged with the topology of the seabed in a computational model so that it can be concluded after extensive calculations whether a wave is only a few centimeters or several meters high and thus life-threatening. When people in the coastal region should be warned in time and rescue measures should be undertaken, the time between the warning and the actual arrival of the wave at the coast is crucial.

The importance of timely and fast predictions means, however, that computers in data centers must constantly be available on standby for emergencies in order to perform such necessary calculations. It translates into a large amount of resources in terms of energy and costs and is difficult to realize especially in developing countries.
Axtrion in cooperation with the team of Numerical Methods in Geosciences at the University of Hamburg applied Cloud with the Infrastructure-as-a-Service (IaaS) specifically for this use case by leveraging its flexible and cost-effective scalable IT resources for the given scenario. In case of an emergency computational resources which are, for instance, being used for financial risk analysis or for music streaming, must be made available within seconds to launch a tsunami simulation and predict the wave height depending on the area.

Project Realisation

With this idea, a feasibility study was conducted by the University of Hamburg and Axtrion intended to show that the required computing resources distributed in the cloud network can be allocated to carry out the calculation in a reasonable time (15 minutes). For the test runs special algorithms were used which can provide information about indicators such as network and computer performance and data-IO.

The implementation was done using Amazon Web Services (AWS) EC2 virtual machines by porting the existing software code of the Hamburg University team. For the translation of the source code on the machines an optimization of the compiler parameters and an adjustment of the message-passing libraries (MPI and OpenMPI) was performed.

Hence, after the simple provisioning of IT resources in the cloud and the porting of the existing software a system was ready for testing the cloud IT and carrying out the feasibility study in just a few days. Axtrion provided consulting services and support to the University of Hamburg with regard to the provisioning and use of cloud services. After the adjustment of the source code, the MPI library and compiler options to the demands of virtual AWS EC2 machines Axtrion conducted the test runs and together with the scientists analyzed and interpreted the results.

Benefits

Cloud is ideally suited for the selected use case. With AWS EC2 instances good measurement results were obtained. The types of machines used were more expensive than smaller (and significantly less powerful) machine types, but they were able to score and show great cost effectiveness because of their demand-driven provision. In comparison conventional data centers specialized for this scenario would cost substantially more because of a high up-front investment with respect to building, hardware, air conditioning, network technology, personnel and high standby times.

“The benchmark comparison shows that using a highly available cloud systems in the field of high-performance computing and simulation for complex model calculations in emergency situations makes sense. In addition to the simple, on-demand scalable and fail-safe use of limitless availability of IT resources in the cloud, the cost factor also plays an essential role for us as a research institution, because even for larger simulation projects we would not need to make any upfront investments.”
Prof. Dr. Jörn Behrens

Professor for Numerical Methods in Geosciences, Mathematics, University Hamburg