Central storage system


Project origins

A High Performance computer centre established to provide large computing power to scientific communities from all over Poland was faced with the challenge of creating a central storage system for high capacity data.
 The computer centre is one of Poland’s leading high-performance computing (HPC) centres. It plays a major role in an extensive project associated with the programme: HPC Infrastructure for Grand Challenges of Science and Engineering (POWIEW). It involves the installation and operation of a leading-edge computing infrastructure across three Polish cities. Large-scale computing systems are made available to researchers in accordance with the principle of open access to infrastructure and software resources.
 The computer centre operated an environment with compute and storage resources from leading vendors of the following systems (among others): IBM Blue Gene/P, IBM Power 775 or NetApp FAS, which allocated part of their computing power to meet the project needs. When the computing infrastructure ran low on capacity, the team of engineers decided to deploy a central storage system for temporary data (created by HPC clusters) and long-term data (home directories).

Aim of the project

The computer centre requested the delivery of an efficient high-end solution with at least 6GB/sec for sequential data throughput, scalable capacity and high density (due to the small physical space available to the Centre). Additional requirement was that all disk space should be available as a single large resource pool to be shared between multiple computer systems.

Solution

The proposed design for a central storage system consisted of: the Dell server platform, NetApp storage arrays, the high performance SAN Brocade network and the IBM cluster file system. The system provided 2.1PB of disk space, covering 720 SAS disks, four servers and SAN network. Physical space allocation for all devices was only 48RU (Rack Units). The entire “raw” space of the storage arrays was merged into one file system with a total capacity of 1.7 PB. The “Client” access to the cluster file system created in this way was ensured through the use of NFS and 40Gb/s InfiniBand interfaces. To prevent the communication channel between the cluster file system layer and the array layer from becoming a “bottleneck” of the entire solution, a high performance SAN network using the 16Gb/s FC technology was established. To ensure the security and integrity of disk space data, the RAID 6 technology was used to provide protection against two simultaneous disk drive failures within a single disk group.

Results

The central data storage system was designed to meet current needs and enable fast and trouble-free expansion through addition of individual components (arrays, disk shelves, cluster file system nodes) to satisfy the growing demand for this type of services. The above functionality was found to be particularly important when dealing with rapidly developing scientific environment, and thus given high priority.
The use of a cluster file system enabled the creation of continuous, high performance and shared disk space resistant to the simultaneous failure of 3 out of the 4 available access nodes, while ensuring continuous access to data. The system tests carried out with the participation of Client representatives showed a performance level for sequential data throughput of 9GB/s for writing sequential data and 8GB/s for reading sequential data from a file system. As shown by these results, one of the main project objectives (system performance of at least 6GB/s) was exceeded, providing additional capacity reserve.
By using the technology previously used in the environment of the computer centre, a system was established that could – to the satisfaction of the Client – be managed by maintenance department staff with no prior training (as they had already been familiar with the management procedures required for that system).

Technology

  • Storage arrays: NetApp E5460
  • Servers: Dell R620
  • SAN switches: Brocade 5300
  • Operating system: Red Hat Enterprise Linux Server 6.3
  • Cluster file system: IBM General Parallel File System (GPFS) 3.5