‘Charliecloud’ stack simplifies HPC big data analysis
Tue 13 Jun 2017
A new product developed by researchers at Los Alamos has been created to simplify the use of supercomputers performing big data analysis.
‘Charliecloud’ supports users performing big data analysis on the Los Alamos supercomputers by providing a containerized approach that allows users to run their own software stack, sequestered from the host operating system.
Los Alamos developed a condensed, 800-line piece of code on the open source Docker platform comprised of 500 lines of C and 300 lines of shell code. Users can download the Docker platform to build customized container images. These images can then be moved from Docker to a supercomputer to run analysis applications using Charliecloud.
Users are provided with administrative freedom to run their own customized big data analysis using Los Alamos supercomputing power, while the security of the system as a whole remains unaffected due to the segmenting of different applications.
Reid Priedhorsky, lead developer of the Charliecloud system said, “Charliecloud lets users easily run crazy new things on our supercomputers. Los Alamos has lots of supercomputing power, and we do lots of simulations that are well supported here. But we’ve found that Big Data analysis projects need to use different frameworks, which often have dependencies that differ from what we have already on the supercomputer. So, we’ve developed a lightweight ‘container’ approach that lets users package their own user-defined software stack in isolation from the host operating system.”
Charliecloud was designed to provide a simple, standard, interoperable workflow that could run on existing HPC hardware with minimal modifications. Also, because it is implemented using namespace designations, there is no need to manage user privilege or trusted operations on center resources.
Charliecloud is currently in production on two of the Los Alamos supercomputers and the team there has found that it performs equally as well as the non-containerized machines.
The development of the Charliecloud solution came from the increasing demand for user-defined software stacks (UDSS) within the Los Alamos central computing framework. UDSS are complex and vary greatly but must be achieved for big data analysis. Unfortunately, as a result, the computing center is burdened with minimizing security and performance risks in the larger environment.