HEP researchers are increasingly directed toward the use of high performance computing (HPC) platforms as solutions to current and future production computing problems. Unlike grid-based high throughput computing (HTC) platforms, where grid nodes use similar architectures and runtime environments, HPC resources come in a wide variety of architectures that are often not designed for data intensive applications. As a result, solutions often (though not always) target a specific type of HPC resource using a target-specific strategy. This situation leads to a complex landscape where the research needs to select a strategy and resource that is best suited to the problem they are trying to solve.
There are at present a number of efforts within the LArSoft community to support running LArSoft on HPC platforms using different strategies that target various types of resources. Three of these are listed below:
- Simple: Running LArSoft out of the box in singularity containers. Note that in the case discussed here, infrastructure changes at the HPC center were required to make it work. More information can be found on the “Experience running LArSoft out-of-the-box on HPC” page.
- Optimized for HPC: Running LArSoft with algorithms that have been modified and optimized to take full advantage of the acceleration and multi-threading capabilities of the particular HPC resource to run with maximum efficiency. This effort required performing native builds of LArSoft code on the HPC platform, and accompanying changes to the architecture of the algorithms. More information will be found on the “LArSoft algorithm optimization for HPC workflows” page.
- GPU as a service (GPUaaS): Deep learning inferencing runs with high efficiency and speed on GPUs. While general computing on GPUs is not a solved problem, LArSoft now has a native capability to dispatch deep learning inferencing tasks to a GPU server. Instructions for doing this can be found on the “Using GPU as a service in LArSoft” page.