Reproducible High Performance Computing

ReHPC is about high-performance scientific computation. High performance is only one aspect of this. Reproducibility is another, as are the relating aspects of stability and reliability.

There appears to be utter confusion about the meaning of reproducibility (or different people simply mean different things with it).

On the software side, the most essential misunderstanding is that containers would resolve the problem of reproducibility. However, dependency hell is still not resolved, but that is not even the biggest of our problems. Typical use of containers by scientists imply obfuscated reproducibility, and that is the bigger issue. For instance, a typical scientist's workflow is to build a Docker container that provides their own software, but it also includes a wide range of dependencies with the goal for fellow scientists to be able to run the code on their machine. Freedom 0 (running the code) is thus fulfilled, as is freedom 2 (if they provided a free software licence compatible with all the dependencies), but freedoms 1 and 4 (study, make changes, improve en release your improvements) is far from trivial. This is where ReHPC comes in. We give advice and, when needed, provide design and programming services, to make your software run realiably and your results reproducible.

On the hardware side, in HPC the high performance aspect is almost always emphasised, for quite obvious reasons, of course. As a result other aspects often suffer. Those contain freedom, stability and predictability. To achieve acceptable levels of those properties, you need the right hardware and the right software. In many applications, proprietary microcode (e.g. Intel CPUs or network cards) may be an acceptable compromise (after all, the low-level instructions are documented and the OS and everything therein may be free software). However, there can be bugs and side effects. And thus our main focus is ARM and POWER (instead of Intel and Nvidia).

Worse, many HPC companies only provide non-free systems with proprietary hardware components and thus inevitably proprietary drivers. This exerts a strain on the system administrator. Moreover, an upgrade of such a driver sometimes even influences the results of a computational model.

On the scientific side, everything must be as portable as possible, as easy as possible to the scientific user and administrators of the system. Vendor lock-in must be avoided and autonomy of every type of user of the system must be maximised.

On the legal side we provide advice concerning licensing and copyright. Here we simplify issues by starting with the principle of Coherent Open Source, a term coined by Bruce Perens, co-founder of the Open Source Initiative (OSI).

Within the ideas of reproducibility and stability we provide advice about and services concerning a range of platforms, most prominently ARM64 and POWER-based systems. The operating system depends on what the administrator and the scientific users are most comfortable with and what their aims are. They include OpenBSD, Dragonfly BSD, GNU Guix, Devuan GNU/Linux and Rocky Linux (sic). This covers a wide range of system types that, we believe, only strengthens the reproducibility and stability aspects (because of porting efforts).

We tend to work together with smaller companies like SiPearl and Raptor Computing Systems, but for some projects also large companies like IBM and HPE.

Contact for information.