EasyBuild

Overview

This section of best practice is based on the wisdom distilled from the EasyBuild Workshop at the University of Birmingham, 22nd October 2018, hosted by Andrew Edmondson, Research Software Group Leader, Advanced Research Computing, with Kenneth Hoste, lead EasyBuild developer at the University of Ghent.

22 people from the UK academic HPC community attended the event.

Many thanks to Andrew for organising it, and to Kenneth and the University of Ghent for presenting, as well as the other volunteer presenters giving their experiences with EasyBuild.

Agenda

The agenda for the meeting was:

10:00 Registration and coffee
10:15 EasyBuild introduction/history/future (Kenneth Hoste)
11:00 How it’s done at various universities
11:45 -Questions and open discussion of common problems 
12:30 LUNCH
13:00 – Contributing back to the community (Kenneth Hoste)
14:00 – Workshop time where we help each other solve problems
14:30 Coffee
16:00 Close

Presentations

The presentations are attached here.

Questions and Answers

Managing package updates

Q: How do you manage package updates – say I want to rebuild a package with some additional compiler options. Do you just build in place, or build a new version.

A: It is best to build a new version to retain the old version for reproducibility, by customising the script to add additional qualifiers,  but an in place build can be done.

Q: How do you manage package updates? Say move abc-1.2.3-foss-2017a to abc-1.2.3-foss-2018b? Can you automate this along with the dependencies?`

A: Yes. For example, if you have Python 3.7 you can try  eb original-version.eb --try-software-version 3.7.1. This keeps the original dependencies, which may or may not be correct, so can use cached EB version to create a new EB if required.

For toolchainseb easybuild-file.eb --try-toolchain-version foss,2018.08. In this case new versions of dependencies are created.

There may be dependency hell issues doing this, e.g. compiling lot of stuff into R, say. You can create bare bones version and then side load other modules with EB, or let the users compile against R. But can cause issues for users. For dependencies you can use module load or module req in higher level  modules (e.g. module load numpy-1.18-python-2.7.12 loads python-2.7.12 first). 

Ghent builds for new foss every 6 months. easyupdate?

Note that  Anaconda – installs pre-compiled binaries on a generic platform for x86, and may not be efficient, as it won’t necessarily include AVX, AVX2 and other optimisations.

Linking to existing Intel installation

Q: How do I set up intel-2017a or similar to find my existing Intel installation?

A: This can be done via external modules. See documentation. Also use the metadata file to specify prefix, etc. Pass the module to easybuild. You will need to create a custom toolchain for this, marking EXTERNAL_MODULE in the toolchain definition for your existing Intel modules. You must also have icc/2017a, etc as a modules. For MKL you will have to build the wrappers for MKL for FFTW as well as just MKL.

There is also an icc-system option. gcc-system works well for GCC, unknown for icc. There is also a similar thing for Intel MPI. However system-mkl does not exist for MKL

More difficult to do this than to download the ICC installation files.

Further information from post-workshop experience:

It has turned out to be relatively simple to achieve this by using an intel/2017a module that includes icc/2017a, ifort/2017a, imkl/2017a and impi/2017a, each of which are simple modules that load the pre-existing Intel modules on the system. To get FFTW to work, the wrappers need to be compiled for this, and the libraries built moved to the MKL library location. Instructions are available from Intel on how to do this.

Custom toolchains

Q: How do I define a custom toolchain to link with stuff I’ve already built?

A: Use modules, and use EXTERNAL_MODULE, but is prone to failure.

Q: Should I define a custom toolchain?

A: Not recommended.

Further information from post-workshop experience:

EasyBuild provides toolchains for YYYYa and YYYYb for GCC, Intel, etc., which may not match what is installed, but a YYYY.XX can be defined to use a different version. This MOSTLY works, except there are issues with some dependencies not easily built, and can be time consuming, as --try-toolchain needs to be used to convert the nearest equivalent YYYYa or YYYYb EasyBuild models to YYYY.XX. Also the end-user software built in this way will not have gone through the EasyBuild test process. It is much faster to install the elements for the required toolchain.

Reloc Errors

Q: My builds end up failing with reloc errors. How do I work around this in easybuild?

A: Usually happens when binutils is not in the easybuild definition, or different binutils versions.

How to avoid so many modules

Q: Easybuild build so much – how do I avoid every build creating a version of GCC, etc?

A: Use fewer toolchain versions, and swap to new versions of toolchains. Also modules can be built as ‘hidden’ to avoid end-users seeing all the underlying dependencies.

Multiple ‘Flavours’ of modules

Q: The build seems to build several differently ‘flavours’ of a version of flex, such as a plain one, GCCCore-5.4.0, GCC-5.4.0-2.26, etc. Is this necessary?

A: This happens for some dependencies. -2.6.4 flex would be the system tools. GCCcore – needs binutils, which needs flex, so you need a system compiler version of flex. At that point flex-2.6.4 isn’t used, but EB doesn’t then build this. You could install these dependencies as hidden, --hide-deps, so they don’t show up via module avail, but they do show up with  module list. module spider shouldn’t show them either unless used as module --show-hidden spider ...

Failing finding multiple versions of a package

Q: Sometimes builds fail with issues of finding multiple versions of a package. What might I be doing wrong?

A: There a few to no instances where this should occur, as it is checked for the common toolchains in the EB testing procedure. Using uncommon toolchains may be problematic.  Can sometime be an issue with module loads.

LLVM (clang) and IBM XL

Q: Is there a way of supporting LLVM-based toolchains? Or IBM XL, etc? Or do I need to create toolchains for these?

A: Assuming clang – yes, but not flang. These toolchains are rarely used. IBM XL xlmvapich2, xlmpich2 – not known how well they work

Continuous Integration with Jenkins

Q: Has anyone integrated EasyBuild with Jenkins for continuous integration testing of applications and open-sourced it so I can cheat and use that off-the-shelf?

A: Yes, this is being done by Swiss National Supercomputing Centre

Integration with Python

Q: Is it possible to directly integrate EasyBuild with Python scripts to do wider tasks (regenerate web docs) without running easybuild on the command line from Python?

This is possible, but not very easy, e.g.from easybuild.tools.run import run_cmd. EasyBuild is not expected to be usable as libraries. There is a trick to make this work. Open an issue to get this fixed.

This will be improved in EasyBuild 3.8.0 upcoming EasyBuild v3.8.0, see https://github.com/easybuilders/easybuild-framework/pull/2638. If you want to use easybuild as a library, you can after calling set_up_configuration() first.

Slides

Q: Are their slides available, please?

A: Yes, will be on the website. (See links on this page).

Debugging

Q: The program is great when it works. However, when it fails debugging the problem can be a bit of a pain. Are there any good procedures?

A: Easybuild doesn’t really help with reporting error – have to look in the log files. Work to be done on this. AMBER can be difficult to build.

Contributing back

Q: How can we contribute easy build recipes back to the community – UK and wider?

A: See presentation.

Minimal toolchains

Q: What is the benefit of minimal toolchains?

Minimal toolchains (--minimal-toochains) are only relevant when you have hierarchies of modules.