Survey
https://tinyurl.com/system-test-bof
Survey of responses. Will find results later to copy in.
Lightning Talks
KAUST
- ~200k cores
- No special priveleges for tests [different to Lboro acceptance tests]
- Component tests (scheduler, etc)
- Synthetic tests (performance)
- VASP, WRF, etc used as well as SPEC tests.
- Minimised software tickets to basically zero in 24 hours following release to users. More reproducible for users.
NCSA
- Testing of tests – debugging
- Tracking of progress on acceptance tests
- Initially manual, special job queue
- Use of Jenkins (considered doing this for regression tests too – common approach)
DOE
- Testing requirements may change over time – e.g. more stable as time goes on.
- Statement of Works
- Functionality tests
- Reliability tests
- System
- Performance. Change over time. Run continually. Published to the web. This should be good practice.
- Availabiity
- Need to be flexible about new tech – vendors may not understand it yet
- Boot/reboot testing important
- Use ReFrame
- Good configuration management and test systems for changes
- Tests sourced from users.
- Kabana and Grafana
Indiana
- 4 systems of approx 500-1000 nodes each
OLCF
- Hardware acceptance, functionality, performance, stability (2 weeks)
- Application:isolated test on each and meet contracted metrics (LAMMPS, NWCHEM, NAMD, etc)
- Python test harness – being or is open source
CSCS
- Avoid tying framework too closely to the system. Use high level.
- ReFrame: Python
- Easier to read
Future Work
- There is an interest in continuing to work together
- Responses will be on the web to the survey
- Definitely something to look at within the UK community, e.g. common contract terms and their expression as tests using ReFrame, as an example.