https://slides.app.goo.gl/JwW8d
Talks
Nicholas Thorne, TACC
- Teach HPC software stack.
- Small groups or individuals
- Hardware – hard to find such a resource for teaching system admins.
- Don’t change tool stacks too often
- Planning for scale
- Parallel file systems important, but hard to get training resources.
- Training is viewed as good
Bryan Johnston, South Africa
- Use older HPC systems. E.g. Ranger, Stampede from TACC, Cambridge, etc.
- Puppet, Cobbler, etc. but now using in-house
- Now OpenHPC [This might be a good option instead of TrinityX]
? Pawsey, Australia
- Difficult to hire people, not enough timer to mentor.
- Have to recruit internationally
- ‘HPC Sys Admin Factory’. Flexible learning paths, internal courses, exporting system admins.
Gin Tan, Monash
- Asked to become a system admin of HPC from a background of non-HPC
- Team has groewn
Anton Limbo, Nambia
- Was asked to do HPC because new Linux in 2016.
- 1 week training, plus another week.
- “HPC for Dummies”
- Help from group in South Africa above.
- Improve community support seen as important.
NSTDA, Thailand
- 0.2 PFlops
- Worked with partners, e.g. Luxembourg, Taiwan, and learn best practices
- Internships (3 months)
- Need to build a career path to attract people.
Types of Admin
- HPC Storage Engineer – new path?
- Maintaining, capacity planning, infrastructure, performance tuning.
- More than just system admin.
- Cluster engineer
- scheduling.
- Network engineer also.
Craig Morris, Edinburgh
- Unique skills
- Windows also possible?
- FPGA, GPU, and other technologies
- Build versus buy
- In-house/outsourced
- Technical vs Management (IT management like PRINCE, ITIL)
Perdue and SIGHPC
- trying to determine scope required for training (cf. HPC-SIG’s concerns), subject matter, resources, gaps.
- Longer term things like content review (again cf. HPC-SIG’s concerns)
Discussion
- How to disseminate things?
- Christine Kitchen’s comments. Tier 2 funding: not successful but there is an interest in funding. Need to be technology neutral but need vendor involvement to obtain sufficient resources. Issues with managerial versus technical.
- Bringing industry into the mix seen as mission critical.
- Student cluster challenge and links to skills?
- Checking HPC system meets specification – is that a system administration job – is the engineer not the administrator. [cf. approach at Loughborough – two part verification of specification – vendor and rerun by system admins]
- How do national labs do this? Yes, worth looking at this.
- Bruno Silva: not making it clear what a great job it is. Incredibly hard to recruit people. Need to show the world how interesting and valuable it is to create more people interested in.
- Noted: system admins get all the complaints, none of the glory.
- We know what the skills are, but need to advertise the job better
- Concept of HPC not really well known?
- Salary an issue – higher in industry.
- Mentoring? This needs time which is often not given.
- Making it easier to do the job? Share information – how easy? Avoid dealing with the same issues. Not wanting to share. [My comment would be that sharing takes time that may not be available – e.g. sharing an automation framework].
- Where does HPC training fit in a curriculum?
- Many online courses [I will add some links here]
- Training in cloud – doesn’t sufficiently translate to the real world?
- System admins don’t like to admit making mistakes. Like to solve own problems. Need to communicate problems and solutions more. Make links with other centres, bounce ideas of each other.
- Teaching knowledge versus teaching how to troubleshoot.
- Straw poll: 50% in audience from Comp Sci background
- Breaking things important to finding out how to fix one.
- Sharing [e.g. procurement docs, acceptance testing frameworks, automations – I need to do more on this in the next couple of months]
- Make opportunities for HPC development clear at an undergraduate level.
- Student clusters?
- Show the HPC in the science.
- Look at other organisations (will add links to PERC and CARC).
- There are many support tools out there: use them.
Slack: https://bit.ly/37qkoCa