Better Education for Sysadmins

https://slides.app.goo.gl/JwW8d

Talks

Nicholas Thorne, TACC

  • Teach HPC software stack.
  • Small groups or individuals
  • Hardware – hard to find such a resource for teaching system admins.
  • Don’t change tool stacks too often
  • Planning for scale
  • Parallel file systems important, but hard to get training resources.
  • Training is viewed as good

Bryan Johnston, South Africa

  • Use older HPC systems. E.g. Ranger, Stampede from TACC, Cambridge, etc.
  • Puppet, Cobbler, etc. but now using in-house
  • Now OpenHPC [This might be a good option instead of TrinityX]

? Pawsey, Australia

  • Difficult to hire people, not enough timer to mentor.
  • Have to recruit internationally
  • ‘HPC Sys Admin Factory’. Flexible learning paths, internal courses, exporting system admins.

Gin Tan, Monash

  • Asked to become a system admin of HPC from a background of non-HPC
  • Team has groewn

Anton Limbo, Nambia

  • Was asked to do HPC because new Linux in 2016.
  • 1 week training, plus another week.
  • “HPC for Dummies”
  • Help from group in South Africa above.
  • Improve community support seen as important.

NSTDA, Thailand

  • 0.2 PFlops
  • Worked with partners, e.g. Luxembourg, Taiwan, and learn best practices
  • Internships (3 months)
  • Need to build a career path to attract people.

Types of Admin

  • HPC Storage Engineer – new path?
    • Maintaining, capacity planning, infrastructure, performance tuning.
    • More than just system admin.
  • Cluster engineer
    • scheduling.
  • Network engineer also.

Craig Morris, Edinburgh

  • Unique skills
  • Windows also possible?
  • FPGA, GPU, and other technologies
  • Build versus buy
  • In-house/outsourced
  • Technical vs Management (IT management like PRINCE, ITIL)

Perdue and SIGHPC

  • trying to determine scope required for training (cf. HPC-SIG’s concerns), subject matter, resources, gaps.
  • Longer term things like content review (again cf. HPC-SIG’s concerns)

Discussion

  • How to disseminate things?
  • Christine Kitchen’s comments. Tier 2 funding: not successful but there is an interest in funding. Need to be technology neutral but need vendor involvement to obtain sufficient resources. Issues with managerial versus technical.
  • Bringing industry into the mix seen as mission critical.
  • Student cluster challenge and links to skills? 
  • Checking HPC system meets specification – is that a system administration job – is the engineer not the administrator. [cf. approach at Loughborough – two part verification of specification – vendor and rerun by system admins]
  • How do national labs do this? Yes, worth looking at this.
  • Bruno Silva: not making it clear what a great job it is. Incredibly hard to recruit people. Need to show the world how interesting and valuable it is to create more people interested in. 
    • Noted: system admins get all the complaints, none of the glory.
  • We know what the skills are, but need to advertise the job better
  • Concept of HPC not really well known? 
  • Salary an issue – higher in industry.
  • Mentoring? This needs time which is often not given.
  • Making it easier to do the job? Share information – how easy? Avoid dealing with the same issues. Not wanting to share. [My comment would be that sharing takes time that may not be available – e.g. sharing an automation framework].
  • Where does HPC training fit in a curriculum?
  • Many online courses [I will add some links here]
  • Training in cloud – doesn’t sufficiently translate to the real world?
  • System admins don’t like to admit making mistakes. Like to solve own problems. Need to communicate problems and solutions more. Make links with other centres, bounce ideas of each other.
  • Teaching knowledge versus teaching how to troubleshoot.
  • Straw poll: 50% in audience from Comp Sci background
  • Breaking things important to finding out how to fix one.
  • Sharing [e.g. procurement docs, acceptance testing frameworks, automations – I need to do more on this in the next couple of months]
  • Make opportunities for HPC development clear at an undergraduate level.
    • Student clusters?
  • Show the HPC in the science.
  • Look at other organisations (will add links to PERC and CARC).
  • There are many support tools out there: use them.

Slack: https://bit.ly/37qkoCa