Interesting discussions, especially concerning CycleCloud and the Microsoft partner UberCloud, led by Wolfgang Gentsch, original Sun Grid Engine developer.
This was bought in by Microsoft and provides a dashboard that allows a researcher or IT department to do things such as provision a Sun Grid Engine cluster, or a Condor pool or specific applications (e.g. Star-CCM+) on a set of resources selected from drop-down lists.
This would seem to very much lower the barrier to entry for one-off and interactive jobs for users. For running multiple jobs that is probably a bit more complex as whist in theory a user could create a Sun Grid Engine cluster to do work there would be questions I would ask about whether it would then be within their skill set to interact with it, and indeed, ensure that it remains sufficiently busy to make efficient use of resources. For example, if 100 users ran 100 mini-clusters and each managed to run at 50% utilisation because their own jobs get in the way of each other it might lead to both a cluster being full (SGE instances using up the whole resource) and the other 100 users having to wait. So I think this would need to be considered carefully in terms of efficiency. Even if using public cloud rather than an internal cloud, it could lead in higher costs than strictly necessary. Now, you could argue that it’s up to a research group to manage its own budget, but it’s still useful for a university to help users get the best from it. But I can see that there are sone research groups at Loughborough not currently using the central HPC that could benefit from having DRMs managing their resources, and I can see a use case for an IT department provisioning a suitable cluster in the cloud or on other resources and allowing that to be efficiently utilised by batch jobs.
This is a company that leverages a number of technologies such as Docker and Kubernetes to allow an end-user to spin up containerised instances of their tools on the cloud in an intuitive way. The focus is on engineering tools.
This comes at a subscription cost but that also has to be balanced against productivity improvements. It is, though, notoriously difficult to determine productivity improvements made by researchers from the perspective of an IT department as it’s hard to capture sufficiently detailed information on a regular basis to allow trends to be tracked and attribute this to a particular action. I have discussed the difficulty of determining the R part of ROI many times before so I won’t repeat those here.
Containers are possible in terms of pre-packaged (e.g with Star-CCM+ already installed), integrated by UberCloud (give them the application, let them package it) or DIY for on-premise use. The latter could be useful for those developing code and then wishing others in a research group to be able to use it. In some ways it again reminds me of things like YouShare where the ‘services’ (a.k.a programs) were either containerised (OpenVZ) or as full-fat VMs in the sense that the metadata indicated which instances of a container or VM could be run on with the option of an existing container/VM being used or a new one. But this wasn’t the fully self-contained image option (although that could have been supported). But UberCloud does the deployment in a more supportable way. It would be interesting to see how something like this could be tied into provenance engines (see mentions in the Democritisation talk from yesterday) and workflow engines.
At the back end technologies like Kubernetes are likely to be employed to provide the required scalability, a technology I’ve used a bit (cf. Docker Swarm) and we have researchers working on too for scalable services that are commercalisations of research work and will run on cloud.