OzStar users do not have permissions to create conda environments from another user's installation

Setup

Currently I have Mambaforge installed in /fred/oz016/envs/mambaforge on OzStar run by my user (dtang).

I have created a few conda environments using this installation from conda environment.yaml files located in /fred/oz016/envs/conda/igwn/.

The (default) output location for conda virtual environments are in the /envs folder of the base installion - therefore you will find them in /fred/oz016/envs/mambaforge/envs/.

Issue

@qian.hu and I wanted to set up an environment for Qian to edit his SealGW package - an installed Python dependency of our pipeline environment for !130 - and then run the pipeline. He needed to create his own virtual environment as he would be editing his dependency in an unstable manner, which may break code for other users who may depend on igwn-py38-testing-spiir (which is currently being used as a shared conda development environment for the gstreamer_python_upgrade).

However, while I can create new virtual environments, update them, and activate them - other users appear to not be able to create them. I have not tested if they can update pre-existing conda virtual environments created by someone else.

@qian.hu and I had encountered this issue when trying to create a new virtual environment for his user.

The commands we ran were

# to initialise the conda/mamba tool in the mambaforge installed by @daniel.tang
/fred/oz016/envs/mambaforge/condabin/conda init
/fred/oz016/envs/mambaforge/condabin/mamba init

# create env from file
mamba env create --file /fred/oz016/envs/conda/igwn/igwn-py38-testing-spiir.yaml --name spiir-qian

However, after running this code, mamba attempted to build the environment using the cache of locally stored metadata/packages, but we received a permission error on these files which prevented another user (Qian) from using mambaforge. We ran the same thing with conda - it errored in the same way.

Potential Solutions

We have some ideas to allow for users to create their conda virtual environments.

Download without cache

It is possible the permission errors only occur when editing cache files. It may be possible that if a user can download package data separately rather than loading them from cache that these permission errors won't occur. However, a quick Google search did not yield any results for an option that allows us to ignore cache (but that doesn't mean it doesn't exist).

Edit Mambaforge permissions for other users

It is possible that we can just edit the permissions of the mambaforge installation for all users of our group. However, we cannot use sudo on OzStar, at least not without requesting permission from the system administrators. It may also be possible to change the permissions for these files without requiring sudo.

New Mambaforge installation with proper permissions

Maybe we, or the system administrators, can install a version of conda for us with appropriate group permissions.

If this is the route taken, please endeavour to make sure mamba is also installed. The igwn conda environments are extremely large and can often take a considerable amount of time to solve and build with conda. Mamba increased performance here considerably.

Complexities

While this would be very good for users, the OzStar supercomputing cluster has limited disk space and inodes for us as a research group.

We've observed that our conda installations use a considerable amount of inode space on the cluster as well. Additionally, our analytics workloads can become quite data heavy (i.e. if we continue to run the pipeline for 2 week+ durations with 100 nodes, which should increase as we require larger simulation data sets and analyses). If we combine this with a many more conda virtual environments (i.e. one for each user instead of shared environments), we will find ourselves at capacity very quickly.

However, we still need different development environments (and specifically conda with our current setup) if users want to develop external packages (e.g. spiir-group/spiir or spiir-group/SealGW) and do some kind of integration/end-to-end test with our gstlal-spiir pipeline.