Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Conda lock file #172

Open
pditommaso opened this issue Oct 20, 2022 · 3 comments · May be fixed by #642
Open

Add support for Conda lock file #172

pditommaso opened this issue Oct 20, 2022 · 3 comments · May be fixed by #642
Assignees

Comments

@pditommaso
Copy link
Contributor

pditommaso commented Oct 20, 2022

Summary

Wave allows the build of container images starting from Conda recipies.

It's going so by creating a Dockfile on-the-fly from the given Conda packages using Micromamba.

The problem of this solution is that the resulting containers are not replicable because the exact list of dependent packages resolved by Conda can change over time even if specific versions were provided. As result a container rebuilt with the same list of packages can have different package versions.

To solve this Conda lock file should be used. A lock file list the exact checksum each package that was resolved by Conda to create the required environment (and used by Wave to build the container).

A Conda lock file looks like the following

# This file may be used to create an environment using:
# $ conda create --name <env> --file <this file>
# platform: linux-64
@EXPLICIT
https://repo.anaconda.com/pkgs/main/linux-64/libstdcxx-ng-11.2.0-h1234567_1.conda#57623d10a70e09e1d048c2b2b6f4e2dd
https://repo.anaconda.com/pkgs/main/linux-64/_libgcc_mutex-0.1-main.conda#c3473ff8bdb3d124ed5ff11ec380d6f9
https://repo.anaconda.com/pkgs/main/linux-64/libgomp-11.2.0-h1234567_1.conda#b372c0eea9b60732fdae4b817a63c8cd
https://repo.anaconda.com/pkgs/main/linux-64/_openmp_mutex-5.1-1_gnu.conda#71d281e9c2192cb3fa425655a8defb85
https://repo.anaconda.com/pkgs/main/linux-64/libgcc-ng-11.2.0-h1234567_1.conda#a87728dabf3151fb9cfa990bd2eb0464
https://repo.anaconda.com/pkgs/main/linux-64/tbb-2020.3-hfd86e86_0.conda#7d06fdc8b4f3e389f26f67311c7ccf5f
https://conda.anaconda.org/conda-forge/linux-64/libzlib-1.2.11-h166bdaf_1014.tar.bz2#757138ba3ddc6777b82e91d9ff62e7b9
https://conda.anaconda.org/conda-forge/linux-64/icu-68.2-h9c3ff4c_0.tar.bz2#6618c9b191638993f2a818c6529e1b49
https://repo.anaconda.com/pkgs/main/linux-64/lz4-c-1.9.3-h295c915_1.conda#d9bd18f73ff566e08add10a54a3463cf
https://conda.anaconda.org/conda-forge/linux-64/libjemalloc-5.2.1-h9c3ff4c_6.tar.bz2#ea7e614f3a641bd6cec18de031228d9a
https://repo.anaconda.com/pkgs/main/linux-64/bzip2-1.0.8-h7b6447c_0.conda#9303f4af7c004e069bae22bde8d800ee
https://repo.anaconda.com/pkgs/main/linux-64/xz-5.2.6-h5eee18b_0.conda#8abc704d4a473839d5351b43deb793bb
https://conda.anaconda.org/conda-forge/linux-64/zlib-1.2.11-h166bdaf_1014.tar.bz2#def3b82d1a03aa695bb38ac1dd072ff2
https://repo.anaconda.com/pkgs/main/linux-64/zstd-1.5.0-ha4553b6_1.conda#a6e4de6b7a4a014d2f76888f11100568
https://conda.anaconda.org/conda-forge/linux-64/boost-cpp-1.74.0-h312852a_4.tar.bz2#22ee6de84c28eb7bd76802cf071c5d25
https://conda.anaconda.org/bioconda/linux-64/salmon-1.6.0-h84f40af_0.tar.bz2#dc109be1c7b0a0c2911d0f5ad14bf94f

Goal

The goal of this issue is to use the Conda lock file to guarantee the reproducibility of containers built by Wave starting from a Conda recipe.

The general flow looks like the following:

  • A request is sent to Wave to build the container with a Conda recipe
  • A checksum for the Conda recipes is computed
  • The checksum is used to lookup the corresponding lock file stored in the Wave DB
  • If the lock file exist is used to recreate the container using the lock file
  • If does not exists the current process is used to build the container, and the corresponding lock file is created and stored in the Wave DB

Implementation

The Wave plugin is currently turning the Conda requirement into a Dockerfile and passing into via the containerFile attribute.

We need to modify this behaviour in a such a way that, when a nextflow process specify one or more Conda packages via a conda directive, like this; the requirement is turned into a Conda recipe file, like the one showed here.

The recipe file is then passed via the condaFile attribute.

By doing that is enough to create a checksum for the given condaFile content, and apply the strategy described in the Goal section

@pditommaso
Copy link
Contributor Author

I've looked a bit more into the required commands to handle this. In a nutshell to create a lock file the following command should be executed after the conda install command:

micromamba env export --name base --explicit > env.lock

The env.lock file should be read and stored in the db.

To recreate the environment the following command should be used

micromamba install --yes --name base --file /tmp/env.lock

Which is essentially the same command used to install from a recipe file, the only difference is that it uses a lock file instead of env (yaml) file. See

https://github.com/nextflow-io/nextflow/blob/904c9409d25c64236a48962c95625666e6c57aca/plugins/nf-wave/src/main/io/seqera/wave/plugin/WaveClient.groovy#L377-L377

@pditommaso pditommaso self-assigned this Oct 31, 2022
@pditommaso
Copy link
Contributor Author

pditommaso commented Sep 9, 2024

Resuming this with a simplified approach. The build will be done using the classic Conda environment, Wave will only take care to store and report the corresponding lock file generated by micromamba using seqeralabs/libseqera#23.

The conda lock file is going generated using this command:

micromamba env export --explicit > environment.lock.yml

Then Wave will save it to a S3 bucket and render in the build API response and page view.

Conda lock files will stored and retried via a CondaLockService following BuildLogService as a template for the implementation

@pinin4fjords
Copy link
Member

I can't tell you how happy this makes me @pditommaso :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants