Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upload model to HuggingFace #10

Closed
jatkinson1000 opened this issue Feb 16, 2023 · 6 comments
Closed

Upload model to HuggingFace #10

jatkinson1000 opened this issue Feb 16, 2023 · 6 comments
Assignees

Comments

@jatkinson1000
Copy link
Member

We eventually want model to be stored on HuggingFace for distribution.

Suggest this is done once github repo is in a reasonable state.
What about datasets?
Can we upload a subset of the pangeo cmip dataset to Huggingface?

@jatkinson1000 jatkinson1000 added the documentation Improvements or additions to documentation label Feb 16, 2023
@jatkinson1000
Copy link
Member Author

Huggingface ideally has code separated into a model, and a dataset.
This suggests we may need to separate this repo into the data access and processing part, and the model.
However I'd maybe leave this until the repo has been cleaned and is installable and runnable, at which point we should then start looking at adding to huggingface.

  • Most simple would be a model that has documented downloadable code that can be run ininference mode.
  • For dataset huggingface suggests you create an API and provide code for accessing the data.
    • This will require some discussion as to how the data is accessed from Pangeo, as pay-on-request currently presents a barrier.

@dorchard
Copy link
Collaborator

It seems the best thing for us to do would be to split this up and use submodules and then GitHub actions to sync with HuggingFace (see https://huggingface.co/docs/hub/spaces-github-actions)

@mondus
Copy link
Collaborator

mondus commented May 9, 2023

This should be left until our second milestone and we should first prioritise getting a release tag out.

@dorchard dorchard removed the documentation Improvements or additions to documentation label May 16, 2023
@raehik
Copy link
Collaborator

raehik commented Aug 14, 2023

Some updates.

Datasets:

  • Small low-resolution forcing dataset uploaded to Hugging Face @ datasets/M2LInES/gfdl-cmip26-gz21-ocean-forcing . (Not the CM2.6 dataset itself; it should be possible to take a "chunk" to upload on Hugging Face, but it's large & would need some expertise.)
  • Forcing data approximating the configuration described in Arthur's paper is generated -- uploading will take some time due to size and getting to grips with HPC use.

Model:

  • Low-resolution trained model uploaded to Hugging Face @ M2LInES/gz21-ocean-momentum.
  • Training model with hyperparameters approximating the paper's configuration -- pending HPC (CSD3) time slice.
  • No code for loading in inference mode yet, other than that in the testing step (which isn't configurable).

@MarionBWeinzierl
Copy link
Collaborator

Added initial information about the data to the readme in #96

@raehik
Copy link
Collaborator

raehik commented Dec 6, 2023

Re my Aug 14 comment: this is complete. The forcing data & model hosted on HuggingFace are low-resolution but usable. Uploading of higher-resolution artifacts is being tracked at #110 .

@raehik raehik closed this as completed Dec 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants