Skip to content

Commit

Permalink
more tuto
Browse files Browse the repository at this point in the history
  • Loading branch information
cgoliver committed Sep 20, 2024
1 parent f8d37b1 commit 5b0bc95
Show file tree
Hide file tree
Showing 3 changed files with 57 additions and 24 deletions.
50 changes: 26 additions & 24 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,12 +20,12 @@

# -- Project information -----------------------------------------------------

project = 'rnaglib'
copyright = '2021, Vincent Mallet et al.'
author = 'Vincent Mallet, Carlos Oliver, Jonathan Broadbent, William L. Hamilton, Jerome Waldispuhl'
project = "rnaglib"
copyright = "2021, Vincent Mallet et al."
author = "Vincent Mallet, Carlos Oliver, Jonathan Broadbent, William L. Hamilton, Jerome Waldispuhl"

# The full version, including alpha/beta/rc tags
release = '0.0.1'
release = "0.0.1"


# -- General configuration ---------------------------------------------------
Expand All @@ -34,40 +34,44 @@
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = [
'sphinx.ext.autodoc',
'sphinx.ext.autosummary',
'sphinx.ext.intersphinx',
'sphinx.ext.mathjax',
'sphinx.ext.napoleon',
'sphinx.ext.viewcode',
'sphinx_autodoc_typehints',
'myst_parser'
"sphinx.ext.autodoc",
"sphinx.ext.autosummary",
"sphinx.ext.intersphinx",
"sphinx.ext.mathjax",
"sphinx.ext.napoleon",
"sphinx.ext.viewcode",
"sphinx_autodoc_typehints",
"myst_parser",
]

html_favicon = "images/favicon.png"

myst_enable_extensions = [
"substitution",
]
extensions += ['sphinx-prompt', 'sphinx_substitution_extensions']
"substitution",
]
extensions += ["sphinx-prompt", "sphinx_substitution_extensions"]
# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']
templates_path = ["_templates"]

# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
# This pattern also affects html_static_path and html_extra_path.
exclude_patterns = []


autoclass_content = 'both'
autodoc_member_order = 'bysource'
autoclass_content = "both"
autodoc_member_order = "bysource"

# -- Options for HTML output -------------------------------------------------

# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
#
html_theme = 'nature'
html_logo = 'https://jwgitlab.cs.mcgill.ca/cgoliver/rnaglib/-/raw/zenodo/images/rgl.png'
html_sidebars = { '**': ['globaltoc.html', 'relations.html', 'sourcelink.html', 'searchbox.html'] }
html_theme = "nature"
html_logo = "https://jwgitlab.cs.mcgill.ca/cgoliver/rnaglib/-/raw/zenodo/images/rgl.png"
html_sidebars = {
"**": ["globaltoc.html", "relations.html", "sourcelink.html", "searchbox.html"]
}


# Add any paths that contain custom static files (such as style sheets) here,
Expand All @@ -76,6 +80,4 @@
# html_static_path = ['_static']


source_suffix = { '.rst': 'restructuredtext',
'.txt': 'markdown',
'.md': 'markdown'}
source_suffix = {".rst": "restructuredtext", ".txt": "markdown", ".md": "markdown"}
31 changes: 31 additions & 0 deletions docs/source/tuto_custom_task.rst
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,7 @@ An instance of the ``Task`` class packages the following attributes:
- ``splitter``: method for partitioning the dataset into train, validation, and test subsets.
- ``target_vars``: method for setting and encoding input and target variables.
- ``evaluate``: method which accepts a model and returns performance metrics.
- ``metadata``: this is a simple (optional) dictionary that holds useful info about the task (e.g. task type, number of classes, etc.)

Once the task processing is complete, all task data is dumped into ``root`` which is a path passed to the task init method.

Expand Down Expand Up @@ -81,6 +82,9 @@ Here is a minimal template for a custom task::
# ...
pass

def init_metadata(self):
return {'task_name': 'my task'}


In this tutorial we will walk through the steps to create a task with the aim of predicting for each residue, whether or not it will be chemically modified, and a more advanced example we will build the task of predicting the Rfam classification of an RNA.

Expand Down Expand Up @@ -246,6 +250,33 @@ Here is the ful task implementation::
return RNAalignSplitter(similarity_threshold=0.6)


Metadata
~~~~~~~~~~~~~~~

Each task holds a ``metadata`` attribute which is a simple dictionary holding useful information about the task (e.g. number of classes, task type, name, description). You can modify this during task setup and it is saved to disk once the task is built.

Task saving and loading
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Once the task is completely built (dataset and splits), the task class automatically calls its ``write()`` method which dumps to the ``root`` directory all the information necessary to skip processing if the task is re-loaded.

Your ``root`` directory will look something like::

my_root/
train_idx.txt
val_idx.txt
test_idx.txt
task_id.txt
metadata.json
dataset/
1abc.json
2xzy.json
...

The task folder contains 3 ``.txt`` files with the indices for each split. The ``metadata.json`` file stores any additional information relevant to the task, the ``task_id.txt`` file holds a unique identifier for the task which is built by hashing all the RNAs and splits so that if anything about the task changes the ID will be different, and bfinally the ``dataset/`` folder holds ``.json`` files which can be loaded into RNA dicts and used to re-instantiate the task.



Customize Splitting
------------------------

Expand Down
Binary file added images/icon.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 5b0bc95

Please sign in to comment.