Merge pull request #33 from monarch-initiative/develop

update deps and readme
monarch-initiative · Apr 24, 2024 · 390c894 · 390c894
2 parents 675b879 + 712e360
commit 390c894
Show file tree

Hide file tree

Showing 3 changed files with 536 additions and 255 deletions.
diff --git a/docs/index.md b/docs/index.md
@@ -1,6 +1,6 @@
 # KG-Hub Downloader
 
-| [Documentation](https://monarch-initiative.github.io/kghub-downloader) |
+| [Documentation](https://monarch-initiative.github.io/kghub-downloader) | [Repository](https://github.com/monarch-initiative/kghub-downloader) | [PyPI](https://pypi.org/project/kghub-downloader) |
 
 ### Overview
 
@@ -9,17 +9,19 @@ This is a configuration based file caching downloader with initial support for h
 ### Installation
 
 KGHub Downloader is available to install via pip:
+
 ```
 pip install kghub-downloader
 ```
 
-### Configure 
+### Configure
 
 The downloader requires a YAML file which contains a list of target URLs to download, and local names to save those downloads.  
 For an example, see [example/download.yaml](example/download.yaml)
 
 Available options are:
-- \***url**: The URL to download from. Currently supported:  
+
+- \***url**: The URL to download from. Currently supported:
   - `http(s)`
   - `ftp`
     - with `glob:` option to download files with specific extensions (only with ftp as of now and looks recursively).
@@ -29,29 +31,33 @@ Available options are:
 - **local_name**: The name to save the file as locally
 - **tag**: A tag to use to filter downloads
 - **api**: The API to use to download the file. Currently supported: `elasticsearch`
-- elastic search options  
+- elastic search options
   - **query_file**: The file containing the query to run against the index
   - **index**: The elastic search index for query
 
 > \* Note:  
->  Google Cloud Storage URLs require that you have set up your credentials as described [here](https://cloud.google.com/artifact-registry/docs/python/authentication#keyring-user). You must:  
-> - [create a service account](https://cloud.google.com/iam/docs/service-accounts-create)  
-> - [add the service account to the relevant bucket](https://cloud.google.com/storage/docs/access-control/using-iam-permissions#bucket-iam) and  
+>  Google Cloud Storage URLs require that you have set up your credentials as described [here](https://cloud.google.com/artifact-registry/docs/python/authentication#keyring-user). You must:
+>
+> - [create a service account](https://cloud.google.com/iam/docs/service-accounts-create)
+> - [add the service account to the relevant bucket](https://cloud.google.com/storage/docs/access-control/using-iam-permissions#bucket-iam) and
 > - [download a JSON key](https://cloud.google.com/iam/docs/keys-create-delete) for that service account.  
->  Then, set the `GOOGLE_APPLICATION_CREDENTIALS` environment variable to point to that file.
+>   Then, set the `GOOGLE_APPLICATION_CREDENTIALS` environment variable to point to that file.
 >
 > Mirorring local files to Amazon AWS S3 bucket requires the following:
->  - [Create an AWS account](https://portal.aws.amazon.com/)
->  - [Create an IAM user in AWS](https://docs.aws.amazon.com/IAM/latest/UserGuide/getting-started.html): This enables getting the `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` needed for authentication. These two should be stored as environment variables in the user's system.
->  - [Create an S3 bucket](https://docs.aws.amazon.com/AmazonS3/latest/userguide/creating-bucket.html): This will be the destination for pushing local files.
+>
+> - [Create an AWS account](https://portal.aws.amazon.com/)
+> - [Create an IAM user in AWS](https://docs.aws.amazon.com/IAM/latest/UserGuide/getting-started.html): This enables getting the `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` needed for authentication. These two should be stored as environment variables in the user's system.
+> - [Create an S3 bucket](https://docs.aws.amazon.com/AmazonS3/latest/userguide/creating-bucket.html): This will be the destination for pushing local files.
+
+You can also include any secrets like API keys you have set as environment variables using `{VARIABLE_NAME}`, for example:
 
-You can also include any secrets like API keys you have set as environment variables using `{VARIABLE_NAME}`, for example:  
 ```yaml
 ---
 - url: "https://example.com/myfancyfile.json?key={YOUR_SECRET}"
   localname: myfancyfile.json
 ```
-Note: `YOUR_SECRET` *MUST* as an environment variable, and be sure to include the {curly braces} in the url string.
+
+Note: `YOUR_SECRET` _MUST_ as an environment variable, and be sure to include the {curly braces} in the url string.
 
 ### Usage
 
@@ -67,30 +73,31 @@ download_from_yaml(yaml_file="download.yaml", output_dir="data")
 
 #### Command Line
 
+To download files listed in a download.yaml file:
+
 ```bash
 $ downloader [OPTIONS] ARGS
 ```
-╰ Download files listed in a download.yaml file
 
-| OPTIONS | | 
-| --- | --- |
-| yaml_file | A string pointing to the download.yaml file, to be parsed for things to download.<br>Defaults to `./download.yaml` |
-| ignore_cache | Ignore cache and download files even if they exist [false] |
-| snippet_only | Downloads only the first 5 kB of each uncompressed source, for testing and file checks |
-| tags | Limit to only downloads with this tag |
-| mirror | Remote storage URL to mirror download to. Supported buckets: Google Cloud Storage |
+| OPTIONS      |                                                                                                       |
+| ------------ | ----------------------------------------------------------------------------------------------------- |
+| yaml_file    | Path to the download.yaml file, to be parsed for things to download.<br>Defaults to `./download.yaml` |
+| ignore_cache | Ignore cache and download files even if they exist (Default `False`)                                  |
+| snippet_only | Downloads only the first 5 kB of each uncompressed source, for testing and file checks                |
+| tags         | Limit to only downloads with this tag                                                                 |
+| mirror       | Remote storage URL to upload downloaded files to.<br/>Supported buckets: Google Cloud Storage         |
 
-
-| ARGUMENTS | | 
-| --- | --- |
-| output_dir | A string pointing to where to write out downloaded files. |
+| ARGUMENTS  |                                 |
+| ---------- | ------------------------------- |
+| output_dir | Where to save downloaded files. |
 
 Examples:
+
 ```bash
 $ downloader --output_dir example_output --tags zfin_gene_to_phenotype example.yaml
 $ downloader --output_dir example_output --mirror gs://your-bucket/desired/directory
 
-# Note that if your YAML file is named `download.yaml`, 
+# Note that if your YAML file is named `download.yaml`,
 # the argument can be omitted from the CLI call.
 $ downloader --output_dir example_output
 ```
@@ -111,4 +118,4 @@ poetry install
 poetry run pytest
 ```
 
-NOTE: The tests require gcloud credentials to be set up as described above, using the monarch github actions service account.
+NOTE: The tests require gcloud credentials to be set up as described above, using the Monarch github actions service account.