monarch-initiative · glass-ships · Oct 10, 2023 · Oct 10, 2023
diff --git a/docs/index.md b/docs/index.md
@@ -15,29 +15,35 @@ pip install kghub-downloader
 
 ### Configure 
 
-#### Download Configuration
-
-The downloader requires a YAML file which contains a list of target URLs to download, and local names to save those downloads.
-The format for the file is:
-```yaml
----
-- 
-  url: "http://example.com/myawesomefile.tsv"
-  local_name: myawesomefile.tsv
--
-  url: "http://example.com/myokfile.json"
-  local_name: myokfile.json
-
-```
+The downloader requires a YAML file which contains a list of target URLs to download, and local names to save those downloads.  
+For an example, see [example/download.yaml](example/download.yaml)
+
+Available options are:
+- \***url**: The URL to download from. Currently supported:  
+  - `http(s)`
+  - Google Cloud Storage (`gs://`)
+  - Google Drive (`gdrive://` or https://drive.google.com/...). The file must be publicly accessible.
+- **local_name**: The name to save the file as locally
+- **tag**: A tag to use to filter downloads
+- **api**: The API to use to download the file. Currently supported: `elasticsearch`
+- elastic search options  
+  - **query_file**: The file containing the query to run against the index
+  - **index**: The elastic search index for query
+
+> \* Note:  
+>  Google Cloud Storage URLs require that you have set up your credentials as described [here](https://cloud.google.com/artifact-registry/docs/python/authentication#keyring-user). You must:  
+> - [create a service account](https://cloud.google.com/iam/docs/service-accounts-create)  
+> - [add the service account to the relevant bucket](https://cloud.google.com/storage/docs/access-control/using-iam-permissions#bucket-iam) and  
+> - [download a JSON key](https://cloud.google.com/iam/docs/keys-create-delete) for that service account.  
+>  Then, set the `GOOGLE_APPLICATION_CREDENTIALS` environment variable to point to that file.
 
 You can also include any secrets like API keys you have set as environment variables using `{VARIABLE_NAME}`, for example:  
 ```yaml
 ---
--
-  url: "https://example.com/myfancyfile.json?key={YOUR_SECRET}"
+- url: "https://example.com/myfancyfile.json?key={YOUR_SECRET}"
   localname: myfancyfile.json
 ```
-Note: You _MUST_ have this secret set as an environment variable, and be sure to include the {curly braces}
+Note: `YOUR_SECRET` *MUST* as an environment variable, and be sure to include the {curly braces} in the url string.
 
 ### Usage
 
@@ -80,3 +86,21 @@ $ downloader --output_dir example_output --mirror gs://your-bucket/desired/direc
 # the argument can be omitted from the CLI call.
 $ downloader --output_dir example_output
 ```
+
+### Development
+
+#### Install
+
+```bash
+git clone https://github.com/monarch-initiative/kghub-downloader.git
+cd kghub-downloader
+poetry install
+```
+
+#### Run tests
+
+```bash
+poetry run pytest
+```
+
+NOTE: The tests require gcloud credentials to be set up as described above, using the monarch github actions service account.
diff --git a/example/download.yaml b/example/download.yaml
@@ -1,23 +1,21 @@
 ---
--
-  url: https://zfin.org/downloads/phenoGeneCleanData_fish.txt
+- url: https://zfin.org/downloads/phenoGeneCleanData_fish.txt
   local_name: zfin/fish_phenotype.txt
--
-  url: gs://monarch-test/kghub_downloader_test_file.yaml
+
+- url: gs://monarch-test/kghub_downloader_test_file.yaml
   local_name: test_file.yaml
+
+- url: gdrive:10ojJffrPSl12OMcu4gyx0fak2CNu6qOs
+  local_name: gdrive_test_1.txt
   tag: testing
-# -
+
+- url: https://drive.google.com/uc?id=10ojJffrPSl12OMcu4gyx0fak2CNu6qOs
+  local_name: gdrive_test_2.txt
+
+# - url: https://www.ebi.ac.uk/chembl/elk/es/
 #   api: elasticsearch
-#   url: https://www.ebi.ac.uk/chembl/elk/es/
 #   query_file: example/query.json
 #   local_name: molecule.json
 #   index: chembl_28_molecule
 #   tag: ebi
-
--
-   url: gdrive:10ojJffrPSl12OMcu4gyx0fak2CNu6qOs
-   local_name: gdrive_test_1.txt
-
--
-  url: https://drive.google.com/uc?id=10ojJffrPSl12OMcu4gyx0fak2CNu6qOs
-  local_name: gdrive_test_2.txt 
+
diff --git a/test/integration/test_download.py b/test/integration/test_download.py
@@ -24,8 +24,11 @@ def test_download():
 
 
 def test_tag():
-    files = ["test/output/zfin/fish_phenotype.txt", "test/output/test_file.yaml"]
-    tagged_files = ["test/output/test_file.yaml"]
+    files = [
+        "test/output/zfin/fish_phenotype.txt",
+        "test/output/test_file.yaml"
+    ]
+    tagged_files = ["test/output/gdrive_test_1.txt"]
 
     for file in files:
         if exists(file):