Skip to content

NCIEVS/ops-cdisc-reports

Repository files navigation

ops-cdisc-reports

This project has a set of modules that create reports and artifacts from the NCI Thesaurus OWL file. It contains code that was moved from https://github.com/NCIEVS/ops-data-conversion and https://github.com/NCIEVS/nci-diff-CDISC. This project aims at consolidating those diseprate pieces and have a unified way to build, deploy and run the jobs to create the reports.

Note that in this readme file, the term report is loosely defined and can be used to refer to TXT or XML files that are more like artifacts than reports. However in the interest of brevity we will refer to all files produced by these modules as reports.

Architecture

This projects makes use of Java 9 modules to separate and isolate the code producing individual reports. There is a common module that is used by all modules for shared code. All reports can be produced through command line. Each module that produces a report also has a Dockerfile. The main utility of the Dockerfile is easily deploy the code as an AWS Lambda function.

###AWS The primary mode of running these reports is as a Step Function in AWS. The following diagram shows the cloud infrastructure used to run these reports.

Architecture

####File Transfer The first step in the process is to transfer the Thesaurus file to Amazon Elastic File System (EFS). There is no direct way through the console to upload this file. If you have the file locally, it is possible to setup a DirectConnect and mount EFS as a network file system. At the moment, the way to upload the file to EFS is to first upload to S3. Then use AWS Data Sync task to then transfer to EFS ####Step Function The following diagram shows the step function execution graph. It is basically a bunch of AWS functions that are executed in a specific order. There is an option to skip executing a particular report based on the input to the step function. When all the reports are run, the reports are transferred to a Google Drive. Step Function Definition #####Sample input JSON

{
  "thesaurusOwlFile": "/mnt/cdisc/Thesaurus-220613-22.06b.owl",
  "publicationDate": "06-14-2022",
  "conceptCodes": [
    "C81222",
    "C77527",
    "C67497",
    "C132298",
    "C120166",
    "C66830",
    "C77526",
    "C165634"
  ],
  "deliveryEmailAddresses":[<EMAIL_ADDRESSES>],
  "pairingReportRequest":{
  "thesaurusOwlFile": "/mnt/cdisc/Thesaurus-220613-22.06b.owl",
  "publicationDate": "2022-06-14",
  "conceptCodes": [
    "C81222",
    "C66830",
    "C77526"
  ]
  }
}

Description of fields in the input JSON

  • thesaurusOwlFile: Absolute path to the Thesaurus file. Note this file is actually in EFS and is mounted to all the Lambda functions
  • publicationDate: The date when the reports are published. The reports are run quarterly and published on a pre-determined date.
  • conceptCodes: These are codes for which the reports are created. The Thesaurus file has lots of concepts in it. The reports are created for only a subset of those.
  • deliveryEmailAddresses: The email addresses with whom the reports will be given access to in Google Drive.
  • pairingReportRequest: The pairing reports use only a subset of conceptCodes. So it has a separate request object.
  • Note: The default is to run all reports. To skip a report, specify skip<REPORT_NAME> in the root of the request. For example skipTextExcelReport

As mentioned before individual modules can be run as executable jar so these reports can be run standalone as well outside of AWS

##Code Written in Java 11. We use lombok in this project. So any POJO that look strange and if you see compile errors. Install lombok in your IDE

Build

Gradle

Gradle is used to build the project. Either individual modules can be build with

gradle changes-report:clean changes-report:build changes-report:assemble

Or the entire project with

gradle clean build assemble

To build zip for Lambda use

gradle clean buildZip

Some of the tests take a long time as they load the entire Thesaurus file into memory. So it is might be useful to skip the tests at times with the following -x test option

gradle clean build assemble -x test

At the moment we are not publishing any artifacts to Nexus. But we can if we need to in the future. An example exists in the text-excel-reports project. The publish task would be used to publish to Nexus.

Terraform

We use Terraform to manage our AWS infrastructure. Terraform is run by the following command

terraform apply -var-file dev.tfvars

This should setup with everything that is needed in AWS to run these reports. Once the reports are run, use the following command to teardown the resources.

terraform apply -destroy -var-file dev.tfvars

Note, that the teardown is important to destroy EFS which is billed for storage. There are a couple of cloud resources like the S3 bucket that holds the Thesaurus file and the secrets to connect to Google Drive that are setup outside of terraform

##Tests There are unit tests for each of the reports. These are run through gradle or in your IDE

About

Rework of CDISC report processing

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published