Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EPIC] DCAT-US Harvesting Pipeline MVP #4395

Closed
11 tasks
btylerburton opened this issue Jul 18, 2023 · 1 comment
Closed
11 tasks

[EPIC] DCAT-US Harvesting Pipeline MVP #4395

btylerburton opened this issue Jul 18, 2023 · 1 comment
Assignees
Labels
H2.0/Harvest-General General Harvesting 2.0 Issues

Comments

@btylerburton
Copy link
Contributor

btylerburton commented Jul 18, 2023

Feature/what we're after

Create an MVP for a DCAT-US Harvesting Pipeline

Anticipated/hypothesized benefits

  • Will allow team to define approach to pipeline development
  • Will allow us to "design on the fly" while still holding to best practices
  • Will allow us to test our assumptions

Measurements/metrics

  • Benchmark processing speed for each pipeline component
  • [... more metrics here ...]

Implmentation/tickets

For the MVP, what would be cast as epics will be organized into singular stories. New tickets created for the MVP will reference the parent Production ticket that will remain in "New Dev". Anything marked with [TODO] has open questions.

Infrastructure

  • [TODO] //do changes to the SSB need to be done for the MVP?
  • [TODO] //do we need to modify brokerpaks to get a working MVP?

Controller

  • [MVP] Initialize Flask app #4398
  • [TODO] Controller API routes //define the routes w/ team
  • [TODO] Controller blueprint and dashboard URL //do we need a dashboard for MVP?
  • Integrate DB client to track harvest jobs
    • Harvest source table:
      • contains configs for each harvest source
    • Harvest job table:
      • contains job state information for each job that's run through the pipeline

Extract

  • Integrate & Extend S3 client
  • Integrate & Extend DB client

Compare

  • Integrate & Extend S3 client
  • Integrate & Extend DB client
  • Integrate Catalog UI API call

Transform

  • Not necessary since it's a no-op DCAT -> DCAT

Load

Validate

Modules

Add additional module tickets here as they are created

@hkdctol hkdctol added the H2.0/Harvest-General General Harvesting 2.0 Issues label Jul 20, 2023
@btylerburton btylerburton self-assigned this Jul 20, 2023
@hkdctol hkdctol removed the Epic label Jul 20, 2023
@hkdctol hkdctol changed the title DCAT-US Harvesting Pipeline MVP [EPIC] DCAT-US Harvesting Pipeline MVP Jul 20, 2023
@btylerburton
Copy link
Contributor Author

This effort has been superceded by Airflow Spike #4422

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
H2.0/Harvest-General General Harvesting 2.0 Issues
Projects
Archived in project
Development

No branches or pull requests

2 participants