feat: add hackathon notebooks (#1)

* Change resource suffix for releases * Add hackathon instructions * Fix links in README.md * Minor changes * fix: code in health check notebook that doesn't run correctly --------- Co-authored-by: Lily <[email protected]> Co-authored-by: efbbrown-dt <[email protected]>
teamdatatonic · Jan 16, 2024 · ec587bb · ec587bb
1 parent c884b65
commit ec587bb
Show file tree

Hide file tree

Showing 16 changed files with 2,438 additions and 167 deletions.
diff --git a/README.md b/README.md
@@ -1,171 +1,37 @@
-<!-- 
-Copyright 2023 Google LLC
+# MLOps Hackathon
 
-Licensed under the Apache License, Version 2.0 (the "License");
-you may not use this file except in compliance with the License.
-You may obtain a copy of the License at
+*Learn about MLOps by deploying your own ML pipelines in Google Cloud. 
+You'll solve a number of exercises and challenges to run pipelines in Vertex AI, continuously monitor your models, and promote your artifacts to a production environment.*
 
-    https://www.apache.org/licenses/LICENSE-2.0
+## Getting started 
 
-Unless required by applicable law or agreed to in writing, software
-distributed under the License is distributed on an "AS IS" BASIS,
-WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-See the License for the specific language governing permissions and
-limitations under the License.
- -->
+As a hackathon attendee, simply follow this notebook series in your Vertex AI Workbench instance:
 
-# Vertex Pipelines End-to-End Samples
+1. **[Health check](./hackathon/01_health_check.ipynb) - start here**
+1. [Run pipelines](./hackathon/02_run_pipelines.ipynb)
+1. [Promote model](./hackathon/03_promote_model.ipynb)
+1. [Challenge: Model monitoring](./hackathon/04_monitoring_challenge.ipynb)
+1. [Challenge: Real-time predictions](./hackathon/05_realtime_challenge.ipynb)
 
-_AKA "Vertex AI Turbo Templates"_
+**❗Note:** This workshop has been designed to be run in Vertex AI Workbench. 
+Support for running the workshop locally is provided, but we recommend Vertex AI Workbench for the best experience.
+
+## For instructors
 
 ![Shell](https://github.com/teamdatatonic/vertex-pipelines-end-to-end-samples/wiki/images/shell.gif)
 
 ## Introduction
 
-This repository provides a reference implementation of [Vertex Pipelines](https://cloud.google.com/vertex-ai/docs/pipelines/) for creating a production-ready MLOps solution on Google Cloud.
-You can take this repository as a starting point you own ML use cases. 
-The implementation includes:
-
-* **Infrastructure-as-Code** using Terraform for a typical dev/test/prod setup of Vertex AI and other relevant services
-* **ML training and prediction pipelines** using the Kubeflow Pipelines
-* **Reusable Kubeflow components** that can be used in common ML pipelines
-* **CI/CD** using Google Cloud Build for linting, testing, and deploying ML pipelines
-* **Developer scripts** (Makefile, Python scripts etc.)
-
-**Get started today by following [this step-by-step notebook tutorial](./docs/notebooks)! 🚀**
-In this three-part notebook series you'll deploy a Google Cloud project and run production-ready ML pipelines using Vertex AI without writing a single line of code.
-
-## Cloud Architecture
-
-The diagram below shows the cloud architecture for this repository.
-
-![Cloud Architecture diagram](./docs/images/architecture.png)
-
-There are four different Google Cloud projects in use
-
-* `dev` - a shared sandbox environment for use during development
-* `test` - environment for testing new changes before they are promoted to production. This environment should be treated as much as possible like a production environment.
-* `prod` - production environment
-* `admin` - separate Google Cloud project for setting up CI/CD in Cloud Build (since the CI/CD pipelines operate across the different environments)
-
-Vertex Pipelines are scheduled using Google Cloud Scheduler. 
-Cloud Scheduler emits a Pub/Sub message that triggers a Cloud Function, which in turn triggers the Vertex Pipeline to run. 
-_In future, this will be replaced with the Vertex Pipelines Scheduler (once there is a Terraform resource for it)._
-
-## Setup
-
-**Prerequisites:**
-
-- [Terraform](https://www.terraform.io/) for managing cloud infrastructure
-- [tfswitch](https://tfswitch.warrensbox.com/) to automatically choose and download an appropriate Terraform version (recommended) 
-- [Pyenv](https://github.com/pyenv/pyenv#installation) for managing Python versions
-- [Poetry](https://python-poetry.org/) for managing Python dependencies
-- [Google Cloud SDK (gcloud)](https://cloud.google.com/sdk/docs/quickstart)
-- Make
-- Cloned repo
-
-**Deploy infrastructure:**
-
-You will need four Google Cloud projects dev, test, prod, and admin.
-The Cloud Build pipelines will run in the _admin_ project, and deploy resources into the dev/test/prod projects.
-Before your CI/CD pipelines can deploy the infrastructure, you will need to set up a Terraform state bucket for each environment:
-
-```bash
-export DEV_PROJECT_ID=my-dev-gcp-project
-export DEV_LOCATION=europe-west2
-gsutil mb -l $DEV_LOCATION -p $DEV_PROJECT_ID --pap=enforced gs://$DEV_PROJECT_ID-tfstate && \
-  gsutil ubla set on gs://$DEV_PROJECT_ID-tfstate
-```
-
-Enable APIs in admin project:
-
-```bash
-export ADMIN_PROJECT_ID=my-admin-gcp-project
-gcloud services enable cloudresourcemanager.googleapis.com serviceusage.googleapis.com --project=$ADMIN_PROJECT_ID
-```
-
-```bash
-make deploy env=dev
-```
-
-More details about infrastructure is explained in [this guide](docs/Infrastructure.md).
-It describes the scheduling of pipelines and how to tear down infrastructure.
-
-**Install dependencies:**
-
-```bash
-pyenv install -skip-existing                          # install Python
-poetry config virtualenvs.prefer-active-python true   # configure Poetry
-make install                                          # install Python dependencies
-cd pipelines && poetry run pre-commit install         # install pre-commit hooks
-cp env.sh.example env.sh
-```
-
-Update the environment variables for your dev environment in `env.sh`.
-
-**Authenticate to Google Cloud:**
-
-```bash
-gcloud auth login
-gcloud auth application-default login
-```
-
-## Run
-
-This repository contains example ML training and prediction pipelines which are explained in [this guide](docs/Pipelines.md).
-
-**Build containers:** The [model/](/model/) directory contains the code for custom training and prediction container images, including the model training script at [model/training/train.py](model/training/train.py). 
-You can modify this to suit your own use case.
-Build the training and prediction container images and push them to Artifact Registry with:
-
-```bash
-make build [ images="training prediction" ]
-```
-
-Optionally specify the `images` variable to only build one of the images.
-
-**Execute pipelines:** Vertex AI Pipelines uses KubeFlow to orchestrate your training steps, as such you'll need to:
-
-1. Compile the pipeline
-1. Build dependent Docker containers
-1. Run the pipeline in Vertex AI
-
-Execute the following command to run through steps 1-3:
-
-```bash
-make run pipeline=training [ build=<true|false> ] [ compile=<true|false> ] [ cache=<true|false> ] [ wait=<true|false> ] 
-```
-
-The command has the following true/false flags:
-
-- `build` - re-build containers for training & prediction code (limit by setting images=training to build only one of the containers)
-- `compile` - re-compile the pipeline to YAML
-- `cache` - cache pipeline steps
-- `wait` - run the pipeline (a-)sync
-
-**Shortcuts:** Use these commands which support the same options as `run` to run the training or prediction pipeline:
-
-```bash
-make training
-make prediction
-```
-
-## Test
-
-Unit tests are performed using [pytest](https://docs.pytest.org).
-The unit tests are run on each pull request. 
-To run them locally you can execute the following command and optionally enable or disable testing of components:
-
-```
-make test [ packages=<pipelines components> ]
-```
-
-## Automation
-
-For details on setting up CI/CD, see [this guide](./docs/Automation.md).
-
-## Putting it all together
-
-For a full walkthrough of the journey from changing the ML pipeline code to having it scheduled and running in production, please see the guide [here](./docs/Production.md).
-
-We value your contribution, see [this guide](./docs/Contribution.md) for contributing to this project.
+The notebooks are self-contained but instructors of this hackathon are asked to prepare the following for hackathon attendees.
+
+1. Create 3x Google Cloud projects (dev, test, prod)
+1. Use `make deploy` to deploy resources in each of them. It's advised to follow the [infrastructure setup notebook](./docs/notebooks/01_infrastructure_setup.ipynb) for each environment
+1. Create an E2E test trigger in the test project
+1. Create a release trigger in the prod project
+1. Add each user with their own Google account with the following IAM roles:
+    - `Vertex AI User` (roles/aiplatform.user)
+    - `Storage Object Viewer` (roles/storage.objectViewer)
+    - `Service Usage Consumer` (roles/serviceusage.serviceUsageConsumer)
+1. Create one Vertex Workbench instance per user.
+1. Confirm that users can access the GCP resources.
+1. ❗Post workshop remember to delete all the users from the project and to clean up branches and releases in this repository
diff --git a/cloudbuild/release.yaml b/cloudbuild/release.yaml
@@ -57,7 +57,7 @@ steps:
           --tag=${TAG_NAME}; \
         done
     env:
-      - RESOURCE_SUFFIX=default
+      - RESOURCE_SUFFIX=${TAG_NAME}
 
 options:
   logging: CLOUD_LOGGING_ONLY

diff --git a/docs/Production.md b/docs/Production.md
@@ -20,11 +20,11 @@ This document describes the full process from making a change to your pipeline c
 
 ## Pre-requisites
 
-- Suitable GCP environments set up - see the [README](../README.md)
+- Suitable GCP environments set up - see the [README](README.md)
 - This repo forked / used as a template for a new GitHub repo
 - CI/CD set up - see the instructions [here](cloudbuild/README.md)
 - Access set up for the BigQuery datasets used in the example pipelines
-- Git repo cloned locally (or in a notebook environment) and local setup complete - see [here](/README.md#local-setup)
+- Git repo cloned locally (or in a notebook environment) and local setup complete - see [here](/docs/README.md#local-setup)
 
 ## Making your changes to the pipelines
 

diff --git a/docs/README.md b/docs/README.md
@@ -0,0 +1,169 @@
+<!-- 
+Copyright 2023 Google LLC
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+    https://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+ -->
+
+# Vertex Pipelines End-to-End Samples
+
+_AKA "Vertex AI Turbo Templates"_
+
+## Introduction
+
+This repository provides a reference implementation of [Vertex Pipelines](https://cloud.google.com/vertex-ai/docs/pipelines/) for creating a production-ready MLOps solution on Google Cloud.
+You can take this repository as a starting point you own ML use cases. 
+The implementation includes:
+
+* **Infrastructure-as-Code** using Terraform for a typical dev/test/prod setup of Vertex AI and other relevant services
+* **ML training and prediction pipelines** using the Kubeflow Pipelines
+* **Reusable Kubeflow components** that can be used in common ML pipelines
+* **CI/CD** using Google Cloud Build for linting, testing, and deploying ML pipelines
+* **Developer scripts** (Makefile, Python scripts etc.)
+
+**Get started today by following [this step-by-step notebook tutorial](notebooks)! 🚀**
+In this three-part notebook series you'll deploy a Google Cloud project and run production-ready ML pipelines using Vertex AI without writing a single line of code.
+
+## Cloud Architecture
+
+The diagram below shows the cloud architecture for this repository.
+
+![Cloud Architecture diagram](images/architecture.png)
+
+There are four different Google Cloud projects in use
+
+* `dev` - a shared sandbox environment for use during development
+* `test` - environment for testing new changes before they are promoted to production. This environment should be treated as much as possible like a production environment.
+* `prod` - production environment
+* `admin` - separate Google Cloud project for setting up CI/CD in Cloud Build (since the CI/CD pipelines operate across the different environments)
+
+Vertex Pipelines are scheduled using Google Cloud Scheduler. 
+Cloud Scheduler emits a Pub/Sub message that triggers a Cloud Function, which in turn triggers the Vertex Pipeline to run. 
+_In future, this will be replaced with the Vertex Pipelines Scheduler (once there is a Terraform resource for it)._
+
+## Setup
+
+**Prerequisites:**
+
+- [Terraform](https://www.terraform.io/) for managing cloud infrastructure
+- [tfswitch](https://tfswitch.warrensbox.com/) to automatically choose and download an appropriate Terraform version (recommended) 
+- [Pyenv](https://github.com/pyenv/pyenv#installation) for managing Python versions
+- [Poetry](https://python-poetry.org/) for managing Python dependencies
+- [Google Cloud SDK (gcloud)](https://cloud.google.com/sdk/docs/quickstart)
+- Make
+- Cloned repo
+
+**Deploy infrastructure:**
+
+You will need four Google Cloud projects dev, test, prod, and admin.
+The Cloud Build pipelines will run in the _admin_ project, and deploy resources into the dev/test/prod projects.
+Before your CI/CD pipelines can deploy the infrastructure, you will need to set up a Terraform state bucket for each environment:
+
+```bash
+export DEV_PROJECT_ID=my-dev-gcp-project
+export DEV_LOCATION=europe-west2
+gsutil mb -l $DEV_LOCATION -p $DEV_PROJECT_ID --pap=enforced gs://$DEV_PROJECT_ID-tfstate && \
+  gsutil ubla set on gs://$DEV_PROJECT_ID-tfstate
+```
+
+Enable APIs in admin project:
+
+```bash
+export ADMIN_PROJECT_ID=my-admin-gcp-project
+gcloud services enable cloudresourcemanager.googleapis.com serviceusage.googleapis.com --project=$ADMIN_PROJECT_ID
+```
+
+```bash
+make deploy env=dev
+```
+
+More details about infrastructure is explained in [this guide](Infrastructure.md).
+It describes the scheduling of pipelines and how to tear down infrastructure.
+
+**Install dependencies:**
+
+```bash
+pyenv install -skip-existing                          # install Python
+poetry config virtualenvs.prefer-active-python true   # configure Poetry
+make install                                          # install Python dependencies
+cd pipelines && poetry run pre-commit install         # install pre-commit hooks
+cp env.sh.example env.sh
+```
+
+Update the environment variables for your dev environment in `env.sh`.
+
+**Authenticate to Google Cloud:**
+
+```bash
+gcloud auth login
+gcloud auth application-default login
+```
+
+## Run
+
+This repository contains example ML training and prediction pipelines which are explained in [this guide](Pipelines.md).
+
+**Build containers:** The [model/](/model/) directory contains the code for custom training and prediction container images, including the model training script at [model/training/train.py](../model/training/train.py). 
+You can modify this to suit your own use case.
+Build the training and prediction container images and push them to Artifact Registry with:
+
+```bash
+make build [ images="training prediction" ]
+```
+
+Optionally specify the `images` variable to only build one of the images.
+
+**Execute pipelines:** Vertex AI Pipelines uses KubeFlow to orchestrate your training steps, as such you'll need to:
+
+1. Compile the pipeline
+1. Build dependent Docker containers
+1. Run the pipeline in Vertex AI
+
+Execute the following command to run through steps 1-3:
+
+```bash
+make run pipeline=training [ build=<true|false> ] [ compile=<true|false> ] [ cache=<true|false> ] [ wait=<true|false> ] 
+```
+
+The command has the following true/false flags:
+
+- `build` - re-build containers for training & prediction code (limit by setting images=training to build only one of the containers)
+- `compile` - re-compile the pipeline to YAML
+- `cache` - cache pipeline steps
+- `wait` - run the pipeline (a-)sync
+
+**Shortcuts:** Use these commands which support the same options as `run` to run the training or prediction pipeline:
+
+```bash
+make training
+make prediction
+```
+
+## Test
+
+Unit tests are performed using [pytest](https://docs.pytest.org).
+The unit tests are run on each pull request. 
+To run them locally you can execute the following command and optionally enable or disable testing of components:
+
+```
+make test [ packages=<pipelines components> ]
+```
+
+## Automation
+
+For details on setting up CI/CD, see [this guide](Automation.md).
+
+## Putting it all together
+
+For a full walkthrough of the journey from changing the ML pipeline code to having it scheduled and running in production, please see the guide [here](Production.md).
+
+We value your contribution, see [this guide](Contribution.md) for contributing to this project.