Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Introduce new KubeTaskParallel function to run pods with two containers #3095

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

hairyhum
Copy link
Contributor

@hairyhum hairyhum commented Sep 6, 2024

Change Overview

In order to separate data dump generation by database tools and export by kando, a pod performing database dump can have two separate containers and set up the pipe over file instead of anonymous pipe.

The goal of this function is to enable database-specific blueprints to use official database images separately from kanister-tools image instead of building a custom image for each databse blueprint.

Pull request type

Please check the type of change your PR introduces:

  • 🚧 Work in Progress
  • 🌈 Refactoring (no functional changes, no api changes)
  • 🐹 Trivial/Minor
  • 🐛 Bugfix
  • 🌻 Feature
  • 🗺️ Documentation
  • 🤖 Test
  • 🏗️ Build

Issues

  • fixes #issue-number

Test Plan

  • 💪 Manual
  • ⚡ Unit test
  • 💚 E2E

@hairyhum
Copy link
Contributor Author

hairyhum commented Sep 6, 2024

There is an example in tests and docs.
Another example of blueprint:

apiVersion: cr.kanister.io/v1alpha1
kind: Blueprint
metadata:
  name: kube-task-parallel-bp
  namespace: kanister
actions:
  dump:
    phases:
    - func: KubeTaskParallel
      name: dump
      args:
        namespace: kanister
        sharedVolumeMedium: Memory
        sharedVolumeSizeLimit: 1Gi
        sharedVolumeDir: "/tmp/"
        backgroundImage: ubuntu:latest
        backgroundCommand:
        - bash
        - -o
        - errexit
        - -o
        - pipefail
        - -c
        - |
          mkfifo /tmp/pipe-file
          for i in {1..10}
          do
            echo $i
            sleep 1
          done > /tmp/pipe-file
        outputImage: ghcr.io/kanisterio/kanister-tools:v9.99.9-dev
        outputCommand:
        - bash
        - -o
        - errexit
        - -o
        - pipefail
        - -c
        - |
          while [ ! -e /tmp/pipe-file  ]; do sleep 1; done
          kando output numbers "$(cat /tmp/pipe-file)"

docs/functions.rst Outdated Show resolved Hide resolved
@viveksinghggits
Copy link
Contributor

Hi @hairyhum ,
Are we doing this to solve any performance issue that were reported? Can you please share some background just for my understanding.

@hairyhum
Copy link
Contributor Author

hairyhum commented Sep 9, 2024

Hi @hairyhum , Are we doing this to solve any performance issue that were reported? Can you please share some background just for my understanding.

The main goal of this change is to separate image management for databse dump blueprints. Not the performance.

@hairyhum hairyhum changed the title feat: Introduce new Dump function to run pods with two containers feat: Introduce new KubeTaskParallel function to run pods with two containers Sep 11, 2024
@hairyhum
Copy link
Contributor Author

Renamed function into KubeTaskParallel and the arguments for containers to background and output containers.

@hairyhum hairyhum force-pushed the dump-function branch 2 times, most recently from 514f8b2 to fdc0a1b Compare September 11, 2024 16:11
Copy link
Contributor

@e-sumin e-sumin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, but for me function name looks a bit misleading.
But, frankly, I can't propose better name. Maybe other reviewers will have an ideas ?

pkg/function/kube_task_parallel.go Outdated Show resolved Hide resolved
Copy link
Contributor

@e-sumin e-sumin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@viveksinghggits viveksinghggits left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me overall. just left some nit comments/formatting suggestion.

@@ -150,6 +150,82 @@ Example:
- |
echo "Example"

KubeTaskParallel
Copy link
Contributor

@viveksinghggits viveksinghggits Sep 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think maybe we should the name of the function in next community call? It's good right now, but by reading the name it looks like in this function we are doing things parallelly that kubetask does sequentially, which doesn't seem to be the case right?
I think it doesn't necessarily have to have KubeTask in its name right? Maybe MultiContainerUpload or something in that line?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to avoid the "purpose" style names like "dump", "upload" or "export". While this function is used for that purpose it's deliberately made generic.

Copy link
Contributor

@viveksinghggits viveksinghggits Sep 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, let's take opinions from other folks as well. @pavannd1 @PrasadG193 @mlavi .

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somehow KubeTaskParallel doesn't sound good to me. I agree that it would be better if we avoid KubeTask* prefix for the new function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use MultiContainerPod or MultiContainerRun, but we already use KubeTaslk instead of Pod in the other function.

It's similar to KubeTask, but allows using multiple images to move backup data.
"background" container is one responsible for generating data, while "output" container
should export it to destination.
The main difference between them is that phase outputs can only generated from the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The main difference between them is that phase outputs can only generated from the
The main difference between these containers is that phase outputs can only be generated from the

nit

"background" container is one responsible for generating data, while "output" container
should export it to destination.
The main difference between them is that phase outputs can only generated from the
"output" container outputs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"output" container outputs.
"output" container.

?

Comment on lines +171 to +172
`backgroundImage`, Yes, `string`, image to be used in "background" container
`backgroundCommand`, Yes, `[]string`, command list to execute in "background" container
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we call this container snapshot or backup container? or maybe data container? just a suggestion, if you think background is ok, I am good.

Comment on lines +173 to +174
`outputImage`, Yes, `string`, image to be used in "output" container
`outputCommand`, Yes, `[]string`, command list to execute in "output" container
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be called upload container?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same note as a note before, trying to name things for what they do rather than what they can be used for.

@@ -0,0 +1,2 @@
---
features: Introduce new KubeTaskParallel function to run pods with two containers connected by shared volume
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
features: Introduce new KubeTaskParallel function to run pods with two containers connected by shared volume
features: Introduced new Kanister function ``KubeTaskParallel`` to run pods with two containers connected by shared volume

?

…ntainers

In order to separate data dump generation by database tools and export
by `kando`, a pod performing database dump can have two separate containers
and set up the pipe over file instead of anonymous pipe
Init container can be used to set up data in the shared volume.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants