Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update of Habana SynapseAI notebook to 1.15.1 #533

Closed
wants to merge 2 commits into from

Conversation

Xaenalt
Copy link
Member

@Xaenalt Xaenalt commented May 22, 2024

Description

Adds the SynapseAI 1.15.1 notebook, which is needed for RHOAI 2.10 compatibility

This required quite a few changes, since the SynapseAI stack requires Python 3.11 for 1.15.1. Uses a similar workflow to Anaconda with a custom base image, this however should be easy to update in the future just by swapping out the FROM in the base image and verifying the python versions going forward.

How Has This Been Tested?

Builds successfully, could use some functional testing

Merge criteria:

  • The commits are squashed in a cohesive manner and have meaningful messages.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work

@Xaenalt
Copy link
Member Author

Xaenalt commented May 22, 2024

@harshad16 PTAL

@Xaenalt
Copy link
Member Author

Xaenalt commented May 22, 2024

/retest

@harshad16
Copy link
Member

@Xaenalt , i m not sure if we would like to maintain python 3.10 to be honest
we should have this checked with others.
i understand this version of Habana only is available at 3.10, however without confirmation
this would make it more harder for the team to maintain this in long run.

@harshad16
Copy link
Member

/hold

@Xaenalt
Copy link
Member Author

Xaenalt commented May 22, 2024

Yeah, thankfully the same versions of everything exist in 3.10, but yeah let's have a longer discussion about it

@Xaenalt
Copy link
Member Author

Xaenalt commented May 22, 2024

According to Intel, this would only be necessary until the 1.17 release which will be end of July-ish

Copy link
Contributor

openshift-ci bot commented Jun 7, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign harshad16 for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jiridanek
Copy link
Member

jiridanek commented Jun 7, 2024

@Xaenalt , i m not sure if we would like to maintain python 3.10 to be honest

Red Hat ships Python 3.11, can you wait and use that? It is supported until its retirement date in May 2026 in rhel 8 as well as rhel 9. If you base the image on ub9/rhel9 and use Python 3.12 available there, that has retirement date in Apr 2027. https://access.redhat.com/support/policy/updates/rhel-app-streams-life-cycle#rhel8_application_streams

This required quite a few changes, since the SynapseAI stack
requires Python 3.11 for 1.15.1. Uses a similar workflow to
Anaconda with a custom base image, this however should be easy
to update in the future just by swapping out the FROM in the base
image and verifying the python versions going forward.

Signed-off-by: Sean Pryor <[email protected]>
In loading Torch (and other modules), it prints out the following

Warning: please export TSAN_OPTIONS='ignore_noninstrumented_modules=1' to avoid false positive reports from the OpenMP runtime!

I added this to an ENV section in the image, since it seems worth
squashing these false positive reports, I left it as an extra
commit though in case others want to leave that to the user
@Xaenalt
Copy link
Member Author

Xaenalt commented Jun 7, 2024

If we're good to use Python 3.11 that'll be excellent, and will be supported in the 1.17 release

I was just doing a few small fixes to the notebook PR, since it's a much easier to use template for the future

@Xaenalt
Copy link
Member Author

Xaenalt commented Jun 7, 2024

If possible, I'd like to revisit the timeline until 1.17 (EO July timeframe) at which point we can retire the 1.15/1.16 series which are stuck on Python 3.10. Currently (pre 1.17) the notebook SynapseAI version has to match what's in the operator

@jstourac
Copy link
Member

jstourac commented Aug 7, 2024

@Xaenalt JFYI, I updated the description of the https://issues.redhat.com/browse/RHOAIENG-5404 which is tracking addition of the Gaudi v1.17 into RHOAI as v1.17 seems to be out now.

In the meantime, there is a plan to introduce a Python 3.11 via #659 (just the image definitions but won't be used in the actual RHOAI/ODH builds yet - that will be a next step).

PROMPT_COMMAND=". ${APP_ROOT}/bin/activate"

# Squash a warning from the Habana runtime
ENV TSAN_OPTIONS='ignore_noninstrumented_modules=1'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we really compiling something with thread sanitizer enabled? this gives you 3 to 10x more memory consumption and about the same amount of performance degradation; tsan is intended for debug builds only

tar -xzvf /tmp/openshift-client-linux.tar.gz oc && \
rm -f /tmp/openshift-client-linux.tar.gz

# Fix permissions to support pip in Openshift environments
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be done in the same RUN as the micropipenv install that created these files

Copy link
Contributor

openshift-ci bot commented Aug 29, 2024

@Xaenalt: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/runtime-intel-pyt-ubi9-python-3-9-pr-image-mirror 06cfbcd link true /test runtime-intel-pyt-ubi9-python-3-9-pr-image-mirror
ci/prow/notebook-jupyter-intel-pyt-ubi9-python-3-9-pr-image-mirror 06cfbcd link true /test notebook-jupyter-intel-pyt-ubi9-python-3-9-pr-image-mirror
ci/prow/notebook-jupyter-intel-tf-ubi9-python-3-9-pr-image-mirror 06cfbcd link true /test notebook-jupyter-intel-tf-ubi9-python-3-9-pr-image-mirror
ci/prow/notebooks-e2e-tests 4009767 link true /test notebooks-e2e-tests
ci/prow/images 4009767 link true /test images
ci/prow/habana-notebooks-e2e-tests 4009767 link true /test habana-notebooks-e2e-tests
ci/prow/anaconda-ubi8-e2e-tests 4009767 link true /test anaconda-ubi8-e2e-tests
ci/prow/amd-runtimes-ubi9-e2e-tests 4009767 link true /test amd-runtimes-ubi9-e2e-tests
ci/prow/notebook-rocm-ubi9-python-3-9-pr-image-mirror 4009767 link true /test notebook-rocm-ubi9-python-3-9-pr-image-mirror
ci/prow/runtime-rocm-pytorch-ubi9-python-3-9-pr-image-mirror 4009767 link true /test runtime-rocm-pytorch-ubi9-python-3-9-pr-image-mirror
ci/prow/runtime-rocm-tensorflow-ubi9-python-3-9-pr-image-mirror 4009767 link true /test runtime-rocm-tensorflow-ubi9-python-3-9-pr-image-mirror
ci/prow/runtimes-ubi8-e2e-tests 4009767 link true /test runtimes-ubi8-e2e-tests
ci/prow/notebook-rocm-jupyter-tf-ubi9-python-3-11-pr-image-mirror 4009767 link true /test notebook-rocm-jupyter-tf-ubi9-python-3-11-pr-image-mirror
ci/prow/notebook-cuda-jupyter-tf-ubi9-python-3-11-pr-image-mirror 4009767 link true /test notebook-cuda-jupyter-tf-ubi9-python-3-11-pr-image-mirror
ci/prow/notebook-jupyter-pytorch-ubi9-python-3-11-pr-image-mirror 4009767 link true /test notebook-jupyter-pytorch-ubi9-python-3-11-pr-image-mirror
ci/prow/runtimes-ubi9-e2e-tests 4009767 link true /test runtimes-ubi9-e2e-tests
ci/prow/rocm-runtimes-ubi9-e2e-tests 4009767 link true /test rocm-runtimes-ubi9-e2e-tests
ci/prow/notebooks-ubi9-e2e-tests 4009767 link true /test notebooks-ubi9-e2e-tests
ci/prow/codeserver-notebook-e2e-tests 4009767 link true /test codeserver-notebook-e2e-tests
ci/prow/intel-notebooks-e2e-tests 4009767 link true /test intel-notebooks-e2e-tests
ci/prow/rstudio-notebook-e2e-tests 4009767 link true /test rstudio-notebook-e2e-tests
ci/prow/rocm-notebooks-e2e-tests 4009767 link true /test rocm-notebooks-e2e-tests

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@daniellutz daniellutz mentioned this pull request Sep 2, 2024
3 tasks
@jstourac
Copy link
Member

jstourac commented Sep 3, 2024

I think that this work will be superseded by the #695.

@Xaenalt
Copy link
Member Author

Xaenalt commented Sep 3, 2024

Yep, superseded by #695

@Xaenalt Xaenalt closed this Sep 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants