Skip to content

Commit

Permalink
Add vllm chart in ai-stack (#18)
Browse files Browse the repository at this point in the history
Signed-off-by: Sanket <[email protected]>
  • Loading branch information
sanketsudake committed Sep 19, 2024
1 parent ee45d6a commit 62a897c
Show file tree
Hide file tree
Showing 5 changed files with 59 additions and 3 deletions.
7 changes: 5 additions & 2 deletions charts/ai-stack/Chart.lock
Original file line number Diff line number Diff line change
Expand Up @@ -11,5 +11,8 @@ dependencies:
- name: chromadb
repository: https://infracloudio.github.io/charts
version: 0.1.3
digest: sha256:0febd220a71c6533c04a53affcfbeca2a77261acba6ded41f424cc34c2a056ff
generated: "2024-08-19T19:46:03.544448+05:30"
- name: vllm
repository: https://infracloudio.github.io/charts
version: 0.1.0
digest: sha256:14b5e60e54b3618e5d950841fee42743eb9d50d2fed44d8d46484c97adbffde6
generated: "2024-09-19T18:46:10.274968+05:30"
8 changes: 7 additions & 1 deletion charts/ai-stack/Chart.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ type: application
# This is the chart version. This version number should be incremented each time you make changes
# to the chart and its templates, including the app version.
# Versions are expected to follow Semantic Versioning (https://semver.org/)
version: 0.3.8
version: 0.4.0

# This is the version number of the application being deployed. This version number should be
# incremented each time you make changes to the application. Versions are not expected to
Expand Down Expand Up @@ -49,6 +49,12 @@ dependencies:
alias: vectordb
condition: vectordb.enabled

- name: vllm
version: 0.1.0
repository: "https://infracloudio.github.io/charts"
alias: vllm
condition: vllm.enabled

keywords:
- ai-stack
- ai-services
Expand Down
1 change: 1 addition & 0 deletions charts/ai-stack/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ The AI stack consists of the following components:
- [Text Generation Inference(TGI)](../text-generation-inference/)
- [Grafana Dashboards](../infracloud-dashboards)
- [ChromaDB](../chromadb)
- [vLLM](../vllm)

## Setup Helm Repository

Expand Down
Binary file added charts/ai-stack/charts/vllm-0.1.0.tgz
Binary file not shown.
46 changes: 46 additions & 0 deletions charts/ai-stack/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -185,3 +185,49 @@ reranker:
- name: hf-cache
persistentVolumeClaim:
claimName: hf-cache


# Values for vllm: the vllm chart
# Reference: https://artifacthub.io/packages/helm/infracloud-charts/vllm?modal=values
vllm:
enabled: false

config:
model: "meta-llama/Meta-Llama-3.1-8B-Instruct"

env:
- name: HF_API_TOKEN
valueFrom:
secretKeyRef:
name: hf-api-token
key: HF_API_TOKEN
- name: HF_HUB_OFFLINE
value: "1"
- name: HF_HUB_CACHE
value: "/model"

resources:
limits:
nvidia.com/gpu: 1
requests:
nvidia.com/gpu: 1

strategy:
type: Recreate

service:
type: LoadBalancer
port: 8000

volumeMounts:
- name: hf-cache
mountPath: /model

volumes:
- name: hf-cache
persistentVolumeClaim:
claimName: hf-cache
- name: shm
emptyDir:
medium: Memory
sizeLimit: "1Gi"

0 comments on commit 62a897c

Please sign in to comment.