Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aws docker support #663

Closed
wants to merge 118 commits into from
Closed
Show file tree
Hide file tree
Changes from 99 commits
Commits
Show all changes
118 commits
Select commit Hold shift + click to select a range
19fc2fc
add redis
kaiyingshan Jul 29, 2022
399c6ee
able to run without mpi
kaiyingshan Jul 30, 2022
ffdfcca
remove useless file
kaiyingshan Jul 30, 2022
5b2da21
separate oob logic from ucx/ucc communicators
kaiyingshan Aug 21, 2022
6001c24
re-enable tests
kaiyingshan Aug 21, 2022
720d71e
code placements
kaiyingshan Aug 22, 2022
3a221bf
minor fixes
kaiyingshan Aug 22, 2022
82f4bda
added python script to run cylon ucx/ucc without mpirun
kaiyingshan Aug 29, 2022
ccb323a
mimic gather with allgather
kaiyingshan Sep 28, 2022
9aa2ce2
Fixes missing MPI_Comm in UCXConfig
mstaylor Mar 16, 2023
aed0a44
Adds CYLON_USE_REDIS flag to allow UCC/UCX builds that don't require …
mstaylor Mar 23, 2023
7670557
Changes ucc_operations to reflect code in cylondata/cylon main
mstaylor Mar 23, 2023
cdadc14
Changes adds defaults for CYLON_UCX, CYLON_UCC and CYLON_USE_REDIS
mstaylor Mar 23, 2023
6eb56f9
Changes adds defaults for CYLON_UCX, CYLON_UCC and CYLON_USE_REDIS
mstaylor Mar 23, 2023
da99851
Changes adds defaults for CYLON_UCX, CYLON_UCC and CYLON_USE_REDIS
mstaylor Mar 23, 2023
2e15781
Changes adds defaults for CYLON_UCX, CYLON_UCC and CYLON_USE_REDIS
mstaylor Mar 23, 2023
6de16fd
Adds missing constructor for UCXUCCCommunicator from main
mstaylor Mar 24, 2023
103eccd
Further resolution of differences between main and branch
mstaylor Mar 31, 2023
d337a68
Fix merge issue where CommType Type() was private and results in a co…
mstaylor Mar 31, 2023
b7b0173
Fix issue with UCX build where CreateChannel in UCX Communicator is n…
mstaylor Mar 31, 2023
9f7e38b
Fixes issue with UCX (non-UCC) tests
mstaylor Mar 31, 2023
2ff91a0
Adds support for redis build git workflow
mstaylor Apr 1, 2023
f7b6fee
fix hiredis workflow
mstaylor Apr 1, 2023
3694a2c
Adds support for redis build git workflow
mstaylor Apr 1, 2023
2ad10fa
Adds support for redis build git workflow
mstaylor Apr 1, 2023
4bcbae3
Adds support for redis build git workflow - root install via sudo
mstaylor Apr 3, 2023
0ce6df6
moves OOBType to separate hpp + cython support
mstaylor Apr 5, 2023
a65a7fe
adds oob_context cython + updates to build.py and setup.py in support…
mstaylor Apr 7, 2023
7211adf
adds oob_context cython + updates to build.py and setup.py in support…
mstaylor Apr 7, 2023
1d45b39
adds oob_context cython + updates to build.py and setup.py in support…
mstaylor Apr 7, 2023
fdbe7c9
adds oob_context cython + updates to build.py and setup.py in support…
mstaylor Apr 9, 2023
2c47ef7
adds oob_context cython + updates to build.py and setup.py in support…
mstaylor Apr 10, 2023
3f9ba96
separates redis oob contexts in separate source files to facilitate c…
mstaylor Apr 11, 2023
5022567
refactoring related to non-redis environments + introducing UCXRedisO…
mstaylor Apr 12, 2023
32005ce
refactoring related to non-redis environments + introducing UCXRedisO…
mstaylor Apr 12, 2023
11d3ed1
introduces UCCRedisOOBContext and adds calls to wrapper class
mstaylor Apr 13, 2023
fbbba2e
adds UCC Config
mstaylor Apr 16, 2023
f836668
adds UCC Config (removes redis hard dependency)
mstaylor Apr 17, 2023
e218ac0
adds necessary hooks in lib.pxd, lib.pyx and context for initDistributed
mstaylor Apr 17, 2023
3d92e3c
includes aws cf scripts for redis and minor change for operator examp…
mstaylor May 3, 2023
ccb6166
minor changes to oob contexts to support Cython initialization + upda…
mstaylor May 9, 2023
3e7e123
adds redis example and minor changes related to redis oob context
mstaylor May 9, 2023
7974db9
fixes for running redis example
mstaylor May 15, 2023
ec3b2f9
updates to redis_example to take argument for world size, redis host …
mstaylor May 18, 2023
a9f9573
adds support for ReduceOp
mstaylor May 24, 2023
33d45f4
fixes circular dependency when using CScalar
mstaylor May 26, 2023
83bc34e
UCC/UCX AllReduce partial
mstaylor May 28, 2023
d9f3abf
UCC/UCX AllReduce partial
mstaylor May 31, 2023
3a5fe29
UCC/UCX AllReduce partial
mstaylor Jun 1, 2023
91778d6
UCC/UCX AllReduce partial
mstaylor Jun 5, 2023
c46e029
UCC/UCX AllReduce partial - adds support for MPICommunicator
mstaylor Jun 14, 2023
d36d093
UCC/UCX AllReduce partial - adds support for UCXCommunicator
mstaylor Jun 14, 2023
06137c9
UCC/UCX AllReduce partial - adds boto3 push to s3 for summary and sto…
mstaylor Jun 18, 2023
c71eed9
UCC/UCX AllReduce partial - adds boto3 push to s3 for summary and sto…
mstaylor Jun 19, 2023
e005afc
partial: adds ucc-ucx dockerfile + minor cmake changes for docker bui…
mstaylor Jun 26, 2023
8c51ba5
cylon git commands
mstaylor Jun 27, 2023
5e73dd7
partial: adds ucc-ucx dockerfile + minor cmake changes for docker bui…
mstaylor Jun 27, 2023
5901882
partial: rolls back cmake changes
mstaylor Jun 27, 2023
07c5b5f
partial: rolls back cmake changes
mstaylor Jun 27, 2023
4b827f5
cylon git commands
mstaylor Jun 27, 2023
7d55c35
cylon git commands
mstaylor Jun 27, 2023
e774f21
test fix for panda/numpy bound (to fix github action build failures)
mstaylor Jun 27, 2023
a6b9c03
adds support for running cylon ucc/ucx/redis in a docker container
mstaylor Jun 27, 2023
1ad31af
adds aws dockerfile + a script that will pull a script (python) for r…
mstaylor Jun 29, 2023
a066a71
minor fix - addresses minor bug in output file creation
mstaylor Jul 2, 2023
3b1653f
docker file environment variable
mstaylor Jul 2, 2023
5872d8c
updates dockerfile and S3 run script to pull args from environment
mstaylor Jul 3, 2023
8ead38d
updates dockerfile and S3 run script to pull args from environment
mstaylor Jul 3, 2023
fb0b331
adds conda run for aws entry point
mstaylor Jul 5, 2023
9c0d17e
adds conda run for aws entry point
mstaylor Jul 5, 2023
293716d
adds conda run for aws entry point
mstaylor Jul 5, 2023
062d512
adds conda run for aws entry point
mstaylor Jul 5, 2023
a0e75f4
adds conda run for aws entry point
mstaylor Jul 5, 2023
fa8402f
adds conda run for aws entry point
mstaylor Jul 5, 2023
33e0158
adds conda run for aws entry point
mstaylor Jul 5, 2023
efc7f8f
adds conda run for aws entry point
mstaylor Jul 5, 2023
dfedb9f
adds conda run for aws entry point
mstaylor Jul 5, 2023
acda5e8
adds conda run for aws entry point
mstaylor Jul 5, 2023
3bc10b9
adds conda run for aws entry point
mstaylor Jul 5, 2023
4879603
adds conda run for aws entry point
mstaylor Jul 5, 2023
7671572
adds conda run for aws entry point
mstaylor Jul 5, 2023
123b86b
partial - first docker execution on aws
mstaylor Jul 6, 2023
4e7255d
partial - first docker execution on aws
mstaylor Jul 6, 2023
f801c9f
partial - first docker execution on aws
mstaylor Jul 6, 2023
998b067
partial - first docker execution on aws
mstaylor Jul 6, 2023
a5ca04b
partial - first docker execution on aws
mstaylor Jul 6, 2023
d6f86c5
partial - first docker execution on aws
mstaylor Jul 6, 2023
12740fa
partial - first docker execution on aws
mstaylor Jul 6, 2023
c4c5bac
partial - first docker execution on aws
mstaylor Jul 6, 2023
6d6740d
partial - first docker execution on aws
mstaylor Jul 6, 2023
6013d8b
partial - first docker execution on aws
mstaylor Jul 6, 2023
14c29c7
partial - first docker execution on aws
mstaylor Jul 6, 2023
736676b
partial - first docker execution on aws
mstaylor Jul 6, 2023
007dbc4
partial - first docker execution on aws
mstaylor Jul 6, 2023
6d012f2
partial - first docker execution on aws
mstaylor Jul 6, 2023
a0b6bba
initial changes for ucx port and address mapping in support for conta…
mstaylor Jul 18, 2023
f8208d3
Merge remote-tracking branch 'origin/aws-docker-support' into aws-doc…
mstaylor Jul 18, 2023
4bf27d5
initial changes for ucx port and address mapping in support for conta…
mstaylor Jul 18, 2023
1e505ce
initial changes for ucx port and address mapping in support for conta…
mstaylor Jul 24, 2023
cf699ab
reverts ucx/ucc port changes
mstaylor Aug 2, 2023
270d366
updates aws docker file instructions includes necessary configuration…
mstaylor Aug 2, 2023
327c177
updates aws docker file instructions includes necessary configuration…
mstaylor Aug 2, 2023
ab43bc8
updates aws docker file instructions includes necessary configuration…
mstaylor Aug 2, 2023
3064b04
updates aws docker file instructions includes necessary configuration…
mstaylor Aug 2, 2023
3be33cb
flushes redis db on Finalize() (prevents the need for db cleanup afte…
mstaylor Aug 4, 2023
8fc86ab
Merge remote-tracking branch 'origin/aws-docker-support' into aws-doc…
mstaylor Aug 4, 2023
f5ca0af
updates scaling script to include sort and slice from rivanna tests +…
mstaylor Aug 7, 2023
ba8b5bb
minor fix on cylon_scaling + commits finalize changes in support for …
mstaylor Aug 7, 2023
7af4f55
adds "None" check for script args
mstaylor Aug 8, 2023
16477c8
fixes minor issue in cylon_scaling script
mstaylor Aug 9, 2023
bbc7363
moves UCCOOBCtx finalization to the beginning of the finalization fun…
mstaylor Aug 10, 2023
2b3f77e
Merge remote-tracking branch 'origin/aws-docker-support' into aws-doc…
mstaylor Aug 10, 2023
09235e6
adds clearDB function and exposes via cython
mstaylor Aug 11, 2023
4f32c0e
adds Status return to remove db
mstaylor Aug 12, 2023
7ec0db6
removes db flush and add python script
mstaylor Aug 12, 2023
03b8017
fixes minor issue in cylon_scaling script
mstaylor Aug 15, 2023
cfcf891
adds cylon aws scaling results + minor changes to cloudformation scri…
mstaylor Aug 19, 2023
e8977fd
adds support for creating N redis clusters via cloudformation stack, …
mstaylor Aug 25, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
77 changes: 77 additions & 0 deletions .github/workflows/conda-cpp-redis.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
name: Conda C++/Python/Redis - gcc,OpenMPI,Redis,UCX/UCC

on:
push:
branches:
- main
- 0.**
pull_request:
branches:
- main
- 0.**

jobs:
build:
runs-on: ${{ matrix.os }}
defaults:
run:
shell: bash -l {0}
strategy:
fail-fast: false
# explicit include-based build matrix, of known valid options
matrix:
include:
# 20.04 supports CUDA 11.0+
- os: ubuntu-20.04
gcc: 9
ucc: "master"

steps:
- uses: actions/checkout@v2

# Specify the correct host compilers
- name: Install/Select gcc and g++
run: |
sudo apt-get install -y gcc-${{ matrix.gcc }} g++-${{ matrix.gcc }} git
echo "CC=/usr/bin/gcc-${{ matrix.gcc }}" >> $GITHUB_ENV
echo "CXX=/usr/bin/g++-${{ matrix.gcc }}" >> $GITHUB_ENV

- uses: conda-incubator/setup-miniconda@v2
with:
activate-environment: cylon_dev
environment-file: conda/environments/cylon.yml

- name: Activate conda
run: conda activate cylon_dev

- name: Install UCC
run: |
git clone --single-branch -b ${{ matrix.ucc }} https://github.com/openucx/ucc.git $HOME/ucc
cd $HOME/ucc
echo "conda ucx: $(conda list | grep ucx)"
./autogen.sh
./configure --prefix=$HOME/ucc/install --with-ucx=$CONDA/envs/cylon_dev
make install

- name: Install Redis
run: |
git clone https://github.com/redis/hiredis.git $HOME/hiredis
cd $HOME/hiredis
make
sudo make install
git clone https://github.com/sewenew/redis-plus-plus.git $HOME/redis-plus-plus
cd $HOME/redis-plus-plus
mkdir build
cd build
cmake -DREDIS_PLUS_PLUS_CXX_STANDARD=11 ..
make
sudo make install

- name: Build cylon, pycylon and run cpp test
run: python build.py -cmake-flags="-DCYLON_UCX=1 -DCYLON_UCC=1 -DUCC_INSTALL_PREFIX=$HOME/ucc/install -DCYLON_USE_REDIS=1" -ipath="$HOME/cylon/install" --cpp --python --test

- name: Run pytest
run: python build.py -ipath="$HOME/cylon/install" --pytest

- name: Build Java
run: python build.py -ipath="$HOME/cylon/install" --java
5 changes: 5 additions & 0 deletions aws/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Running Cylon on AWS ECS

Mills Wellons Staylor, III


64 changes: 64 additions & 0 deletions aws/scripts/S3_run_script.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
import subprocess
import time
import argparse

import boto3
from botocore.exceptions import ClientError
import os

import logging

def environ_or_required(key):
return (
{'default': os.environ.get(key)} if os.environ.get(key)
else {'required': True}
)

def get_file(file_name, bucket, object_name=None):
"""Upload a file to an S3 bucket

:param file_name: File to upload
:param bucket: Bucket to upload to
:param object_name: S3 object name. If not specified then file_name is used
:return: True if file was uploaded, else False
"""

# If S3 object_name was not specified, use file_name
if object_name is None:
object_name = os.path.basename(file_name)

# download the file
s3_client = boto3.client('s3')
try:
with open(file_name, 'wb') as f:
s3_client.download_fileobj(bucket, object_name, f)
return f
except ClientError as e:
logging.error(e)
return None



def join(data=None):
script = get_file(file_name=data['output_filename'], bucket=data['s3_bucket'], object_name=data['s3_object_name'])

if script is None:
print(f"unable to retrieve file {data['output_filename']} from AWS S3")

cmd = data['args'].split()
subprocess.call(['python'] + [data['output_filename']] + cmd, shell=False)

if __name__ == "__main__":
parser = argparse.ArgumentParser(description="run S3 script")

parser.add_argument('-b', dest='s3_bucket', type=str, help="S3 Bucket Name", **environ_or_required('S3_BUCKET'))
parser.add_argument('-o', dest='s3_object_name', type=str, help="S3 Object Name", **environ_or_required('S3_OBJECT_NAME'))
parser.add_argument('-f', dest='output_filename', type=str, help="Output filename",
**environ_or_required('OUTPUT_FILENAME'))
parser.add_argument('-a', dest='args', type=str, help="script exec arguments",
**environ_or_required('EXEC_ARGS'))



args = vars(parser.parse_args())
join(args)
71 changes: 71 additions & 0 deletions aws/scripts/cloudformation/cylon-elasticache.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
AWSTemplateFormatVersion: 2010-09-09

Parameters:
AvailabilityZone1:
Type: String

AvailabilityZone2:
Type: String

CacheEngine:
Type: String

CacheEngineVersion:
Type: String

CacheNodeType:
Type: String

CacheParameterGroupName:
Type: String

CacheSecurityGroup:
Type: String

CacheSubnet1:
Type: String

CacheSubnet2:
Type: String

Prefix:
Type: String

RedisPort:
Type: Number

ReplicaCount:
Type: Number


Resources:
SubnetGroup:
Type: AWS::ElastiCache::SubnetGroup
Properties:
CacheSubnetGroupName: !Sub "${Prefix}-subnetgroup"
Description: !Sub "${Prefix}-SubnetGroup"
SubnetIds:
- !Ref CacheSubnet1
- !Ref CacheSubnet2
Tags:
- Key: "name"
Value: !Sub "${Prefix}-Redis SubnetGroup"




CacheCluster:
Type: AWS::ElastiCache::CacheCluster
Properties:
ClusterName: !Sub "${Prefix}-Redis"
CacheNodeType: !Ref CacheNodeType
CacheSubnetGroupName: !Ref SubnetGroup
Engine: !Ref CacheEngine
EngineVersion: 7.0
NumCacheNodes: 1 #has to be 1 for redis
VpcSecurityGroupIds:
- !Ref CacheSecurityGroup
Tags:
- Key: "name"
Value: !Sub "${Prefix}-Redis Cluster"
DependsOn: SubnetGroup
90 changes: 90 additions & 0 deletions aws/scripts/cloudformation/cylon-redis.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
AWSTemplateFormatVersion: "2010-09-09"
Parameters:
TemplateBucketName:
Type: String
Default: staylor.dev2

Prefix:
Type: String
Default: cylon

Architecture:
Type: String
Default: arm64

AvailabilityZone1:
Type: String
Default: us-east-1c

AvailabilityZone2:
Type: String
Default: us-east-1d

CacheEngine:
Type: String
Default: redis

CacheEngineVersion:
Type: String
Default: 6.2

CacheNodeType:
Type: String
Default: cache.t4g.micro

CacheParameterGroupName:
Type: String
Default: default.redis7.cluster.on

CacheSecurityGroupName:
Type: String
Default: sg-0da3e3dcebe706315

CacheSubnet1:
Type: String
Default: subnet-07995eea6c462cd73

CacheSubnet2:
Type: String
Default: subnet-039df5ab7fd94f516

ImageId:
Type: String
Default: /aws/service/ami-amazon-linux-latest/amzn2-ami-hvm-arm64-gp2

InstanceType:
Type: String
Default: t4g.nano

ReplicaCount:
Type: Number
Default: 1

Runtime:
Type: String
Default: python3.8

RedisPort:
Type: Number
Default: 6379


Resources:
ElastiCacheStack:
Type: AWS::CloudFormation::Stack
"DeletionPolicy" : "Delete"
Properties:
TemplateURL: !Sub "https://s3.amazonaws.com/${TemplateBucketName}/${Prefix}/${Prefix}-elasticache.yaml"
Parameters:
AvailabilityZone1: !Ref AvailabilityZone1
AvailabilityZone2: !Ref AvailabilityZone2
CacheEngine: !Ref CacheEngine
CacheEngineVersion: !Ref CacheEngineVersion
CacheNodeType: !Ref CacheNodeType
CacheParameterGroupName: !Ref CacheParameterGroupName
CacheSecurityGroup: !Ref CacheSecurityGroupName
CacheSubnet1: !Ref CacheSubnet1
CacheSubnet2: !Ref CacheSubnet2
Prefix: !Ref Prefix
RedisPort: !Ref RedisPort
ReplicaCount: !Ref ReplicaCount
Loading
Loading