Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spec Scheduler #1487

Open
wants to merge 599 commits into
base: inference
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
599 commits
Select commit Hold shift + click to select a range
38f7920
Change ssh to https
Flechman Jun 2, 2024
a9a6126
Merge branch 'specscheduler' into specschedul_profiling
Flechman Jun 2, 2024
0753a58
Merge pull request #1393 from flexflow/specschedul_profiling
Flechman Jun 2, 2024
6881f40
feat: addadd submodule deps/flashinfer
chenzhuofu Jun 3, 2024
0755b3c
Merge branch 'specscheduler' of github.com:flexflow/FlexFlow into HEAD
chenzhuofu Jun 3, 2024
e3eefeb
feat: add flashinfer into build process
chenzhuofu Jun 3, 2024
919a464
chore: backup tree_inc_multihead_self_attention
chenzhuofu Jun 3, 2024
a54b42f
chore: remove unused
chenzhuofu Jun 3, 2024
6370a0d
chore: change file output to check alignment
chenzhuofu Jun 3, 2024
516f290
chore: minor
chenzhuofu Jun 3, 2024
5e39ca1
chore: minor
chenzhuofu Jun 3, 2024
0d85f58
feat: change commit tokens behavior
chenzhuofu Jun 4, 2024
146fa2e
feat: eliminate usage of devQKVProjArray in commit_tokens_kernel
chenzhuofu Jun 4, 2024
94f7abf
chore: minor
chenzhuofu Jun 4, 2024
c6f4b0a
chore: minor
chenzhuofu Jun 4, 2024
e63c042
feat: compact q_vec into continuous one
chenzhuofu Jun 4, 2024
ed4fbc0
chore: add TODOs
chenzhuofu Jun 4, 2024
cc51599
chore: add CustomMask to BatchConfig
chenzhuofu Jun 4, 2024
433ae1c
feat: add custom_mask in tree_verify_attn op (GPU mem)
chenzhuofu Jun 4, 2024
ff7aa52
feat: update custom_mask on gpu side
chenzhuofu Jun 4, 2024
55c7480
feat: add scratch_space
chenzhuofu Jun 4, 2024
b30dcce
chore: split the attention kernel
chenzhuofu Jun 4, 2024
638df76
feat: add tree_verify_attention based on flashinfer
chenzhuofu Jun 4, 2024
4da0839
Support sampling and speculative sampling.
zikun-li Jun 5, 2024
51a83ab
Merge branch 'specscheduler' of github.com:flexflow/FlexFlow into spe…
zikun-li Jun 5, 2024
9cf91b9
feat: add flashinfer attention kernel instantiate
chenzhuofu Jun 5, 2024
26fea0c
feat: use kernel dispatch
chenzhuofu Jun 5, 2024
0137964
feat: orig attention kernel use custom_mask
chenzhuofu Jun 5, 2024
27a830a
fix: minor
chenzhuofu Jun 5, 2024
44ce741
chore: original kernel eliminate CausalMask dependency
chenzhuofu Jun 5, 2024
f0b8636
fix: MaskMode::kCustom
chenzhuofu Jun 6, 2024
10f7dea
chore: add debug output
chenzhuofu Jun 6, 2024
fa61220
fix: neg_inf for half
chenzhuofu Jun 6, 2024
edfa073
fix: flashinfer support non-prompt phase
chenzhuofu Jun 6, 2024
b1e4d49
fix: align w/ original attention kernel
chenzhuofu Jun 6, 2024
da54433
chore: remove unused
chenzhuofu Jun 6, 2024
6f1d7c9
chore: finetune max ssm steps
chenzhuofu Jun 6, 2024
792a4b7
Fixed bugs in the sampling kernel. Fixed bugs in request manager prof…
zikun-li Jun 7, 2024
b0caa8f
feat: add queryTmp into attentionMeta
chenzhuofu Jun 7, 2024
cbba29b
feat: tree attn support parallelly update qkv cache
chenzhuofu Jun 7, 2024
7721caa
Tried to support gumbel sampling. Left some todos.
zikun-li Jun 7, 2024
84d1ba8
feat: support full precision attention
chenzhuofu Jun 7, 2024
01c4c0c
feat: key and value cache allocate together
chenzhuofu Jun 8, 2024
a83b0ba
feat: add data structure for flashinfer batch inference
chenzhuofu Jun 9, 2024
8fb8bfb
feat: add batch template instance
chenzhuofu Jun 9, 2024
c891fd1
feat: add some aux function for batch inference
chenzhuofu Jun 9, 2024
255ad0c
feat: updating function for batch inference
chenzhuofu Jun 9, 2024
17f61a3
feat: add batch inference
chenzhuofu Jun 9, 2024
1e3febd
chore:minor
chenzhuofu Jun 9, 2024
cade300
feat: split orig_attn
chenzhuofu Jun 9, 2024
116aada
chore: minor
chenzhuofu Jun 10, 2024
96855dc
Fixed request manager.
zikun-li Jun 10, 2024
f4fa3b6
Merge branch 'specscheduler-new-attention' of github.com:flexflow/Fle…
zikun-li Jun 10, 2024
2438b7a
feat: update commit_token_kernel
chenzhuofu Jun 10, 2024
fb5fe7a
Merge branch 'specscheduler-new-attention' of github.com:flexflow/Fle…
chenzhuofu Jun 10, 2024
60b7923
fix: minor
chenzhuofu Jun 10, 2024
389ee7a
fix: get wrong batchconfig from future
chenzhuofu Jun 10, 2024
3c90892
Fixed request manager bug.
zikun-li Jun 10, 2024
3fb17c4
chore: minor add some fields
chenzhuofu Jun 11, 2024
4a537be
fix: kvlayout
chenzhuofu Jun 11, 2024
54266ee
feat: switch to appropriate page_size (64)
chenzhuofu Jun 11, 2024
184460e
feat: add qk_indtr for flashinfer forward
chenzhuofu Jun 11, 2024
578545d
fix: cornor bug
chenzhuofu Jun 11, 2024
40e7c30
feat: move batch_prefill_handler into meta for performance
chenzhuofu Jun 11, 2024
3c1b039
feat: remove unused legacy code
chenzhuofu Jun 12, 2024
e77c9fd
chore: minor output
chenzhuofu Jun 13, 2024
ed4b4ee
Modified the update_custom_mask kernel.
zikun-li Jun 13, 2024
11c3048
Merge branch 'specscheduler-new-attention' of github.com:flexflow/Fle…
chenzhuofu Jun 13, 2024
fb360c0
Modified the commit_tokens kernel
zikun-li Jun 13, 2024
81d3052
Merge branch 'specscheduler-new-attention' of github.com:flexflow/Fle…
zikun-li Jun 13, 2024
78b8dac
fix: minor typo
chenzhuofu Jun 13, 2024
9e84bd7
fix: minor
chenzhuofu Jun 13, 2024
e099405
feat: improve attention handler beginforward
chenzhuofu Jun 14, 2024
c2ab4ca
chore: minor
chenzhuofu Jun 14, 2024
6f9cb95
Added some profiling output
zikun-li Jun 16, 2024
e98ae57
Merge branch 'specscheduler-new-attention' of github.com:flexflow/Fle…
zikun-li Jun 16, 2024
0002b25
Fix 0 logit caused by half precision.
zikun-li Jun 22, 2024
93c3583
Commented out the profiling codes, but keep them there.
zikun-li Jun 22, 2024
8f134fb
Added more parameters to parse_args
zikun-li Jun 23, 2024
aeee29a
Merged specscheduler-new-attention
zikun-li Jun 23, 2024
a7f649e
Fix
zikun-li Jun 23, 2024
02c1a82
Fix ssm_decoding_step timeframe
Flechman Jun 26, 2024
8dbe98d
Specscheduler new attention (#1434)
chenzhuofu Aug 14, 2024
2b02648
Custom AllReduce (#1467)
chenzhuofu Aug 15, 2024
b0e6da2
feat: support laama-2 architechture
chenzhuofu Aug 17, 2024
fdd6a61
chore: minor rename
chenzhuofu Aug 20, 2024
90c0d40
chore: comment out debug output
chenzhuofu Aug 20, 2024
f5a2b1a
fix: temporarily support GQA (which is downgrade into MHA)
chenzhuofu Aug 20, 2024
16db39c
chore: minor rename
chenzhuofu Aug 20, 2024
f78dfd7
chore: minor update
chenzhuofu Aug 20, 2024
a6751d9
chore: minor rename
chenzhuofu Aug 20, 2024
24f4f39
feat: incr_decode switch to flashinfer-based implementation
chenzhuofu Aug 20, 2024
a14b9b2
feat: clean up incr_attention, move global code into separate file
chenzhuofu Aug 20, 2024
ae96135
fix: template instantiate
chenzhuofu Aug 20, 2024
98641fb
chore: more clean up
chenzhuofu Aug 21, 2024
8e9e35b
chore: minor
chenzhuofu Aug 21, 2024
f03479a
chore: q/k/vSize reduce to hidden_size
chenzhuofu Aug 21, 2024
a363b79
chore: reduce projSize into head_dim
chenzhuofu Aug 21, 2024
6eba778
chore: minor
chenzhuofu Aug 21, 2024
77ac4fd
feat: support GQA after compute_qkv, but got runtime error
chenzhuofu Aug 21, 2024
880309e
feat: update flashinfer version
chenzhuofu Aug 21, 2024
e75137b
chore: minor
chenzhuofu Aug 21, 2024
e34bde9
fix: reserve enough space for batch_handler
chenzhuofu Aug 22, 2024
739a333
feat: significantly reduce memory consumption of batch_handler
chenzhuofu Aug 22, 2024
3902b74
chore: minor
chenzhuofu Aug 22, 2024
ce40f7e
style: format code
chenzhuofu Aug 22, 2024
60c1dbe
chore: minor
chenzhuofu Aug 22, 2024
d29d155
chore: separate attention meta into another header file
chenzhuofu Aug 23, 2024
7367079
Merge pull request #1470 from flexflow/gqa-support
chenzhuofu Aug 23, 2024
bc4d9f7
feat: avoid patch query
chenzhuofu Aug 24, 2024
e41f374
chore: separate apply_pos_encoding from compute_qkv
chenzhuofu Aug 25, 2024
5783cf1
chore: remove unused ptr
chenzhuofu Aug 27, 2024
ea580f7
fix: memory pointer alignment
chenzhuofu Aug 27, 2024
be93e5c
chore: minor smplification
chenzhuofu Aug 27, 2024
b6bcd4e
feat: StreamingCacheInfo
chenzhuofu Aug 27, 2024
f2634a9
feat: add streamingCache-related meta params
chenzhuofu Aug 28, 2024
828b1b8
chore: more acurate definition
chenzhuofu Aug 28, 2024
7e71229
chore: minor
chenzhuofu Aug 28, 2024
a2041ea
feat: add streamingCacheInfo
chenzhuofu Aug 28, 2024
694cedf
feat: apply_pos_encoding & update_qkv_cache, add offset control
chenzhuofu Aug 28, 2024
810721e
chore: minor rename
chenzhuofu Aug 28, 2024
ced5e34
Modified the scheduling algorithm.
zikun-li Aug 29, 2024
70a4c2d
feat: kernel implementation for streaming cache usage
chenzhuofu Aug 30, 2024
686bdae
Removed an unused variable.
zikun-li Aug 30, 2024
7e10e1d
Removed unused variable and added tree pruning.
zikun-li Aug 30, 2024
8face9c
feat: implement position encoding for streaming cache
chenzhuofu Aug 31, 2024
e94598f
fix: params should add (de)serialization method
chenzhuofu Aug 31, 2024
f0d56ec
chore: reduce kv cache size
chenzhuofu Aug 31, 2024
419e0f8
chore: minor
chenzhuofu Sep 1, 2024
fb31261
fix: output misalignment
chenzhuofu Sep 1, 2024
fdf7b86
chore: minor
chenzhuofu Sep 1, 2024
77aa1af
fix: speculative decoding update_custom_mask only consider mask withi…
chenzhuofu Sep 1, 2024
425d770
fix: barrier_flag initial value
chenzhuofu Sep 2, 2024
049dfcb
fix: barrier_flag initial value
chenzhuofu Sep 2, 2024
263d9d6
doc: attention meta info
chenzhuofu Sep 2, 2024
689dbd6
docs: minor
chenzhuofu Sep 2, 2024
fe5a8ad
Added indexing support for streaming cache.
chenzhuofu Sep 3, 2024
f095ab7
Merge streamingllm
chenzhuofu Sep 3, 2024
4a32e47
Fix bugs.
chenzhuofu Sep 3, 2024
3a1cf30
Merge branch 'streamingllm' of github.com:flexflow/FlexFlow into stre…
chenzhuofu Sep 3, 2024
fc4c1cd
docs: minor
chenzhuofu Sep 3, 2024
e55cc6e
chore: minor rename
chenzhuofu Sep 3, 2024
8f056af
feat: add streaming-llm logic to attention
chenzhuofu Sep 3, 2024
813e43f
fix: typo
chenzhuofu Sep 3, 2024
e1477d4
fix: minor bugs in streaming llm
chenzhuofu Sep 3, 2024
2f9ef18
fix: minor runtime bug
chenzhuofu Sep 4, 2024
30867e0
Added statics.
chenzhuofu Sep 4, 2024
7f7daeb
Fix output.
zikun-li Sep 4, 2024
417e70a
Fix a bug.
chenzhuofu Sep 4, 2024
f184321
Merge with streamingllm
chenzhuofu Sep 4, 2024
b5eeb26
chore: minor output
chenzhuofu Sep 4, 2024
13850bb
fix: minor offset transition bug
chenzhuofu Sep 5, 2024
30d17a2
chore: minor
chenzhuofu Sep 5, 2024
07d57b8
Fix bug in counting mean acc rate.
chenzhuofu Sep 5, 2024
61177ee
style: format code
chenzhuofu Sep 5, 2024
e42f596
Merge pull request #1489 from flexflow/streamingllm
chenzhuofu Sep 5, 2024
10a1824
Merge branch 'specscheduler' into specscheduler-new-scheduler
chenzhuofu Sep 5, 2024
8100123
Removed unused outputs.
chenzhuofu Sep 7, 2024
3b87329
Removed unused output.
chenzhuofu Sep 7, 2024
b252b31
Merge pull request #1492 from flexflow/specscheduler-new-scheduler
zikun-li Sep 7, 2024
f4e46d2
Fix bug.
zikun-li Sep 7, 2024
bacc515
fix: indeterminate output of customAllReduce
chenzhuofu Sep 7, 2024
101c420
Merge branch 'specscheduler' of github.com:flexflow/FlexFlow into spe…
chenzhuofu Sep 7, 2024
3a35387
fix: request expected latency
chenzhuofu Sep 8, 2024
9b2245b
feat: add GenerationRequest
chenzhuofu Sep 9, 2024
2112b48
feat: add EmissionMachine to simulate requests arrival
chenzhuofu Sep 9, 2024
86e31c3
chore: minor
chenzhuofu Sep 9, 2024
0997fad
chore: minor
chenzhuofu Sep 9, 2024
ae0b8e3
feat: update load_pending_requests logic
chenzhuofu Sep 9, 2024
132f68f
fix: dead lock in request manager; client wait until server init
chenzhuofu Sep 10, 2024
c57b3ee
feat: client support prompt input with slo_ratio
chenzhuofu Sep 10, 2024
2040cf7
feat: add an prompt processing script
chenzhuofu Sep 10, 2024
03ba37e
style: minor format
chenzhuofu Sep 10, 2024
36fb00e
feat: add slo attainment metric
chenzhuofu Sep 10, 2024
fd6f610
chore: minor
chenzhuofu Sep 10, 2024
6f89252
feat: separate max_tokens_per_batch for SSM and LLM
chenzhuofu Sep 10, 2024
d67d577
chore: remove redundant max_spec_tree_tokens
chenzhuofu Sep 11, 2024
1b5c66e
chore: minor
chenzhuofu Sep 11, 2024
d19cd75
style: format
chenzhuofu Sep 11, 2024
6c20f18
Merge pull request #1494 from flexflow/specscheduler-request-emission
chenzhuofu Sep 12, 2024
6e37125
chore: minor output
chenzhuofu Sep 14, 2024
3c4e50e
Fix bugs in the scheduler.
zikun-li Sep 14, 2024
62ac7ed
feat: add max_tokens_per_prefilling_batch
chenzhuofu Sep 14, 2024
da91d84
feat: support batched prefilling
chenzhuofu Sep 14, 2024
d013079
Merge branch 'specscheduler' of github.com:flexflow/FlexFlow into spe…
chenzhuofu Sep 15, 2024
1637ed4
style: format
chenzhuofu Sep 15, 2024
bcb028c
Add a switch for early termination based on slo attainment.
zikun-li Sep 15, 2024
020a210
Merge branch 'specscheduler' of github.com:flexflow/FlexFlow into spe…
zikun-li Sep 15, 2024
06d332c
fix: memory misalignment
chenzhuofu Sep 15, 2024
cf7b7b9
Merge branch 'specscheduler' of github.com:flexflow/FlexFlow into spe…
chenzhuofu Sep 15, 2024
5ddeb11
chore: minor
chenzhuofu Sep 16, 2024
fd6eb7b
Reimplemented add_tokens_to_spec_token_tree.
chenzhuofu Sep 16, 2024
4b4d55c
merge
chenzhuofu Sep 16, 2024
5623fc5
chore: refactor lock
chenzhuofu Sep 16, 2024
f524aac
fix: request per batch
chenzhuofu Sep 17, 2024
d42c6ce
Merge branch 'specscheduler' of github.com:flexflow/FlexFlow into spe…
chenzhuofu Sep 17, 2024
0b7a02f
Optimizes CPU performance of the scheduler
chenzhuofu Sep 18, 2024
fa13afa
chore: incr decode add slo attainment
chenzhuofu Sep 18, 2024
86f95dc
Optimized some usage of priority queues.
chenzhuofu Sep 18, 2024
f169812
feat: support slo ratio sampling
chenzhuofu Sep 18, 2024
e1f711b
Merge branch 'specscheduler' of github.com:flexflow/FlexFlow into spe…
chenzhuofu Sep 18, 2024
7ae7edd
fix: incr_decode doesn'y have slo attainment metric
chenzhuofu Sep 19, 2024
ff3af26
feat: support early_drop switch
chenzhuofu Sep 19, 2024
1a1dc56
chore: add request_per_second param
chenzhuofu Sep 19, 2024
9f034a4
chore: change early drop logic
chenzhuofu Sep 19, 2024
fe55382
feat: add emission output
chenzhuofu Sep 20, 2024
0420199
Dynamically control tree width to not exceed max_tokens_per_ssm_batch.
chenzhuofu Sep 21, 2024
7c7376a
Simplified the method to add tokens to the token trees.
chenzhuofu Sep 22, 2024
4396fc9
Dynamic max tree depth control
chenzhuofu Sep 24, 2024
eee85fe
feat: update raft dependency (select_k)
chenzhuofu Sep 24, 2024
7caaf72
feat: raft build file
chenzhuofu Sep 24, 2024
2ab10b1
chore: minor
chenzhuofu Sep 24, 2024
57f6378
feat: update argTopk op
chenzhuofu Sep 24, 2024
47be784
Merge branch 'specscheduler' of github.com:flexflow/FlexFlow into spe…
chenzhuofu Sep 24, 2024
9fa1f4e
chore: update emission trace
chenzhuofu Sep 24, 2024
0a516c6
feat: add TraceEmissionMachine
chenzhuofu Sep 26, 2024
2071273
Add back old scheduler
chenzhuofu Sep 28, 2024
79f9130
feat: add trace generator
chenzhuofu Oct 1, 2024
f224b5e
fix: initialization issue; read microsecond
chenzhuofu Oct 1, 2024
1fe612b
Merge branch 'specscheduler' of github.com:flexflow/FlexFlow into spe…
chenzhuofu Oct 1, 2024
18a70ff
Update nccl (#1507)
goliaro Sep 21, 2024
ebd45d3
speedup docker builds
goliaro Sep 22, 2024
347d9ad
update
goliaro Sep 22, 2024
62925bb
fix: emission time
chenzhuofu Oct 2, 2024
2e5db3c
feat: trace generator add scaling_factor
chenzhuofu Oct 2, 2024
a17ec6e
feat: add old_scheduler option
chenzhuofu Oct 3, 2024
efead4f
feat: cherry-pick https://github.com/flexflow/FlexFlow/commit/9784b5c…
jiazhihao Aug 12, 2024
285696e
update legion version
goliaro Aug 28, 2024
de55a2e
Fix nccl-induced segfault (#1481)
goliaro Aug 31, 2024
b5fbc8b
Add option to enable old scheduler.
chenzhuofu Oct 4, 2024
a1035f8
Merge branch 'specscheduler' of github.com:flexflow/FlexFlow into spe…
chenzhuofu Oct 4, 2024
03eb516
Merge.
chenzhuofu Oct 4, 2024
3fbb364
feat: cherry-pick from https://github.com/flexflow/FlexFlow/pull/1517…
jiazhihao Oct 3, 2024
6482d76
fix: long request support
chenzhuofu Oct 4, 2024
622b8a8
Merge branch 'specscheduler' of github.com:flexflow/FlexFlow into spe…
chenzhuofu Oct 4, 2024
a23cddb
fix: memory leakage in file_loader
chenzhuofu Oct 5, 2024
e845953
Merge branch 'specscheduler' of github.com:flexflow/FlexFlow into spe…
chenzhuofu Oct 5, 2024
3574e51
feat: support inf slo ratio
chenzhuofu Oct 5, 2024
4accd43
Merge branch 'specscheduler' of github.com:flexflow/FlexFlow into spe…
chenzhuofu Oct 5, 2024
272a2e9
chore: minor
chenzhuofu Oct 5, 2024
29f5c69
fix: add logic of batch prefilling, request should be taken back and …
chenzhuofu Oct 6, 2024
dcb61c7
style: minor format
chenzhuofu Oct 6, 2024
1659fde
chore: minor info output
chenzhuofu Oct 6, 2024
a2a5174
chore: use unordered_map in argtopk
chenzhuofu Oct 7, 2024
00a98eb
chore: minor
chenzhuofu Oct 7, 2024
8a28da5
chore: add goodput report
chenzhuofu Oct 7, 2024
1e68324
chore: minor
chenzhuofu Oct 7, 2024
239fe17
chore: replace busy_waiting to condition_variable
chenzhuofu Oct 7, 2024
381a808
feat: make some tasks concurrent
chenzhuofu Oct 8, 2024
d9ff5ee
chore: add more profiling
chenzhuofu Oct 9, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@ python/flexflow/core/flexflow_cffi_header.py
*.pb.h
*.o
*.a
*.nsys-rep
*.nfs*

# Byte-compiled / optimized / DLL files
__pycache__/
Expand Down
8 changes: 7 additions & 1 deletion .gitmodules
Original file line number Diff line number Diff line change
Expand Up @@ -22,4 +22,10 @@
[submodule "deps/tokenizers-cpp"]
path = deps/tokenizers-cpp
url = https://github.com/mlc-ai/tokenizers-cpp.git
fetchRecurseSubmodules = true
fetchRecurseSubmodules = true
[submodule "deps/flashinfer"]
path = deps/flashinfer
url = https://github.com/flashinfer-ai/flashinfer.git
[submodule "deps/raft"]
path = deps/raft
url = https://github.com/rapidsai/raft.git
26 changes: 25 additions & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,12 @@ project(FlexFlow)

include(ExternalProject)

enable_language(CXX)
enable_language(CUDA)
if (CMAKE_CXX_COMPILER_VERSION VERSION_LESS 8)
message(FATAL_ERROR "Your C++ compiler is too old. Please upgrade to version 8 or higher.")
endif()

# Set policy CMP0074 to eliminate cmake warnings
cmake_policy(SET CMP0074 NEW)
cmake_policy(SET CMP0077 NEW)
Expand Down Expand Up @@ -128,6 +134,9 @@ list(APPEND CC_FLAGS
list(APPEND NVCC_FLAGS
-std=c++17)

list(APPEND NVCC_FLAGS
--expt-relaxed-constexpr
--extended-lambda)

add_compile_options(${CC_FLAGS})
set(CUDA_NVCC_FLAGS ${CUDA_NVCC_FLAGS} ${NVCC_FLAGS})
Expand Down Expand Up @@ -201,6 +210,12 @@ if(NOT BUILD_LEGION_ONLY)
# optional
include(optional)

set(CMAKE_PREFIX_PATH ${CMAKE_PREFIX_PATH} ${CMAKE_CURRENT_SOURCE_DIR}/deps/raft/cpp/build/install)
find_package(raft)
list(APPEND FLEXFLOW_INCLUDE_DIRS ${CMAKE_CURRENT_SOURCE_DIR}/deps/raft/cpp/include)

list(APPEND FLEXFLOW_INCLUDE_DIRS ${CMAKE_CURRENT_SOURCE_DIR}/deps/flashinfer/include)

if (FF_GPU_BACKEND STREQUAL "cuda")
list(APPEND FF_CC_FLAGS
-DFF_USE_CUDA)
Expand Down Expand Up @@ -290,6 +305,12 @@ if(NOT BUILD_LEGION_ONLY)
LIST_DIRECTORIES False
${FLEXFLOW_ROOT}/src/*.cu)

# tensorrt_llm custom allreduce
if(FF_USE_NCCL)
list(APPEND FLEXFLOW_INCLUDE_DIRS ${CMAKE_CURRENT_SOURCE_DIR}/deps/tensorrt_llm)
list(APPEND FLEXFLOW_GPU_SRC ${CMAKE_CURRENT_SOURCE_DIR}/deps/tensorrt_llm/tensorrt_llm/custom_allreduce_kernels.cu)
endif()

add_compile_definitions(FF_USE_CUDA)

if(BUILD_SHARED_LIBS)
Expand Down Expand Up @@ -397,6 +418,8 @@ if(NOT BUILD_LEGION_ONLY)
target_link_libraries(flexflow ${LEGION_LIBRARY} ${FLEXFLOW_EXT_LIBRARIES} nlohmann_json::nlohmann_json mpark_variant optional)
endif()

target_link_libraries(flexflow raft::raft)

#library api version, bump from time to time
set(SOVERSION 1)

Expand Down Expand Up @@ -425,7 +448,7 @@ if(NOT BUILD_LEGION_ONLY)
# generate the Legion Python bindings library. When building from pip, we need to do this post-install to prevent Legion from overwriting the path to the Legion shared library
add_custom_command(TARGET flexflow
POST_BUILD
COMMAND ${Python_EXECUTABLE} ${CMAKE_CURRENT_SOURCE_DIR}/deps/legion/bindings/python/setup.py build --cmake-build-dir ${Legion_BINARY_DIR}/runtime --prefix ${Legion_BINARY_DIR} --build-lib=${Legion_BINARY_DIR}/bindings/python ${Legion_PYTHON_EXTRA_INSTALL_ARGS}
COMMAND CMAKE_BUILD_DIR=${Legion_BINARY_DIR}/runtime CMAKE_INSTALL_PREFIX=${Legion_BINARY_DIR} ${Python_EXECUTABLE} ${CMAKE_CURRENT_SOURCE_DIR}/deps/legion/bindings/python/setup.py build --build-lib=${Legion_BINARY_DIR}/bindings/python ${Legion_PYTHON_EXTRA_INSTALL_ARGS}
WORKING_DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR}/deps/legion/bindings/python
)
# create flexflow_python interpreter. When building from pip, we install the FF_HOME/python/flexflow_python script instead.
Expand Down Expand Up @@ -558,6 +581,7 @@ if(NOT BUILD_LEGION_ONLY)
if(FF_BUILD_ALL_INFERENCE_EXAMPLES OR FF_BUILD_ALL_EXAMPLES)
add_subdirectory(inference/spec_infer)
add_subdirectory(inference/incr_decoding)
add_subdirectory(inference/trace_generator)
endif()


Expand Down
7 changes: 5 additions & 2 deletions FlexFlow.mk
Original file line number Diff line number Diff line change
Expand Up @@ -95,9 +95,12 @@ ifneq ($(strip $(FF_USE_PYTHON)), 1)
endif


INC_FLAGS += -I${FF_HOME}/include -I${FF_HOME}/inference -I${FF_HOME}/deps/optional/include -I${FF_HOME}/deps/variant/include -I${FF_HOME}/deps/json/include -I${FF_HOME}/deps/tokenizers-cpp/include -I${FF_HOME}/deps/tokenizers-cpp/sentencepiece/src
INC_FLAGS += -I${FF_HOME}/include -I${FF_HOME}/inference -I${FF_HOME}/deps/optional/include -I${FF_HOME}/deps/variant/include -I${FF_HOME}/deps/json/include -I${FF_HOME}/deps/tokenizers-cpp/include -I${FF_HOME}/deps/tokenizers-cpp/sentencepiece/src \
-I${FF_HOME}/deps/raft/cpp/include -I${FF_HOME}/deps/rmm/include -I${FF_HOME}/deps/spdlog/include \
-I${FF_HOME}/deps/flashinfer/include
CC_FLAGS += -DMAX_TENSOR_DIM=$(MAX_DIM) -DLEGION_MAX_RETURN_SIZE=32768
NVCC_FLAGS += -DMAX_TENSOR_DIM=$(MAX_DIM) -DLEGION_MAX_RETURN_SIZE=32768
NVCC_FLAGS += -DMAX_TENSOR_DIM=$(MAX_DIM) -DLEGION_MAX_RETURN_SIZE=32768 \
--expt-relaxed-constexpr --extended-lambda
HIPCC_FLAGS += -DMAX_TENSOR_DIM=$(MAX_DIM) -DLEGION_MAX_RETURN_SIZE=32768
GASNET_FLAGS +=
# For Point and Rect typedefs
Expand Down
200 changes: 74 additions & 126 deletions cmake/nccl.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -2,140 +2,88 @@ set(NCCL_NAME nccl)
# set(NCCL_CUDA_ARCH "-gencode=arch=compute_${CUDA_ARCH},code=sm_${CUDA_ARCH}")
# message("NCCL_CUDA_ARCH: ${NCCL_CUDA_ARCH}")

set(NCCL_URL "")
if((FF_USE_PREBUILT_NCCL OR FF_USE_ALL_PREBUILT_LIBRARIES) AND CMAKE_HOST_SYSTEM_PROCESSOR MATCHES "x86_64")
if(LINUX_VERSION MATCHES "20.04")
if (CUDA_VERSION VERSION_EQUAL "11.0")
set(NCCL_URL "https://github.com/flexflow/flexflow-third-party/releases/latest/download/nccl_ubuntu-20.04_11.0.3.tar.gz")
elseif(CUDA_VERSION VERSION_EQUAL "11.1")
set(NCCL_URL "https://github.com/flexflow/flexflow-third-party/releases/latest/download/nccl_ubuntu-20.04_11.1.1.tar.gz")
elseif(CUDA_VERSION VERSION_EQUAL "11.2")
set(NCCL_URL "https://github.com/flexflow/flexflow-third-party/releases/latest/download/nccl_ubuntu-20.04_11.2.2.tar.gz")
elseif(CUDA_VERSION VERSION_EQUAL "11.3")
set(NCCL_URL "https://github.com/flexflow/flexflow-third-party/releases/latest/download/nccl_ubuntu-20.04_11.3.1.tar.gz")
elseif(CUDA_VERSION VERSION_EQUAL "11.4")
set(NCCL_URL "https://github.com/flexflow/flexflow-third-party/releases/latest/download/nccl_ubuntu-20.04_11.4.3.tar.gz")
elseif(CUDA_VERSION VERSION_EQUAL "11.5")
set(NCCL_URL "https://github.com/flexflow/flexflow-third-party/releases/latest/download/nccl_ubuntu-20.04_11.5.2.tar.gz")
elseif(CUDA_VERSION VERSION_EQUAL "11.6")
set(NCCL_URL "https://github.com/flexflow/flexflow-third-party/releases/latest/download/nccl_ubuntu-20.04_11.6.2.tar.gz")
elseif(CUDA_VERSION VERSION_EQUAL "11.7")
set(NCCL_URL "https://github.com/flexflow/flexflow-third-party/releases/latest/download/nccl_ubuntu-20.04_11.7.0.tar.gz")
endif()
elseif(LINUX_VERSION MATCHES "18.04")
if (CUDA_VERSION VERSION_EQUAL "10.1")
set(NCCL_URL "https://github.com/flexflow/flexflow-third-party/releases/latest/download/nccl_ubuntu-18.04_10.1.243.tar.gz")
elseif (CUDA_VERSION VERSION_EQUAL "10.2")
set(NCCL_URL "https://github.com/flexflow/flexflow-third-party/releases/latest/download/nccl_ubuntu-18.04_10.2.89.tar.gz")
elseif (CUDA_VERSION VERSION_EQUAL "11.0")
set(NCCL_URL "https://github.com/flexflow/flexflow-third-party/releases/latest/download/nccl_ubuntu-18.04_11.0.3.tar.gz")
elseif(CUDA_VERSION VERSION_EQUAL "11.1")
set(NCCL_URL "https://github.com/flexflow/flexflow-third-party/releases/latest/download/nccl_ubuntu-18.04_11.1.1.tar.gz")
elseif(CUDA_VERSION VERSION_EQUAL "11.2")
set(NCCL_URL "https://github.com/flexflow/flexflow-third-party/releases/latest/download/nccl_ubuntu-18.04_11.2.2.tar.gz")
elseif(CUDA_VERSION VERSION_EQUAL "11.3")
set(NCCL_URL "https://github.com/flexflow/flexflow-third-party/releases/latest/download/nccl_ubuntu-18.04_11.3.1.tar.gz")
elseif(CUDA_VERSION VERSION_EQUAL "11.4")
set(NCCL_URL "https://github.com/flexflow/flexflow-third-party/releases/latest/download/nccl_ubuntu-18.04_11.4.3.tar.gz")
elseif(CUDA_VERSION VERSION_EQUAL "11.5")
set(NCCL_URL "https://github.com/flexflow/flexflow-third-party/releases/latest/download/nccl_ubuntu-18.04_11.5.2.tar.gz")
elseif(CUDA_VERSION VERSION_EQUAL "11.6")
set(NCCL_URL "https://github.com/flexflow/flexflow-third-party/releases/latest/download/nccl_ubuntu-18.04_11.6.2.tar.gz")
elseif(CUDA_VERSION VERSION_EQUAL "11.7")
set(NCCL_URL "https://github.com/flexflow/flexflow-third-party/releases/latest/download/nccl_ubuntu-18.04_11.7.0.tar.gz")
endif()
endif()
if(NCCL_PATH)
set(NCCL_ROOT ${NCCL_PATH})
else()
# if NCCL_PATH is not set, let's try to find it in the CUDA root
set(NCCL_ROOT ${CUDA_TOOLKIT_ROOT_DIR})
endif()

if(NCCL_URL)
# Download and import pre-compiled NCCL library
message(STATUS "Using pre-compiled NCCL library")
message(STATUS "NCCL_URL: ${NCCL_URL}")
find_library(NCCL_LIBRARY
NAMES libnccl${LIBEXT}
PATHS ${NCCL_ROOT} ${CUDA_ROOT}
PATH_SUFFIXES lib lib64
DOC "NCCL library." )

include(FetchContent)
FetchContent_Declare(${NCCL_NAME}
URL ${NCCL_URL}
CONFIGURE_COMMAND ""
BUILD_COMMAND ""
)
FetchContent_GetProperties(${NCCL_NAME})
if(NOT ${NCCL_NAME}_POPULATED)
FetchContent_Populate(${NCCL_NAME})
endif()

set(NCCL_FOLDER_PATH ${${NCCL_NAME}_SOURCE_DIR}/deps/${NCCL_NAME})
set(NCCL_INCLUDE_DIR ${NCCL_FOLDER_PATH}/include)
set(NCCL_LIB_DIR ${NCCL_FOLDER_PATH}/lib)
message(STATUS "NCCL library path: ${NCCL_FOLDER_PATH}")
add_library(nccl SHARED IMPORTED)
set_target_properties(nccl PROPERTIES IMPORTED_LOCATION ${NCCL_FOLDER_PATH})
find_path(NCCL_INCLUDE_DIR
NAMES nccl.h
HINTS ${NCCL_ROOT}
PATH_SUFFIXES include
DOC "NCCL include directory.")

list(APPEND FLEXFLOW_INCLUDE_DIRS ${NCCL_INCLUDE_DIR})
list(APPEND FLEXFLOW_EXT_LIBRARIES ${NCCL_LIB_DIR}/libnccl${LIBEXT})
install(DIRECTORY ${NCCL_INCLUDE_DIR}/ DESTINATION include)
install(DIRECTORY ${NCCL_LIB_DIR}/ DESTINATION lib PATTERN "pkgconfig" EXCLUDE)

else()
if(NCCL_PATH)
set(NCCL_ROOT ${NCCL_PATH})
# find NCCL, set NCCL lib and include
if(NCCL_LIBRARY AND NCCL_INCLUDE_DIR)
set(NCCL_FOUND ON)
set(NCCL_LIBRARIES ${NCCL_LIBRARY})
set(NCCL_INCLUDE_DIRS ${NCCL_INCLUDE_DIR})

# Check NCCL version
if(EXISTS "${NCCL_INCLUDE_DIR}/nccl.h")
file(STRINGS "${NCCL_INCLUDE_DIR}/nccl.h" NCCL_VERSION_DEFINES
REGEX "#define NCCL_MAJOR [0-9]+" )
file(STRINGS "${NCCL_INCLUDE_DIR}/nccl.h" NCCL_VERSION_DEFINES2
REGEX "#define NCCL_MINOR [0-9]+" )
string(REGEX MATCH "([0-9]+)" NCCL_MAJOR ${NCCL_VERSION_DEFINES})
string(REGEX MATCH "([0-9]+)" NCCL_MINOR ${NCCL_VERSION_DEFINES2})
set(NCCL_VERSION "${NCCL_MAJOR}.${NCCL_MINOR}")
if(NCCL_VERSION VERSION_LESS 2.23)
set(NCCL_OLD TRUE)
else()
set(NCCL_OLD FALSE)
endif()
message(STATUS "Found NCCL version: ${NCCL_VERSION}")
else()
# if NCCL_PATH is not set, let's try to find it in the CUDA root
set(NCCL_ROOT ${CUDA_TOOLKIT_ROOT_DIR})
message(WARNING "NCCL header not found, unable to determine version")
set(NCCL_OLD TRUE) # Assume old version if we can't determine
endif()

find_library(NCCL_LIBRARY
NAMES libnccl${LIBEXT}
PATHS ${NCCL_ROOT} ${CUDA_ROOT}
PATH_SUFFIXES lib lib64
DOC "NCCL library." )
endif()

find_path(NCCL_INCLUDE_DIR
NAMES nccl.h
HINTS ${NCCL_ROOT}
PATH_SUFFIXES include
DOC "NCCL include directory.")

# find NCCL, set NCCL lib and include
if(NCCL_LIBRARY AND NCCL_INCLUDE_DIR)
set(NCCL_FOUND ON)
set(NCCL_LIBRARIES ${NCCL_LIBRARY})
set(NCCL_INCLUDE_DIRS ${NCCL_INCLUDE_DIR})
endif()

# find NCCL
if(NCCL_FOUND)
list(APPEND FLEXFLOW_EXT_LIBRARIES ${NCCL_LIBRARIES})
list(APPEND FLEXFLOW_INCLUDE_DIRS ${NCCL_INCLUDE_DIRS})
message( STATUS "NCCL include : ${NCCL_INCLUDE_DIRS}" )
message( STATUS "NCCL libraries : ${NCCL_LIBRARIES}" )
add_library(nccl SHARED IMPORTED)

# Build NCCL from source
else()
message(STATUS "Building NCCL from source")
list(TRANSFORM CUDA_GENCODE PREPEND "NVCC_GENCODE=" OUTPUT_VARIABLE NCCL_BUILD_NVCC_GENCODE)

ExternalProject_Add(${NCCL_NAME}
SOURCE_DIR ${PROJECT_SOURCE_DIR}/deps/${NCCL_NAME}
PREFIX ${CMAKE_BINARY_DIR}/deps/${NCCL_NAME}
INSTALL_DIR ${CMAKE_BINARY_DIR}/deps/${NCCL_NAME}
BUILD_BYPRODUCTS ${CMAKE_BINARY_DIR}/deps/${NCCL_NAME}/lib/libnccl${LIBEXT}
INSTALL_COMMAND ""
CONFIGURE_COMMAND ""
BUILD_COMMAND make src.build "${NCCL_BUILD_NVCC_GENCODE}" "CUDA_HOME=${CUDA_TOOLKIT_ROOT_DIR}" "BUILDDIR=${CMAKE_BINARY_DIR}/deps/${NCCL_NAME}"
BUILD_IN_SOURCE 1
)
# find NCCL
if(NCCL_FOUND AND (NOT NCCL_OLD OR CUDA_VERSION VERSION_LESS 12.0))
list(APPEND FLEXFLOW_EXT_LIBRARIES ${NCCL_LIBRARIES})
list(APPEND FLEXFLOW_INCLUDE_DIRS ${NCCL_INCLUDE_DIRS})
message( STATUS "NCCL include : ${NCCL_INCLUDE_DIRS}" )
message( STATUS "NCCL libraries : ${NCCL_LIBRARIES}" )
add_library(nccl SHARED IMPORTED)

# Build NCCL from source
else()
message(STATUS "Building NCCL from source")
list(TRANSFORM CUDA_GENCODE PREPEND "NVCC_GENCODE=" OUTPUT_VARIABLE NCCL_BUILD_NVCC_GENCODE)

ExternalProject_Get_Property(${NCCL_NAME} INSTALL_DIR)
message(STATUS "NCCL install dir: ${INSTALL_DIR}")
list(APPEND FLEXFLOW_INCLUDE_DIRS
${INSTALL_DIR}/include)
list(APPEND FLEXFLOW_EXT_LIBRARIES
${INSTALL_DIR}/lib/libnccl${LIBEXT})
set_directory_properties(PROPERTIES ADDITIONAL_CLEAN_FILES "${CMAKE_BINARY_DIR}/deps/${NCCL_NAME}/lib/")

install(DIRECTORY ${CMAKE_BINARY_DIR}/deps/${NCCL_NAME}/include/ DESTINATION include)
install(DIRECTORY ${CMAKE_BINARY_DIR}/deps/${NCCL_NAME}/lib/ DESTINATION lib PATTERN "pkgconfig" EXCLUDE)
set(NCCL_BUILD_CMD make src.build "${NCCL_BUILD_NVCC_GENCODE}" "CUDA_HOME=${CUDA_TOOLKIT_ROOT_DIR}" "BUILDDIR=${CMAKE_BINARY_DIR}/deps/${NCCL_NAME}")
if(DEFINED ENV{MAKEFLAGS})
set(NCCL_BUILD_CMD ${CMAKE_COMMAND} -E env MAKEFLAGS=$ENV{MAKEFLAGS} ${NCCL_BUILD_CMD})
endif()
ExternalProject_Add(${NCCL_NAME}
SOURCE_DIR ${PROJECT_SOURCE_DIR}/deps/${NCCL_NAME}
PREFIX ${CMAKE_BINARY_DIR}/deps/${NCCL_NAME}
INSTALL_DIR ${CMAKE_BINARY_DIR}/deps/${NCCL_NAME}
BUILD_BYPRODUCTS ${CMAKE_BINARY_DIR}/deps/${NCCL_NAME}/lib/libnccl${LIBEXT}
INSTALL_COMMAND ""
CONFIGURE_COMMAND ""
BUILD_COMMAND ${NCCL_BUILD_CMD}
BUILD_IN_SOURCE 1
)

ExternalProject_Get_Property(${NCCL_NAME} INSTALL_DIR)
message(STATUS "NCCL install dir: ${INSTALL_DIR}")
list(APPEND FLEXFLOW_INCLUDE_DIRS
${INSTALL_DIR}/include)
list(APPEND FLEXFLOW_EXT_LIBRARIES
${INSTALL_DIR}/lib/libnccl${LIBEXT})
set_directory_properties(PROPERTIES ADDITIONAL_CLEAN_FILES "${CMAKE_BINARY_DIR}/deps/${NCCL_NAME}/lib/")

install(DIRECTORY ${CMAKE_BINARY_DIR}/deps/${NCCL_NAME}/include/ DESTINATION include)
install(DIRECTORY ${CMAKE_BINARY_DIR}/deps/${NCCL_NAME}/lib/ DESTINATION lib PATTERN "pkgconfig" EXCLUDE)
endif()
5 changes: 5 additions & 0 deletions config/config.linux
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,11 @@ function get_build_configs() {
BUILD_CONFIGS="FF_CUDA_ARCH=${FF_CUDA_ARCH} FF_HIP_ARCH=${FF_HIP_ARCH} CUDA_DIR=${CUDA_DIR} CUDNN_DIR=${CUDNN_DIR} CUBLAS_DIR=${CUBLAS_DIR} CURAND_DIR=${CURAND_DIR} NCCL_DIR=${NCCL_DIR} FF_USE_PYTHON=${FF_USE_PYTHON} BUILD_LEGION_ONLY=${BUILD_LEGION_ONLY} FF_GASNET_CONDUIT=${FF_GASNET_CONDUIT} UCX_DIR=${UCX_DIR} FF_LEGION_NETWORKS=${FF_LEGION_NETWORKS} FF_BUILD_ALL_EXAMPLES=${FF_BUILD_ALL_EXAMPLES} FF_BUILD_ALL_INFERENCE_EXAMPLES=${FF_BUILD_ALL_INFERENCE_EXAMPLES} FF_BUILD_UNIT_TESTS=${FF_BUILD_UNIT_TESTS} FF_USE_PREBUILT_NCCL=${FF_USE_PREBUILT_NCCL} FF_USE_PREBUILT_LEGION=${FF_USE_PREBUILT_LEGION} FF_USE_ALL_PREBUILT_LIBRARIES=${FF_USE_ALL_PREBUILT_LIBRARIES} FF_USE_AVX2=${FF_USE_AVX2} FF_MAX_DIM=${FF_MAX_DIM} ROCM_PATH=${ROCM_PATH} FF_GPU_BACKEND=${FF_GPU_BACKEND} INSTALL_DIR=${INSTALL_DIR}"
}

#install raft
echo "Building raft dependency ..."
INSTALL_PREFIX=./install $(dirname $0)/../deps/raft/build.sh libraft > /dev/null
echo "Building raft dependency ... Done"

if [[ -n "$1" && ( "$1" == "CMAKE_FLAGS" || "$1" == "CUDA_PATH" ) ]]; then
. $(dirname $0)/config.inc
# Passing CMAKE_FLAGS or CUDA_PATH as $1 will print the value of the CMAKE_FLAGS/CUDA_PATH variable,
Expand Down
1 change: 1 addition & 0 deletions deps/flashinfer
Submodule flashinfer added at be6bf5
2 changes: 1 addition & 1 deletion deps/legion
Submodule legion updated from 24e8c4 to 0d32b3
2 changes: 1 addition & 1 deletion deps/nccl
Submodule nccl updated 188 files
1 change: 1 addition & 0 deletions deps/raft
Submodule raft added at b79f15
5 changes: 5 additions & 0 deletions deps/tensorrt_llm/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
## Custom AllReduce Implementation

This is an adapted version of the custom AllReduce plugin from NVIDIA's [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM) repository.

To replace the NCCL AllReduce call, we should also add a CUDA IPC support to the custom AllReduce usage. Our IPC&AllReduce implementation is referenced from [mlc-ai/relax](https://github.com/mlc-ai/relax).
Loading
Loading