-
Notifications
You must be signed in to change notification settings - Fork 224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add interface for differentiating inputs and weights in CG & PCG #1493
Add interface for differentiating inputs and weights in CG & PCG #1493
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 75 of 75 files at r1, all commit messages.
Reviewable status: all files reviewed, 6 unresolved discussions (waiting on @lockshaw and @wmdi)
lib/models/src/models/transformer.cc
line 93 at r1 (raw file):
vdim, config.dropout, /*bias=*/false);
Should we make bias
false by default?
lib/op-attrs/include/op-attrs/ops/batch_matmul.h
line 12 at r1 (raw file):
namespace FlexFlow { CHECK_VALID_OP_ATTR(BatchMatmulAttrs);
What does this do / was there any reason it wasn't there before?
lib/pcg/include/pcg/computation_graph_builder.h
line 166 at r1 (raw file):
std::optional<InitializerAttrs> const &bias_initializer = std::nullopt, std::optional<std::string> const &name = std::nullopt, std::optional<std::string> const &projection_name = std::nullopt,
why are names necessary?
lib/pcg/src/pcg/computation_graph.cc
line 25 at r1 (raw file):
} LayerAddedResult add_layer(ComputationGraph &computation_graph,
Why a new interface?
lib/op-attrs/test/src/op-attrs/ops/conv_2d.cc
line 23 at r1 (raw file):
SUBCASE("with bias") { Conv2DAttrs attrs = make_attrs(true);
Suggestion:
make_attrs(/*use_bias=*/true)
lib/op-attrs/test/src/op-attrs/ops/conv_2d.cc
line 37 at r1 (raw file):
SUBCASE("without bias") { Conv2DAttrs attrs = make_attrs(false);
Suggestion:
make_attrs(/*use_bias=*/false)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 74 of 75 files reviewed, 6 unresolved discussions (waiting on @reyna-abhyankar and @wmdi)
lib/models/src/models/transformer.cc
line 93 at r1 (raw file):
Previously, reyna-abhyankar (Reyna Abhyankar) wrote…
Should we make
bias
false by default?
Probably not, since bias=true
is the pytorch default: https://pytorch.org/docs/stable/generated/torch.nn.MultiheadAttention.html
lib/op-attrs/include/op-attrs/ops/batch_matmul.h
line 12 at r1 (raw file):
Previously, reyna-abhyankar (Reyna Abhyankar) wrote…
What does this do / was there any reason it wasn't there before?
It just checks a couple basic properties (mainly just value/copy semantics) that honestly probably aren't all that necessary anymore now that we have an actual testing framework in place. I'll leave it in for now for consistency with the other operators, but I've created an issue to eventually go and remove them: #1498
lib/pcg/include/pcg/computation_graph_builder.h
line 166 at r1 (raw file):
Previously, reyna-abhyankar (Reyna Abhyankar) wrote…
why are names necessary?
For testing--currently names are the way to retrieve layer_guid_t
s from a ComputationGraph
built by ComputationGraphBuilder
--see https://github.com/lockshaw/FlexFlow/blob/3440b3e669943b69320b092fdd1e2975dbbf5496/lib/pcg/test/src/pcg/computation_graph.cc#L155-L204 for an example.
Currently there's a bit of awkwardness in that ComputationGraphBuilder
is a bit high-level for a lot of tests (has the ability to auto-insert cast and broadcast operators, auto-create weights, etc. which makes getting layer_guid_t
s and tensor_guid_t
s a bit difficult and makes reasoning about what a test actually does a bit more challenging than it should be), but the low-level graph interface is often way too verbose. The solution is likely to add an interface in the middle (between ComputationGraphBuilder
and the raw graph interface--tracked in #1499 and somewhat related to #1474), but I'm not yet ready to implement it, so for now we're using a balance of the two interfaces while I figure out a better long-term solution
lib/pcg/src/pcg/computation_graph.cc
line 25 at r1 (raw file):
Previously, reyna-abhyankar (Reyna Abhyankar) wrote…
Why a new interface?
This interface actually has been there for a while, it just wasn't implemented. That said, there is good reason for having interfaces between the raw graph interface and ComputationGraphBuilder
, as outlined in https://reviewable.io/reviews/flexflow/FlexFlow/1493#-O6UVfTk8CqARIr-WYgQ
lib/op-attrs/test/src/op-attrs/ops/conv_2d.cc
line 23 at r1 (raw file):
SUBCASE("with bias") { Conv2DAttrs attrs = make_attrs(true);
Done.
lib/op-attrs/test/src/op-attrs/ops/conv_2d.cc
line 37 at r1 (raw file):
SUBCASE("without bias") { Conv2DAttrs attrs = make_attrs(false);
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r2, all commit messages.
Reviewable status: all files reviewed, 6 unresolved discussions (waiting on @reyna-abhyankar)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! all files reviewed, all discussions resolved (waiting on @lockshaw)
lib/pcg/include/pcg/computation_graph_builder.h
line 166 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
For testing--currently names are the way to retrieve
layer_guid_t
s from aComputationGraph
built byComputationGraphBuilder
--see https://github.com/lockshaw/FlexFlow/blob/3440b3e669943b69320b092fdd1e2975dbbf5496/lib/pcg/test/src/pcg/computation_graph.cc#L155-L204 for an example.Currently there's a bit of awkwardness in that
ComputationGraphBuilder
is a bit high-level for a lot of tests (has the ability to auto-insert cast and broadcast operators, auto-create weights, etc. which makes gettinglayer_guid_t
s andtensor_guid_t
s a bit difficult and makes reasoning about what a test actually does a bit more challenging than it should be), but the low-level graph interface is often way too verbose. The solution is likely to add an interface in the middle (betweenComputationGraphBuilder
and the raw graph interface--tracked in #1499 and somewhat related to #1474), but I'm not yet ready to implement it, so for now we're using a balance of the two interfaces while I figure out a better long-term solution
Yeah, that's fair.
Description of changes:
PCG
andop-attrs
test suite organizingtransformer
implementation (MHAs should not usebias
)Related Issues:
Linked Issues:
Issues closed by this PR:
This change is