Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Adds a new merge method
nuslerp
. This method allows for a superset of the functionality ofslerp
. If provided with a base modelnuslerp
will perform spherical interpolation of the task vectors. While the originalslerp
always flattens weight tensors into a single dimensionnuslerp
can also do row-wise and column-wise interpolation of tensors.This method remedies one of my long-standing gripes with how I implemented
slerp
. Instead of taking at
parameter and usingbase_model
to specify which is the "first" model,nuslerp
simply takes aweight
parameter for each model and computes the interpolation factort
internally. This makes it fit the conventions of the other merge methods much better. Theweight
parameter behaves in the same fashion as it does formerge_method: linear
withnormalize: true
.The idea to add task vector SLERP is inspired by DeepMind's great use of it in their WARP paper.