Skip to content

Commit

Permalink
Update changelog (#126)
Browse files Browse the repository at this point in the history
Forgot to do this in the earlier PR.

Also replaces GitHub markdown checkmarks in the PyPI description with
the unicode char.
  • Loading branch information
alihassanijr authored May 2, 2024
1 parent f92e750 commit 8422e23
Show file tree
Hide file tree
Showing 2 changed files with 18 additions and 14 deletions.
10 changes: 7 additions & 3 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,26 +1,30 @@
# Changelog

## [Main branch]
* Fused neighborhood attention (FNA) kernels (forward pass only for now)

## [0.17.0] - 2024-05-02
* [Fused neighborhood attention](https://github.com/SHI-Labs/NATTEN/tree/main/docs/fna) (FNA) kernels
* 1D, 2D and 3D Neighborhood Attention are supported,
* Causal neighborhood attention is implemented,
* Window (kernel) size, dilation, and causality can be defined *per-axis*,
* All GPU architectures since Maxwell (SM50) are supported,
* SM50 up to SM70 are SIMT-only, but support both FP16 and FP32,
* SM70 and SM75 target Tensor Cores in FP16, and SIMT-style in FP32,
* SM80 and above target Tensor Cores in FP16, BF16, and FP32.
* Relative positional biases are implemented (not defined for causal masking yet),
* NATTEN [Auto-tuner](https://github.com/SHI-Labs/NATTEN/blob/main/docs/fna/autotuner.md),
* Memory preferences and [KV parallelism](https://github.com/SHI-Labs/NATTEN/blob/main/docs/fna/kv-parallelism.md) modes,
* Relative positional biases are only supported in forward pass (inference).
* Memory layout in FNA is different from existing kernels (`[B, *, heads, dim]` instead of `[B, heads, *, dim]`.)
* Eventually this layout can skip over the permute/explicit reshape step in the attention module following
the QKV projection.
* For more refer to [Fused vs unfused NA](docs/fna/fused-vs-unfused.md).
* Naive kernels now implement and allow causal masking,
* Naive kernels (CPU and CUDA) now allow varying parameters (window size, dilation, causal) across axes,
* Major bug fix in Volta GEMM kernels
* The epilogue was different for Volta, and it slipped through unit tests,
* Tests are now more aggressive, and the issue has been fixed.
* Memory alignment bug in half RPB gradient kernels fixed
* See [#97](https://github.com/SHI-Labs/NATTEN/issues/97).
* Naive FP16 enabled for SM50-SM60.

## [0.15.1] - 2024-01-24
* Attention tensors can now be views, which allows combining neighborhood and any other attention pattern (i.e. registers,
Expand Down
22 changes: 11 additions & 11 deletions assets/README_pypi.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,9 +66,9 @@ compatible with NATTEN.

| Problem space | CPU Backend | Causal masking | Varying parameters | Relative positional bias | Autograd support |
| ----------- | ----------- | ------------------ | ------------------ | ------------------------ | ------------------------ |
| 1D | naive | :white_check_mark: | :white_check_mark: | :white_check_mark: | Forward and reverse mode |
| 2D | naive | :white_check_mark: | :white_check_mark: | :white_check_mark: | Forward and reverse mode |
| 3D | naive | :white_check_mark: | :white_check_mark: | :white_check_mark: | Forward and reverse mode |
| 1D | naive | ✓ | ✓ | ✓ | Forward and reverse mode |
| 2D | naive | ✓ | ✓ | ✓ | Forward and reverse mode |
| 3D | naive | ✓ | ✓ | ✓ | Forward and reverse mode |

Notes:
* Forward mode autograd does not support relative positional biases and causal masking yet.
Expand All @@ -78,14 +78,14 @@ Notes:

| Problem space | CUDA Backend | Causal masking | Varying parameters | Relative positional bias | Autograd support | Min. Arch |
| ----------- | ----------- | ------------------ | ------------------ | ------------------------ | ------------------------ | --------- |
| 1D | naive | :white_check_mark: | :white_check_mark: | :white_check_mark: | Forward and reverse mode | SM35 |
| 2D | naive | :white_check_mark: | :white_check_mark: | :white_check_mark: | Forward and reverse mode | SM35 |
| 3D | naive | :white_check_mark: | :white_check_mark: | :white_check_mark: | Forward and reverse mode | SM35 |
| 1D | gemm | - | - | :white_check_mark: | Forward and reverse mode | SM70 |
| 2D | gemm | - | - | :white_check_mark: | Forward and reverse mode | SM70 |
| 1D | fna | :white_check_mark: | :white_check_mark: | :white_check_mark: | Reverse mode | SM50 |
| 2D | fna | :white_check_mark: | :white_check_mark: | :white_check_mark: | Reverse mode | SM50 |
| 3D | fna | :white_check_mark: | :white_check_mark: | :white_check_mark: | Reverse mode | SM50 |
| 1D | naive | ✓ | ✓ | ✓ | Forward and reverse mode | SM35 |
| 2D | naive | ✓ | ✓ | ✓ | Forward and reverse mode | SM35 |
| 3D | naive | ✓ | ✓ | ✓ | Forward and reverse mode | SM35 |
| 1D | gemm | - | - | ✓ | Forward and reverse mode | SM70 |
| 2D | gemm | - | - | ✓ | Forward and reverse mode | SM70 |
| 1D | fna | ✓ | ✓ | ✓ | Reverse mode | SM50 |
| 2D | fna | ✓ | ✓ | ✓ | Reverse mode | SM50 |
| 3D | fna | ✓ | ✓ | ✓ | Reverse mode | SM50 |

Notes:
* FP16 kernels are only available on SM50 and above*, and BF16 requires SM80 and above.
Expand Down

0 comments on commit 8422e23

Please sign in to comment.