Update changelog (#126)

Forgot to do this in the earlier PR. Also replaces GitHub markdown checkmarks in the PyPI description with the unicode char.
SHI-Labs · May 2, 2024 · 8422e23 · 8422e23
1 parent f92e750
commit 8422e23
Show file tree

Hide file tree

Showing 2 changed files with 18 additions and 14 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,26 +1,30 @@
 # Changelog
 
 ## [Main branch]
-* Fused neighborhood attention (FNA) kernels (forward pass only for now)
+
+## [0.17.0] - 2024-05-02
+* [Fused neighborhood attention](https://github.com/SHI-Labs/NATTEN/tree/main/docs/fna) (FNA) kernels
   * 1D, 2D and 3D Neighborhood Attention are supported,
   * Causal neighborhood attention is implemented,
   * Window (kernel) size, dilation, and causality can be defined *per-axis*,
   * All GPU architectures since Maxwell (SM50) are supported,
     * SM50 up to SM70 are SIMT-only, but support both FP16 and FP32,
     * SM70 and SM75 target Tensor Cores in FP16, and SIMT-style in FP32,
     * SM80 and above target Tensor Cores in FP16, BF16, and FP32.
-  * Relative positional biases are implemented (not defined for causal masking yet),
+  * NATTEN [Auto-tuner](https://github.com/SHI-Labs/NATTEN/blob/main/docs/fna/autotuner.md),
+  * Memory preferences and [KV parallelism](https://github.com/SHI-Labs/NATTEN/blob/main/docs/fna/kv-parallelism.md) modes,
+  * Relative positional biases are only supported in forward pass (inference).
   * Memory layout in FNA is different from existing kernels (`[B, *, heads, dim]` instead of `[B, heads, *, dim]`.)
     * Eventually this layout can skip over the permute/explicit reshape step in the attention module following
     the QKV projection.
+    * For more refer to [Fused vs unfused NA](docs/fna/fused-vs-unfused.md). 
 * Naive kernels now implement and allow causal masking,
 * Naive kernels (CPU and CUDA) now allow varying parameters (window size, dilation, causal) across axes,
 * Major bug fix in Volta GEMM kernels
   * The epilogue was different for Volta, and it slipped through unit tests,
   * Tests are now more aggressive, and the issue has been fixed.
 * Memory alignment bug in half RPB gradient kernels fixed
   * See [#97](https://github.com/SHI-Labs/NATTEN/issues/97).
-* Naive FP16 enabled for SM50-SM60.
 
 ## [0.15.1] - 2024-01-24
 * Attention tensors can now be views, which allows combining neighborhood and any other attention pattern (i.e. registers,

diff --git a/assets/README_pypi.md b/assets/README_pypi.md
@@ -66,9 +66,9 @@ compatible with NATTEN.
 
 | Problem space | CPU Backend | Causal masking     | Varying parameters | Relative positional bias | Autograd support         |
 | -----------   | ----------- | ------------------ | ------------------ | ------------------------ | ------------------------ |
-| 1D            | naive       | :white_check_mark: | :white_check_mark: | :white_check_mark:       | Forward and reverse mode |
-| 2D            | naive       | :white_check_mark: | :white_check_mark: | :white_check_mark:       | Forward and reverse mode |
-| 3D            | naive       | :white_check_mark: | :white_check_mark: | :white_check_mark:       | Forward and reverse mode |
+| 1D            | naive       | &#10003;           | &#10003;           | &#10003;                 | Forward and reverse mode |
+| 2D            | naive       | &#10003;           | &#10003;           | &#10003;                 | Forward and reverse mode |
+| 3D            | naive       | &#10003;           | &#10003;           | &#10003;                 | Forward and reverse mode |
 
 Notes:
 * Forward mode autograd does not support relative positional biases and causal masking yet.
@@ -78,14 +78,14 @@ Notes:
 
 | Problem space | CUDA Backend | Causal masking     | Varying parameters | Relative positional bias | Autograd support         | Min. Arch |
 | -----------   | -----------  | ------------------ | ------------------ | ------------------------ | ------------------------ | --------- |
-| 1D            | naive        | :white_check_mark: | :white_check_mark: | :white_check_mark:       | Forward and reverse mode | SM35      |
-| 2D            | naive        | :white_check_mark: | :white_check_mark: | :white_check_mark:       | Forward and reverse mode | SM35      |
-| 3D            | naive        | :white_check_mark: | :white_check_mark: | :white_check_mark:       | Forward and reverse mode | SM35      |
-| 1D            | gemm         | -                  | -                  | :white_check_mark:       | Forward and reverse mode | SM70      |
-| 2D            | gemm         | -                  | -                  | :white_check_mark:       | Forward and reverse mode | SM70      |
-| 1D            | fna          | :white_check_mark: | :white_check_mark: | :white_check_mark:       | Reverse mode             | SM50      |
-| 2D            | fna          | :white_check_mark: | :white_check_mark: | :white_check_mark:       | Reverse mode             | SM50      |
-| 3D            | fna          | :white_check_mark: | :white_check_mark: | :white_check_mark:       | Reverse mode             | SM50      |
+| 1D            | naive        | &#10003;           | &#10003;           | &#10003;                 | Forward and reverse mode | SM35      |
+| 2D            | naive        | &#10003;           | &#10003;           | &#10003;                 | Forward and reverse mode | SM35      |
+| 3D            | naive        | &#10003;           | &#10003;           | &#10003;                 | Forward and reverse mode | SM35      |
+| 1D            | gemm         | -                  | -                  | &#10003;                 | Forward and reverse mode | SM70      |
+| 2D            | gemm         | -                  | -                  | &#10003;                 | Forward and reverse mode | SM70      |
+| 1D            | fna          | &#10003;           | &#10003;           | &#10003;                 | Reverse mode             | SM50      |
+| 2D            | fna          | &#10003;           | &#10003;           | &#10003;                 | Reverse mode             | SM50      |
+| 3D            | fna          | &#10003;           | &#10003;           | &#10003;                 | Reverse mode             | SM50      |
 
 Notes: 
 * FP16 kernels are only available on SM50 and above*, and BF16 requires SM80 and above.