Update README.md

epfml · Jul 3, 2023 · 19bb543 · 19bb543
1 parent b618336
commit 19bb543
Showing 1 changed file with 13 additions and 0 deletions.
diff --git a/README.md b/README.md
@@ -42,6 +42,19 @@ Usage:
 +     optimizer_step(optimizer, powersgd)
 ```
 
+## Differences with the paper version
+
+The version in this code base is a slight improvement over the version in the PowerSGD paper.
+It looks a bit like Algorithm 2 in [this follow-up paper](https://arxiv.org/pdf/2008.01425.pdf).
+
+We found that there are two ways to control the approximation quality in PowerSGD: the first is the 'rank' of the approximation, and the second is the 'number of iterations'. Because the cost of orthogonalisation grows as $O(\text{rank}^2)$, increasing the rank can become inefficient, leaving changing the number of iterations as the best option.
+
+In the original PowerSGD paper, more iterations only improves the quality of the rank-k approximation, as the approximation converges to the "best rank k approximation". In the [follow-up paper](https://arxiv.org/pdf/2008.01425.pdf), intermediate results from these power iterations are all used, effectively increasing the rank as the number of iterations grows.
+
+In the original PowerSGD paper, we used two iterations per SGD step (a left and a right iteration). In this setting, there is not much of a difference. The difference appears when you use more power iteration steps per SGD step.
+
+
+
 ## PyTorch implementation
 PyTorch features an implementation of PowerSGD as a [communucation hook](https://pytorch.org/docs/stable/ddp_comm_hooks.html) for `DistributedDataParallel` models.
 Because of the integration with DDP, the code is more involved than the code in this repository.