BYOL Single GPU implementation #1

pranavsinghps1 · 2021-07-25T15:46:02Z

Implementation of BYOL: https://arxiv.org/abs/2006.07733 on Single GPU
Issue #190

review-notebook-app · 2021-07-25T15:46:07Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

) Summary: Pull Request resolved: facebookresearch#343 Some basic changes to make this script work within FBinfra. 1. Register Manifold in PathManager. 1. In order to do #1, create fb/extra_scripts/convert_sharded_checkpoint.y and add necessary dependencies in TARGETS 1. Replace some torch.loads using PathManager. Reviewed By: prigoyal Differential Revision: D29166520 fbshipit-source-id: a61b4eb80d74526b0a7e2d38f973eb688b311a94

configs/config/dataset_catalog.json

configs/config/pretrain/byol/byol_1gpu.yaml

pranavsinghps1 · 2021-07-27T16:26:32Z

As per the commit fc1217d addressed the following:

Changed the file name to byol_8node_resnet.yaml
Changed the key for imagenet1k_folder to Vissl Spec entry.
Parameterised img_pil_color_distortion.py to add brightness, contrast, saturation, hue, color_jitter_probability, and gray_probability.
Changed LOG_FREQUENCY to 200 and CHECKPOINT_FREQUENCY to 10.
Changed Momentum to 0.99
Adjusted values for Gaussian Blur and Solarization.
Added interpolation variable to Random to RandomResizedCrop in config.

configs/config/pretrain/byol/byol_8node_resnet.yaml

vissl/hooks/byol_hooks.py

vissl/losses/byol_loss.py

Byol updates #4

iseessel · 2021-08-10T13:20:56Z

vissl/hooks/byol_hooks.py

+    def __init__(self, base_momentum: float, shuffle_batch: bool = True):
+        super().__init__()
+        self.base_momentum = base_momentum
+        self.is_distributed = False


Question: Do we need this?

iseessel · 2021-08-10T13:21:08Z

vissl/hooks/byol_hooks.py

+
+class BYOLHook(ClassyHook):
+    """
+    TODO: Update description


Update description to BYOL.

iseessel · 2021-08-10T13:21:40Z

vissl/hooks/byol_hooks.py

+
+    @staticmethod
+    def cosine_decay(training_iter, max_iters, initial_value):
+        # TODO: Why do we need this min statement?


Comment this method.

Put types in the method.

iseessel · 2021-08-10T13:22:11Z

vissl/hooks/byol_hooks.py

+        return initial_value * cosine_decay_value
+
+    @staticmethod
+    def target_ema(training_iter, base_ema, max_iters):


Let's comment every method in the byol_hooks.py and byol_losses.py and make sure they all have type hints.

iseessel · 2021-08-10T13:22:50Z

vissl/hooks/byol_hooks.py

+
+    def _build_byol_target_network(self, task: tasks.ClassyTask) -> None:
+        """
+        Create the model replica called the target. This will slowly track


Improve comment. Something like: "Target network is exponential moving average of online network, ... "

iseessel · 2021-08-10T13:23:29Z

vissl/hooks/byol_hooks.py

+    @torch.no_grad()
+    def on_forward(self, task: tasks.ClassyTask) -> None:
+        """
+        - Update the target model.


Update comment for BYOL (this was copy/pasted from moco).

iseessel · 2021-08-10T13:24:41Z

vissl/losses/byol_loss.py

+@register_loss("byol_loss")
+class BYOLLoss(ClassyLoss):
+    """
+    TODO: change description


Change loss description.

iseessel · 2021-08-10T13:24:47Z

vissl/losses/byol_loss.py

+    and https://github.com/facebookresearch/moco for a reference implementation, reused here
+
+    Config params:
+        embedding_dim (int): head output output dimension


Do we need/use these vars?

iseessel · 2021-08-10T13:24:59Z

vissl/losses/byol_loss.py

+    "_BYOLLossConfig", ["embedding_dim", "momentum"]
+)
+
+def regression_loss(x, y):


Type-hints + comments for all these functions.

iseessel · 2021-08-10T13:25:18Z

vissl/losses/byol_loss.py

+    @classmethod
+    def from_config(cls, config: BYOLLossConfig):
+        """
+        Instantiates BYOLLoss from configuration.


Put in the config options in the docstring here.

iseessel · 2021-08-10T13:25:37Z

vissl/losses/byol_loss.py

+
+    def forward(self, online_network_prediction: torch.Tensor, *args, **kwargs) -> torch.Tensor:
+        """
+        Given the encoder queries, the key and the queue of the previous queries,


This comment I think is copy/pasted.

iseessel · 2021-08-10T13:27:31Z

vissl/hooks/byol_hooks.py

+        self.is_distributed = False
+
+        self.momentum = None
+        self.inv_momentum = None


Remove this.

iseessel · 2021-08-10T13:27:57Z

vissl/hooks/byol_hooks.py

+
+        self.momentum = None
+        self.inv_momentum = None
+        self.total_iters = None


Rename this max_iters.

pranavsinghps1 added 22 commits July 11, 2021 15:18

Added BYOL Model

7b46f2f

updated byol 1node milestones

3def1a0

updated byol 1node milestones

6d0d4aa

updated byol 1node milestones

58b89b3

updated num_updates in 1 node byol

700b315

updated resnet config in 1 node byol

44ae527

Added BYOL Yaml

9bd27b3

Updated batch Size for BYOL config

bbe11f0

Updated batch Size for BYOL config

5b9069a

Updated lr for BYOL

0f3cc04

Added lr decay

99e15ea

Changed optimiser to LARS

0477cca

Changed optimiser scheduling

e7f143d

Changed optimiser scheduling

283f342

Changed optimiser scheduling

d9bf756

Added BYOL model for 1 Gpu

8851ab8

Added warmup and updated lr

fde0566

Added lr scaling

4875b4e

Added Max and Warmup iterations

e22317e

Syncing

1e6ca13

Added BYOL

65b6bb4

Updated MLP

0a2df62

iseessel requested changes Jul 26, 2021

View reviewed changes

pranavsinghps1 added 4 commits July 27, 2021 19:59

Added p value corrections for Gaussian Blur and Random Solarize

5c1cc45

Added p value corrections for Gaussian Blur and Random Solarize

3d1d8eb

Parameterized img_pil_color_distortion.py

df24a43

Parameterized img_pil_color_distortion.py

fc1217d

Added Byol Transfer, Changed BYOL loss, updated lr and wd

f9cc9aa

pranavsinghps1 requested a review from iseessel August 9, 2021 07:24

Updated Checkpointing

53bf3fc