Sharing Parameters across PyTorch modules

MC
2 min readJul 30, 2024

--

Hello Medium! This is my first post, and so I will start with something very simple and introduce how one can share parameters between PyTorch modules.

What is Parameter Sharing?

Parameter sharing in PyTorch allows different parts of your neural network to use the same weights (i.e. parameters). This technique can:

  • Reduce the total number of parameters in your model
  • Help prevent overfitting (parameter sharing is basically a form of inductive bias)
  • Decrease memory usage

Let’s dive into how it works with some simple examples.

How to Share Parameters

In PyTorch, sharing parameters is as simple as assigning the same nn.Parameter object to multiple modules. Here's a basic example with linear layers:

import torch.nn as nnlinear1 = nn.Linear(hidden_size, output_size)# Share the weight and bias of linear2 with linear3
linear2 = nn.Linear(hidden_size, output_size)
linear2.weight = linear1.weight
linear2.bias = linear1.bias

In this example, linear1 and linear2 share the same weights and biases. In order to share parameters across modules, the parameters must have compatible shapes. After the weights and biases of linear2 have been reassigned to those of linear1, any changes to these parameters during training will affect both modules. This is because they now reference the same underlying tensors in the computational graph.

Practical Applications

Parameter sharing is useful in various scenarios:

  1. Siamese networks: When you need to process similar inputs in the same way.
  2. Weight-tied autoencoders: Where the decoder weights are tied to the encoder weights.
  3. Certain types of language models: Where you might want to share embeddings between the input and output layers.
  4. Certain types of time series models implement a form of reversible instance normalization (Kim et. al 2022) where inputs are normalized before feeding them to a neural network. This normalization has to be reversed at the output to help with handling distributional shifts in data.

Conclusion

Parameter sharing is a simple yet effective technique in PyTorch. By reducing the number of parameters, you can create more efficient models and include particular inductive biases without necessarily sacrificing performance.

The best way to understand any concept is to experiment with them yourself.

--

--

No responses yet