Friday, April 15, 2022

[FIXED] How to share weights between modules in Pytorch?

April 15, 2022 python, pytorch No comments

Issue

What is the correct way of sharing weights between two layers(modules) in Pytorch?
Based on my findings in the Pytorch discussion forum, there are several ways for doing this.
As an example, based on this discussion, I thought simply assigning the transposed weights would do it. That is doing :

 self.decoder[0].weight = self.encoder[0].weight.t()

This however, proved to be wrong and causes an error. I then tried wrapping the above line in a nn.Parameter():

self.decoder[0].weight = nn.Parameter(self.encoder[0].weight.t())

This eliminates the error, but then again, there is no sharing happening here. by this I just initialized a new tensor with the same values as the encoder[0].weight.t().

I then found this link which provides different ways for sharing weights. however, I'm skeptical if all methods given there are actually correct.
For example, one way is demonstrated like this :

# tied autoencoder using off the shelf nn modules
class TiedAutoEncoderOffTheShelf(nn.Module):
    def __init__(self, inp, out, weight):
        super().__init__()
        self.encoder = nn.Linear(inp, out, bias=False)
        self.decoder = nn.Linear(out, inp, bias=False)

        # tie the weights
        self.encoder.weight.data = weight.clone()
        self.decoder.weight.data = self.encoder.weight.data.transpose(0,1)

    def forward(self, input):
        encoded_feats = self.encoder(input)
        reconstructed_output = self.decoder(encoded_feats)
        return encoded_feats, reconstructed_output

Basically it creates a new weight tensor using nn.Parameter() and assigns it to each layer/module like this :

weights = nn.Parameter(torch.randn_like(self.encoder[0].weight))
self.encoder[0].weight.data = weights.clone()
self.decoder[0].weight.data = self.encoder[0].weight.data.transpose(0, 1)

This really confuses me, how is this sharing the same variable between these two layers? Is it not just cloning the 'raw' data?
When I used this approach, and visualized the weights, I noticed the visualizations were different and that make me even more certain something is not right.
I'm not sure if the different visualizations were solely due to one being the transpose of the other one, or as I just already suspected, they are optimized independently (i.e. the weights are not shared between layers)

example weight initialization :

Solution

As it turns out, after further investigation, which was simply retransposing the decoder's weight and visualized it, they were indeed shared.
Below is the visualization for encoder and decoders weights :

Answered By - Hossein

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Friday, April 15, 2022

[FIXED] How to share weights between modules in Pytorch?

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels