Issue
What is the correct way of sharing weights between two layers(modules) in Pytorch?
Based on my findings in the Pytorch discussion forum, there are several ways for doing this.
As an example, based on this discussion, I thought simply assigning the transposed weights would do it. That is doing :
self.decoder[0].weight = self.encoder[0].weight.t()
This however, proved to be wrong and causes an error.
I then tried wrapping the above line in a nn.Parameter()
:
self.decoder[0].weight = nn.Parameter(self.encoder[0].weight.t())
This eliminates the error, but then again, there is no sharing happening here. by this I just initialized a new tensor with the same values as the encoder[0].weight.t()
.
I then found this link which provides different ways for sharing weights. however, I'm skeptical if all methods given there are actually correct.
For example, one way is demonstrated like this :
# tied autoencoder using off the shelf nn modules
class TiedAutoEncoderOffTheShelf(nn.Module):
def __init__(self, inp, out, weight):
super().__init__()
self.encoder = nn.Linear(inp, out, bias=False)
self.decoder = nn.Linear(out, inp, bias=False)
# tie the weights
self.encoder.weight.data = weight.clone()
self.decoder.weight.data = self.encoder.weight.data.transpose(0,1)
def forward(self, input):
encoded_feats = self.encoder(input)
reconstructed_output = self.decoder(encoded_feats)
return encoded_feats, reconstructed_output
Basically it creates a new weight tensor using nn.Parameter()
and assigns it to each layer/module like this :
weights = nn.Parameter(torch.randn_like(self.encoder[0].weight))
self.encoder[0].weight.data = weights.clone()
self.decoder[0].weight.data = self.encoder[0].weight.data.transpose(0, 1)
This really confuses me, how is this sharing the same variable between these two layers?
Is it not just cloning the 'raw' data?
When I used this approach, and visualized the weights, I noticed the visualizations were different and that make me even more certain something is not right.
I'm not sure if the different visualizations were solely due to one being the transpose of the other one, or as I just already suspected, they are optimized independently (i.e. the weights are not shared between layers)
example weight initialization :
Solution
As it turns out, after further investigation, which was simply retransposing the decoder's weight and visualized it, they were indeed shared.
Below is the visualization for encoder and decoders weights :
Answered By - Hossein
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.