Issue
I'm trying to create a module, which contains certain layers of nn.Parameters(). If I initialize the layer as following -
self.W = nn.Parameter(torch.randn(4,4), requires_grad=True).double()
then this layer doesn't appear to register in the module parameters.
However, this initialization does work -
self.W = nn.Parameter(torch.FloatTensor(4,4), requires_grad=True)
Full example -
class TestNet(nn.Module):
def __init__(self):
super(TestNet, self).__init__()
self.W = nn.Parameter(torch.randn(4,4), requires_grad=True).double()
def forward(self, x):
x = torch.matmul(x, self.W.T)
x = torch.sigmoid(x)
return x
tnet = TestNet()
print(list(tnet.parameters()))
### Output = [] (an empty list)
Compared to -
class TestNet(nn.Module):
def __init__(self):
super(TestNet, self).__init__()
self.W = nn.Parameter(torch.FloatTensor(4,4), requires_grad=True)
def forward(self, x):
x = torch.matmul(x, self.W.T)
x = torch.sigmoid(x)
return x
tnet = TestNet()
print(list(tnet.parameters()))
Which prints -
[Parameter containing:
tensor([[-1.8859e+26, 6.0240e-01, 1.0842e-19, 3.8177e-05],
[ 1.5229e-27, -8.5899e+09, 1.5226e-27, -3.6893e+19],
[ 4.2039e-45, -4.6566e-10, 1.5229e-27, -2.0000e+00],
[ 2.8026e-45, 0.0000e+00, 0.0000e+00, 4.5918e-40]],
requires_grad=True)]
So what is the difference? Why doesn't the torch.randn() version work? I couldn't find anything about this in the docs or in previous answers online.
Solution
Calling randn
is completely fine. The issue is that .double()
is being called at the end of the operation:
class TestNet(nn.Module):
def __init__(self):
super(TestNet, self).__init__()
self.W = nn.Parameter(torch.randn(4,4, dtype = torch.double), requires_grad=True)
# self.W = nn.Parameter(torch.randn(4,4).double(), requires_grad=True) # also works
def forward(self, x):
x = torch.matmul(x, self.W.T)
x = torch.sigmoid(x)
return x
tnet = TestNet()
print(tnet.W.dtype)
# torch.float64
print(list(tnet.parameters()))
# [Parameter containing:
# tensor([[-1.9645, -1.5445, 0.2435, 0.4380],
# [ 1.1403, 0.8836, 0.1811, -0.1212],
# [ 1.5983, -0.1854, -0.2626, 0.2881],
# [-1.2364, -0.4802, -0.6038, 0.1164]], requires_grad=True)]
Now the code registers the parameters. I added dtype = torch.double
in the initialization of randn
to make sure that self.W
contains doubles
as before.
In summary, we cannot call nn.Parameter
, and then register its conversion to another data type as our neural network weights for the deep learning system.
Answered By - C-3PO
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.