Issue
I'm trying to implement deep supervision strategy in an encoder-decoder architecture using PyTorch. The idea is to do the weighted sum of the results of three convolution layers (with a learnable parameters Wi
).
Suppose we have three tensors: A
, B
and C
of identical shapes: (64, 48, 48, 48)
.
My goal is to do a weighted linear sum of these three tensors: (w0 * A + w1 * B + w2 * C)
with w0
, w1
, w2
should be learnable parameters by the network.
Maybe I have to use torch.nn.Linear(in_features, out_features)
, but I dont know what will be in and out features in this case.
Any suggestions please?
Solution
You could define a custom parameter tensor and store the w_i
in it. Then compute the weighted sum of the matrices with the weights.
Register the custom parameter like so:
W = nn.Parameter(torch.rand(3))
You can either compute the sum by hand:
w0, w1, w2 = W
res = w0*A + w1*B + w2*B
Or instead use torch.einsum
for conciseness. Do note this approach doesn't depend on the number of components in your linear sum:
X = torch.stack([A, B, C])
res = torch.einsum('mbchw,m->bcw', X, W)
Good thing you pointed out nn.Linear
, you can actually pull it off with this layer. Notice that nn.Linear
can take an n-dimensional tensor as input: (batch_size, *, in_features)
and will output (batch_size ,*, out_features)
, where *
can be any number of dimensions. In your case in_features
is the number of weights and out_features
is 1
. Looking at it differently: you only require 1 neuron to compute the weighted sum. One important thing though is that the "feature" dimension must be last, i.e. the stack must be done on the last dimension:
W = nn.Linear(3, 1)
res = W(torch.stack([A, B, C], dim=-1))
And the output shape will be:
>>> res.shape
torch.Size([64, 48, 48, 48, 1])
Answered By - Ivan
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.