Issue
I am trying to understand an example snippet that makes use of the PyTorch transposed convolution function, with documentation here, where in the docs the author writes:
"The padding argument effectively adds dilation * (kernel_size - 1) - padding amount of zero padding to both sizes of the input."
Consider the snippet below where a sample image of shape [1, 1, 4, 4]
containing all ones is input to a ConvTranspose2D
operation with arguments stride=2
and padding=1
with a weight matrix of shape (1, 1, 4, 4)
that has entries from a range between 1
and 16
(in this case dilation=1
and added_padding = 1*(4-1)-1 = 2
)
sample_im = torch.ones(1, 1, 4, 4).cuda()
sample_deconv = nn.ConvTranspose2d(1, 1, 4, 2, 1, bias=False).cuda()
sample_deconv.weight = torch.nn.Parameter(
torch.tensor([[[[ 1., 2., 3., 4.],
[ 5., 6., 7., 8.],
[ 9., 10., 11., 12.],
[13., 14., 15., 16.]]]]).cuda())
Which yields:
>>> sample_deconv(sample_im)
tensor([[[[ 6., 12., 14., 12., 14., 12., 14., 7.],
[12., 24., 28., 24., 28., 24., 28., 14.],
[20., 40., 44., 40., 44., 40., 44., 22.],
[12., 24., 28., 24., 28., 24., 28., 14.],
[20., 40., 44., 40., 44., 40., 44., 22.],
[12., 24., 28., 24., 28., 24., 28., 14.],
[20., 40., 44., 40., 44., 40., 44., 22.],
[10., 20., 22., 20., 22., 20., 22., 11.]]]], device='cuda:0',
grad_fn=<CudnnConvolutionTransposeBackward>)
Now I have seen simple examples of transposed convolution without stride and padding. For instance, if the input is a 2x2
image [[2, 4], [0, 1]]
, and the convolutional filter with one output channel is [[3, 1], [1, 5]]
, then the resulting tensor of shape (1, 1, 3, 3)
can be seen as the sum of the four colored matrices in the image below:
The problem is I can't seem to find examples that use strides and/or padding in the same visualization. As per my snippet, I am having a very difficult time understanding how the padding is applied to the sample image, or how the stride works to get this output. Any insights appreciated, even just understanding how the 6
in the (0,0)
entry or the 12
in the (0,1)
entry of the resulting matrix are computed would be very helpful.
Solution
The output spatial dimensions of nn.ConvTranspose2d
are given by:
out = (x - 1)s - 2p + d(k - 1) + op + 1
where x
is the input spatial dimension and out
the corresponding output size, s
is the stride, d
the dilation, p
the padding, k
the kernel size, and op
the output padding.
If we keep the following operands:
For each value of the input, we compute a buffer (of the corresponding color) by calculating the product with each element of the kernel.
Here are the visualizations for s=1, p=0
, s=1, p=1
, s=2, p=0
, and s=2, p=1
:
s=1, p=0
: output is3x3
For the blue buffer, we have (1) 2*k_top-left = 2*3 = 6
; (2) 2*k_top-right = 2*1 = 2
; (3) 2*k_bottom-left = 2*1 = 2
; (4) 2*k_bottom-right = 2*5 = 10
.
s=1, p=1
: output is1x1
s=2, p=0
: output is4x4
s=2, p=2
: output is2x2
Answered By - Ivan
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.