Sunday, November 27, 2022

[FIXED] Understanding the PyTorch implementation of Conv2DTranspose

November 27, 2022 conv-neural-network, pytorch No comments

Issue

I am trying to understand an example snippet that makes use of the PyTorch transposed convolution function, with documentation here, where in the docs the author writes:

"The padding argument effectively adds dilation * (kernel_size - 1) - padding amount of zero padding to both sizes of the input."

Consider the snippet below where a sample image of shape [1, 1, 4, 4] containing all ones is input to a ConvTranspose2D operation with arguments stride=2 and padding=1 with a weight matrix of shape (1, 1, 4, 4) that has entries from a range between 1 and 16 (in this case dilation=1 and added_padding = 1*(4-1)-1 = 2)

sample_im = torch.ones(1, 1, 4, 4).cuda()
sample_deconv = nn.ConvTranspose2d(1, 1, 4, 2, 1, bias=False).cuda()
sample_deconv.weight = torch.nn.Parameter(
    torch.tensor([[[[ 1.,  2.,  3.,  4.], 
                    [ 5.,  6.,  7.,  8.], 
                    [ 9., 10., 11., 12.], 
                    [13., 14., 15., 16.]]]]).cuda())

Which yields:

>>> sample_deconv(sample_im)
tensor([[[[ 6., 12., 14., 12., 14., 12., 14.,  7.],
          [12., 24., 28., 24., 28., 24., 28., 14.],
          [20., 40., 44., 40., 44., 40., 44., 22.],
          [12., 24., 28., 24., 28., 24., 28., 14.],
          [20., 40., 44., 40., 44., 40., 44., 22.],
          [12., 24., 28., 24., 28., 24., 28., 14.],
          [20., 40., 44., 40., 44., 40., 44., 22.],
          [10., 20., 22., 20., 22., 20., 22., 11.]]]], device='cuda:0',
       grad_fn=<CudnnConvolutionTransposeBackward>)

Now I have seen simple examples of transposed convolution without stride and padding. For instance, if the input is a 2x2 image [[2, 4], [0, 1]], and the convolutional filter with one output channel is [[3, 1], [1, 5]], then the resulting tensor of shape (1, 1, 3, 3) can be seen as the sum of the four colored matrices in the image below:

The problem is I can't seem to find examples that use strides and/or padding in the same visualization. As per my snippet, I am having a very difficult time understanding how the padding is applied to the sample image, or how the stride works to get this output. Any insights appreciated, even just understanding how the 6 in the (0,0) entry or the 12 in the (0,1) entry of the resulting matrix are computed would be very helpful.

Solution

The output spatial dimensions of nn.ConvTranspose2d are given by:

out = (x - 1)s - 2p + d(k - 1) + op + 1

where x is the input spatial dimension and out the corresponding output size, s is the stride, d the dilation, p the padding, k the kernel size, and op the output padding.

If we keep the following operands:

For each value of the input, we compute a buffer (of the corresponding color) by calculating the product with each element of the kernel.

Here are the visualizations for s=1, p=0, s=1, p=1, s=2, p=0, and s=2, p=1:

s=1, p=0: output is 3x3

For the blue buffer, we have (1) 2*k_top-left = 2*3 = 6; (2) 2*k_top-right = 2*1 = 2; (3) 2*k_bottom-left = 2*1 = 2; (4) 2*k_bottom-right = 2*5 = 10.

s=1, p=1: output is 1x1

s=2, p=0: output is 4x4

s=2, p=2: output is 2x2

Answered By - Ivan

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Sunday, November 27, 2022

[FIXED] Understanding the PyTorch implementation of Conv2DTranspose

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels