Sunday, December 12, 2021

[FIXED] Conv1D with kernel_size=1 vs Linear layer

December 12, 2021 conv-neural-network, python, pytorch No comments

Issue

I'm working on very sparse vectors as input. I started working with simple Linear (dense/fully connected layers) and my network yielded pretty good results (let's take accuracy as my metric here, 95.8%).

I later tried to use a Conv1d with a kernel_size=1 and a MaxPool1d, and this network works slightly better (96.4% accuracy).

Question: How are these two implementation different ? Shouldn't a Conv1d with a unit kernel_size do the same as a Linear layer?

I've tried multiple runs, the CNN always yields slightly better results.

Solution

nn.Conv1d with a kernel size of 1 and nn.Linear give essentially the same results. The only differences are the initialization procedure and how the operations are applied (which has some effect on the speed). Note that using a linear layer should be faster as it is implemented as a simple matrix multiplication (+ adding a broadcasted bias vector)

@RobinFrcd your answers are either different due to MaxPool1d or due to the different initialization procedure.

Here are a few experiments to prove my claims:

def count_parameters(model):
    """Count the number of parameters in a model."""
    return sum([p.numel() for p in model.parameters()])

conv = torch.nn.Conv1d(8,32,1)
print(count_parameters(conv))
# 288

linear = torch.nn.Linear(8,32)
print(count_parameters(linear))
# 288

print(conv.weight.shape)
# torch.Size([32, 8, 1])
print(linear.weight.shape)
# torch.Size([32, 8])

# use same initialization
linear.weight = torch.nn.Parameter(conv.weight.squeeze(2))
linear.bias = torch.nn.Parameter(conv.bias)

tensor = torch.randn(128,256,8)
permuted_tensor = tensor.permute(0,2,1).clone().contiguous()

out_linear = linear(tensor)
print(out_linear.mean())
# tensor(0.0067, grad_fn=<MeanBackward0>)

out_conv = conv(permuted_tensor)
print(out_conv.mean())
# tensor(0.0067, grad_fn=<MeanBackward0>)

Speed test:

%%timeit
_ = linear(tensor)
# 151 µs ± 297 ns per loop

%%timeit
_ = conv(permuted_tensor)
# 1.43 ms ± 6.33 µs per loop

As Hanchen's answer show, the results can differ very slightly due to numerical precision.

Answered By - Yann Dubois

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Sunday, December 12, 2021

[FIXED] Conv1D with kernel_size=1 vs Linear layer

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels