Issue
I'm working on very sparse vectors as input. I started working with simple Linear
(dense/fully connected layers) and my network yielded pretty good results (let's take accuracy as my metric here, 95.8%).
I later tried to use a Conv1d
with a kernel_size=1
and a MaxPool1d
, and this network works slightly better (96.4% accuracy).
Question: How are these two implementation different ? Shouldn't a Conv1d
with a unit kernel_size
do the same as a Linear
layer?
I've tried multiple runs, the CNN always yields slightly better results.
Solution
nn.Conv1d
with a kernel size of 1 and nn.Linear
give essentially the same results. The only differences are the initialization procedure and how the operations are applied (which has some effect on the speed). Note that using a linear layer should be faster as it is implemented as a simple matrix multiplication (+ adding a broadcasted bias vector)
@RobinFrcd your answers are either different due to MaxPool1d
or due to the different initialization procedure.
Here are a few experiments to prove my claims:
def count_parameters(model):
"""Count the number of parameters in a model."""
return sum([p.numel() for p in model.parameters()])
conv = torch.nn.Conv1d(8,32,1)
print(count_parameters(conv))
# 288
linear = torch.nn.Linear(8,32)
print(count_parameters(linear))
# 288
print(conv.weight.shape)
# torch.Size([32, 8, 1])
print(linear.weight.shape)
# torch.Size([32, 8])
# use same initialization
linear.weight = torch.nn.Parameter(conv.weight.squeeze(2))
linear.bias = torch.nn.Parameter(conv.bias)
tensor = torch.randn(128,256,8)
permuted_tensor = tensor.permute(0,2,1).clone().contiguous()
out_linear = linear(tensor)
print(out_linear.mean())
# tensor(0.0067, grad_fn=<MeanBackward0>)
out_conv = conv(permuted_tensor)
print(out_conv.mean())
# tensor(0.0067, grad_fn=<MeanBackward0>)
Speed test:
%%timeit
_ = linear(tensor)
# 151 µs ± 297 ns per loop
%%timeit
_ = conv(permuted_tensor)
# 1.43 ms ± 6.33 µs per loop
As Hanchen's answer show, the results can differ very slightly due to numerical precision.
Answered By - Yann Dubois
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.