Issue
I am trying to use torchvision’s video classification models (R3D, R(2+1)D, MC18) but my data is single channel (grey scale video), and these model uses 3 channel input, in that case I am trying to override the stem class , can someone please confirm if what I am doing is correct?
For R3D18 and MC18 stem=BasicStem
class BasicStemModified(nn.Sequential):
def __init__(self):
super(BasicStemModified, self).__init__(
nn.Conv3d(1, 45, kernel_size=(7, 7, 1), #changing filter to 1 channel input
stride=(2, 2, 1), padding=(3, 3, 0),
bias=False),
nn.BatchNorm3d(45),
nn.ReLU(inplace=True),
nn.Conv3d(45, 64, kernel_size=(1, 1, 3),
stride=(1, 1, 1), padding=(0, 0, 1),
bias=False),
nn.BatchNorm3d(64),
nn.ReLU(inplace=True))
model = torchvision.models.video.mc3_18(pretrained=False)
model.stem = BasicStemModified() #here assigning the modified stem
model.fc = nn.Sequential(
nn.Dropout(0.3),
nn.Linear(model.fc.in_features, num_classes)
)
model.to('cuda:0')
For R(2+1)D:
#For R(2+1)D model `stem=R2Plus1dStem`
class R2Plus1dStemModified(nn.Sequential):
"""R(2+1)D stem is different than the default one as it uses separated 3D convolution
"""
def __init__(self):
super(R2Plus1dStemModified, self).__init__(
nn.Conv3d(3, 45, kernel_size=(1, 7, 7), #changing filter to 1 channel input
stride=(1, 2, 2), padding=(0, 3, 3),
bias=False),
nn.BatchNorm3d(45),
nn.ReLU(inplace=True),
nn.Conv3d(45, 64, kernel_size=(3, 1, 1),
stride=(1, 1, 1), padding=(1, 0, 0),
bias=False),
nn.BatchNorm3d(64),
nn.ReLU(inplace=True))
model = torchvision.models.video.mc3_18(pretrained=False)
model.stem = R2Plus1dStemModified() #here assigning the modified stem
model.fc = nn.Sequential(
nn.Dropout(0.3),
nn.Linear(model.fc.in_features, num_classes)
)
model.to('cuda:0')
Solution
When switching from RGB to gray, the most simple way to go is to change the DATA and not the model:
If you have an input frame with only one channel (gray), you can simply expand
the singleton channel dimension to span three channels. This is trivial and allows you to use pre-trained models as-is.
If you insist on modifying the model - you can do so while preserving most of the pre-trained weights:
model = torchvision.models.video.mc3_18(pretrained=True) # get the pretrained
# modify only the first conv layer
origc = model.stem[0] # the orig conv layer
# build a new layer only with one input channel
c1 = torch.nn.Conv3d(1, origc.out_channels, kernel_size=origc.kernel_size, stride=origc.stride, padding=origc.padding, bias=origc.bias)
# this is the nice part - init the new weights using the original ones
with torch.no_grad():
c1.weight.data = origc.weight.data.sum(dim=1, keepdim=True)
Answered By - Shai
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.