Saturday, November 13, 2021

[FIXED] Overriding stem method in video classification model to change filter channel

November 13, 2021 deep-learning, pytorch, resnet No comments

Issue

I am trying to use torchvision’s video classification models (R3D, R(2+1)D, MC18) but my data is single channel (grey scale video), and these model uses 3 channel input, in that case I am trying to override the stem class , can someone please confirm if what I am doing is correct?

For R3D18 and MC18 `stem=BasicStem`

class BasicStemModified(nn.Sequential):


    def __init__(self):
        super(BasicStemModified, self).__init__(
            nn.Conv3d(1, 45, kernel_size=(7, 7, 1),  #changing filter to 1 channel input
                      stride=(2, 2, 1), padding=(3, 3, 0),
                      bias=False),
            nn.BatchNorm3d(45),
            nn.ReLU(inplace=True),

            nn.Conv3d(45, 64, kernel_size=(1, 1, 3),
                      stride=(1, 1, 1), padding=(0, 0, 1),
                      bias=False),
            nn.BatchNorm3d(64),
            nn.ReLU(inplace=True))


model = torchvision.models.video.mc3_18(pretrained=False)

model.stem = BasicStemModified() #here assigning the modified stem


model.fc = nn.Sequential(
    nn.Dropout(0.3),
    nn.Linear(model.fc.in_features, num_classes)
)


model.to('cuda:0')

For R(2+1)D:

#For R(2+1)D model `stem=R2Plus1dStem`

class R2Plus1dStemModified(nn.Sequential):
    """R(2+1)D stem is different than the default one as it uses separated 3D convolution
    """
    def __init__(self):
        super(R2Plus1dStemModified, self).__init__(
            nn.Conv3d(3, 45, kernel_size=(1, 7, 7),   #changing filter to 1 channel input
                      stride=(1, 2, 2), padding=(0, 3, 3),
                      bias=False),
            nn.BatchNorm3d(45),
            nn.ReLU(inplace=True),
            nn.Conv3d(45, 64, kernel_size=(3, 1, 1),
                      stride=(1, 1, 1), padding=(1, 0, 0),
                      bias=False),
            nn.BatchNorm3d(64),
            nn.ReLU(inplace=True))

model = torchvision.models.video.mc3_18(pretrained=False)

model.stem = R2Plus1dStemModified() #here assigning the modified stem

model.fc = nn.Sequential(
    nn.Dropout(0.3),
    nn.Linear(model.fc.in_features, num_classes)
)


model.to('cuda:0')

Solution

When switching from RGB to gray, the most simple way to go is to change the DATA and not the model:
If you have an input frame with only one channel (gray), you can simply expand the singleton channel dimension to span three channels. This is trivial and allows you to use pre-trained models as-is.

If you insist on modifying the model - you can do so while preserving most of the pre-trained weights:

model = torchvision.models.video.mc3_18(pretrained=True)  # get the pretrained
# modify only the first conv layer
origc = model.stem[0]  # the orig conv layer
# build a new layer only with one input channel
c1 = torch.nn.Conv3d(1, origc.out_channels, kernel_size=origc.kernel_size, stride=origc.stride, padding=origc.padding, bias=origc.bias)

# this is the nice part - init the new weights using the original ones
with torch.no_grad():
  c1.weight.data = origc.weight.data.sum(dim=1, keepdim=True)

Answered By - Shai

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Saturday, November 13, 2021

[FIXED] Overriding stem method in video classification model to change filter channel

Issue

For R3D18 and MC18 `stem=BasicStem`

For R(2+1)D:

Solution

0 comments:

Post a Comment

Popular Posts

Labels

Saturday, November 13, 2021

Issue

For R3D18 and MC18 stem=BasicStem

For R(2+1)D:

Solution

0 comments:

Post a Comment

Popular Posts

Labels

For R3D18 and MC18 `stem=BasicStem`