Thursday, May 19, 2022

[FIXED] Why does a model based on Dense layers gives better results than one based on Conv2D?

May 19, 2022 conv-neural-network, machine-learning, tensorflow No comments

Issue

In Tensorflow, the results of training a model based on Dense layers are better than a model based on equivalent Conv2D layers.

Results:

Using Dense: loss: 16.1930 - mae: 2.5369 - mse: 16.1930
Using Conv2D: loss: 83.7851 - mae: 6.5585 - mse: 83.7851

Should this be expected or are we doing something wrong?

The code we are using is the following (adapted from here):

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

import numpy as np
import pandas as pd
import sys

model_type = int(sys.argv[1]) # 0: Dense, Else: Conv2D

verbose = 0

# load data & normalize

(train_features, train_labels), (test_features, test_labels) = keras.datasets.boston_housing.load_data()

train_mean = np.mean(train_features, axis=0)
train_std = np.std(train_features, axis=0)
train_features_norm = (train_features - train_mean) / train_std
test_features_norm = (test_features - train_mean) / train_std

train_labels_norm = train_labels
test_labels_norm = test_labels

input_height = train_features_norm.shape[1]

# model

if model_type == 0:
    model = keras.Sequential([
        layers.InputLayer(input_shape=(input_height)),
        layers.Dense(20, activation='relu'),
        layers.Dense(1)])

else:
    train_features_norm = np.reshape(train_features_norm, (-1, input_height, 1))
    test_features_norm = np.reshape(test_features_norm, (-1, input_height, 1))
    
    model = keras.Sequential([
        layers.InputLayer(input_shape=(input_height, 1, 1)),
        layers.Conv2D(20, (input_height, 1), activation='relu'),
        layers.Conv2D(1, (1, 1))]) # replacing this layer with Dense(1) gives the same results
    
model.compile(
    optimizer=tf.optimizers.Adam(),
    loss='mse',
    metrics=['mae', 'mse'])

model.summary()

# training

early_stop = keras.callbacks.EarlyStopping(
    monitor='val_loss',
    patience=50)

history = model.fit(
    train_features_norm,
    train_labels_norm,
    epochs=1000,
    verbose=verbose,
    validation_split=0.1)

# results

hist = pd.DataFrame(history.history)
hist['epoch'] = history.epoch
print(hist)

rmse_final = np.sqrt(float(hist['val_mse'].tail(1)))
print('Final Root Mean Square Error on validation set: {}'.format(round(rmse_final, 3)))

# compare how the model perfoms on the test dataset
mse, _, _ = model.evaluate(test_features_norm, test_labels_norm)
rmse = np.sqrt(mse)
print('Root Mean Square Error on test set: {}'.format(round(rmse, 3)))

NOTE: model_type can be used to select a model based on Dense layers (= 0), or a model based on Conv2D (any other value).

Background

We have a system (BeagleBone AI using TIDL) which doesn't support Dense layers. It does, however, support Conv2D layers and, as far as we know, a Conv2D can be configured to be equivalent to a Dense layer.

For example, in a Dense layer with two units/outputs, no bias, and two inputs, the output is:

O1 = W11 * I1 + W12 * I2
O2 = W21 * I1 + W22 * I2

O - output, I - input, W - weight

Similarly, in a Conv2D layer with two 1x1 output channels, no bias, one 1x2 input channel, and a 1x2 kernel, the output is:

O1 = K11 * I11 + K12 * I12
O2 = K21 * I11 + K22 * I12

O - output channel, I - input channel, K - kernel weights

This means that mathematically they are equivalent. But training works better when the Dense layer is used.

Solution

There are two issues here:

The shape of the features (None, input_height, 1) doesn't match the shape of the model's input (None, input_height, 1, 1).
The shape of the labels (None, 1) doesn't match the shape of model's output (None, 1, 1, 1).

Each of these has an impact on the performance of the model. Both are needed to reach the performance level of the model based on Dense layers.

Fix (add an extra dim to the features, reshape the labels):

if model_type == 0:
    ...

else:
    train_features_norm = np.reshape(train_features_norm, (-1, input_height, 1, 1))
    test_features_norm = np.reshape(test_features_norm, (-1, input_height, 1, 1))

    train_labels_norm = np.reshape(train_labels_norm, (-1, 1, 1, 1))
    test_labels_norm = np.reshape(test_labels_norm, (-1, 1, 1, 1))
    
    ...

Should this be expected or are we doing something wrong?

No, this is not expected. I am not sure if the original code can be considered wrong. My expection (and since it didn't complain about mismatching shapes, as it usually does) was that because the "missing" dimensions were of size 1, it didn't really matter. Well, they do.

Thank you @elbe. Your answer was key for me to realize the issues above.

Answered By - Adriano Carvalho

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Thursday, May 19, 2022

[FIXED] Why does a model based on Dense layers gives better results than one based on Conv2D?

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels