Issue
In Tensorflow, the results of training a model based on Dense layers are better than a model based on equivalent Conv2D layers.
Results:
- Using Dense: loss: 16.1930 - mae: 2.5369 - mse: 16.1930
- Using Conv2D: loss: 83.7851 - mae: 6.5585 - mse: 83.7851
Should this be expected or are we doing something wrong?
The code we are using is the following (adapted from here):
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import numpy as np
import pandas as pd
import sys
model_type = int(sys.argv[1]) # 0: Dense, Else: Conv2D
verbose = 0
# load data & normalize
(train_features, train_labels), (test_features, test_labels) = keras.datasets.boston_housing.load_data()
train_mean = np.mean(train_features, axis=0)
train_std = np.std(train_features, axis=0)
train_features_norm = (train_features - train_mean) / train_std
test_features_norm = (test_features - train_mean) / train_std
train_labels_norm = train_labels
test_labels_norm = test_labels
input_height = train_features_norm.shape[1]
# model
if model_type == 0:
model = keras.Sequential([
layers.InputLayer(input_shape=(input_height)),
layers.Dense(20, activation='relu'),
layers.Dense(1)])
else:
train_features_norm = np.reshape(train_features_norm, (-1, input_height, 1))
test_features_norm = np.reshape(test_features_norm, (-1, input_height, 1))
model = keras.Sequential([
layers.InputLayer(input_shape=(input_height, 1, 1)),
layers.Conv2D(20, (input_height, 1), activation='relu'),
layers.Conv2D(1, (1, 1))]) # replacing this layer with Dense(1) gives the same results
model.compile(
optimizer=tf.optimizers.Adam(),
loss='mse',
metrics=['mae', 'mse'])
model.summary()
# training
early_stop = keras.callbacks.EarlyStopping(
monitor='val_loss',
patience=50)
history = model.fit(
train_features_norm,
train_labels_norm,
epochs=1000,
verbose=verbose,
validation_split=0.1)
# results
hist = pd.DataFrame(history.history)
hist['epoch'] = history.epoch
print(hist)
rmse_final = np.sqrt(float(hist['val_mse'].tail(1)))
print('Final Root Mean Square Error on validation set: {}'.format(round(rmse_final, 3)))
# compare how the model perfoms on the test dataset
mse, _, _ = model.evaluate(test_features_norm, test_labels_norm)
rmse = np.sqrt(mse)
print('Root Mean Square Error on test set: {}'.format(round(rmse, 3)))
NOTE: model_type can be used to select a model based on Dense layers (= 0), or a model based on Conv2D (any other value).
Background
We have a system (BeagleBone AI using TIDL) which doesn't support Dense layers. It does, however, support Conv2D layers and, as far as we know, a Conv2D can be configured to be equivalent to a Dense layer.
For example, in a Dense layer with two units/outputs, no bias, and two inputs, the output is:
- O1 = W11 * I1 + W12 * I2
- O2 = W21 * I1 + W22 * I2
O - output, I - input, W - weight
Similarly, in a Conv2D layer with two 1x1 output channels, no bias, one 1x2 input channel, and a 1x2 kernel, the output is:
- O1 = K11 * I11 + K12 * I12
- O2 = K21 * I11 + K22 * I12
O - output channel, I - input channel, K - kernel weights
This means that mathematically they are equivalent. But training works better when the Dense layer is used.
Solution
There are two issues here:
- The shape of the features (None, input_height, 1) doesn't match the shape of the model's input (None, input_height, 1, 1).
- The shape of the labels (None, 1) doesn't match the shape of model's output (None, 1, 1, 1).
Each of these has an impact on the performance of the model. Both are needed to reach the performance level of the model based on Dense layers.
Fix (add an extra dim to the features, reshape the labels):
if model_type == 0:
...
else:
train_features_norm = np.reshape(train_features_norm, (-1, input_height, 1, 1))
test_features_norm = np.reshape(test_features_norm, (-1, input_height, 1, 1))
train_labels_norm = np.reshape(train_labels_norm, (-1, 1, 1, 1))
test_labels_norm = np.reshape(test_labels_norm, (-1, 1, 1, 1))
...
Should this be expected or are we doing something wrong?
No, this is not expected. I am not sure if the original code can be considered wrong. My expection (and since it didn't complain about mismatching shapes, as it usually does) was that because the "missing" dimensions were of size 1, it didn't really matter. Well, they do.
Thank you @elbe. Your answer was key for me to realize the issues above.
Answered By - Adriano Carvalho
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.