Tuesday, February 6, 2024

[FIXED] Improving an LSTM models fit to a datasets noise

February 06, 2024 keras, lstm, machine-learning, python, tensorflow No comments

Issue

I am a data analyst trying to improve my knowledge with machine learning.

I've completed a model for a timeseries dataset, where each point is 1 day apart, with no gaps. The specific model type that I have attempted is a multi-layered auto-regression bi-directional LSTM using tensorflow's keras, see model specific code below:

model = keras.Sequential()
model.add(Bidirectional(LSTM(
            units = 128,
            input_shape = (X_train.shape[1], X_train.shape[2]),
            return_sequences=True)))
model.add(Bidirectional(LSTM(
            units = 64,
            input_shape = (X_train.shape[1], X_train.shape[2]),
            return_sequences=True)))
model.add(Bidirectional(LSTM(
            units = 32,
            input_shape = (X_train.shape[1], X_train.shape[2]),
            return_sequences=True)))
model.add(Bidirectional(LSTM(
            units = 16,
            input_shape = (X_train.shape[1], X_train.shape[2]),
            return_sequences=False)))
model.add(keras.layers.Dense(16))
model.add(keras.layers.Dropout(rate = 0.5))
model.add(keras.layers.Dense(1))
model.compile(loss='mean_squared_error', optimizer='Adam')
history = model.fit(
    X_train, y_train,
    epochs = 100,
    batch_size = 128,
    validation_split = 0.2,
    shuffle = False
) 
print(model.summary())

I've been told that this is likely overkill for this specific learning task by a superior member of staff, but wanted to add it for full transparency. See summary below:

 Layer (type)                Output Shape              Param #
=================================================================
 bidirectional (Bidirectiona  (None, 50, 256)          133120
 l)

 bidirectional_1 (Bidirectio  (None, 50, 128)          164352
 nal)

 bidirectional_2 (Bidirectio  (None, 50, 64)           41216
 nal)

 bidirectional_3 (Bidirectio  (None, 32)               10368
 nal)

 dense (Dense)               (None, 16)                528

 dropout (Dropout)           (None, 16)                0

 dense_1 (Dense)             (None, 1)                 17

=================================================================
Total params: 349,601
Trainable params: 349,601
Non-trainable params: 0
_________________________________________________________________

The model reports the loss values (after 100 epochs, using Mean Squared Error):

loss: 0.0040 - val_loss: 0.0050 (Overfit)

With an RMSE derived with: math.sqrt(mean_squared_error(y_train,train_predict)) and math.sqrt(mean_squared_error(y_test,test_predict)) with sklearn.metrics and the built in function mean_squared_error from the aforementioned package.

Train RMSE: 28.795422522129595

Test RMSE: 34.17014386085355

And for a graphical representation:

To which I finally arrive at my question; how do I better fit my model to more closely represent the noise within the data, as this is what I believe to be causing the high RMSE values. I have looked into attention mechanisms, in the hopes that I might be able to highlight specific peaks and troughs within the data, but it seems that these are best used with image/text prediction oriented models. I could try training over more epochs, but the model is already slightly overfit, so this would exasperate this particular issue further.

I understand this is a fairly open ended question but I have best tried to "show my working", and thank you in advance.

Solution

This does look like a massive overkill for the task. Start by reducing the number of LSTM layers and adding dropout in between the LSTM layers and within each LSTM.

Answered By - KonstantinosKokos

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Tuesday, February 6, 2024

[FIXED] Improving an LSTM models fit to a datasets noise

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels